A syllable-based method for Vietnamese text compression
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F16%3A86099063" target="_blank" >RIV/61989100:27240/16:86099063 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1145/2857546.2857564" target="_blank" >http://dx.doi.org/10.1145/2857546.2857564</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1145/2857546.2857564" target="_blank" >10.1145/2857546.2857564</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
A syllable-based method for Vietnamese text compression
Popis výsledku v původním jazyce
Text compression is a technique to reduce the size of text file and increase the transfer rate as well as save storage space. Many approaches have been proposed to tackle this problem in several languages such as: English, Chinese, Turkey, Japanese, French, etc. In this paper, we propose a method to compress Vietnamese text using syllables based on morphology and dictionaries. Our method firstly splits a morphosyllable to a consonant and a syllable then we encode it based on dictionaries of consonants and syllables. In our method, based on characteristics of Vietnamese language with six tone-marks, we build six different dictionaries of syllables. We collect a testing set of 20 different text files with different sizes to demonstrate our system. Experimental results show that our system achieves good performance with the compression ratio around 73%. In comparison with WinZIP version 19.51 and WinRAR version 5.212, our method achieves a higher compression ratio while the size of text file is small. So that, our method can apply efficiency to compress for short text such as: SMS messages, text messages on social networks. (C) 2016 ACM.
Název v anglickém jazyce
A syllable-based method for Vietnamese text compression
Popis výsledku anglicky
Text compression is a technique to reduce the size of text file and increase the transfer rate as well as save storage space. Many approaches have been proposed to tackle this problem in several languages such as: English, Chinese, Turkey, Japanese, French, etc. In this paper, we propose a method to compress Vietnamese text using syllables based on morphology and dictionaries. Our method firstly splits a morphosyllable to a consonant and a syllable then we encode it based on dictionaries of consonants and syllables. In our method, based on characteristics of Vietnamese language with six tone-marks, we build six different dictionaries of syllables. We collect a testing set of 20 different text files with different sizes to demonstrate our system. Experimental results show that our system achieves good performance with the compression ratio around 73%. In comparison with WinZIP version 19.51 and WinRAR version 5.212, our method achieves a higher compression ratio while the size of text file is small. So that, our method can apply efficiency to compress for short text such as: SMS messages, text messages on social networks. (C) 2016 ACM.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2016
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
ACM IMCOM 2016: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication
ISBN
978-1-4503-4142-4
ISSN
—
e-ISSN
—
Počet stran výsledku
6
Strana od-do
1-6
Název nakladatele
Association for Computing Machinery
Místo vydání
New York
Místo konání akce
Danang
Datum konání akce
4. 1. 2016
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—