A syllable-based method for Vietnamese text compression
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F16%3A86099063" target="_blank" >RIV/61989100:27240/16:86099063 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1145/2857546.2857564" target="_blank" >http://dx.doi.org/10.1145/2857546.2857564</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1145/2857546.2857564" target="_blank" >10.1145/2857546.2857564</a>
Alternative languages
Result language
angličtina
Original language name
A syllable-based method for Vietnamese text compression
Original language description
Text compression is a technique to reduce the size of text file and increase the transfer rate as well as save storage space. Many approaches have been proposed to tackle this problem in several languages such as: English, Chinese, Turkey, Japanese, French, etc. In this paper, we propose a method to compress Vietnamese text using syllables based on morphology and dictionaries. Our method firstly splits a morphosyllable to a consonant and a syllable then we encode it based on dictionaries of consonants and syllables. In our method, based on characteristics of Vietnamese language with six tone-marks, we build six different dictionaries of syllables. We collect a testing set of 20 different text files with different sizes to demonstrate our system. Experimental results show that our system achieves good performance with the compression ratio around 73%. In comparison with WinZIP version 19.51 and WinRAR version 5.212, our method achieves a higher compression ratio while the size of text file is small. So that, our method can apply efficiency to compress for short text such as: SMS messages, text messages on social networks. (C) 2016 ACM.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
ACM IMCOM 2016: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication
ISBN
978-1-4503-4142-4
ISSN
—
e-ISSN
—
Number of pages
6
Pages from-to
1-6
Publisher name
Association for Computing Machinery
Place of publication
New York
Event location
Danang
Event date
Jan 4, 2016
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—