All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

A syllable-based method for Vietnamese text compression

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F16%3A86099063" target="_blank" >RIV/61989100:27240/16:86099063 - isvavai.cz</a>

  • Result on the web

    <a href="http://dx.doi.org/10.1145/2857546.2857564" target="_blank" >http://dx.doi.org/10.1145/2857546.2857564</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1145/2857546.2857564" target="_blank" >10.1145/2857546.2857564</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    A syllable-based method for Vietnamese text compression

  • Original language description

    Text compression is a technique to reduce the size of text file and increase the transfer rate as well as save storage space. Many approaches have been proposed to tackle this problem in several languages such as: English, Chinese, Turkey, Japanese, French, etc. In this paper, we propose a method to compress Vietnamese text using syllables based on morphology and dictionaries. Our method firstly splits a morphosyllable to a consonant and a syllable then we encode it based on dictionaries of consonants and syllables. In our method, based on characteristics of Vietnamese language with six tone-marks, we build six different dictionaries of syllables. We collect a testing set of 20 different text files with different sizes to demonstrate our system. Experimental results show that our system achieves good performance with the compression ratio around 73%. In comparison with WinZIP version 19.51 and WinRAR version 5.212, our method achieves a higher compression ratio while the size of text file is small. So that, our method can apply efficiency to compress for short text such as: SMS messages, text messages on social networks. (C) 2016 ACM.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

  • Continuities

    S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2016

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    ACM IMCOM 2016: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication

  • ISBN

    978-1-4503-4142-4

  • ISSN

  • e-ISSN

  • Number of pages

    6

  • Pages from-to

    1-6

  • Publisher name

    Association for Computing Machinery

  • Place of publication

    New York

  • Event location

    Danang

  • Event date

    Jan 4, 2016

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article