N-Gram-Based Text Compression

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F16%3A86099098" target="_blank" >RIV/61989100:27240/16:86099098 - isvavai.cz</a>
Výsledek na webu
<a href="http://downloads.hindawi.com/journals/cin/2016/9483646.pdf" target="_blank" >http://downloads.hindawi.com/journals/cin/2016/9483646.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1155/2016/9483646" target="_blank" >10.1155/2016/9483646</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
N-Gram-Based Text Compression
Popis výsledku v původním jazyce
We propose an efficient method for compressing Vietnamese text using n-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n-grams and then encodes them based on n-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Each n-gram is encoded by two to four bytes accordingly based on its corresponding n-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to build n-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods. (C) 2016 Vu H. Nguyen et al.
Název v anglickém jazyce
N-Gram-Based Text Compression
Popis výsledku anglicky
We propose an efficient method for compressing Vietnamese text using n-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n-grams and then encodes them based on n-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Each n-gram is encoded by two to four bytes accordingly based on its corresponding n-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to build n-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods. (C) 2016 Vu H. Nguyen et al.

Klasifikace

Druh
J<sub>x</sub> - Nezařazeno - Článek v odborném periodiku (Jimp, Jsc a Jost)
CEP obor
IN - Informatika
OECD FORD obor
—

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2016
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Computational Intelligence and Neuroscience
ISSN
1687-5265
e-ISSN
—
Svazek periodika
2016
Číslo periodika v rámci svazku
2016
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
11
Strana od-do
1-11
Kód UT WoS článku
000388857100001
EID výsledku v databázi Scopus
2-s2.0-84999683585

Podobné výsledky(10)

Trigram-based Vietnamese text compression A syllable-based method for Vietnamese text compression Compression of a Dictionary

Co hledáte?

Rychlé hledání

Chytré vyhledávání

N-Gram-Based Text Compression

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)