Improvement of text compression using subset of words
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F14%3A86092523" target="_blank" >RIV/61989100:27240/14:86092523 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1166/asl.2014.5282" target="_blank" >http://dx.doi.org/10.1166/asl.2014.5282</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1166/asl.2014.5282" target="_blank" >10.1166/asl.2014.5282</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Improvement of text compression using subset of words
Popis výsledku v původním jazyce
This paper describes a novel approach to the text compression based on the combination of the characters and words approach. New approach uses subset of words for improvement of text compression. The amount of words used in the algorithm is based on thesize and the content of the compressed texts. The ideal number of the words with respect to the compression algorithm used and compressed data is also investigated in this paper. Several source files will be evaluated and different number of words will be combined with the characters to achieve better compression. Moreover three different compression algorithms will be evaluated. The effect of the combination of words with characters on different text files from the standard compression corpuses and different compression algorithms will be investigated in the experiments. The results show that these combinations are always better than the pure word or the pure character approach. Moreover a few ideas about necessary numbers of words for
Název v anglickém jazyce
Improvement of text compression using subset of words
Popis výsledku anglicky
This paper describes a novel approach to the text compression based on the combination of the characters and words approach. New approach uses subset of words for improvement of text compression. The amount of words used in the algorithm is based on thesize and the content of the compressed texts. The ideal number of the words with respect to the compression algorithm used and compressed data is also investigated in this paper. Several source files will be evaluated and different number of words will be combined with the characters to achieve better compression. Moreover three different compression algorithms will be evaluated. The effect of the combination of words with characters on different text files from the standard compression corpuses and different compression algorithms will be investigated in the experiments. The results show that these combinations are always better than the pure word or the pure character approach. Moreover a few ideas about necessary numbers of words for
Klasifikace
Druh
J<sub>x</sub> - Nezařazeno - Článek v odborném periodiku (Jimp, Jsc a Jost)
CEP obor
IN - Informatika
OECD FORD obor
—
Návaznosti výsledku
Projekt
<a href="/cs/project/GPP202%2F11%2FP142" target="_blank" >GPP202/11/P142: Optimalizace a paralelizace kompresních metod</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2014
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Advanced Science Letters
ISSN
1936-6612
e-ISSN
—
Svazek periodika
20
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
5
Strana od-do
312-316
Kód UT WoS článku
—
EID výsledku v databázi Scopus
—