Improvement of text compression using subset of words
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F14%3A86092523" target="_blank" >RIV/61989100:27240/14:86092523 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1166/asl.2014.5282" target="_blank" >http://dx.doi.org/10.1166/asl.2014.5282</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1166/asl.2014.5282" target="_blank" >10.1166/asl.2014.5282</a>
Alternative languages
Result language
angličtina
Original language name
Improvement of text compression using subset of words
Original language description
This paper describes a novel approach to the text compression based on the combination of the characters and words approach. New approach uses subset of words for improvement of text compression. The amount of words used in the algorithm is based on thesize and the content of the compressed texts. The ideal number of the words with respect to the compression algorithm used and compressed data is also investigated in this paper. Several source files will be evaluated and different number of words will be combined with the characters to achieve better compression. Moreover three different compression algorithms will be evaluated. The effect of the combination of words with characters on different text files from the standard compression corpuses and different compression algorithms will be investigated in the experiments. The results show that these combinations are always better than the pure word or the pure character approach. Moreover a few ideas about necessary numbers of words for
Czech name
—
Czech description
—
Classification
Type
J<sub>x</sub> - Unclassified - Peer-reviewed scientific article (Jimp, Jsc and Jost)
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/GPP202%2F11%2FP142" target="_blank" >GPP202/11/P142: Optimization and parallelization of compression methods</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2014
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Advanced Science Letters
ISSN
1936-6612
e-ISSN
—
Volume of the periodical
20
Issue of the periodical within the volume
1
Country of publishing house
US - UNITED STATES
Number of pages
5
Pages from-to
312-316
UT code for WoS article
—
EID of the result in the Scopus database
—