Compression of a Set of Files with Natural Language Content

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21240%2F15%3A00218065" target="_blank" >RIV/68407700:21240/15:00218065 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1093/comjnl/bxu046" target="_blank" >http://dx.doi.org/10.1093/comjnl/bxu046</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1093/comjnl/bxu046" target="_blank" >10.1093/comjnl/bxu046</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Compression of a Set of Files with Natural Language Content
Popis výsledku v původním jazyce
An algorithm for very efficient compression of a set of natural language text files is presented. Not only a very good compression ratio is reached, the used compression method allows fast pattern matching in compressed text, which is an attractive property especially for search engines. Much information is stored in the form of a large collection of text files. The web search engines can store the web pages in the raw text form to build so-called snippets or to perform so-called positional ranking functions on them. Furthermore, there exist many other similar contexts such as the storage of emails, application logs or the databases of text files (literary works or technical reports). In this paper, we address the problem of the compression of a largecollection of text files distributed in cluster of computers, where the single files need to be randomly accessed in very short time. The compression algorithm is based on a word-based approach and the idea of combination of two statistic
Název v anglickém jazyce
Compression of a Set of Files with Natural Language Content
Popis výsledku anglicky
An algorithm for very efficient compression of a set of natural language text files is presented. Not only a very good compression ratio is reached, the used compression method allows fast pattern matching in compressed text, which is an attractive property especially for search engines. Much information is stored in the form of a large collection of text files. The web search engines can store the web pages in the raw text form to build so-called snippets or to perform so-called positional ranking functions on them. Furthermore, there exist many other similar contexts such as the storage of emails, application logs or the databases of text files (literary works or technical reports). In this paper, we address the problem of the compression of a largecollection of text files distributed in cluster of computers, where the single files need to be randomly accessed in very short time. The compression algorithm is based on a word-based approach and the idea of combination of two statistic

Klasifikace

Druh
J<sub>x</sub> - Nezařazeno - Článek v odborném periodiku (Jimp, Jsc a Jost)
CEP obor
IN - Informatika
OECD FORD obor
—

Návaznosti výsledku

Projekt
<a href="/cs/project/GA13-03253S" target="_blank" >GA13-03253S: Zpracování textových a stromových struktur a jejich aplikace</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Computer Journal
ISSN
0010-4620
e-ISSN
—
Svazek periodika
58
Číslo periodika v rámci svazku
5
Stát vydavatele periodika
GB - Spojené království Velké Británie a Severního Irska
Počet stran výsledku
17
Strana od-do
1169-1185
Kód UT WoS článku
000359139000010
EID výsledku v databázi Scopus
2-s2.0-84929464949

Podobné výsledky(10)

Komprese semistrukturovaných dokumentů NWB Query Engines: Tools to Search Data Stored in Neurodata Without Borders Format Li2MnCl4 Krehlikova et al

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Compression of a Set of Files with Natural Language Content

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)