Positional Inverted Self-index

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21240%2F16%3A00241718" target="_blank" >RIV/68407700:21240/16:00241718 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Positional Inverted Self-index
Popis výsledku v původním jazyce
We address the problem of positional indexing in natural language domain. The positional inverted index contains the information of the word positions. Thus, it is able to recover the original text file, which implies that it is not necessary to store the original file. Our Positional Inverted Self-Index (PISI) stores the word position gaps encoded by variable byte code. The inverted lists of single terms are combined into one inverted list that represents a backbone of the text file since it stores the sequence of the indexed words of the original file. The inverted list is synchronized with presentation layer that stores separators, stopwords, as well as variants of the indexed words. The Huffman coding is used to encode the presentation layer. The trade-off between compression effectiveness and the decompression speed can be tuned by different parameters of the index algorithm. ] x
Název v anglickém jazyce
Positional Inverted Self-index
Popis výsledku anglicky
We address the problem of positional indexing in natural language domain. The positional inverted index contains the information of the word positions. Thus, it is able to recover the original text file, which implies that it is not necessary to store the original file. Our Positional Inverted Self-Index (PISI) stores the word position gaps encoded by variable byte code. The inverted lists of single terms are combined into one inverted list that represents a backbone of the text file since it stores the sequence of the indexed words of the original file. The inverted list is synchronized with presentation layer that stores separators, stopwords, as well as variants of the indexed words. The Huffman coding is used to encode the presentation layer. The trade-off between compression effectiveness and the decompression speed can be tuned by different parameters of the index algorithm. ] x

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—

Návaznosti výsledku

Projekt
<a href="/cs/project/GA13-03253S" target="_blank" >GA13-03253S: Zpracování textových a stromových struktur a jejich aplikace</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2016
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of Data Compression Conference 2016
ISBN
978-1-5090-1853-6
ISSN
1068-0314
e-ISSN
—
Počet stran výsledku
1
Strana od-do
627-627
Název nakladatele
IEEE Computer Society Press
Místo vydání
New York
Místo konání akce
Snowbird
Datum konání akce
29. 3. 2016
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000391353800106

Podobné výsledky(10)

Towards Efficient Positional Inverted Index Compression of a Set of Files with Natural Language Content Multimedia Index Engine

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Positional Inverted Self-index

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)