All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Suffix Array for Large Alphabet

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F08%3A00101220" target="_blank" >RIV/00216208:11320/08:00101220 - isvavai.cz</a>

  • Result on the web

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Suffix Array for Large Alphabet

  • Original language description

    Burrows-Wheeler Transform (BWT) is used as the main part in block compression which has a good balance of speed and compression ratio. Suffix arrays are used in the coding phase of BWT and we focus on creating them for an alphabet larger than 256 symbols. The motivation for this work has been software project XBW - an application for compression of large XML files. The role of BWT is to reorder input before applying other algorithms. We describe and implement three families of algorithms for encoding. The first is inspired by the work of Sadakane and further improved by Larsson. The second family includes algorithm by Seward and algorithm by Itoh further improved by Kao. Finally we present algorithm by Karkkainen and Sanders for constructing su+-x arrays in linear time. As our main result we show that for textual data using syllables or words as symbols of alphabet improves both run time and compression ratio of block compression.

  • Czech name

    Suffixové pole pro velkou abecedu

  • Czech description

    Burrows-Wheelerova transformace (BWT) se používá jako hlavní část blokové komprese, díky dobrému kompresnímu poměru a přijatelné rychlosti komprese. Pro BWT se využívá struktury suffixového pole, my jsme se zaměřili na abecedy větší než 256 symbolů. Motivací pro tento článek byl projekt XBW, jehož hlavní částí je BWT. Popsali jsme, implementovali a změřili různé rodiny algoritmů pro tvorbu suffixového pole. Hlavním výsledkem bylo, že použití slabikových nebo slovních metod pro kompresi textu pomocí BWTvylepší nejen kompresní poměr ale i čas komprese.

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    JC - Computer hardware and software

  • OECD FORD branch

Result continuities

  • Project

    Result was created during the realization of more than one project. More information in the Projects tab.

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>Z - Vyzkumny zamer (s odkazem do CEZ)

Others

  • Publication year

    2008

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    2008 Data Compression Conference (DCC 2008)

  • ISBN

    0-7695-3121-0

  • ISSN

  • e-ISSN

  • Number of pages

    1

  • Pages from-to

  • Publisher name

    IEEE Computer Society Press

  • Place of publication

    Snowbird, Utah, USA

  • Event location

    Snowbird, Utah, USA

  • Event date

    Jan 1, 2008

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article

    000255196800093