Streaming algorithms for embedding and computing edit distance in the low distance regime
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F16%3A10331498" target="_blank" >RIV/00216208:11320/16:10331498 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1145/2897518.2897577" target="_blank" >http://dx.doi.org/10.1145/2897518.2897577</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1145/2897518.2897577" target="_blank" >10.1145/2897518.2897577</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Streaming algorithms for embedding and computing edit distance in the low distance regime
Popis výsledku v původním jazyce
The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings x,y lying in the Boolean hypercube. The edit distance between x and y is defined as the minimum number of character insertion, deletion, and bit flips needed for converting x into y. Whereas, the Hamming distance between x and y is the number of bit flips needed for converting x to y. In this paper we study a randomized injective embedding of the edit distance into the Hamming distance with a small distortion. We show a randomized embedding with quadratic distortion. Namely, for any x,y satisfying that their edit distance equals k, the Hamming distance between the embedding of x and y is O(k2) with high probability. This improves over the distortion ratio of O( n * n) obtained by Jowhari (2012) for small values of k. Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input. We provide several applications for this embedding. Among our results we provide a one-pass (streaming) algorithm for edit distance running in space O(s) and computing edit distance exactly up-to distance s1/6. This algorithm is based on kernelization for edit distance that is of independent interest.
Název v anglickém jazyce
Streaming algorithms for embedding and computing edit distance in the low distance regime
Popis výsledku anglicky
The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings x,y lying in the Boolean hypercube. The edit distance between x and y is defined as the minimum number of character insertion, deletion, and bit flips needed for converting x into y. Whereas, the Hamming distance between x and y is the number of bit flips needed for converting x to y. In this paper we study a randomized injective embedding of the edit distance into the Hamming distance with a small distortion. We show a randomized embedding with quadratic distortion. Namely, for any x,y satisfying that their edit distance equals k, the Hamming distance between the embedding of x and y is O(k2) with high probability. This improves over the distortion ratio of O( n * n) obtained by Jowhari (2012) for small values of k. Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input. We provide several applications for this embedding. Among our results we provide a one-pass (streaming) algorithm for edit distance running in space O(s) and computing edit distance exactly up-to distance s1/6. This algorithm is based on kernelization for edit distance that is of independent interest.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—
Návaznosti výsledku
Projekt
<a href="/cs/project/GA14-10003S" target="_blank" >GA14-10003S: Omezené typy výpočtů: algoritmy, modely, složitost</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2016
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016
ISBN
978-1-4503-4132-5
ISSN
—
e-ISSN
—
Počet stran výsledku
14
Strana od-do
712-725
Název nakladatele
ACM 2016
Místo vydání
New York, NY, USA
Místo konání akce
Cambridge, MA, USA
Datum konání akce
18. 6. 2016
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—