Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F18%3A00101119" target="_blank" >RIV/00216224:14330/18:00101119 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1007/978-3-662-58384-5_3" target="_blank" >http://dx.doi.org/10.1007/978-3-662-58384-5_3</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-662-58384-5_3" target="_blank" >10.1007/978-3-662-58384-5_3</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries
Popis výsledku v původním jazyce
Current era of digital data explosion calls for employment of content-based similarity search techniques, since traditional searchable metadata like annotations are not always available. In our work, we focus on a scenario where the similarity search is used in the context of stream processing, which is one of the suitable approaches to deal with huge amounts of data. Our goal is to maximize the throughput of processed queries while a slight delay is acceptable. We propose a technique that dynamically reorders the queries coming from the stream in order to use our caching mechanism in huge data spaces more effectively. We were able to achieve significantly higher throughput compared to the baseline when no reordering and no caching were used. Moreover, our proposal does not incur any additional precision loss of the similarity search, as opposed to some other caching techniques. In addition to the throughput maximization, we also study the potential of trading off the throughput for low delays (waiting times). The proposed technique allows to be parameterized by the amount of the throughput that can be sacrificed.
Název v anglickém jazyce
Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries
Popis výsledku anglicky
Current era of digital data explosion calls for employment of content-based similarity search techniques, since traditional searchable metadata like annotations are not always available. In our work, we focus on a scenario where the similarity search is used in the context of stream processing, which is one of the suitable approaches to deal with huge amounts of data. Our goal is to maximize the throughput of processed queries while a slight delay is acceptable. We propose a technique that dynamically reorders the queries coming from the stream in order to use our caching mechanism in huge data spaces more effectively. We were able to achieve significantly higher throughput compared to the baseline when no reordering and no caching were used. Moreover, our proposal does not incur any additional precision loss of the similarity search, as opposed to some other caching techniques. In addition to the throughput maximization, we also study the potential of trading off the throughput for low delays (waiting times). The proposed technique allows to be parameterized by the amount of the throughput that can be sacrificed.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/GA16-18889S" target="_blank" >GA16-18889S: Analytika pro velká nestrukturovaná data</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII
ISBN
9783662583838
ISSN
0302-9743
e-ISSN
—
Počet stran výsledku
28
Strana od-do
61-88
Název nakladatele
Springer
Místo vydání
Berlin, Heidelberg
Místo konání akce
Berlin, Heidelberg
Datum konání akce
1. 1. 2018
Typ akce podle státní příslušnosti
CST - Celostátní akce
Kód UT WoS článku
—