Nearest-neighbor Search from Large Datasets using Narrow Sketches
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F22%3A00125541" target="_blank" >RIV/00216224:14330/22:00125541 - isvavai.cz</a>
Result on the web
<a href="https://www.scitepress.org/PublicationsDetail.aspx?ID=s5xL4A2YSOs=&t=1" target="_blank" >https://www.scitepress.org/PublicationsDetail.aspx?ID=s5xL4A2YSOs=&t=1</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.5220/0010817600003122" target="_blank" >10.5220/0010817600003122</a>
Alternative languages
Result language
angličtina
Original language name
Nearest-neighbor Search from Large Datasets using Narrow Sketches
Original language description
We consider the nearest-neighbor search on large-scale high-dimensional datasets that cannot fit in the main memory. Sketches are bit strings that compactly express data points. Although it is usually thought that wide sketches are needed for high-precision searches, we use relatively narrow sketches such as 22-bit or 24-bit, to select a small set of candidates for the search. We use an asymmetric distance between data points and sketches as the criteria for candidate selection, instead of traditionally used Hamming distance. It can be considered a distance partially restoring quantization error. We utilize an efficient one-by-one sketch enumeration in the order of the partially restored distance to realize a fast candidate selection. We use two datasets to demonstrate the effectiveness of the method: YFCC100M-HNfc6 consisting of about 100 million 4,096 dimensional image descriptors and DEEP1B consisting of 1 billion 96 dimensional vectors. Using a standard desktop computer, we condu cted a nearest-neighbor search for a query on datasets stored on SSD, where vectors are represented by 8-bit integers. The proposed method executes the search in 5.8 seconds for the 400GB dataset YFCC100M, and 0.24 seconds for the 100GB dataset DEEP1B, while keeping the recall of 90%.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/EF16_019%2F0000822" target="_blank" >EF16_019/0000822: CyberSecurity, CyberCrime and Critical Information Infrastructures Center of Excellence</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM
ISBN
9789897585494
ISSN
—
e-ISSN
—
Number of pages
10
Pages from-to
401-410
Publisher name
SciTePress
Place of publication
Portugal
Event location
online
Event date
Jan 1, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000819122200044