All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Nearest-neighbor Search from Large Datasets using Narrow Sketches

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F22%3A00125541" target="_blank" >RIV/00216224:14330/22:00125541 - isvavai.cz</a>

  • Result on the web

    <a href="https://www.scitepress.org/PublicationsDetail.aspx?ID=s5xL4A2YSOs=&t=1" target="_blank" >https://www.scitepress.org/PublicationsDetail.aspx?ID=s5xL4A2YSOs=&t=1</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.5220/0010817600003122" target="_blank" >10.5220/0010817600003122</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Nearest-neighbor Search from Large Datasets using Narrow Sketches

  • Original language description

    We consider the nearest-neighbor search on large-scale high-dimensional datasets that cannot fit in the main memory. Sketches are bit strings that compactly express data points. Although it is usually thought that wide sketches are needed for high-precision searches, we use relatively narrow sketches such as 22-bit or 24-bit, to select a small set of candidates for the search. We use an asymmetric distance between data points and sketches as the criteria for candidate selection, instead of traditionally used Hamming distance. It can be considered a distance partially restoring quantization error. We utilize an efficient one-by-one sketch enumeration in the order of the partially restored distance to realize a fast candidate selection. We use two datasets to demonstrate the effectiveness of the method: YFCC100M-HNfc6 consisting of about 100 million 4,096 dimensional image descriptors and DEEP1B consisting of 1 billion 96 dimensional vectors. Using a standard desktop computer, we condu cted a nearest-neighbor search for a query on datasets stored on SSD, where vectors are represented by 8-bit integers. The proposed method executes the search in 5.8 seconds for the 400GB dataset YFCC100M, and 0.24 seconds for the 100GB dataset DEEP1B, while keeping the recall of 90%.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/EF16_019%2F0000822" target="_blank" >EF16_019/0000822: CyberSecurity, CyberCrime and Critical Information Infrastructures Center of Excellence</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2022

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM

  • ISBN

    9789897585494

  • ISSN

  • e-ISSN

  • Number of pages

    10

  • Pages from-to

    401-410

  • Publisher name

    SciTePress

  • Place of publication

    Portugal

  • Event location

    online

  • Event date

    Jan 1, 2022

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article

    000819122200044