Reproducible experiments with Learned Metric Index Framework
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F23%3A00131386" target="_blank" >RIV/00216224:14330/23:00131386 - isvavai.cz</a>
Result on the web
<a href="https://www.sciencedirect.com/science/article/pii/S0306437923000911" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0306437923000911</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.is.2023.102255" target="_blank" >10.1016/j.is.2023.102255</a>
Alternative languages
Result language
angličtina
Original language name
Reproducible experiments with Learned Metric Index Framework
Original language description
This work is a companion reproducible paper of a previous paper (Antol et al., 2021) in which we presented an alternative to the traditional paradigm of similarity searching in metric spaces called the Learned Metric Index. Inspired by the advance in learned indexing of structured data, we used machine learning models to replace index pivots, thus posing similarity search as a classification problem. This implementation proved to be more than competitive with the conventional methods in terms of speed and recall, proving the concept as viable. The aim of this publication is to make our source code, datasets, and experiments publicly available. For this purpose, we create a collection of Python3 software libraries, YAML reproducible experiment files, and JSON ground-truth files, all bundled in a Docker image – the Learned Metric Index Framework (LMIF) – which can be run using any Docker-compatible operating system on a CPU with Advanced vector extensions (AVX). We introduce a reproducibility protocol for our experiments using LMIF and provide a closer look at the experimental process. We introduce new experimental results by running the reproducibility protocol introduced herein and discussing the differences with the results reported in our primary work (Antol et al., 2021). Finally, we make an argument that these results can be considered weakly reproducible (in both of the performance metrics), since they point to the same conclusions derived in the primary paper.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
20206 - Computer hardware and architecture
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Information systems
ISSN
0306-4379
e-ISSN
0306-4379
Volume of the periodical
118
Issue of the periodical within the volume
1
Country of publishing house
NL - THE KINGDOM OF THE NETHERLANDS
Number of pages
16
Pages from-to
102255
UT code for WoS article
001050259000001
EID of the result in the Scopus database
2-s2.0-85166232879