Reproducible experiments with Learned Metric Index Framework

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F23%3A00131386" target="_blank" >RIV/00216224:14330/23:00131386 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.sciencedirect.com/science/article/pii/S0306437923000911" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0306437923000911</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.is.2023.102255" target="_blank" >10.1016/j.is.2023.102255</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Reproducible experiments with Learned Metric Index Framework
Popis výsledku v původním jazyce
This work is a companion reproducible paper of a previous paper (Antol et al., 2021) in which we presented an alternative to the traditional paradigm of similarity searching in metric spaces called the Learned Metric Index. Inspired by the advance in learned indexing of structured data, we used machine learning models to replace index pivots, thus posing similarity search as a classification problem. This implementation proved to be more than competitive with the conventional methods in terms of speed and recall, proving the concept as viable. The aim of this publication is to make our source code, datasets, and experiments publicly available. For this purpose, we create a collection of Python3 software libraries, YAML reproducible experiment files, and JSON ground-truth files, all bundled in a Docker image – the Learned Metric Index Framework (LMIF) – which can be run using any Docker-compatible operating system on a CPU with Advanced vector extensions (AVX). We introduce a reproducibility protocol for our experiments using LMIF and provide a closer look at the experimental process. We introduce new experimental results by running the reproducibility protocol introduced herein and discussing the differences with the results reported in our primary work (Antol et al., 2021). Finally, we make an argument that these results can be considered weakly reproducible (in both of the performance metrics), since they point to the same conclusions derived in the primary paper.
Název v anglickém jazyce
Reproducible experiments with Learned Metric Index Framework
Popis výsledku anglicky
This work is a companion reproducible paper of a previous paper (Antol et al., 2021) in which we presented an alternative to the traditional paradigm of similarity searching in metric spaces called the Learned Metric Index. Inspired by the advance in learned indexing of structured data, we used machine learning models to replace index pivots, thus posing similarity search as a classification problem. This implementation proved to be more than competitive with the conventional methods in terms of speed and recall, proving the concept as viable. The aim of this publication is to make our source code, datasets, and experiments publicly available. For this purpose, we create a collection of Python3 software libraries, YAML reproducible experiment files, and JSON ground-truth files, all bundled in a Docker image – the Learned Metric Index Framework (LMIF) – which can be run using any Docker-compatible operating system on a CPU with Advanced vector extensions (AVX). We introduce a reproducibility protocol for our experiments using LMIF and provide a closer look at the experimental process. We introduce new experimental results by running the reproducibility protocol introduced herein and discussing the differences with the results reported in our primary work (Antol et al., 2021). Finally, we make an argument that these results can be considered weakly reproducible (in both of the performance metrics), since they point to the same conclusions derived in the primary paper.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
20206 - Computer hardware and architecture

Návaznosti výsledku

Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Information systems
ISSN
0306-4379
e-ISSN
0306-4379
Svazek periodika
118
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
NL - Nizozemsko
Počet stran výsledku
16
Strana od-do
102255
Kód UT WoS článku
001050259000001
EID výsledku v databázi Scopus
2-s2.0-85166232879

Podobné výsledky(10)

The Art of Reproducible Machine Learning: A Survey of Methodology in Word Vector Experiments Application of Distance Metric Learning to Automated Malware Detection Zlepšení klasifikace malwarových rodin pomocí naučené vzdálenosti pro nízké dimenze

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Reproducible experiments with Learned Metric Index Framework

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)