Overcoming Long Inference Time of Nearest Neighbors Analysis in Regression and Uncertainty Prediction
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21240%2F24%3A00375001" target="_blank" >RIV/68407700:21240/24:00375001 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1007/s42979-024-02670-2" target="_blank" >https://doi.org/10.1007/s42979-024-02670-2</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s42979-024-02670-2" target="_blank" >10.1007/s42979-024-02670-2</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Overcoming Long Inference Time of Nearest Neighbors Analysis in Regression and Uncertainty Prediction
Popis výsledku v původním jazyce
The intuitive approach of comparing like with like, forms the basis of the so-called nearest neighbor analysis, which is central to many machine learning algorithms. Nearest neighbor analysis is easy to interpret, analyze, and reason about. It is widely used in advanced techniques such as uncertainty estimation in regression models, as well as the renowned k-nearest neighbor-based algorithms. Nevertheless, its high inference time complexity, which is dataset size dependent even in the case of its faster approximated version, restricts its applications and can considerably inflate the application cost. In this paper, we address the problem of high inference time complexity. By using gradient-boosted regression trees as a predictor of the labels obtained from nearest neighbor analysis, we demonstrate a significant increase in inference speed, improving by several orders of magnitude. We validate the effectiveness of our approach on a real-world European Car Pricing Dataset with approximately rows for both residual cost and price uncertainty prediction. Moreover, we assess our method’s performance on the most commonly used tabular benchmark datasets to demonstrate its scalability. The link is to github repository where the code is available: https://github.com/koutefra/uncertainty_experiments.
Název v anglickém jazyce
Overcoming Long Inference Time of Nearest Neighbors Analysis in Regression and Uncertainty Prediction
Popis výsledku anglicky
The intuitive approach of comparing like with like, forms the basis of the so-called nearest neighbor analysis, which is central to many machine learning algorithms. Nearest neighbor analysis is easy to interpret, analyze, and reason about. It is widely used in advanced techniques such as uncertainty estimation in regression models, as well as the renowned k-nearest neighbor-based algorithms. Nevertheless, its high inference time complexity, which is dataset size dependent even in the case of its faster approximated version, restricts its applications and can considerably inflate the application cost. In this paper, we address the problem of high inference time complexity. By using gradient-boosted regression trees as a predictor of the labels obtained from nearest neighbor analysis, we demonstrate a significant increase in inference speed, improving by several orders of magnitude. We validate the effectiveness of our approach on a real-world European Car Pricing Dataset with approximately rows for both residual cost and price uncertainty prediction. Moreover, we assess our method’s performance on the most commonly used tabular benchmark datasets to demonstrate its scalability. The link is to github repository where the code is available: https://github.com/koutefra/uncertainty_experiments.
Klasifikace
Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
SN Computer Science
ISSN
2662-995X
e-ISSN
2661-8907
Svazek periodika
5
Číslo periodika v rámci svazku
5
Stát vydavatele periodika
SG - Singapurská republika
Počet stran výsledku
12
Strana od-do
—
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85191305190