Classification with Costly Features in Hierarchical Deep Sets
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F24%3A00375997" target="_blank" >RIV/68407700:21230/24:00375997 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1007/s10994-024-06565-4" target="_blank" >https://doi.org/10.1007/s10994-024-06565-4</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s10994-024-06565-4" target="_blank" >10.1007/s10994-024-06565-4</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Classification with Costly Features in Hierarchical Deep Sets
Popis výsledku v původním jazyce
Classification with costly features (CwCF) is a classification problem that includes the cost of features in the optimization criteria. Individually for each sample, its features are sequentially acquired to maximize accuracy while minimizing the acquired features' cost. However, existing approaches can only process data that can be expressed as vectors of fixed length. In real life, the data often possesses rich and complex structure, which can be more precisely described with formats such as XML or JSON. The data is hierarchical and often contains nested lists of objects. In this work, we extend an existing deep reinforcement learning-based algorithm with hierarchical deep sets and hierarchical softmax, so that it can directly process this data. The extended method has greater control over which features it can acquire and, in experiments with seven datasets, we show that this leads to superior performance. To showcase the real usage of the new method, we apply it to a real-life problem of classifying malicious web domains, using an online service.
Název v anglickém jazyce
Classification with Costly Features in Hierarchical Deep Sets
Popis výsledku anglicky
Classification with costly features (CwCF) is a classification problem that includes the cost of features in the optimization criteria. Individually for each sample, its features are sequentially acquired to maximize accuracy while minimizing the acquired features' cost. However, existing approaches can only process data that can be expressed as vectors of fixed length. In real life, the data often possesses rich and complex structure, which can be more precisely described with formats such as XML or JSON. The data is hierarchical and often contains nested lists of objects. In this work, we extend an existing deep reinforcement learning-based algorithm with hierarchical deep sets and hierarchical softmax, so that it can directly process this data. The extended method has greater control over which features it can acquire and, in experiments with seven datasets, we show that this leads to superior performance. To showcase the real usage of the new method, we apply it to a real-life problem of classifying malicious web domains, using an online service.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Machine Learning
ISSN
0885-6125
e-ISSN
1573-0565
Svazek periodika
113
Číslo periodika v rámci svazku
7
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
36
Strana od-do
4487-4522
Kód UT WoS článku
001229224200001
EID výsledku v databázi Scopus
2-s2.0-85193792574