Large-dimensionality small-instance set feature selection: A hybrid bio-inspired heuristic approach
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F18%3A10241746" target="_blank" >RIV/61989100:27240/18:10241746 - isvavai.cz</a>
Výsledek na webu
<a href="https://reader.elsevier.com/reader/sd/pii/S2210650216303042?token=542D17B5F4CE16DB9A9E5E17C8D447F364F9C65D9AE8A862748ADFE650DB2D83FF8B210E6955EA4F85CB77A8E011D848" target="_blank" >https://reader.elsevier.com/reader/sd/pii/S2210650216303042?token=542D17B5F4CE16DB9A9E5E17C8D447F364F9C65D9AE8A862748ADFE650DB2D83FF8B210E6955EA4F85CB77A8E011D848</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.swevo.2018.02.021" target="_blank" >10.1016/j.swevo.2018.02.021</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Large-dimensionality small-instance set feature selection: A hybrid bio-inspired heuristic approach
Popis výsledku v původním jazyce
Selection of a representative set of features is still a crucial and challenging problem in machine learning. The complexity of the problem increases when any of the following situations occur: a very large number of attributes (large dimensionality); a very small number of instances or time points (small-instance set). The first situation poses problems for machine learning algorithm as the search space for selecting a combination of relevant features becomes impossible to explore in a reasonable time and with reasonable computational resources. The second aspect poses the problem of having insufficient data to learn from (insufficient examples). In this work, we approach both these issues at the same time. The methods we proposed are heuristics inspired by nature (in particular, by biology). We propose a hybrid of two methods which has the advantage of providing a good learning from fewer examples and a fair selection of features from a really large set, all these while ensuring a high standard classification accuracy of the data. The methods used are antlion optimization (ALO), grey wolf optimization (GWO), and a combination of the two (ALO-GWO). We test their performance on datasets having almost 50,000 features and less than 200 instances. The results look promising while compared with other methods such as genetic algorithms (GA) and particle swarm optimization (PSO).
Název v anglickém jazyce
Large-dimensionality small-instance set feature selection: A hybrid bio-inspired heuristic approach
Popis výsledku anglicky
Selection of a representative set of features is still a crucial and challenging problem in machine learning. The complexity of the problem increases when any of the following situations occur: a very large number of attributes (large dimensionality); a very small number of instances or time points (small-instance set). The first situation poses problems for machine learning algorithm as the search space for selecting a combination of relevant features becomes impossible to explore in a reasonable time and with reasonable computational resources. The second aspect poses the problem of having insufficient data to learn from (insufficient examples). In this work, we approach both these issues at the same time. The methods we proposed are heuristics inspired by nature (in particular, by biology). We propose a hybrid of two methods which has the advantage of providing a good learning from fewer examples and a fair selection of features from a really large set, all these while ensuring a high standard classification accuracy of the data. The methods used are antlion optimization (ALO), grey wolf optimization (GWO), and a combination of the two (ALO-GWO). We test their performance on datasets having almost 50,000 features and less than 200 instances. The results look promising while compared with other methods such as genetic algorithms (GA) and particle swarm optimization (PSO).
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Swarm and Evolutionary Computation
ISSN
2210-6502
e-ISSN
—
Svazek periodika
42
Číslo periodika v rámci svazku
October
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
14
Strana od-do
29-42
Kód UT WoS článku
000445716200003
EID výsledku v databázi Scopus
2-s2.0-85043393542