Evolutionary Feature Subset Selection with Compression-based Entropy Estimation
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F16%3A86099076" target="_blank" >RIV/61989100:27240/16:86099076 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1145/2908812.2908853" target="_blank" >http://dx.doi.org/10.1145/2908812.2908853</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1145/2908812.2908853" target="_blank" >10.1145/2908812.2908853</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Evolutionary Feature Subset Selection with Compression-based Entropy Estimation
Popis výsledku v původním jazyce
Modern massive data sets often comprise of millions of records and thousands of features. Their efficient processing by traditional methods represents an increasing challenge. Feature selection methods form a family of traditional instruments for data dimensionality reduction. They aim at selecting subsets of data features so that the loss of information, contained in the full data set, is minimized. Evolutionary feature selection methods have shown good ability to identify feature subsets in very-high-dimensional data sets. Their efficiency depends, among others, on a particular optimization algorithm, feature subset representation, and objective function definition. In this paper, two evolutionary methods for fixed-length subset selection are employed to find feature subsets on the basis of their entropy, estimated by a fast data compression algorithm. The reasonability of the fitness criterion, ability of the investigated methods to find good feature subsets, and the usefulness of selected feature subsets for practical data mining, is evaluated using two well-known data sets and several widely-used classification algorithms.
Název v anglickém jazyce
Evolutionary Feature Subset Selection with Compression-based Entropy Estimation
Popis výsledku anglicky
Modern massive data sets often comprise of millions of records and thousands of features. Their efficient processing by traditional methods represents an increasing challenge. Feature selection methods form a family of traditional instruments for data dimensionality reduction. They aim at selecting subsets of data features so that the loss of information, contained in the full data set, is minimized. Evolutionary feature selection methods have shown good ability to identify feature subsets in very-high-dimensional data sets. Their efficiency depends, among others, on a particular optimization algorithm, feature subset representation, and objective function definition. In this paper, two evolutionary methods for fixed-length subset selection are employed to find feature subsets on the basis of their entropy, estimated by a fast data compression algorithm. The reasonability of the fitness criterion, ability of the investigated methods to find good feature subsets, and the usefulness of selected feature subsets for practical data mining, is evaluated using two well-known data sets and several widely-used classification algorithms.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—
Návaznosti výsledku
Projekt
<a href="/cs/project/GJ16-25694Y" target="_blank" >GJ16-25694Y: Mnohoparadigmatické algoritmy dolování z dat založené na vyhledávání, fuzzy technologiích a bio-inspirovaných výpočtech</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2016
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
GECCO'16 : proceedings of the 2016 Genetic and evolutionary computation conference
ISBN
978-1-4503-4206-3
ISSN
—
e-ISSN
—
Počet stran výsledku
8
Strana od-do
933-940
Název nakladatele
Association for Computing Machinery
Místo vydání
New York
Místo konání akce
Denver
Datum konání akce
20. 7. 2016
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000382659200118