Testing of Inductive Preprocessing Algorithm

Result description

The data preprocessing is very important part of the knowledge discovery process. Data mining systems contains tens of preprocessing methods (for example methods for missing data imputation, data reduction, discretization, data enrichment, etc...) and usually it is not clear which methods to use. The selection of preprocessing methods appropriate for particular dataset needs strong experience and a lot of experimenting. In this paper we will test our extension of inductive approach to data preprocessing. We developed inductive preprocessing method which utilizes genetic algorithm to compose from scratch a sequence of preprocessing methods which fits to the data and allows successful model to be created. To test our automatic preprocessing utilize several real-world datasets available from UCI Machine learning repository.

Keywords

Inductive preprocessing UCI

The result's identifiers

Result code in IS VaVaI
RIV/68407700:21230/09:00159932 - isvavai.cz
Alternative codes found
RIV/68407700:21240/09:00159932
Result on the web
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.215.1155&rep=rep1&type=pdf
DOI - Digital Object Identifier
—

Alternative languages

Result language
angličtina
Original language name
Testing of Inductive Preprocessing Algorithm
Original language description
The data preprocessing is very important part of the knowledge discovery process. Data mining systems contains tens of preprocessing methods (for example methods for missing data imputation, data reduction, discretization, data enrichment, etc...) and usually it is not clear which methods to use. The selection of preprocessing methods appropriate for particular dataset needs strong experience and a lot of experimenting. In this paper we will test our extension of inductive approach to data preprocessing. We developed inductive preprocessing method which utilizes genetic algorithm to compose from scratch a sequence of preprocessing methods which fits to the data and allows successful model to be created. To test our automatic preprocessing utilize several real-world datasets available from UCI Machine learning repository.
Czech name
—
Czech description
—

Classification

Type
O - Miscellaneous
CEP classification
IN - Informatics
OECD FORD branch
—

Result continuities

Project
KJB201210701: Automated Knowledge Extraction
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)

Others

Publication year
2009
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Basic information

Result type

O - Miscellaneous

CEP

IN - Informatics

Year of implementation

2009

Similar results(10)

Boolean factor analysis for data preprocessing in machine learning Preprocessing input data for machine learning by FCA Benchmark of Data Preprocessing Methods for Imbalanced Classification

What are you looking for?

Quick search

Smart search

Share search results