Missing Data Imputation and the Inductive Modelling
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F07%3A03132900" target="_blank" >RIV/68407700:21230/07:03132900 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Missing Data Imputation and the Inductive Modelling
Original language description
Missing data is a big problem in simulation for data mining and data analysis. Real world applications often contains missing data. Many data-mining methods is unable to create models from data which contains missing values. Traditional approach is to delete vectors with missing data. Unfortunately, this approach may lead to decreased accuracy of the models and in the worst case all data in dataset may be deleted. For this reason many different imputation techniques were developed and some are widely used. In this paper, we present a comparison of several well-known techniques for missing data imputation. Presented techniques includes imputation of mean value, zero, value from nearest input vector and few others. In this paper we show which techniquesare the best in estimation of missing values. To test imputation methods we used several different datasets. We compare the imputation methods in two ways. The first is to compare imputed data with original data.
Czech name
Nahrazování chybějících dat a induktivní modelování
Czech description
Missing data is a big problem in simulation for data mining and data analysis. Real world applications often contains missing data. Many data-mining methods is unable to create models from data which contains missing values. Traditional approach is to delete vectors with missing data. Unfortunately, this approach may lead to decreased accuracy of the models and in the worst case all data in dataset may be deleted. For this reason many different imputation techniques were developed and some are widely used. In this paper, we present a comparison of several well-known techniques for missing data imputation. Presented techniques includes imputation of mean value, zero, value from nearest input vector and few others. In this paper we show which techniquesare the best in estimation of missing values. To test imputation methods we used several different datasets. We compare the imputation methods in two ways. The first is to compare imputed data with original data.
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/KJB201210701" target="_blank" >KJB201210701: Automated Knowledge Extraction</a><br>
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2007
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 6th EUROSIM Congress on Modelling and Simulation
ISBN
978-3-901608-32-2
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
—
Publisher name
ARGESIM
Place of publication
Vienna
Event location
Ljubljana
Event date
Sep 9, 2007
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—