All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Missing Features Reconstruction and Its Impact on Classification Accuracy

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21240%2F19%3A00331729" target="_blank" >RIV/68407700:21240/19:00331729 - isvavai.cz</a>

  • Result on the web

    <a href="https://link.springer.com/chapter/10.1007%2F978-3-030-22744-9_16" target="_blank" >https://link.springer.com/chapter/10.1007%2F978-3-030-22744-9_16</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/978-3-030-22744-9_16" target="_blank" >10.1007/978-3-030-22744-9_16</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Missing Features Reconstruction and Its Impact on Classification Accuracy

  • Original language description

    In real-world applications, we can encounter situations when a well-trained model has to be used to predict from a damaged dataset. The damage caused by missing or corrupted values can be either on the level of individual instances or on the level of entire features. Both situations have a negative impact on the usability of the model on such a dataset. This paper focuses on the scenario where entire features are missing which can be understood as a specific case of transfer learning. Our aim is to experimentally research the influence of various imputation methods on the performance of several classification models. The imputation impact is researched on a combination of traditional methods such as k-NN, linear regression, and MICE compared to modern imputation methods such as multi-layer perceptron (MLP) and gradient boosted trees (XGBT). For linear regression, MLP, and XGBT we also propose two approaches to using them for multiple features imputation. The experiments were performed on both real world and artificial datasets with continuous features where different numbers of features, varying from one feature to 50%, were missing. The results show that MICE and linear regression are generally good imputers regardless of the conditions. On the other hand, the performance of MLP and XGBT is strongly dataset dependent. Their performance is the best in some cases, but more often they perform worse than MICE or linear regression.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/GA18-18080S" target="_blank" >GA18-18080S: Fusion-Based Knowledge Discovery in Human Activity Data</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2019

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Computational Science – ICCS 2019

  • ISBN

    978-3-030-22744-9

  • ISSN

  • e-ISSN

  • Number of pages

    14

  • Pages from-to

    207-220

  • Publisher name

    Springer, Cham

  • Place of publication

  • Event location

    Faro

  • Event date

    Jun 12, 2019

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article