Avoiding Anomalies in Data Stream Learning
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F13%3A00070032" target="_blank" >RIV/00216224:14330/13:00070032 - isvavai.cz</a>
Result on the web
<a href="http://link.springer.com/chapter/10.1007%2F978-3-642-40897-7_4" target="_blank" >http://link.springer.com/chapter/10.1007%2F978-3-642-40897-7_4</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-642-40897-7_4" target="_blank" >10.1007/978-3-642-40897-7_4</a>
Alternative languages
Result language
angličtina
Original language name
Avoiding Anomalies in Data Stream Learning
Original language description
The presence of anomalies in data compromises data quality and can reduce the effectiveness of learning algorithms. Standard data mining methodologies refer to data cleaning as a pre-processing before the learning task. The problem of data cleaning is exacerbated when learning in the computational model of data streams. In this paper we present a streaming algorithm for learning classification rules able to detect contextual anomalies in the data. Contextual anomalies are surprising attribute values inthe context defined by the conditional part of the rule. For each example we compute the degree of anomaliness based on the probability of the attribute-values given the conditional part of the rule covering the example. The examples with high degree ofanomaliness are signaled to the user and not used to train the classifier. The experimental evaluation in real-world data sets shows the ability to discover anomalous examples in the data.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/LG13010" target="_blank" >LG13010: Czech Republic representation in the European Research Consortium for Informatics and Mathematics (ERCIM)</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2013
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Discovery Science, Proceedings of 16th International Conference DS 2013
ISBN
9783642408960
ISSN
0302-9743
e-ISSN
—
Number of pages
15
Pages from-to
49-63
Publisher name
Springer
Place of publication
Berlin Heidelberg
Event location
Singapore
Event date
Oct 6, 2013
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—