Generalised linear model-based algorithm for detection of outliers in environmental data and comparison with semi-parametric outlier detection methods
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F62156489%3A43110%2F19%3A43915809" target="_blank" >RIV/62156489:43110/19:43915809 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/60162694:G42__/19:00536896
Výsledek na webu
<a href="https://doi.org/10.1016/j.apr.2019.01.010" target="_blank" >https://doi.org/10.1016/j.apr.2019.01.010</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.apr.2019.01.010" target="_blank" >10.1016/j.apr.2019.01.010</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Generalised linear model-based algorithm for detection of outliers in environmental data and comparison with semi-parametric outlier detection methods
Popis výsledku v původním jazyce
Outliers are often present in large datasets of air pollutant concentrations. Existing methods for detection of outliers in environmental data can be divided as follows into three groups depending on the character of the data: methods for time series, methods for time series measured simultaneously with accompanying variables and methods for spatial data. A number of methods suggested for the automatic detection of outliers in time series data are limited by assumptions of known distribution of the analysed variable. Since the environmental variables are often influenced by accompanying factors their distribution is difficult to estimate. Considering the known information about accompanying variables and using appropriate methods for detection of outliers in time series measured simultaneously with accompanying variables can be a significant improvement in outlier detection approaches. This paper presents a method for the automatic detection of outliers in PM10 aerosols measured simultaneously with accompanying variables. The method is based on generalised linear model and subsequent analysis of the residuals. The method makes use of the benefits from the additional information included in the accessibility of accompanying variables. The results of the suggested procedure are compared with the results obtained using two distribution-free outlier detection methods for time series formerly suggested by the authors. The simulations-based comparison of the performance of all three procedures showed that the procedure presented in this paper effectively detects outliers that deviate at least 5 standard deviations from the mean value of the neighbouring observations and outperforms both distribution-free outlier detection methods for time series.
Název v anglickém jazyce
Generalised linear model-based algorithm for detection of outliers in environmental data and comparison with semi-parametric outlier detection methods
Popis výsledku anglicky
Outliers are often present in large datasets of air pollutant concentrations. Existing methods for detection of outliers in environmental data can be divided as follows into three groups depending on the character of the data: methods for time series, methods for time series measured simultaneously with accompanying variables and methods for spatial data. A number of methods suggested for the automatic detection of outliers in time series data are limited by assumptions of known distribution of the analysed variable. Since the environmental variables are often influenced by accompanying factors their distribution is difficult to estimate. Considering the known information about accompanying variables and using appropriate methods for detection of outliers in time series measured simultaneously with accompanying variables can be a significant improvement in outlier detection approaches. This paper presents a method for the automatic detection of outliers in PM10 aerosols measured simultaneously with accompanying variables. The method is based on generalised linear model and subsequent analysis of the residuals. The method makes use of the benefits from the additional information included in the accessibility of accompanying variables. The results of the suggested procedure are compared with the results obtained using two distribution-free outlier detection methods for time series formerly suggested by the authors. The simulations-based comparison of the performance of all three procedures showed that the procedure presented in this paper effectively detects outliers that deviate at least 5 standard deviations from the mean value of the neighbouring observations and outperforms both distribution-free outlier detection methods for time series.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10103 - Statistics and probability
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Atmospheric Pollution Research
ISSN
1309-1042
e-ISSN
—
Svazek periodika
10
Číslo periodika v rámci svazku
4
Stát vydavatele periodika
TR - Turecká republika
Počet stran výsledku
9
Strana od-do
1015-1023
Kód UT WoS článku
000472996900002
EID výsledku v databázi Scopus
2-s2.0-85067862378