Implementation of DBSCAN Clustering Algorithm within the Framework of the Objective Clustering Inductive Technology Based on R and Knime Tools
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F44555601%3A13440%2F19%3A43894606" target="_blank" >RIV/44555601:13440/19:43894606 - isvavai.cz</a>
Výsledek na webu
<a href="http://ric.zntu.edu.ua/article/view/163652" target="_blank" >http://ric.zntu.edu.ua/article/view/163652</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.15588/1607-3274-2019-1-8" target="_blank" >10.15588/1607-3274-2019-1-8</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Implementation of DBSCAN Clustering Algorithm within the Framework of the Objective Clustering Inductive Technology Based on R and Knime Tools
Popis výsledku v původním jazyce
Context. The problem of the data clustering within the framework of the objective clustering inductive technology is considered. Practical implementation of the obtained hybrid model based on the complex use of R and KNIME tools is performed. The object of the study is the hybrid model of the data clustering based on the complex use of both DBSCAN clustering algorithm and the objective clustering inductive technology. Objective. The aim of the work is the creation of the hybrid model of the objective clustering based on DBSCAN clustering algorithm and its practical implementation on the basis of the complex use of both R and KNIME tools. Method. The inductive methods of complex systems modelling have been used as the basis to determine the optimal parameters of DBSCAN clustering algorithm within the framework of the objective clustering inductive technology. The practical implementation of this technology involves: the use of two equal power subsets, which contain the same quantity of pairwise similar objects; calculation of the internal and the external clustering quality criteria; calculation of the complex balance criterion, maximum value of which corresponds to the best clustering in terms of the used criteria. Implementation of this process involves two main stages. Firstly, the optimal values of the EPS parameter were determined at each step within the range of the minPts value changes. The charts of the complex balance criterion versus the EPS value were obtained for each minPts value as the results of this stage implementation. Then, the analysis of the obtained intermediate results was performed in order to determine the optimal solution, which corresponds to both the maximum value of the complex balance criterion on the one side and the aims of the current clustering on the other side. Results. The developed hybrid model has been implemented based on software KNIME with the use of plugins, which have been written in software R. The efficiency of the model was tasted with the use of the different data: low dimensional data of the computing school of East Finland University; Fisher's iris; gene expression profiles of the patients, which were investigated on lung cancer. Conclusions. The results of the simulation have shown high efficiency of the proposed method. The studied objects were distributed into clusters correctly in all cases. The proposed method allows us to decrease the reproducibility error, since the solution concerning determination of the clustering algorithm optimal parameters was taken based on both the clustering results obtained on equal power subsets separately and the difference of the clustering results obtained on the two equal power subsets.
Název v anglickém jazyce
Implementation of DBSCAN Clustering Algorithm within the Framework of the Objective Clustering Inductive Technology Based on R and Knime Tools
Popis výsledku anglicky
Context. The problem of the data clustering within the framework of the objective clustering inductive technology is considered. Practical implementation of the obtained hybrid model based on the complex use of R and KNIME tools is performed. The object of the study is the hybrid model of the data clustering based on the complex use of both DBSCAN clustering algorithm and the objective clustering inductive technology. Objective. The aim of the work is the creation of the hybrid model of the objective clustering based on DBSCAN clustering algorithm and its practical implementation on the basis of the complex use of both R and KNIME tools. Method. The inductive methods of complex systems modelling have been used as the basis to determine the optimal parameters of DBSCAN clustering algorithm within the framework of the objective clustering inductive technology. The practical implementation of this technology involves: the use of two equal power subsets, which contain the same quantity of pairwise similar objects; calculation of the internal and the external clustering quality criteria; calculation of the complex balance criterion, maximum value of which corresponds to the best clustering in terms of the used criteria. Implementation of this process involves two main stages. Firstly, the optimal values of the EPS parameter were determined at each step within the range of the minPts value changes. The charts of the complex balance criterion versus the EPS value were obtained for each minPts value as the results of this stage implementation. Then, the analysis of the obtained intermediate results was performed in order to determine the optimal solution, which corresponds to both the maximum value of the complex balance criterion on the one side and the aims of the current clustering on the other side. Results. The developed hybrid model has been implemented based on software KNIME with the use of plugins, which have been written in software R. The efficiency of the model was tasted with the use of the different data: low dimensional data of the computing school of East Finland University; Fisher's iris; gene expression profiles of the patients, which were investigated on lung cancer. Conclusions. The results of the simulation have shown high efficiency of the proposed method. The studied objects were distributed into clusters correctly in all cases. The proposed method allows us to decrease the reproducibility error, since the solution concerning determination of the clustering algorithm optimal parameters was taken based on both the clustering results obtained on equal power subsets separately and the difference of the clustering results obtained on the two equal power subsets.
Klasifikace
Druh
J<sub>ost</sub> - Ostatní články v recenzovaných periodicích
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Radio Electronics, Computer Science, Control
ISSN
1607-3274
e-ISSN
—
Svazek periodika
2019
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
UA - Ukrajina
Počet stran výsledku
10
Strana od-do
77-88
Kód UT WoS článku
000465002700006
EID výsledku v databázi Scopus
—