Efficient algorithms for mining clickstream patterns using pseudo-IDLists
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F70883521%3A28140%2F20%3A63524899" target="_blank" >RIV/70883521:28140/20:63524899 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.sciencedirect.com/science/article/pii/S0167739X19314475" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0167739X19314475</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.future.2020.01.034" target="_blank" >10.1016/j.future.2020.01.034</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Efficient algorithms for mining clickstream patterns using pseudo-IDLists
Popis výsledku v původním jazyce
Sequential pattern mining is an important task in data mining. Its subproblem, clickstream pattern mining, is starting to attract more research due to the growth of the Internet and the need to analyze online customer behaviors. To date, only few works are dedicately proposed for the problem of mining clickstream patterns. Although one approach is to use the general algorithms for sequential pattern mining, those algorithms’ performance may suffer and the resources needed are more than would be necessary with a dedicated method for mining clickstreams. In this paper, we present pseudo-IDList, a novel data structure that is more suitable for clickstream pattern mining. Based on this structure, a vertical format algorithm named CUP (Clickstream pattern mining Using Pseudo-IDList) is proposed. Furthermore, we propose a pruning heuristic named DUB (Dynamic intersection Upper Bound) to improve our proposed algorithm. Four real-life clickstream databases are used for the experiments and the results show that our proposed methods are effective and efficient regarding runtimes and memory consumption. © 2020 Elsevier B.V.
Název v anglickém jazyce
Efficient algorithms for mining clickstream patterns using pseudo-IDLists
Popis výsledku anglicky
Sequential pattern mining is an important task in data mining. Its subproblem, clickstream pattern mining, is starting to attract more research due to the growth of the Internet and the need to analyze online customer behaviors. To date, only few works are dedicately proposed for the problem of mining clickstream patterns. Although one approach is to use the general algorithms for sequential pattern mining, those algorithms’ performance may suffer and the resources needed are more than would be necessary with a dedicated method for mining clickstreams. In this paper, we present pseudo-IDList, a novel data structure that is more suitable for clickstream pattern mining. Based on this structure, a vertical format algorithm named CUP (Clickstream pattern mining Using Pseudo-IDList) is proposed. Furthermore, we propose a pruning heuristic named DUB (Dynamic intersection Upper Bound) to improve our proposed algorithm. Four real-life clickstream databases are used for the experiments and the results show that our proposed methods are effective and efficient regarding runtimes and memory consumption. © 2020 Elsevier B.V.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2020
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
FUTURE GENERATION COMPUTER SYSTEMS
ISSN
0167-739X
e-ISSN
—
Svazek periodika
107
Číslo periodika v rámci svazku
Neuvedeno
Stát vydavatele periodika
NL - Nizozemsko
Počet stran výsledku
13
Strana od-do
18-30
Kód UT WoS článku
000527331800002
EID výsledku v databázi Scopus
2-s2.0-85078857727