Random-Forest-Based Analysis of URL Paths
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F67985807%3A_____%2F17%3A00478626" target="_blank" >RIV/67985807:_____/17:00478626 - isvavai.cz</a>
Výsledek na webu
<a href="http://ceur-ws.org/Vol-1885/129.pdf" target="_blank" >http://ceur-ws.org/Vol-1885/129.pdf</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Random-Forest-Based Analysis of URL Paths
Popis výsledku v původním jazyce
One of the key sources of spreading malware are malicious web sites - either tricking user to install malware imitating legitimate software or, in the case of various exploit kits, initiating malware installation even without any user action. The most common technique against such web sites is blacklisting. However, it provides little to no information about new sites never seen before. Therefore, there has been important research into predicting malicious web sites based on their features. This work-in-progress paper presents a light-weight prediction method using solely lexical features of the site URL and classification by random forests. To this end, three possibilities of feature extraction have been elaborated and investigated on real-world data sets with respect to precision and recall. The obtained results indicate that there is nearly never a significant difference betweeen the considered methods, and that in spite of the limitation to the lexical features of the site URL, they have an impressive performance in terms of area under the precision-recall curve for the path parts of URLs.
Název v anglickém jazyce
Random-Forest-Based Analysis of URL Paths
Popis výsledku anglicky
One of the key sources of spreading malware are malicious web sites - either tricking user to install malware imitating legitimate software or, in the case of various exploit kits, initiating malware installation even without any user action. The most common technique against such web sites is blacklisting. However, it provides little to no information about new sites never seen before. Therefore, there has been important research into predicting malicious web sites based on their features. This work-in-progress paper presents a light-weight prediction method using solely lexical features of the site URL and classification by random forests. To this end, three possibilities of feature extraction have been elaborated and investigated on real-world data sets with respect to precision and recall. The obtained results indicate that there is nearly never a significant difference betweeen the considered methods, and that in spite of the limitation to the lexical features of the site URL, they have an impressive performance in terms of area under the precision-recall curve for the path parts of URLs.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/GA17-01251S" target="_blank" >GA17-01251S: Metaučení pro extrakci pravidel s numerickými konsekventy</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings ITAT 2017: Information Technologies - Applications and Theory
ISBN
978-1974274741
ISSN
1613-0073
e-ISSN
—
Počet stran výsledku
7
Strana od-do
129-135
Název nakladatele
Technical University & CreateSpace Independent Publishing Platform
Místo vydání
Aachen & Charleston
Místo konání akce
Martinské hole
Datum konání akce
22. 9. 2017
Typ akce podle státní příslušnosti
EUR - Evropská akce
Kód UT WoS článku
—