Random-Forest-Based Analysis of URL Paths
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F67985807%3A_____%2F17%3A00478626" target="_blank" >RIV/67985807:_____/17:00478626 - isvavai.cz</a>
Result on the web
<a href="http://ceur-ws.org/Vol-1885/129.pdf" target="_blank" >http://ceur-ws.org/Vol-1885/129.pdf</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Random-Forest-Based Analysis of URL Paths
Original language description
One of the key sources of spreading malware are malicious web sites - either tricking user to install malware imitating legitimate software or, in the case of various exploit kits, initiating malware installation even without any user action. The most common technique against such web sites is blacklisting. However, it provides little to no information about new sites never seen before. Therefore, there has been important research into predicting malicious web sites based on their features. This work-in-progress paper presents a light-weight prediction method using solely lexical features of the site URL and classification by random forests. To this end, three possibilities of feature extraction have been elaborated and investigated on real-world data sets with respect to precision and recall. The obtained results indicate that there is nearly never a significant difference betweeen the considered methods, and that in spite of the limitation to the lexical features of the site URL, they have an impressive performance in terms of area under the precision-recall curve for the path parts of URLs.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/GA17-01251S" target="_blank" >GA17-01251S: Metalearning for Extraction of Rules with Numerical Consequents</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings ITAT 2017: Information Technologies - Applications and Theory
ISBN
978-1974274741
ISSN
1613-0073
e-ISSN
—
Number of pages
7
Pages from-to
129-135
Publisher name
Technical University & CreateSpace Independent Publishing Platform
Place of publication
Aachen & Charleston
Event location
Martinské hole
Event date
Sep 22, 2017
Type of event by nationality
EUR - Evropská akce
UT code for WoS article
—