Phishing Email Detection based on Named Entity Recognition
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F19%3A00330505" target="_blank" >RIV/68407700:21230/19:00330505 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/68407700:21240/19:00330505 RIV/68407700:21730/19:00330505
Výsledek na webu
<a href="http://www.scitepress.org/ProceedingsDetails.aspx?ID=2JXfLZNuB94=&t=1" target="_blank" >http://www.scitepress.org/ProceedingsDetails.aspx?ID=2JXfLZNuB94=&t=1</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.5220/0007314202520256" target="_blank" >10.5220/0007314202520256</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Phishing Email Detection based on Named Entity Recognition
Popis výsledku v původním jazyce
This work evaluates two phishing detection algorithms, which are both based on named entity recognition (NER), on live traffic of Email.cz. The first algorithm was proposed in (Ramanathan and Wechsler, 2013). It is using NER and latent Dirichlet allocation (LDA) as feature extractors for random forest classifier. This algorithm achieved 100% F-measure on the publicly available testing dataset. We are using this algorithm as the baseline for our newly proposed solution. The newly proposed solution is using companies detected by the NER and it is comparing URLs present in the email content to the company URL profile (based on history). The company URL profile contains domains which are frequently mentioned in legitimate traffic from that domain. The advantage of the proposed solution is that it does not need phishing dataset, which is hard to get, especially for languages other than English. Our solution outperforms the baseline solution. Both solutions are able to detect previously und etected phishing attacks. Combination of the solutions achieves 100 % F-measure on the portion of live traffic.
Název v anglickém jazyce
Phishing Email Detection based on Named Entity Recognition
Popis výsledku anglicky
This work evaluates two phishing detection algorithms, which are both based on named entity recognition (NER), on live traffic of Email.cz. The first algorithm was proposed in (Ramanathan and Wechsler, 2013). It is using NER and latent Dirichlet allocation (LDA) as feature extractors for random forest classifier. This algorithm achieved 100% F-measure on the publicly available testing dataset. We are using this algorithm as the baseline for our newly proposed solution. The newly proposed solution is using companies detected by the NER and it is comparing URLs present in the email content to the company URL profile (based on history). The company URL profile contains domains which are frequently mentioned in legitimate traffic from that domain. The advantage of the proposed solution is that it does not need phishing dataset, which is hard to get, especially for languages other than English. Our solution outperforms the baseline solution. Both solutions are able to detect previously und etected phishing attacks. Combination of the solutions achieves 100 % F-measure on the portion of live traffic.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the 5th International Conference on Information Systems Security and Privacy
ISBN
978-989-758-359-9
ISSN
—
e-ISSN
—
Počet stran výsledku
5
Strana od-do
252-256
Název nakladatele
SciTePress
Místo vydání
Madeira
Místo konání akce
Praha
Datum konání akce
23. 2. 2019
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—