The Importance of Token Granularity Matching of Pre-trained Word Vectors for Deep Learning-Based Spam Classification
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F21%3A10441733" target="_blank" >RIV/00216208:11320/21:10441733 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.1109/ICNLP52887.2021.00007" target="_blank" >https://doi.org/10.1109/ICNLP52887.2021.00007</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICNLP52887.2021.00007" target="_blank" >10.1109/ICNLP52887.2021.00007</a>
Alternative languages
Result language
angličtina
Original language name
The Importance of Token Granularity Matching of Pre-trained Word Vectors for Deep Learning-Based Spam Classification
Original language description
Spam email detection is a research hotspot, and the most efficient detection method is based on deep learning. In the context of the extensive use of pre-trained word vectors in deep neural networks, this paper studies the impact of pre-trained word vector models on the Text-CNN-based spam classification model, and uses token granularity matching technology to optimize the word2vec pre-trained word vector model in the vector representation on the spam email. By comparing the accuracy and time complexity of the spam classification with or without token granularity matching, it can be concluded that the Word2Vec pre-trained word vectors combined with token granularity processing can improve the performance of the Text-CNN model on spam email classification.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings - 2021 3rd International Conference on Natural Language Processing, ICNLP 2021
ISBN
978-1-66541-411-1
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
129-133
Publisher name
IEEE Conference Publishing Services
Place of publication
Piscataway
Event location
Beijing
Event date
Mar 26, 2021
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—