Review Spam Detection Using Word Embeddings and Deep Neural Networks
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216275%3A25410%2F19%3A39914919" target="_blank" >RIV/00216275:25410/19:39914919 - isvavai.cz</a>
Výsledek na webu
<a href="https://link.springer.com/chapter/10.1007/978-3-030-19823-7_28" target="_blank" >https://link.springer.com/chapter/10.1007/978-3-030-19823-7_28</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-19823-7_28" target="_blank" >10.1007/978-3-030-19823-7_28</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Review Spam Detection Using Word Embeddings and Deep Neural Networks
Popis výsledku v původním jazyce
Review spam (fake review) detection is increasingly important taking into consideration the rapid growth of internet purchases. Therefore, sophisticated spam filters must be designed to tackle the problem. Traditional machine learning algorithms use review content and other features to detect review spam. However, as demonstrated in related studies, the linguistic context of words may be of particular importance for text categorization. In order to enhance the performance of review spam detection, we propose a novel content-based approach that considers both bag-of-words and word context. More precisely, our approach utilizes n-grams and the skip-gram word embedding method to build a vector model. As a result, high-dimensional feature representation is generated. To handle the representation and classify the review spam accurately, a deep feed-forward neural network is used in the second step. To verify our approach, we use two hotel review datasets, including positive and negative reviews. We show that the proposed detection system outperforms other popular algorithms for review spam detection in terms of accuracy and area under ROC. Importantly, the system provides balanced performance on both classes, legitimate and spam, irrespective of review polarity.
Název v anglickém jazyce
Review Spam Detection Using Word Embeddings and Deep Neural Networks
Popis výsledku anglicky
Review spam (fake review) detection is increasingly important taking into consideration the rapid growth of internet purchases. Therefore, sophisticated spam filters must be designed to tackle the problem. Traditional machine learning algorithms use review content and other features to detect review spam. However, as demonstrated in related studies, the linguistic context of words may be of particular importance for text categorization. In order to enhance the performance of review spam detection, we propose a novel content-based approach that considers both bag-of-words and word context. More precisely, our approach utilizes n-grams and the skip-gram word embedding method to build a vector model. As a result, high-dimensional feature representation is generated. To handle the representation and classify the review spam accurately, a deep feed-forward neural network is used in the second step. To verify our approach, we use two hotel review datasets, including positive and negative reviews. We show that the proposed detection system outperforms other popular algorithms for review spam detection in terms of accuracy and area under ROC. Importantly, the system provides balanced performance on both classes, legitimate and spam, irrespective of review polarity.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
IFIP Advances in Information and Communication Technology. Vol. 559
ISBN
978-3-030-19822-0
ISSN
1868-4238
e-ISSN
—
Počet stran výsledku
11
Strana od-do
340-350
Název nakladatele
Springer
Místo vydání
Berlin
Místo konání akce
Hersonissos
Datum konání akce
24. 5. 2019
Typ akce podle státní příslušnosti
EUR - Evropská akce
Kód UT WoS článku
—