Using of n-grams from morphological tags for fake news classification

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216275%3A25410%2F21%3A39917745" target="_blank" >RIV/00216275:25410/21:39917745 - isvavai.cz</a>
Result on the web
<a href="https://peerj.com/articles/cs-624/#" target="_blank" >https://peerj.com/articles/cs-624/#</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.7717/peerj-cs.624" target="_blank" >10.7717/peerj-cs.624</a>

Alternative languages

Result language
angličtina
Original language name
Using of n-grams from morphological tags for fake news classification
Original language description
Research of the techniques for effective fake news detection has become very needed and attractive. These techniques have a background in many research disciplines, including morphological analysis. Several researchers stated that simple content related n-grams and POS tagging had been proven insufficient for fake news classification. However, they did not realise any empirical research results, which could confirm these statements experimentally in the last decade. Considering this contradiction, the main aim of the paper is to experimentally evaluate the potential of the common use of n-grams and POS tags for the correct classification of fake and true news. The dataset of published fake or real news about the current Covid-19 pandemic was pre-processed using morphological analysis. As a result, n-grams of POS tags were prepared and further analysed. Three techniques based on POS tags were proposed and applied to different groups of n-grams in the pre-processing phase of fake news detection. The n-gram size was examined as the first. Subsequently, the most suitable depth of the decision trees for sufficient generalization was scoped. Finally, the performance measures of models based on the proposed techniques were compared with the standardised reference TF-IDF technique. The performance measures of the model like accuracy, precision, recall and f1-score are considered, together with the 10-fold cross-validation technique. Simultaneously, the question, whether the TF-IDF technique can be improved using POS tags was researched in detail. The results showed that the newly proposed techniques are comparable with the traditional TF-IDF technique. At the same time, it can be stated that the morphological analysis can improve the baseline TF-IDF technique. As a result, the performance measures of the model, precision for fake news and recall for real news, were statistically significantly improved.
Czech name
—
Czech description
—

Classification

Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
<a href="/en/project/GA19-15498S" target="_blank" >GA19-15498S: Modelling emotions in verbal and nonverbal managerial communication to predict corporate financial risk</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Name of the periodical
PeerJ Computer Science
ISSN
2376-5992
e-ISSN
—
Volume of the periodical
7
Issue of the periodical within the volume
19.7.2021
Country of publishing house
GB - UNITED KINGDOM
Number of pages
27
Pages from-to
"e624"
UT code for WoS article
000700069900001
EID of the result in the Scopus database
2-s2.0-85112643670

Similar results(10)

A deep learning approach to building a framework for Urdu POS and NER TwIdw—A Novel Method for Feature Extraction from Unstructured Texts Comparison of fake and real news based on morphological analysis

What are you looking for?

Quick search

Smart search

Using of n-grams from morphological tags for fake news classification

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)