Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216275%3A25410%2F23%3A39920745" target="_blank" >RIV/00216275:25410/23:39920745 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/article/10.1007/s00521-023-08967-2" target="_blank" >https://link.springer.com/article/10.1007/s00521-023-08967-2</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s00521-023-08967-2" target="_blank" >10.1007/s00521-023-08967-2</a>

Alternative languages

Result language
angličtina
Original language name
Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks
Original language description
In this paper, a new technique of feature extraction is proposed, which is considered an essential part of natural language processing. Feature extraction is the process of transformation of the unstructured text to a format which is recognizable by computers. This means a transformation to a vector of numbers. The study evaluates and compares the performance of three methods: M1, which is the baseline method TfIdf; M2, which combines TfIdf with POS tags; and M3, a novel technique called MDgwPosF that incorporates weighted TfIdf values based on word depths and the relative frequency of POS tags. The primary focus of the study is to assess and compare the performance of these methods, with particular emphasis on evaluating how M3 performs in comparison with M1 and M2. Two different datasets and feed-forward, LSTM and GRU neural networks were used in this study. The results showed that the feed-forward model with the proposed method MDgwPosF in moderate topology achieved the best performance across various measures. The dataset created automatically performed better than the manual dataset. The differences between methods and topologies were not statistically significant. Statistically significant differences between the classification models were proven. The MDgwPosF method achieved higher accuracy compared to the baseline TfIdf, indicating that incorporating additional information into the vector can enhance the performance of TfIdf.
Czech name
—
Czech description
—

Classification

Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Name of the periodical
Neural Computing and Applications
ISSN
0941-0643
e-ISSN
1433-3058
Volume of the periodical
35
Issue of the periodical within the volume
29
Country of publishing house
US - UNITED STATES
Number of pages
13
Pages from-to
22055-22067
UT code for WoS article
001066965500041
EID of the result in the Scopus database
2-s2.0-85170046366

Similar results(10)

Language-Independent Approach for Morphological Disambiguation TwIdw—A Novel Method for Feature Extraction from Unstructured Texts Using of n-grams from morphological tags for fake news classification

What are you looking for?

Quick search

Smart search

Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)