TwIdw—A Novel Method for Feature Extraction from Unstructured Texts
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3AFUR2H2ZI" target="_blank" >RIV/00216208:11320/23:FUR2H2ZI - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85161544753&doi=10.3390%2fapp13116438&partnerID=40&md5=e95ec9fb96be72f2f9485421db49d986" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85161544753&doi=10.3390%2fapp13116438&partnerID=40&md5=e95ec9fb96be72f2f9485421db49d986</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3390/app13116438" target="_blank" >10.3390/app13116438</a>
Alternative languages
Result language
angličtina
Original language name
TwIdw—A Novel Method for Feature Extraction from Unstructured Texts
Original language description
"Featured Application: The research has a potential application in the field of fake news detection. By using the feature extraction technique, TwIdw, proposed in this paper, more relevant and informative features can be extracted from the text data, which can lead to an enhancement in the accuracy of the classification models employed in these tasks. This research proposes a novel technique for fake news classification using natural language processing (NLP) methods. The proposed technique, TwIdw (Term weight–inverse document weight), is used for feature extraction and is based on TfIdf, with the term frequencies replaced by the depth of the words in documents. The effectiveness of the TwIdw technique is compared to another feature extraction method—basic TfIdf. Classification models were created using the random forest and feedforward neural networks, and within those, three different datasets were used. The feedforward neural network method with the KaiDMML dataset showed an increase in accuracy of up to 3.9%. The random forest method with TwIdw was not as successful as the neural network method and only showed an increase in accuracy with the KaiDMML dataset (1%). The feedforward neural network, on the other hand, showed an increase in accuracy with the TwIdw technique for all datasets. Precision and recall measures also confirmed good results, particularly for the neural network method. The TwIdw technique has the potential to be used in various NLP applications, including fake news classification and other NLP classification problems. © 2023 by the authors."
Czech name
—
Czech description
—
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
"Applied Sciences (Switzerland)"
ISSN
2076-3417
e-ISSN
—
Volume of the periodical
13
Issue of the periodical within the volume
11
Country of publishing house
US - UNITED STATES
Number of pages
15
Pages from-to
1-15
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85161544753