TwIdw—A Novel Method for Feature Extraction from Unstructured Texts
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3AFUR2H2ZI" target="_blank" >RIV/00216208:11320/23:FUR2H2ZI - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85161544753&doi=10.3390%2fapp13116438&partnerID=40&md5=e95ec9fb96be72f2f9485421db49d986" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85161544753&doi=10.3390%2fapp13116438&partnerID=40&md5=e95ec9fb96be72f2f9485421db49d986</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3390/app13116438" target="_blank" >10.3390/app13116438</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
TwIdw—A Novel Method for Feature Extraction from Unstructured Texts
Popis výsledku v původním jazyce
"Featured Application: The research has a potential application in the field of fake news detection. By using the feature extraction technique, TwIdw, proposed in this paper, more relevant and informative features can be extracted from the text data, which can lead to an enhancement in the accuracy of the classification models employed in these tasks. This research proposes a novel technique for fake news classification using natural language processing (NLP) methods. The proposed technique, TwIdw (Term weight–inverse document weight), is used for feature extraction and is based on TfIdf, with the term frequencies replaced by the depth of the words in documents. The effectiveness of the TwIdw technique is compared to another feature extraction method—basic TfIdf. Classification models were created using the random forest and feedforward neural networks, and within those, three different datasets were used. The feedforward neural network method with the KaiDMML dataset showed an increase in accuracy of up to 3.9%. The random forest method with TwIdw was not as successful as the neural network method and only showed an increase in accuracy with the KaiDMML dataset (1%). The feedforward neural network, on the other hand, showed an increase in accuracy with the TwIdw technique for all datasets. Precision and recall measures also confirmed good results, particularly for the neural network method. The TwIdw technique has the potential to be used in various NLP applications, including fake news classification and other NLP classification problems. © 2023 by the authors."
Název v anglickém jazyce
TwIdw—A Novel Method for Feature Extraction from Unstructured Texts
Popis výsledku anglicky
"Featured Application: The research has a potential application in the field of fake news detection. By using the feature extraction technique, TwIdw, proposed in this paper, more relevant and informative features can be extracted from the text data, which can lead to an enhancement in the accuracy of the classification models employed in these tasks. This research proposes a novel technique for fake news classification using natural language processing (NLP) methods. The proposed technique, TwIdw (Term weight–inverse document weight), is used for feature extraction and is based on TfIdf, with the term frequencies replaced by the depth of the words in documents. The effectiveness of the TwIdw technique is compared to another feature extraction method—basic TfIdf. Classification models were created using the random forest and feedforward neural networks, and within those, three different datasets were used. The feedforward neural network method with the KaiDMML dataset showed an increase in accuracy of up to 3.9%. The random forest method with TwIdw was not as successful as the neural network method and only showed an increase in accuracy with the KaiDMML dataset (1%). The feedforward neural network, on the other hand, showed an increase in accuracy with the TwIdw technique for all datasets. Precision and recall measures also confirmed good results, particularly for the neural network method. The TwIdw technique has the potential to be used in various NLP applications, including fake news classification and other NLP classification problems. © 2023 by the authors."
Klasifikace
Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
"Applied Sciences (Switzerland)"
ISSN
2076-3417
e-ISSN
—
Svazek periodika
13
Číslo periodika v rámci svazku
11
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
15
Strana od-do
1-15
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85161544753