Modified frequency-based term weighting schemes for text classification
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F62690094%3A18450%2F17%3A50013621" target="_blank" >RIV/62690094:18450/17:50013621 - isvavai.cz</a>
Result on the web
<a href="http://www.sciencedirect.com/science/article/pii/S156849461730251X?via%3Dihub" target="_blank" >http://www.sciencedirect.com/science/article/pii/S156849461730251X?via%3Dihub</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.asoc.2017.04.069" target="_blank" >10.1016/j.asoc.2017.04.069</a>
Alternative languages
Result language
angličtina
Original language name
Modified frequency-based term weighting schemes for text classification
Original language description
With the rapid growth of textual content on the Internet, automatic text categorization is a comparatively more effective solution in information organization and knowledge management. Feature selection, one of the basic phases in statistical-based text categorization, crucially depends on the term weighting methods In order to improve the performance of text categorization, this paper proposes four modified frequency-based term weighting schemes namely; mTF, mTFIDF, TFmIDF, and mTFmIDF. The proposed term weighting schemes take the amount of missing terms into account calculating the weight of existing terms. The proposed schemes show the highest performance for a SVM classifier with a micro-average F1 classification performance value of 97%. Moreover, benchmarking results on Reuters-21578, 20Newsgroups, and WebKB text-classification datasets, using different classifying algorithms such as SVM and KNN show that the proposed schemes mTF, mTFIDF, and mTFmIDF outperform other weighting schemes such as TF, TFIDF, and Entropy. Additionally, the statistical significance tests show a significant enhancement of the classification performance based on the modified schemes.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Applied soft computing
ISSN
1568-4946
e-ISSN
—
Volume of the periodical
58
Issue of the periodical within the volume
September
Country of publishing house
NL - THE KINGDOM OF THE NETHERLANDS
Number of pages
14
Pages from-to
193-206
UT code for WoS article
000405457500015
EID of the result in the Scopus database
2-s2.0-85018921015