Word-Graph vs. bag-of-words feature extraction for solving author identification problem
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27510%2F19%3A10243815" target="_blank" >RIV/61989100:27510/19:10243815 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Word-Graph vs. bag-of-words feature extraction for solving author identification problem
Original language description
In this paper we examine multiple methods for solving the problem of text vectorization in context of text classification. We compare two variants of a traditional Bag-of-Words technique to a newly proposed Word-Graph approach based on graph representation of a text document and measuring similarities between graph structures. We further propose modifications to the Word-Graph method potentially improving classification accuracy. Results of experiments performed while solving an author identification problem on a dataset consisting of speeches made during meetings of Slovak National Parliament show that the Word-Graph approach offers similar levels of accuracy as traditional methods. Proposed modifications significantly improve the performance in case of imbalanced number of documents for each class in the training set. (C) 2019 VSB-Technical University of Ostrava. All rights reserved.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2019
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 13th International Conference on Strategic Management and its Support by Information Systems: May 21th-22th, 2019, Ostrava, Czech Republic
ISBN
978-80-248-4305-6
ISSN
2570-5776
e-ISSN
—
Number of pages
8
Pages from-to
418-425
Publisher name
VŠB - Technical University of Ostrava
Place of publication
Ostrava
Event location
Ostrava
Event date
May 21, 2019
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—