Improving Multi-label Document Classification of Czech News Articles
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F15%3A43926586" target="_blank" >RIV/49777513:23520/15:43926586 - isvavai.cz</a>
Result on the web
<a href="http://link.springer.com/chapter/10.1007/978-3-319-24033-6_35" target="_blank" >http://link.springer.com/chapter/10.1007/978-3-319-24033-6_35</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-24033-6_35" target="_blank" >10.1007/978-3-319-24033-6_35</a>
Alternative languages
Result language
angličtina
Original language name
Improving Multi-label Document Classification of Czech News Articles
Original language description
In this paper, we present our improvement of a multi-label document classifier for text filtering in a corpus containing Czech news articles, where relevant topics of an arbitrary document are to be assigned automatically. Different vector space models, different classifiers and different thresholding strategies were investigated and the performance was measured in terms of sample-wise average F1 score. Results of this paper show that we can improve the performance of our baseline naive Bayes classifier by 25% relatively when using linear SVC classifier with sublinear tf-idf vector space model, and another 6.1% relatively when using regressor-based sample-wise thresholding strategy.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/GBP103%2F12%2FG084" target="_blank" >GBP103/12/G084: Center for Large Scale Multi-modal Data Interpretation</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Text, Speech, and Dialogue, 18th International Conference, TSD 2015, Pilsen,Czech Republic, September 14-17, 2015, Proceedings
ISBN
978-3-319-24032-9
ISSN
0302-9743
e-ISSN
—
Number of pages
9
Pages from-to
307-315
Publisher name
Springer
Place of publication
Berlin
Event location
Plzeň, Czech Republic
Event date
Sep 14, 2015
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000365947800035