Investigation of Latent Semantic Analysis for Clustering of Czech News Articles
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F14%3A%230002973" target="_blank" >RIV/46747885:24220/14:#0002973 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1109/DEXA.2014.54" target="_blank" >http://dx.doi.org/10.1109/DEXA.2014.54</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/DEXA.2014.54" target="_blank" >10.1109/DEXA.2014.54</a>
Alternative languages
Result language
angličtina
Original language name
Investigation of Latent Semantic Analysis for Clustering of Czech News Articles
Original language description
This paper studies the use of Latent Semantic Analysis (LSA) for automatic clustering of Czech news articles. We show that LSA is capable of yielding good results in this task as it allows us to reduce the problem of synonymy. This is a very important factor particularly for Czech, which belongs to a group of highly inflective and morphologicallyrich languages. The experimental evaluation of our clustering scheme and investigation of LSA is performed on query-and category-based test sets. The obtained results demonstrate that the automatic system yields values of the Rand index that are absolutely lower -- by 20% -- than the accuracy of human cluster annotations. We also show which similarity metric should be used for cluster merging and the effect ofdimension reduction on clustering accuracy.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/TA01011204" target="_blank" >TA01011204: Living Archives</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2014
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proc. of International Workshop on Database and Expert Systems Applications (DEXA), 2014 25th
ISBN
978-1-4799-5721-7
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
223-227
Publisher name
IEEE
Place of publication
Německo
Event location
Mnichov, Německo
Event date
Jan 1, 2014
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—