All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F14%3A%230002973" target="_blank" >RIV/46747885:24220/14:#0002973 - isvavai.cz</a>

  • Result on the web

    <a href="http://dx.doi.org/10.1109/DEXA.2014.54" target="_blank" >http://dx.doi.org/10.1109/DEXA.2014.54</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1109/DEXA.2014.54" target="_blank" >10.1109/DEXA.2014.54</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Investigation of Latent Semantic Analysis for Clustering of Czech News Articles

  • Original language description

    This paper studies the use of Latent Semantic Analysis (LSA) for automatic clustering of Czech news articles. We show that LSA is capable of yielding good results in this task as it allows us to reduce the problem of synonymy. This is a very important factor particularly for Czech, which belongs to a group of highly inflective and morphologicallyrich languages. The experimental evaluation of our clustering scheme and investigation of LSA is performed on query-and category-based test sets. The obtained results demonstrate that the automatic system yields values of the Rand index that are absolutely lower -- by 20% -- than the accuracy of human cluster annotations. We also show which similarity metric should be used for cluster merging and the effect ofdimension reduction on clustering accuracy.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    JC - Computer hardware and software

  • OECD FORD branch

Result continuities

  • Project

    <a href="/en/project/TA01011204" target="_blank" >TA01011204: Living Archives</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2014

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proc. of International Workshop on Database and Expert Systems Applications (DEXA), 2014 25th

  • ISBN

    978-1-4799-5721-7

  • ISSN

  • e-ISSN

  • Number of pages

    5

  • Pages from-to

    223-227

  • Publisher name

    IEEE

  • Place of publication

    Německo

  • Event location

    Mnichov, Německo

  • Event date

    Jan 1, 2014

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article