All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Unsupervised Document Classification and Topic Detection

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F17%3A43932650" target="_blank" >RIV/49777513:23520/17:43932650 - isvavai.cz</a>

  • Result on the web

    <a href="https://link.springer.com/chapter/10.1007%2F978-3-319-66429-3_75" target="_blank" >https://link.springer.com/chapter/10.1007%2F978-3-319-66429-3_75</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/978-3-319-66429-3_75" target="_blank" >10.1007/978-3-319-66429-3_75</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Unsupervised Document Classification and Topic Detection

  • Original language description

    This article presents a method for pre-processing the feature vectors representing text documents that are consequently classified using unsupervised methods. The main goal is to show that state-of-the-art classification methods can be improved by a certain data preparation process. The first method is a standard K-means clustering and the second Latent Dirichlet allocation (LDA) method. Both are widely used in text processing. The mentioned algorithms are applied to two data sets in two different languages. First of them, the 20NewsGroup is a widely used benchmark for classification of English documents. The second set was selected from the large body of Czech news articles and was used mainly to compare the performance of the tested methods also for the case of less frequently studied language. Furthermore, the unsupervised methods are also compared with the supervised ones in order to (in some sense) ascertain the upper-bound of the task.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    20205 - Automation and control systems

Result continuities

  • Project

    <a href="/en/project/DG16P02B048" target="_blank" >DG16P02B048: System for permanent preservation of documentation and presentation of historical sources from the period of totalitarian regimes</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2017

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Speech and Computer 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings

  • ISBN

    978-3-319-66428-6

  • ISSN

    0302-9743

  • e-ISSN

    neuvedeno

  • Number of pages

    9

  • Pages from-to

    748-756

  • Publisher name

    Springer

  • Place of publication

    Cham

  • Event location

    Hatfield, Hertfordshire, United Kingdom

  • Event date

    Sep 12, 2017

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article