All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F62156489%3A43110%2F15%3A43906670" target="_blank" >RIV/62156489:43110/15:43906670 - isvavai.cz</a>

  • Result on the web

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities

  • Original language description

    The chapter introduces clustering as a family of algorithms that can be successfully used to organize text documents into groups without prior knowledge of these groups. The chapter also demonstrates using unsupervised clustering to group large amount ofunlabeled textual data (customer reviews written informally in five natural languages) so it can be used later for further analysis. The attention is paid to the process of selecting clustering algorithms, their parameters, methods of data preprocessing, and to the methods of evaluating the results by a human expert with an assistance of computers, too. The feasibility has been demonstrated by a number of experiments with external evaluation using known labels and expert validation with an assistance of a computer. It has been found that it is possible to apply the same procedures, including clustering, cluster validation, and detection of topics and significant words for different natural languages with satisfactory results.

  • Czech name

  • Czech description

Classification

  • Type

    C - Chapter in a specialist book

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

  • Continuities

    S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2015

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Book/collection name

    Modern Computational Models of Semantic Discovery in Natural Language

  • ISBN

    978-1-4666-8690-8

  • Number of pages of the result

    41

  • Pages from-to

    71-111

  • Number of pages of the book

    334

  • Publisher name

    IGI Global

  • Place of publication

    Hershey

  • UT code for WoS chapter