Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F62156489%3A43110%2F17%3A43910950" target="_blank" >RIV/62156489:43110/17:43910950 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.4018/978-1-5225-1759-7.ch081" target="_blank" >http://dx.doi.org/10.4018/978-1-5225-1759-7.ch081</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.4018/978-1-5225-1759-7.ch081" target="_blank" >10.4018/978-1-5225-1759-7.ch081</a>
Alternative languages
Result language
angličtina
Original language name
Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities
Original language description
The chapter introduces clustering as a family of algorithms that can be successfully used to organize text documents into groups without prior knowledge of these groups. The chapter also demonstrates using unsupervised clustering to group large amount of unlabeled textual data (customer reviews written informally in five natural languages) so it can be used later for further analysis. The attention is paid to the process of selecting clustering algorithms, their parameters, methods of data preprocessing, and to the methods of evaluating the results by a human expert with an assistance of computers, too. The feasibility has been demonstrated by a number of experiments with external evaluation using known labels and expert validation with an assistance of a computer. It has been found that it is possible to apply the same procedures, including clustering, cluster validation, and detection of topics and significant words for different natural languages with satisfactory results.
Czech name
—
Czech description
—
Classification
Type
C - Chapter in a specialist book
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Book/collection name
Artificial Intelligence: Concepts, Methodologies, Tools, and Applications
ISBN
978-1-5225-1759-7
Number of pages of the result
40
Pages from-to
1981-2020
Number of pages of the book
3048
Publisher name
IGI Global
Place of publication
Hershey
UT code for WoS chapter
—