Comparison of Selected Methods for Document Clustering
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F67985807%3A_____%2F11%3A00356107" target="_blank" >RIV/67985807:_____/11:00356107 - isvavai.cz</a>
Alternative codes found
RIV/61384399:31140/11:00036039
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Comparison of Selected Methods for Document Clustering
Original language description
17 cluster analysis techniques proposed for document clustering in terms of internal and external quality measures of clustering and computing time demands are compared. These are combinations of three basic methods (direct, repeated bisection and agglomerative) and five clustering criterion functions for solution assessment (two intra-cluster, one inter-cluster, and two complex ones); all implemented in the CLUTO software package. Furthermore, in the case of the agglomerative method we also applied a single linkage and complete linkage clustering as a criterion function. Collection 20 Newsgroups, a binary vector representation of e-mail messages, was used for comparing the methods. Experiments with document clustering have proved that, from the pointof view of entropy and purity, the direct method provides the best results. As regards computing time, the repeated bisection (divisive) method has been the fastest.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
BB - Applied statistics, operational research
OECD FORD branch
—
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2011
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Advances in Intelligent Web Mastering - 3
ISBN
978-3-642-18028-6
ISSN
—
e-ISSN
—
Number of pages
10
Pages from-to
—
Publisher name
Springer
Place of publication
Berlin
Event location
Fribourg
Event date
Jan 26, 2011
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—