Application of Multinomial Mixture Model to Text Classification
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F03%3A03087707" target="_blank" >RIV/68407700:21230/03:03087707 - isvavai.cz</a>
Alternative codes found
RIV/67985556:_____/03:16030062
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Application of Multinomial Mixture Model to Text Classification
Original language description
The goal of text document classification is to assign a new document into one class from the predefined classes based on its contents. In this paper, a mixture of multinomial distributions is proposed as a model for class-conditional distributions in document classification task. A bag-of-words approach to vector document representation is employed. It is shown, that the accuracy of the Bayes document classifier can be improved by the proposed model in comparison with the Bayes classifiers based on themultivariate Bernoulli model, the multinomial model as well as the multivariate Bernoulli mixture model. Experimental results on the Reuters and the Newsgroups data sets indicate the effectiveness of the multinomial mixture model. Furthermore, an increase in classification accuracy is achieved for small training data sets, when multiclass Bhattacharyya distance is used instead of average mutual information as a
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2003
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Pattern Recognition and Image Analysis
ISBN
3-540-40217-9
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
646-653
Publisher name
Springer
Place of publication
Berlin
Event location
Puerto de Andtratx, Mallorca
Event date
Jun 4, 2003
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—