Vector representation of context networks of latent topics
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F13%3A00214138" target="_blank" >RIV/68407700:21230/13:00214138 - isvavai.cz</a>
Alternative codes found
RIV/68407700:21240/13:00214138
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Vector representation of context networks of latent topics
Original language description
Transforming of text documents to real vectors is an essential step for text mining tasks such as classification, clustering and information retrieval. The extracted vectors serve as inputs for data mining models. Large vocabularies of natural languagesimply a high dimensionality of input vectors; hence a substantial dimensionality reduction has to be made. We propose a new approach to a vector representation of text documents. Our representation takes into account an order of latent topics that generate observed words; an extracted document vector includes information about the adjacency of words in a document. We experimentally proved that the proposed representation enables to build document classifiers of higher accuracy using shorter document vectors. Short but informative document vectors enable to save memory for storing data, to use simpler models that learn faster and to significantly reduce an overfit effect.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2013
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the World Congress on Engineering 2013
ISBN
978-988-19251-0-7
ISSN
2078-0958
e-ISSN
—
Number of pages
5
Pages from-to
286-290
Publisher name
Newswood Limited - International Association of Engineers
Place of publication
Hong Kong
Event location
London
Event date
May 3, 2013
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—