Vector representation of context networks of latent topics

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F13%3A00214138" target="_blank" >RIV/68407700:21230/13:00214138 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/68407700:21240/13:00214138
Výsledek na webu
—
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Vector representation of context networks of latent topics
Popis výsledku v původním jazyce
Transforming of text documents to real vectors is an essential step for text mining tasks such as classification, clustering and information retrieval. The extracted vectors serve as inputs for data mining models. Large vocabularies of natural languagesimply a high dimensionality of input vectors; hence a substantial dimensionality reduction has to be made. We propose a new approach to a vector representation of text documents. Our representation takes into account an order of latent topics that generate observed words; an extracted document vector includes information about the adjacency of words in a document. We experimentally proved that the proposed representation enables to build document classifiers of higher accuracy using shorter document vectors. Short but informative document vectors enable to save memory for storing data, to use simpler models that learn faster and to significantly reduce an overfit effect.
Název v anglickém jazyce
Vector representation of context networks of latent topics
Popis výsledku anglicky
Transforming of text documents to real vectors is an essential step for text mining tasks such as classification, clustering and information retrieval. The extracted vectors serve as inputs for data mining models. Large vocabularies of natural languagesimply a high dimensionality of input vectors; hence a substantial dimensionality reduction has to be made. We propose a new approach to a vector representation of text documents. Our representation takes into account an order of latent topics that generate observed words; an extracted document vector includes information about the adjacency of words in a document. We experimentally proved that the proposed representation enables to build document classifiers of higher accuracy using shorter document vectors. Short but informative document vectors enable to save memory for storing data, to use simpler models that learn faster and to significantly reduce an overfit effect.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—

Návaznosti výsledku

Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

Rok uplatnění
2013
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the World Congress on Engineering 2013
ISBN
978-988-19251-0-7
ISSN
2078-0958
e-ISSN
—
Počet stran výsledku
5
Strana od-do
286-290
Název nakladatele
Newswood Limited - International Association of Engineers
Místo vydání
Hong Kong
Místo konání akce
London
Datum konání akce
3. 5. 2013
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Klasifikace textových dokumentů použitím směsových modelů Opinion mining of consumer reviews using deep neural networks with word-sentiment associations Application of Finite Mixtures to Text Document Classification

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Vector representation of context networks of latent topics

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)