All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Vector representation of context networks of latent topics

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F13%3A00214138" target="_blank" >RIV/68407700:21230/13:00214138 - isvavai.cz</a>

  • Alternative codes found

    RIV/68407700:21240/13:00214138

  • Result on the web

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Vector representation of context networks of latent topics

  • Original language description

    Transforming of text documents to real vectors is an essential step for text mining tasks such as classification, clustering and information retrieval. The extracted vectors serve as inputs for data mining models. Large vocabularies of natural languagesimply a high dimensionality of input vectors; hence a substantial dimensionality reduction has to be made. We propose a new approach to a vector representation of text documents. Our representation takes into account an order of latent topics that generate observed words; an extracted document vector includes information about the adjacency of words in a document. We experimentally proved that the proposed representation enables to build document classifiers of higher accuracy using shorter document vectors. Short but informative document vectors enable to save memory for storing data, to use simpler models that learn faster and to significantly reduce an overfit effect.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

  • Continuities

    I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

  • Publication year

    2013

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of the World Congress on Engineering 2013

  • ISBN

    978-988-19251-0-7

  • ISSN

    2078-0958

  • e-ISSN

  • Number of pages

    5

  • Pages from-to

    286-290

  • Publisher name

    Newswood Limited - International Association of Engineers

  • Place of publication

    Hong Kong

  • Event location

    London

  • Event date

    May 3, 2013

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article