Text Mining with Latent Semantic Analysis
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F62156489%3A43110%2F15%3A43909062" target="_blank" >RIV/62156489:43110/15:43909062 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Text Mining with Latent Semantic Analysis
Popis výsledku v původním jazyce
Latent semantic analysis is presented as an advanced model for representing text data used in order to eliminate problems with processing large amount of text documents. The paper describes information retrieval, classification, and clustering experiments with real world document collections in four different natural languages. Different preprocessing techniques are applied to the data and the experiments are carried out with the original data and the data projected to a reduced dimensionality space after singular value decomposition. The results show that preprocessing influenced the results differently according to the language of the documents and that singular value decomposition has a positive impact mainly on the computational aspects of the textmining processes.
Název v anglickém jazyce
Text Mining with Latent Semantic Analysis
Popis výsledku anglicky
Latent semantic analysis is presented as an advanced model for representing text data used in order to eliminate problems with processing large amount of text documents. The paper describes information retrieval, classification, and clustering experiments with real world document collections in four different natural languages. Different preprocessing techniques are applied to the data and the experiments are carried out with the original data and the data projected to a reduced dimensionality space after singular value decomposition. The results show that preprocessing influenced the results differently according to the language of the documents and that singular value decomposition has a positive impact mainly on the computational aspects of the textmining processes.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
IN - Informatika
OECD FORD obor
—
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů