Revealing Prevailing Semantic Contents of Clusters Generated from Untagged Freely Written Text Documents in Natural Languages
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F62156489%3A43110%2F13%3A00215855" target="_blank" >RIV/62156489:43110/13:00215855 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1007/978-3-642-40585-3_55" target="_blank" >http://dx.doi.org/10.1007/978-3-642-40585-3_55</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-642-40585-3_55" target="_blank" >10.1007/978-3-642-40585-3_55</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Revealing Prevailing Semantic Contents of Clusters Generated from Untagged Freely Written Text Documents in Natural Languages
Popis výsledku v původním jazyce
The presented work deals with automatic detection of semantic contents of groups of textual documents, which are freely written in various natural languages. The large original set of untagged documents is split between a requested number of clusters according to a user's needs. Each cluster is taken as a class and a classifier (decision tree) is induced. The words used by the tree represent significant terms that define semantics of individual clusters. The importance (weights) of the terms combined inindividual tree branches are computed according to their particular meaning from the correct classification viewpoint -- a certain word combined with other words may lead to different classes but a specific class can strongly prevail. The results are demonstrated using large data sets composed from many hotel-service customers' reviews written in six different natural languages.
Název v anglickém jazyce
Revealing Prevailing Semantic Contents of Clusters Generated from Untagged Freely Written Text Documents in Natural Languages
Popis výsledku anglicky
The presented work deals with automatic detection of semantic contents of groups of textual documents, which are freely written in various natural languages. The large original set of untagged documents is split between a requested number of clusters according to a user's needs. Each cluster is taken as a class and a classifier (decision tree) is induced. The words used by the tree represent significant terms that define semantics of individual clusters. The importance (weights) of the terms combined inindividual tree branches are computed according to their particular meaning from the correct classification viewpoint -- a certain word combined with other words may lead to different classes but a specific class can strongly prevail. The results are demonstrated using large data sets composed from many hotel-service customers' reviews written in six different natural languages.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—
Návaznosti výsledku
Projekt
—
Návaznosti
Z - Vyzkumny zamer (s odkazem do CEZ)
Ostatní
Rok uplatnění
2013
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Text, Speech, and Dialogue
ISBN
978-3-642-40584-6
ISSN
—
e-ISSN
—
Počet stran výsledku
8
Strana od-do
434-441
Název nakladatele
Springer
Místo vydání
Heidelberg New York Dordrecht London
Místo konání akce
Pilsen
Datum konání akce
1. 9. 2013
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—