Automated Mining of Relevant N-grams in Relation to Predominant Topics of Text Documents
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F62156489%3A43110%2F15%3A43906663" target="_blank" >RIV/62156489:43110/15:43906663 - isvavai.cz</a>
Výsledek na webu
<a href="http://link.springer.com/chapter/10.1007%2F978-3-319-24033-6_52" target="_blank" >http://link.springer.com/chapter/10.1007%2F978-3-319-24033-6_52</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-24033-6_52" target="_blank" >10.1007/978-3-319-24033-6_52</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Automated Mining of Relevant N-grams in Relation to Predominant Topics of Text Documents
Popis výsledku v původním jazyce
The article describes a method focused on the automatic analysis of large collections of short Internet textual documents, freely written in various natural languages and represented as sparse vectors, to reveal whatmulti-word phrases are relevant in relation to a given basic categorization. In addition, the revealed phrases serve for discovering additional different predominant topics, which are not explicitly expressed by the basic categories. Themain idea is to look for n-grams where an n-gram is a collocation of n consecutive words. This leads to the problem of relevant feature selection where a feature is an n-gram that provides more information than an individual word. The feature selection is carried out by entropy minimization which returns a set of combined relevant n-grams and can be used for creating rules, decision trees, or information retrieval. The results are demonstrated for English, German, Spanish, andRussian customer reviews of hotel services publicly available on t
Název v anglickém jazyce
Automated Mining of Relevant N-grams in Relation to Predominant Topics of Text Documents
Popis výsledku anglicky
The article describes a method focused on the automatic analysis of large collections of short Internet textual documents, freely written in various natural languages and represented as sparse vectors, to reveal whatmulti-word phrases are relevant in relation to a given basic categorization. In addition, the revealed phrases serve for discovering additional different predominant topics, which are not explicitly expressed by the basic categories. Themain idea is to look for n-grams where an n-gram is a collocation of n consecutive words. This leads to the problem of relevant feature selection where a feature is an n-gram that provides more information than an individual word. The feature selection is carried out by entropy minimization which returns a set of combined relevant n-grams and can be used for creating rules, decision trees, or information retrieval. The results are demonstrated for English, German, Spanish, andRussian customer reviews of hotel services publicly available on t
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Text, Speech and Dialogue (TSD 2015)
ISBN
978-3-319-24032-9
ISSN
0302-9743
e-ISSN
—
Počet stran výsledku
9
Strana od-do
461-469
Název nakladatele
Springer Switzerland
Místo vydání
Cham
Místo konání akce
Plzeň
Datum konání akce
14. 9. 2015
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000365947800052