Automated Mining of Relevant N-grams in Relation to Predominant Topics of Text Documents
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F62156489%3A43110%2F15%3A43906663" target="_blank" >RIV/62156489:43110/15:43906663 - isvavai.cz</a>
Result on the web
<a href="http://link.springer.com/chapter/10.1007%2F978-3-319-24033-6_52" target="_blank" >http://link.springer.com/chapter/10.1007%2F978-3-319-24033-6_52</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-24033-6_52" target="_blank" >10.1007/978-3-319-24033-6_52</a>
Alternative languages
Result language
angličtina
Original language name
Automated Mining of Relevant N-grams in Relation to Predominant Topics of Text Documents
Original language description
The article describes a method focused on the automatic analysis of large collections of short Internet textual documents, freely written in various natural languages and represented as sparse vectors, to reveal whatmulti-word phrases are relevant in relation to a given basic categorization. In addition, the revealed phrases serve for discovering additional different predominant topics, which are not explicitly expressed by the basic categories. Themain idea is to look for n-grams where an n-gram is a collocation of n consecutive words. This leads to the problem of relevant feature selection where a feature is an n-gram that provides more information than an individual word. The feature selection is carried out by entropy minimization which returns a set of combined relevant n-grams and can be used for creating rules, decision trees, or information retrieval. The results are demonstrated for English, German, Spanish, andRussian customer reviews of hotel services publicly available on t
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Text, Speech and Dialogue (TSD 2015)
ISBN
978-3-319-24032-9
ISSN
0302-9743
e-ISSN
—
Number of pages
9
Pages from-to
461-469
Publisher name
Springer Switzerland
Place of publication
Cham
Event location
Plzeň
Event date
Sep 14, 2015
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000365947800052