Automated Mining of Relevant N-grams in Relation to Predominant Topics of Text Documents

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F62156489%3A43110%2F15%3A43906663" target="_blank" >RIV/62156489:43110/15:43906663 - isvavai.cz</a>
Result on the web
<a href="http://link.springer.com/chapter/10.1007%2F978-3-319-24033-6_52" target="_blank" >http://link.springer.com/chapter/10.1007%2F978-3-319-24033-6_52</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-24033-6_52" target="_blank" >10.1007/978-3-319-24033-6_52</a>

Result language
angličtina
Original language name
Automated Mining of Relevant N-grams in Relation to Predominant Topics of Text Documents
Original language description
The article describes a method focused on the automatic analysis of large collections of short Internet textual documents, freely written in various natural languages and represented as sparse vectors, to reveal whatmulti-word phrases are relevant in relation to a given basic categorization. In addition, the revealed phrases serve for discovering additional different predominant topics, which are not explicitly expressed by the basic categories. Themain idea is to look for n-grams where an n-gram is a collocation of n consecutive words. This leads to the problem of relevant feature selection where a feature is an n-gram that provides more information than an individual word. The feature selection is carried out by entropy minimization which returns a set of combined relevant n-grams and can be used for creating rules, decision trees, or information retrieval. The results are demonstrated for English, German, Spanish, andRussian customer reviews of hotel services publicly available on t
Czech name
—
Czech description
—

Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Similar results(10)