Distributional Semantics in Language Modeling
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F15%3A43925553" target="_blank" >RIV/49777513:23520/15:43925553 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Distributional Semantics in Language Modeling
Original language description
Language models are crucial for many tasks in natural language processing and n-grams are probably the best way to build them. Huge effort is being invested in improving the n-gram language models. By introducing external knowledge (morphology, syntax, etc.) into the models, a significant improvement can be achieved. The models can, however, be improved without external knowledge and the better smoothing is an excellent example of such improvement. By discovering hidden patterns in unlabeled training corpora, we can enhance the language modeling with the information that is already present in the corpora. This thesis studies three different ways of latent information discovery. Global semantics is modeled by latent Dirichlet allocation and brings long-range dependencies into language models. Word clusters given by semantic spaces enrich these language models with short-range semantics. Finally, our own unsupervised stemming algorithm is used to further enhance the performance of langua
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
JD - Use of computers, robotics and its application
OECD FORD branch
—
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů