Semantic Spaces for Improving language Modeling
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F14%3A43918497" target="_blank" >RIV/49777513:23520/14:43918497 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1016/j.csl.2013.05.001" target="_blank" >http://dx.doi.org/10.1016/j.csl.2013.05.001</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.csl.2013.05.001" target="_blank" >10.1016/j.csl.2013.05.001</a>
Alternative languages
Result language
angličtina
Original language name
Semantic Spaces for Improving language Modeling
Original language description
Language models are crucial for many tasks in NLP (Natural Language Processing) and n-grams are the best way to build them. Huge effort is being invested in improving n-gram language models. By introducing external information (morphology, syntax, partitioning into documents, etc.) into the models a significant improvement can be achieved. The models can however be improved with no external information and smoothing is an excellent example of such an improvement. In this article we show another way of improving the models that also requires no external information. We examine patterns that can be found in large corpora by building semantic spaces (HAL, COALS, BEAGLE and others described in this article). These semantic spaces have never been tested inlanguage modeling before. Our method uses semantic spaces and clustering to build classes for a class-based language model. The class-based model is then coupled with a standard n-gram model to create a very effective language model. Our
Czech name
—
Czech description
—
Classification
Type
J<sub>x</sub> - Unclassified - Peer-reviewed scientific article (Jimp, Jsc and Jost)
CEP classification
JD - Use of computers, robotics and its application
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/ED1.1.00%2F02.0090" target="_blank" >ED1.1.00/02.0090: NTIS - New Technologies for Information Society</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2014
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Computer Speech and Language
ISSN
0885-2308
e-ISSN
—
Volume of the periodical
28
Issue of the periodical within the volume
1
Country of publishing house
US - UNITED STATES
Number of pages
18
Pages from-to
192-209
UT code for WoS article
—
EID of the result in the Scopus database
—