Latent semantics in language models

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F15%3A43924529" target="_blank" >RIV/49777513:23520/15:43924529 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1016/j.csl.2015.01.004" target="_blank" >http://dx.doi.org/10.1016/j.csl.2015.01.004</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.csl.2015.01.004" target="_blank" >10.1016/j.csl.2015.01.004</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Latent semantics in language models
Popis výsledku v původním jazyce
This paper investigates three different sources of information and their integration into language modelling. Global semantics is modelled by Latent Dirichlet allocation and brings long range dependencies into language models. Word clusters given by semantic spaces enrich these language models with short range semantics. Finally, our own stemming algorithm is used to further enhance the performance of language modelling for inflectional languages. Our research shows that these three sources of information enrich each other and their combination dramatically improves language modelling. All investigated models are acquired in a fully unsupervised manner. We show the efficiency of our methods for several languages such as Czech, Slovenian, Slovak, Polish, Hungarian, and English, proving their multilingualism. The perplexity tests are accompanied by machine translation tests that prove the ability of the proposed models to improve the performance of a real-world application.
Název v anglickém jazyce
Latent semantics in language models
Popis výsledku anglicky
This paper investigates three different sources of information and their integration into language modelling. Global semantics is modelled by Latent Dirichlet allocation and brings long range dependencies into language models. Word clusters given by semantic spaces enrich these language models with short range semantics. Finally, our own stemming algorithm is used to further enhance the performance of language modelling for inflectional languages. Our research shows that these three sources of information enrich each other and their combination dramatically improves language modelling. All investigated models are acquired in a fully unsupervised manner. We show the efficiency of our methods for several languages such as Czech, Slovenian, Slovak, Polish, Hungarian, and English, proving their multilingualism. The perplexity tests are accompanied by machine translation tests that prove the ability of the proposed models to improve the performance of a real-world application.

Klasifikace

Druh
J<sub>x</sub> - Nezařazeno - Článek v odborném periodiku (Jimp, Jsc a Jost)
CEP obor
JD - Využití počítačů, robotika a její aplikace
OECD FORD obor
—

Návaznosti výsledku

Projekt
<a href="/cs/project/ED1.1.00%2F02.0090" target="_blank" >ED1.1.00/02.0090: NTIS - Nové technologie pro informační společnost</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Computer Speech and language
ISSN
0885-2308
e-ISSN
—
Svazek periodika
33
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
GB - Spojené království Velké Británie a Severního Irska
Počet stran výsledku
21
Strana od-do
88-108
Kód UT WoS článku
—
EID výsledku v databázi Scopus
—

Podobné výsledky(10)

Distributional Semantics in Language Modeling Semantic Spaces for Improving language Modeling Language Modeling with Syntactic and Semantic Representation for Sentence Acceptability Predictions

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Latent semantics in language models

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)