Latent Tree Language Model
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F16%3A43930573" target="_blank" >RIV/49777513:23520/16:43930573 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Latent Tree Language Model
Original language description
In this paper we introduce Latent Tree Language Model (LTLM), a novel approach to language modeling that encodes syntax and semantics of a given sentence as a tree of word roles. The learning phase iteratively updates the trees by moving nodes according to Gibbs sampling. We introduce two algorithms to infer a tree for a given sentence. The first one is based on Gibbs sampling. It is fast, but does not guarantee to find the most probable tree. The second one is based on dynamic programming. It is slower, but guarantees to find the most probable tree. We provide comparison of both algorithms. We combine LTLM with 4-gram Modified Kneser-Ney language model via linear interpolation. Our experiments with English and Czech corpora show significant perplexity reductions (up to 46% for English and 49% for Czech) compared with standalone 4-gram Modified Kneser-Ney language model.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JD - Use of computers, robotics and its application
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/LO1506" target="_blank" >LO1506: Sustainability support of the centre NTIS - New Technologies for the Information Society</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
International Conference on Empirical Methods in Natural Language Processing (EMNLP 2016)
ISBN
978-1-945626-25-8
ISSN
—
e-ISSN
—
Number of pages
11
Pages from-to
436-446
Publisher name
The Association for Computational Linguistics
Place of publication
New York
Event location
Austin, Texas, USA
Event date
Nov 1, 2016
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—