Online LDA-Based Language Model Adaptation
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F18%3A43952476" target="_blank" >RIV/49777513:23520/18:43952476 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/chapter/10.1007%2F978-3-030-00794-2_36" target="_blank" >https://link.springer.com/chapter/10.1007%2F978-3-030-00794-2_36</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-00794-2_36" target="_blank" >10.1007/978-3-030-00794-2_36</a>
Alternative languages
Result language
angličtina
Original language name
Online LDA-Based Language Model Adaptation
Original language description
In this paper, we present our improvements in online topic-based language model adaptation. Our aim is to enhance the automatic speech recognition of a multi-topic speech which is to be recognized in the real-time (online). Latent Dirichlet Allocation (LDA) is an unsupervised topic model designed to uncover hidden semantic relationships between words and documents in a text corpus and thus reveal latent topics automatically. We use LDA to cluster the text corpus and to predict topics online from partial hypotheses during the real-time speech recognition. Based on detected topic changes in the speech, we adapt the language model on-the-fly. We are demonstrating the improvement of our system on the task of online subtitling of TV news, where we achieved 18% relative reduction of perplexity and 3.52% relative reduction of WER over non-adapted system.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/GBP103%2F12%2FG084" target="_blank" >GBP103/12/G084: Center for Large Scale Multi-modal Data Interpretation</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Lecture Notes in Computer Science
ISBN
978-3-030-00793-5
ISSN
0302-9743
e-ISSN
neuvedeno
Number of pages
8
Pages from-to
334-341
Publisher name
Springer
Place of publication
Cham
Event location
Brno, Czech Republic
Event date
Sep 11, 2018
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—