Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F06%3A00000452" target="_blank" >RIV/49777513:23520/06:00000452 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—

Alternative languages

Result language
angličtina
Original language name
Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech
Original language description
In our paper, we present a method for incorporating available linguistic information into a statistical language model that is used in ASR system for transcribing spontaneous speech. We employ the class-based language model paradigm and use the morphological tags as the basis for world-to-class mapping. Since the number of different tags is at least by one order of magnitude lower than the number of words even in the tasks with moderately-sized vocabularies, the tag-based model can be rather robustly estimated using even the relatively small text corpora. Unfortunately, this robustness goes hand in hand with restricted predictive ability of the class-based model. Hence we apply the two-pass recognition strategy, where the first pass is performed with the standard word-based n-gram and the resulting lattices are rescored in the second pass using the aforementioned class-based model.
Czech name
Využití lingvistických znalostí v jazykovém modelování spontánní mluvené češtiny
Czech description
V článku představujeme metodu, která umožňuje využití lingvistické informace v jazykovém modelu, který je pak zapojen do systému rozpoznávání spontánní řeči. Využíváme přitom princip třídového jazykového modelu - pro rozdělení slov do tříd používáme morfologické značky. Vzhledem k tomu, že počet různých značek je minimálně o jeden řád nižší než počet různých slov ve slovníku středního rozsahu, značkový model může být robustně natrénován i z relativně malého množství dat. Bohužel, tato robustnost je vykoupena omezenou prediktivní silou třídového modelu. Proto aplikujeme dvouprůchodovou strategii rozpoznávání, kde první pr

Classification

Type
D - Article in proceedings
CEP classification
JD - Use of computers, robotics and its application
OECD FORD branch
—

Result continuities

Project
<a href="/en/project/1P05ME786" target="_blank" >1P05ME786: Spontaneous automatic speech recognition in large audioarchives</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

Publication year
2006
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Article name in the collection
Proceedings of LREC 2006
ISBN
2-9517408-2-4
ISSN
—
e-ISSN
—
Number of pages
4
Pages from-to
2600-2603
Publisher name
ELRA
Place of publication
Paris
Event location
Janov
Event date
Jan 1, 2006
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—

Similar results(10)

Fitting class-based language models into weighted finite-state transducer framework Fitting class-based language models into weighted finite-state transducer framework Class-Based Language Model Application for Czech Language

What are you looking for?

Quick search

Smart search

Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)