Využití lingvistických znalostí v jazykovém modelování spontánní mluvené češtiny

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F06%3A00000452" target="_blank" >RIV/49777513:23520/06:00000452 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech
Popis výsledku v původním jazyce
In our paper, we present a method for incorporating available linguistic information into a statistical language model that is used in ASR system for transcribing spontaneous speech. We employ the class-based language model paradigm and use the morphological tags as the basis for world-to-class mapping. Since the number of different tags is at least by one order of magnitude lower than the number of words even in the tasks with moderately-sized vocabularies, the tag-based model can be rather robustly estimated using even the relatively small text corpora. Unfortunately, this robustness goes hand in hand with restricted predictive ability of the class-based model. Hence we apply the two-pass recognition strategy, where the first pass is performed with the standard word-based n-gram and the resulting lattices are rescored in the second pass using the aforementioned class-based model.
Název v anglickém jazyce
Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech
Popis výsledku anglicky
In our paper, we present a method for incorporating available linguistic information into a statistical language model that is used in ASR system for transcribing spontaneous speech. We employ the class-based language model paradigm and use the morphological tags as the basis for world-to-class mapping. Since the number of different tags is at least by one order of magnitude lower than the number of words even in the tasks with moderately-sized vocabularies, the tag-based model can be rather robustly estimated using even the relatively small text corpora. Unfortunately, this robustness goes hand in hand with restricted predictive ability of the class-based model. Hence we apply the two-pass recognition strategy, where the first pass is performed with the standard word-based n-gram and the resulting lattices are rescored in the second pass using the aforementioned class-based model.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
JD - Využití počítačů, robotika a její aplikace
OECD FORD obor
—

Návaznosti výsledku

Projekt
<a href="/cs/project/1P05ME786" target="_blank" >1P05ME786: Automatická analýza spontánní řeči v rozsáhlých archívech audionahrávek</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2006
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of LREC 2006
ISBN
2-9517408-2-4
ISSN
—
e-ISSN
—
Počet stran výsledku
4
Strana od-do
2600-2603
Název nakladatele
ELRA
Místo vydání
Paris
Místo konání akce
Janov
Datum konání akce
1. 1. 2006
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Fitting class-based language models into weighted finite-state transducer framework Fitting class-based language models into weighted finite-state transducer framework Aplikace jazykového modelu založeného na třídách pro český jazyk

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Využití lingvistických znalostí v jazykovém modelování spontánní mluvené češtiny

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)