Coverage of Spontaneous Conversational Speech from Nijmegen Corpus of Casual Czech by General ASR Language Models
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F11%3A00185972" target="_blank" >RIV/68407700:21230/11:00185972 - isvavai.cz</a>
Výsledek na webu
<a href="http://mirjamernestus.ruhosting.nl/Ernestus/Workshop2011.php" target="_blank" >http://mirjamernestus.ruhosting.nl/Ernestus/Workshop2011.php</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Coverage of Spontaneous Conversational Speech from Nijmegen Corpus of Casual Czech by General ASR Language Models
Popis výsledku v původním jazyce
The Large Vocabulary Continuous Speech Recognition (LVCSR) as one of the frequent applications of speech technology is being applied nowadays in growing number of applications in everyday human life. Consequently, also the need of spontaneous speech recognition arises, however, such speech has strongly different character in comparison to non-spontaneous speech. Then such specific phenomena are not supposed to be covered by standard general Language Model (LM). In this contribution we will analyze Nijmegen Corpus of Causal Czech (NCCCz) from the point of view of several LMs which are publicly available. We will analyze the rate of Out-Of-Vocabulary (OOV) words, the rate of word fractions, repetitions, or repeated starts, the perplexity computed at textlevel above transcription of NCCCz, LVCSR performance above recordings using above mentioned LMs.
Název v anglickém jazyce
Coverage of Spontaneous Conversational Speech from Nijmegen Corpus of Casual Czech by General ASR Language Models
Popis výsledku anglicky
The Large Vocabulary Continuous Speech Recognition (LVCSR) as one of the frequent applications of speech technology is being applied nowadays in growing number of applications in everyday human life. Consequently, also the need of spontaneous speech recognition arises, however, such speech has strongly different character in comparison to non-spontaneous speech. Then such specific phenomena are not supposed to be covered by standard general Language Model (LM). In this contribution we will analyze Nijmegen Corpus of Causal Czech (NCCCz) from the point of view of several LMs which are publicly available. We will analyze the rate of Out-Of-Vocabulary (OOV) words, the rate of word fractions, repetitions, or repeated starts, the perplexity computed at textlevel above transcription of NCCCz, LVCSR performance above recordings using above mentioned LMs.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
JA - Elektronika a optoelektronika, elektrotechnika
OECD FORD obor
—
Návaznosti výsledku
Projekt
<a href="/cs/project/GA102%2F08%2F0707" target="_blank" >GA102/08/0707: Rozpoznávání mluvené řeči v reálných podmínkách</a><br>
Návaznosti
Z - Vyzkumny zamer (s odkazem do CEZ)
Ostatní
Rok uplatnění
2011
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů