Methods for Rapid Development of Automatic Speech Recognition System for Russian

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F15%3A00002968" target="_blank" >RIV/46747885:24220/15:00002968 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1109/ECMSM.2015.7208686" target="_blank" >http://dx.doi.org/10.1109/ECMSM.2015.7208686</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ECMSM.2015.7208686" target="_blank" >10.1109/ECMSM.2015.7208686</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Methods for Rapid Development of Automatic Speech Recognition System for Russian
Popis výsledku v původním jazyce
In this paper we present our approach to the rapid and efficient development of an automatic speech recognition (ASR) system for Russian. We try to utilize our tools, procedures and data previously designed and collected for other Slavic languages, Czech and Slovak. We show how we build a large corpus of texts acquired from major publishers' web pages and convert it from Cyrillic to Latin to simplify further processing. The corpus is used to create a representative lexicon with 218K words and 259K pronunciations and a probabilistic language model. When training the acoustic model (AM), we use the GlobalPhone database of recordings and a largely automated scheme that includes bootstrapping with an existing Czech AM and several iterative steps to gradually improve both phonetic annotations and the target Russian AM. The recent prototype of the Russian ASR system is evaluated on the test part of the GlobalPhone database and achieves 18.2 % word error rate..
Název v anglickém jazyce
Methods for Rapid Development of Automatic Speech Recognition System for Russian
Popis výsledku anglicky
In this paper we present our approach to the rapid and efficient development of an automatic speech recognition (ASR) system for Russian. We try to utilize our tools, procedures and data previously designed and collected for other Slavic languages, Czech and Slovak. We show how we build a large corpus of texts acquired from major publishers' web pages and convert it from Cyrillic to Latin to simplify further processing. The corpus is used to create a representative lexicon with 218K words and 259K pronunciations and a probabilistic language model. When training the acoustic model (AM), we use the GlobalPhone database of recordings and a largely automated scheme that includes bootstrapping with an existing Czech AM and several iterative steps to gradually improve both phonetic annotations and the target Russian AM. The recent prototype of the Russian ASR system is evaluated on the test part of the GlobalPhone database and achieves 18.2 % word error rate..

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
JC - Počítačový hardware a software
OECD FORD obor
—

Návaznosti výsledku

Projekt
<a href="/cs/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilinguální platforma pro monitoring a analýzu multimédií</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics
ISBN
978-1-4799-6972-2
ISSN
—
e-ISSN
—
Počet stran výsledku
6
Strana od-do
26-31
Název nakladatele
IEEE
Místo vydání
Česká Republika
Místo konání akce
Česká Republika, Liberec
Datum konání akce
—
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000363814500011

Podobné výsledky(10)

Impact of Phonetic Annotation Precision on Automatic Speech Recognition Systems A Study on Adapting Czech Automatic Speech Recognition System to Croatian Language Peculiarities of translation of multi-word czech oikonyms with the preposition “na” into Russian (using linguistic corpora)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Methods for Rapid Development of Automatic Speech Recognition System for Russian

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)