Dealing with Bilingualism in Automatic Transcription of Historical Archive of Czech Radio

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F13%3A%230002789" target="_blank" >RIV/46747885:24220/13:#0002789 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1007/978-3-642-41190-8_26" target="_blank" >http://dx.doi.org/10.1007/978-3-642-41190-8_26</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-642-41190-8_26" target="_blank" >10.1007/978-3-642-41190-8_26</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Dealing with Bilingualism in Automatic Transcription of Historical Archive of Czech Radio
Popis výsledku v původním jazyce
One of the biggest challenges in the automatic transcription of the historical audio archive of Czech and Czechoslovak radio is bilingualism. Two closely related languages, Czech and Slovak, are mixed in many archive documents. Both were the official languages in former Czechoslovakia (1918-1992) and both were used in media. The two languages are considered similar, although they differ in more than 75 % of their lexical inventories, which complicates automatic speech-to-text conversion. In this paper,we present and objectively measure the difference between the two languages. After that we propose a method suitable for automatic identification of two acoustically and lexically similar languages. It is based on employing 2 size-optimized parallel lexicons and language models. On large test data, we show that the 2 languages can be distinguished with almost 99 % accuracy. Moreover, the language identification module can be easily incorporated into a 2-pass decoding scheme with almost n
Název v anglickém jazyce
Dealing with Bilingualism in Automatic Transcription of Historical Archive of Czech Radio
Popis výsledku anglicky
One of the biggest challenges in the automatic transcription of the historical audio archive of Czech and Czechoslovak radio is bilingualism. Two closely related languages, Czech and Slovak, are mixed in many archive documents. Both were the official languages in former Czechoslovakia (1918-1992) and both were used in media. The two languages are considered similar, although they differ in more than 75 % of their lexical inventories, which complicates automatic speech-to-text conversion. In this paper,we present and objectively measure the difference between the two languages. After that we propose a method suitable for automatic identification of two acoustically and lexically similar languages. It is based on employing 2 size-optimized parallel lexicons and language models. On large test data, we show that the 2 languages can be distinguished with almost 99 % accuracy. Moreover, the language identification module can be easily incorporated into a 2-pass decoding scheme with almost n

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
JC - Počítačový hardware a software
OECD FORD obor
—

Návaznosti výsledku

Projekt
<a href="/cs/project/DF11P01OVV013" target="_blank" >DF11P01OVV013: Zpřístupnění archivu Českého rozhlasu pro sofistikované vyhledávání</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2013
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
New Trends in Image Analysis and Processing - ICIAP 2013
ISBN
9783642411892
ISSN
0302-9743
e-ISSN
—
Počet stran výsledku
9
Strana od-do
238-246
Název nakladatele
Springer-Verlag Berlin Heidelber
Místo vydání
Germany, Berlin
Místo konání akce
Italy, Naples
Datum konání akce
9. 9. 2013
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

An Engine for Online Video Search in Large Archives of the Holocaust Testimonies Databáze heslářů False friends: About the incorrect use of the words and their etymology

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Dealing with Bilingualism in Automatic Transcription of Historical Archive of Czech Radio

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)