Using Various Types of Multimedia Resources to Train System for Automatic Transcription of Czech Historical Oral Archives

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F13%3A%230002790" target="_blank" >RIV/46747885:24220/13:#0002790 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1007/978-3-642-41190-8_25" target="_blank" >http://dx.doi.org/10.1007/978-3-642-41190-8_25</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-642-41190-8_25" target="_blank" >10.1007/978-3-642-41190-8_25</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Using Various Types of Multimedia Resources to Train System for Automatic Transcription of Czech Historical Oral Archives
Popis výsledku v původním jazyce
Historical spoken documents represent a unique segment of national cultural heritage. In order to disclose the large Czech Radio audio archive to research community and to public, we have been developing a system whose aim is to transcribe automaticallythe archive files, index them and make them searchable. The transcription of contemporary (1 or 2 decades old) documents is based on the lexicon and statistical language model (LM) built from a large amount of recent texts available in electronic form. From the older periods (before 1990), however, digital texts do not exist. Therefore, we needed a) to find resources that represent language of those times, b) to convert them from their original form to text, c) to utilize this text for creating epoch specific lexicons and LMs, and eventually, d) to apply them in the developed speech recognition system. In our case, the main resources included: scanned historical newspapers, shorthand notes from the national parliament and subtitles from
Název v anglickém jazyce
Using Various Types of Multimedia Resources to Train System for Automatic Transcription of Czech Historical Oral Archives
Popis výsledku anglicky
Historical spoken documents represent a unique segment of national cultural heritage. In order to disclose the large Czech Radio audio archive to research community and to public, we have been developing a system whose aim is to transcribe automaticallythe archive files, index them and make them searchable. The transcription of contemporary (1 or 2 decades old) documents is based on the lexicon and statistical language model (LM) built from a large amount of recent texts available in electronic form. From the older periods (before 1990), however, digital texts do not exist. Therefore, we needed a) to find resources that represent language of those times, b) to convert them from their original form to text, c) to utilize this text for creating epoch specific lexicons and LMs, and eventually, d) to apply them in the developed speech recognition system. In our case, the main resources included: scanned historical newspapers, shorthand notes from the national parliament and subtitles from

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
JC - Počítačový hardware a software
OECD FORD obor
—

Návaznosti výsledku

Projekt
<a href="/cs/project/DF11P01OVV013" target="_blank" >DF11P01OVV013: Zpřístupnění archivu Českého rozhlasu pro sofistikované vyhledávání</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2013
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
New Trends in Image Analysis and Processing - ICIAP 2013
ISBN
9783642411892
ISSN
0302-9743
e-ISSN
—
Počet stran výsledku
10
Strana od-do
228-237
Název nakladatele
Springer-Verlag Berlin Heidelber
Místo vydání
Germany, Berlin
Místo konání akce
Italy, Naples
Datum konání akce
9. 9. 2013
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Downdating lexicon and language model for automatic transcription of Czech historical spoken documents Lexicon-based vs. Lexicon-free ASR for Norwegian Parliament Speech Transcription Dlouhodobá ochrana a zpřístupnění dat z webových archivů: WebArchiv Národní knihovny České republiky

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Using Various Types of Multimedia Resources to Train System for Automatic Transcription of Czech Historical Oral Archives

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)