Using Various Types of Multimedia Resources to Train System for Automatic Transcription of Czech Historical Oral Archives
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F13%3A%230002790" target="_blank" >RIV/46747885:24220/13:#0002790 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-642-41190-8_25" target="_blank" >http://dx.doi.org/10.1007/978-3-642-41190-8_25</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-642-41190-8_25" target="_blank" >10.1007/978-3-642-41190-8_25</a>
Alternative languages
Result language
angličtina
Original language name
Using Various Types of Multimedia Resources to Train System for Automatic Transcription of Czech Historical Oral Archives
Original language description
Historical spoken documents represent a unique segment of national cultural heritage. In order to disclose the large Czech Radio audio archive to research community and to public, we have been developing a system whose aim is to transcribe automaticallythe archive files, index them and make them searchable. The transcription of contemporary (1 or 2 decades old) documents is based on the lexicon and statistical language model (LM) built from a large amount of recent texts available in electronic form. From the older periods (before 1990), however, digital texts do not exist. Therefore, we needed a) to find resources that represent language of those times, b) to convert them from their original form to text, c) to utilize this text for creating epoch specific lexicons and LMs, and eventually, d) to apply them in the developed speech recognition system. In our case, the main resources included: scanned historical newspapers, shorthand notes from the national parliament and subtitles from
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/DF11P01OVV013" target="_blank" >DF11P01OVV013: Disclosure of the Czech Radio archive for sophisticated search</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2013
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
New Trends in Image Analysis and Processing - ICIAP 2013
ISBN
9783642411892
ISSN
0302-9743
e-ISSN
—
Number of pages
10
Pages from-to
228-237
Publisher name
Springer-Verlag Berlin Heidelber
Place of publication
Germany, Berlin
Event location
Italy, Naples
Event date
Sep 9, 2013
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—