Cross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F15%3A%230003428" target="_blank" >RIV/46747885:24220/15:#0003428 - isvavai.cz</a>
Alternative codes found
RIV/46747885:24220/15:00002973
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Cross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources
Original language description
We present methods and procedures designed for cost-efficient adaptation of an existing speech recognition system to Polish. The system (originally built for Czech language) is adapted using common texts and speech recordings accessible from Polish web-pages. The most critical part, an acoustic model (AM) for Polish, is built in several steps, which include: a) an initial bootstrapping phase that utilizes existing Czech AM, b) a lightly-supervised iterative scheme for automatic collection and annotationof Polish speech data, and finally c) acquisition of a large amount of broadcast data in an unsupervised way. The developed system has been evaluated in the task of automatic content monitoring of major Polish TV and Radio stations. Its transcription accuracy (measured on a set of four complete TV news shows with total duration of 105 minutes) reaches almost 80 %. For clean studio speech, its accuracy gets over 92 %.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilingual Multimedia Monitoring and Analyzing Platform</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics
ISBN
978-83-932640-8-7
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
181-185
Publisher name
Fundancja Uniwersytetu im. Adama Mickiewicza w Poznaniu
Place of publication
Polsko
Event location
Polsko, Poznaň
Event date
Jan 1, 2015
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—