All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Cross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F15%3A%230003428" target="_blank" >RIV/46747885:24220/15:#0003428 - isvavai.cz</a>

  • Alternative codes found

    RIV/46747885:24220/15:00002973

  • Result on the web

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Cross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources

  • Original language description

    We present methods and procedures designed for cost-efficient adaptation of an existing speech recognition system to Polish. The system (originally built for Czech language) is adapted using common texts and speech recordings accessible from Polish web-pages. The most critical part, an acoustic model (AM) for Polish, is built in several steps, which include: a) an initial bootstrapping phase that utilizes existing Czech AM, b) a lightly-supervised iterative scheme for automatic collection and annotationof Polish speech data, and finally c) acquisition of a large amount of broadcast data in an unsupervised way. The developed system has been evaluated in the task of automatic content monitoring of major Polish TV and Radio stations. Its transcription accuracy (measured on a set of four complete TV news shows with total duration of 105 minutes) reaches almost 80 %. For clean studio speech, its accuracy gets over 92 %.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    JC - Computer hardware and software

  • OECD FORD branch

Result continuities

  • Project

    <a href="/en/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilingual Multimedia Monitoring and Analyzing Platform</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

  • Publication year

    2015

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics

  • ISBN

    978-83-932640-8-7

  • ISSN

  • e-ISSN

  • Number of pages

    5

  • Pages from-to

    181-185

  • Publisher name

    Fundancja Uniwersytetu im. Adama Mickiewicza w Poznaniu

  • Place of publication

    Polsko

  • Event location

    Polsko, Poznaň

  • Event date

    Jan 1, 2015

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article