All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Pronunciation Variants and ASR of Colloquial Speech: A Case Study on Czech

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F18%3A10379673" target="_blank" >RIV/00216208:11210/18:10379673 - isvavai.cz</a>

  • Result on the web

    <a href="http://www.lrec-conf.org/proceedings/lrec2018/summaries/833.html" target="_blank" >http://www.lrec-conf.org/proceedings/lrec2018/summaries/833.html</a>

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Pronunciation Variants and ASR of Colloquial Speech: A Case Study on Czech

  • Original language description

    A standard ASR system is built using three types of mutually related language resources: apart from speech recordings and orthographic transcripts, a pronunciation component maps tokens in the transcripts to their phonetic representations. Its implementation is either lexicon-based (whether by way of simple lookup or of a stochastic grapheme-to-phoneme converter trained on the source lexicon) or rule-based, or a hybrid thereof. Whichever approach ends up being taken (as determined primarily by the writing system of the language in question), little attention is usually paid to pronunciation variants stemming from connected speech processes, hypoarticulation, and other phenomena typical for colloquial speech, mostly because the resource is seldom directly empirically derived. This paper presents a case study on the automatic recognition of colloquial Czech, using a pronunciation dictionary extracted from the ORTOFON corpus of informal spontaneous Czech, which is manually phonetically transcribed. The performance of the dictionary is compared to a standard rule-based pronunciation component, as evaluated against a subset of the ORTOFON corpus (multiple speakers recorded on a single compact device) and the Vystadial telephone speech corpus, for which prior benchmarks are available.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    60203 - Linguistics

Result continuities

  • Project

    <a href="/en/project/LM2015044" target="_blank" >LM2015044: Czech National Corpus</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2018

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

  • ISBN

    979-10-95546-00-9

  • ISSN

  • e-ISSN

    neuvedeno

  • Number of pages

    6

  • Pages from-to

    2704-2709

  • Publisher name

    European Language Resources Association (ELRA)

  • Place of publication

    Miyazaki

  • Event location

    Miyazaki

  • Event date

    May 7, 2018

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article