All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Robust Recognition of Conversational Telephone Speech via Multi-Condition Training and Data Augmentation

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F18%3A00006134" target="_blank" >RIV/46747885:24220/18:00006134 - isvavai.cz</a>

  • Result on the web

    <a href="http://dx.doi.org/10.1007/978-3-030-00794-2_35" target="_blank" >http://dx.doi.org/10.1007/978-3-030-00794-2_35</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/978-3-030-00794-2_35" target="_blank" >10.1007/978-3-030-00794-2_35</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Robust Recognition of Conversational Telephone Speech via Multi-Condition Training and Data Augmentation

  • Original language description

    In this paper, we focus on automatic recognition of telephone conversational speech in scenario, when no amount of genuine telephone recordings is available for training. The training set contains only data from a significantly different domain, such as recording of broadcast news. Significant mismatch arises between training and test conditions, which leads to deteriorated performance of the resulting recognition system. We aim to diminish this mismatch using the data augmentation. Speech compression and narrow-band spectrum are significant features of the telephone speech. We apply these effects to the training dataset artificially, in order to make it more similar to the desired test conditions. Using such augmented dataset, we subsequently train an acoustic model. Our experiments show that the augmented models achieve accuracy close to the results of a model trained on genuine telephone data. Moreover, when the augmentation is applied to the real-world telephone data, further accuracy gains are achieved. © Springer Nature Switzerland AG 2018.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    20206 - Computer hardware and architecture

Result continuities

  • Project

    <a href="/en/project/TH03010018" target="_blank" >TH03010018: DeepSpot - Multilingual technology for spotting and instant alerting</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2018

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) - 21st International Conference on Text, Speech, and Dialogue, TSD 2018

  • ISBN

    978-303000793-5

  • ISSN

    03029743

  • e-ISSN

  • Number of pages

    10

  • Pages from-to

    324-333

  • Publisher name

    Springer Verlag

  • Place of publication

  • Event location

    Brno, Czech Republic

  • Event date

    Jan 1, 2018

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article