All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F14%3A10289384" target="_blank" >RIV/00216208:11320/14:10289384 - isvavai.cz</a>

  • Result on the web

    <a href="http://www.lrec-conf.org/proceedings/lrec2014/index.html" target="_blank" >http://www.lrec-conf.org/proceedings/lrec2014/index.html</a>

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license

  • Original language description

    We present a dataset of telephone conversations in English and Czech, developed to train acoustic models for automatic speech recognition (ASR) in spoken dialogue systems (SDSs). The data comprise 45 hours of speech in English and over 18 hours in Czech.All audio data and a large part of transcriptions was collected using crowdsourcing; the rest was transcribed by hired transcribers. We release the data together with scripts for data re-processing and building acoustic models using the HTK and Kaldi ASR toolkits. We publish the trained models described in this paper as well. The data are released under the CC-BY-SA 3.0 license, the scripts are licensed under Apache 2.0. In the paper, we report on the methodology of collecting the data, on the size andproperties of the data, and on the scripts and their use. We verify the usability of the datasets by training and valuating acoustic models using the presented data and scripts.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

    Result was created during the realization of more than one project. More information in the Projects tab.

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2014

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014)

  • ISBN

    978-2-9517408-8-4

  • ISSN

  • e-ISSN

  • Number of pages

    5

  • Pages from-to

    4423-4427

  • Publisher name

    European Language Resources Association

  • Place of publication

    Reykjavík, Iceland

  • Event location

    Reykjavík, Iceland

  • Event date

    May 26, 2014

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article