Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F14%3A10289384" target="_blank" >RIV/00216208:11320/14:10289384 - isvavai.cz</a>
Result on the web
<a href="http://www.lrec-conf.org/proceedings/lrec2014/index.html" target="_blank" >http://www.lrec-conf.org/proceedings/lrec2014/index.html</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license
Original language description
We present a dataset of telephone conversations in English and Czech, developed to train acoustic models for automatic speech recognition (ASR) in spoken dialogue systems (SDSs). The data comprise 45 hours of speech in English and over 18 hours in Czech.All audio data and a large part of transcriptions was collected using crowdsourcing; the rest was transcribed by hired transcribers. We release the data together with scripts for data re-processing and building acoustic models using the HTK and Kaldi ASR toolkits. We publish the trained models described in this paper as well. The data are released under the CC-BY-SA 3.0 license, the scripts are licensed under Apache 2.0. In the paper, we report on the methodology of collecting the data, on the size andproperties of the data, and on the scripts and their use. We verify the usability of the datasets by training and valuating acoustic models using the presented data and scripts.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2014
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014)
ISBN
978-2-9517408-8-4
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
4423-4427
Publisher name
European Language Resources Association
Place of publication
Reykjavík, Iceland
Event location
Reykjavík, Iceland
Event date
May 26, 2014
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—