Robust Recognition of Conversational Telephone Speech via Multi-Condition Training and Data Augmentation
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F18%3A00006134" target="_blank" >RIV/46747885:24220/18:00006134 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-030-00794-2_35" target="_blank" >http://dx.doi.org/10.1007/978-3-030-00794-2_35</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-00794-2_35" target="_blank" >10.1007/978-3-030-00794-2_35</a>
Alternative languages
Result language
angličtina
Original language name
Robust Recognition of Conversational Telephone Speech via Multi-Condition Training and Data Augmentation
Original language description
In this paper, we focus on automatic recognition of telephone conversational speech in scenario, when no amount of genuine telephone recordings is available for training. The training set contains only data from a significantly different domain, such as recording of broadcast news. Significant mismatch arises between training and test conditions, which leads to deteriorated performance of the resulting recognition system. We aim to diminish this mismatch using the data augmentation. Speech compression and narrow-band spectrum are significant features of the telephone speech. We apply these effects to the training dataset artificially, in order to make it more similar to the desired test conditions. Using such augmented dataset, we subsequently train an acoustic model. Our experiments show that the augmented models achieve accuracy close to the results of a model trained on genuine telephone data. Moreover, when the augmentation is applied to the real-world telephone data, further accuracy gains are achieved. © Springer Nature Switzerland AG 2018.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20206 - Computer hardware and architecture
Result continuities
Project
<a href="/en/project/TH03010018" target="_blank" >TH03010018: DeepSpot - Multilingual technology for spotting and instant alerting</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) - 21st International Conference on Text, Speech, and Dialogue, TSD 2018
ISBN
978-303000793-5
ISSN
03029743
e-ISSN
—
Number of pages
10
Pages from-to
324-333
Publisher name
Springer Verlag
Place of publication
—
Event location
Brno, Czech Republic
Event date
Jan 1, 2018
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—