Various DNN-HMM architectures used in acoustic modeling with single-speaker and single-channel
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F21%3A43962798" target="_blank" >RIV/49777513:23520/21:43962798 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/chapter/10.1007/978-3-030-89579-2_8" target="_blank" >https://link.springer.com/chapter/10.1007/978-3-030-89579-2_8</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-89579-2_8" target="_blank" >10.1007/978-3-030-89579-2_8</a>
Alternative languages
Result language
angličtina
Original language name
Various DNN-HMM architectures used in acoustic modeling with single-speaker and single-channel
Original language description
In this paper, we discuss some interesting features of training a special acoustic model for only one speaker with a constant acoustic background (acoustic channel). Currently, the LF-MMI method achieves the best results in many speech recognition tasks. A typical LF-MMI training procedure uses a special 1-state HMM topology that has different pdfs at the self-loop and forward transitions. We would like to discuss the replacement of this typical LF-MMI HMM by different types of HMM topologies (1-, 2- and 3-state HMM topologies that have outputs associated with states). Next, we discuss the advantages of using biphone context modeling over using the triphone context or even simpler context-free monophone. We also address the effect of the amount of training data and the context of DNN on WER, and all this with regard to a special acoustic model with one speaker and an almost constant acoustic channel.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/EF17_048%2F0007267" target="_blank" >EF17_048/0007267: Research and Development of Intelligent Components of Advanced Technologies for the Pilsen Metropolitan Area (InteCom)</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Statistical Language and Speech Processing, 9th International Conference, SLSP 2021, Cardiff, UK, November 23–25, 2021, Proceedings
ISBN
978-3-030-89578-5
ISSN
0302-9743
e-ISSN
1611-3349
Number of pages
12
Pages from-to
85-96
Publisher name
Springer
Place of publication
Cham
Event location
Cardiff, United Kingdom
Event date
Nov 23, 2021
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—