Increasing the Accuracy of the ASR System by Prolonging Voiceless Phonemes in the Speech of Patients Using the Electrolarynx
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F20%3A43959812" target="_blank" >RIV/49777513:23520/20:43959812 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/chapter/10.1007/978-3-030-60276-5_54" target="_blank" >https://link.springer.com/chapter/10.1007/978-3-030-60276-5_54</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-60276-5_54" target="_blank" >10.1007/978-3-030-60276-5_54</a>
Alternative languages
Result language
angličtina
Original language name
Increasing the Accuracy of the ASR System by Prolonging Voiceless Phonemes in the Speech of Patients Using the Electrolarynx
Original language description
Patients who have undergone total laryngectomy and use electrolarynx for voice production suffer from poor intelligibility. It may lead in many cases to fear of speaking to strangers, even over the phone. Automatic Speech Recognition (ASR) systems could help patients overcome this problem in many ways. Unfortunately, even state-of-the-art ASR systems cannot provide results comparable to those of conventional speakers. The problem is mainly caused by the similarity between voiced and unvoiced phoneme pairs. In many cases, a language model can help to solve the issue, but only if the word context is sufficiently long. Therefore adjustment of acoustic data and/or acoustic model is necessary to increase recognition accuracy. In this paper, we propose voiceless phonemes elongation to improve recognition accuracy and enrich the ASR system with a model that takes this elongation into account. The idea of elongation is verified on a set of ASR experiments with artificially elongated voiceless phonemes. To enriching the ASR system, the DNN model for rescoring lattices based on phoneme duration is proposed. The new system is compared with a standard ASR. It is also verified that the ASR system created using elongated synthetic data can successfully recognize the actual elongated data pronounced by the real speaker.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/TN01000024" target="_blank" >TN01000024: National Competence Center - Cybernetics and Artificial Intelligence</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
22nd International Conference, SPECOM 2020, St. Petersburg, Russia, October 7–9, 2020, Proceedings
ISBN
978-3-030-60275-8
ISSN
0302-9743
e-ISSN
1611-3349
Number of pages
10
Pages from-to
562-571
Publisher name
Springer
Place of publication
Cham
Event location
St. Petersburg; Russian Federation
Event date
Oct 7, 2020
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—