Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F22%3A43965705" target="_blank" >RIV/49777513:23520/22:43965705 - isvavai.cz</a>
Result on the web
<a href="https://www.isca-speech.org/archive/interspeech_2022/lehecka22_interspeech.html" target="_blank" >https://www.isca-speech.org/archive/interspeech_2022/lehecka22_interspeech.html</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2022-10439" target="_blank" >10.21437/Interspeech.2022-10439</a>
Alternative languages
Result language
angličtina
Original language name
Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech
Original language description
In this paper, we present our progress in pretraining Czech monolingual audio transformers from a large dataset containing more than 80 thousand hours of unlabeled speech, and subsequently fine-tuning the model on automatic speech recognition tasks using a combination of in-domain data and almost 6 thousand hours of out-of-domain transcribed speech. We are presenting a large palette of experiments with various fine-tuning setups evaluated on two public datasets (CommonVoice and VoxPopuli) and one extremely challenging dataset from the MALACH project. Our results show that monolingual Wav2Vec 2.0 models are robust ASR systems, which can take advantage of large labeled and unlabeled datasets and successfully compete with state-of-the-art LVCSR systems. Moreover, Wav2Vec models proved to be good zero-shot learners when no training data are available for the target ASR task.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISBN
—
ISSN
2308-457X
e-ISSN
—
Number of pages
5
Pages from-to
1831-1835
Publisher name
Red Hook
Place of publication
New York
Event location
Incheon, Korea
Event date
Sep 18, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—