Under-Represented Speech Dataset from Open Data: Case Study on the Romanian Language

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AU9ASJ3FY" target="_blank" >RIV/00216208:11320/25:U9ASJ3FY - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85206580176&doi=10.3390%2fapp14199043&partnerID=40&md5=476c2e940fb6ecdc26782e719321a107" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85206580176&doi=10.3390%2fapp14199043&partnerID=40&md5=476c2e940fb6ecdc26782e719321a107</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3390/app14199043" target="_blank" >10.3390/app14199043</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Under-Represented Speech Dataset from Open Data: Case Study on the Romanian Language
Popis výsledku v původním jazyce
This paper introduces the USPDATRO dataset. This is a speech dataset, in the Romanian language, constructed from open data, focusing on under-represented voice types (children, young and old people, and female voices). The paper covers the methodology behind the dataset construction, specific details regarding the dataset, and evaluation of existing Romanian Automatic Speech Recognition (ASR) systems, with different architectures. Results indicate that more under-represented speech content is needed in the training of ASR systems. Our approach can be extended to other low-resourced languages, as long as open data are available. © 2024 by the authors.
Název v anglickém jazyce
Under-Represented Speech Dataset from Open Data: Case Study on the Romanian Language
Popis výsledku anglicky
This paper introduces the USPDATRO dataset. This is a speech dataset, in the Romanian language, constructed from open data, focusing on under-represented voice types (children, young and old people, and female voices). The paper covers the methodology behind the dataset construction, specific details regarding the dataset, and evaluation of existing Romanian Automatic Speech Recognition (ASR) systems, with different architectures. Results indicate that more under-represented speech content is needed in the training of ASR systems. Our approach can be extended to other low-resourced languages, as long as open data are available. © 2024 by the authors.

Klasifikace

Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Applied Sciences (Switzerland)
ISSN
2076-3417
e-ISSN
—
Svazek periodika
14
Číslo periodika v rámci svazku
19
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
13
Strana od-do
1-13
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85206580176

Podobné výsledky(10)

Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Under-Represented Speech Dataset from Open Data: Case Study on the Romanian Language

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)