Language and Speech Technology for Central Kurdish Varieties
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3A8U3MTELJ" target="_blank" >RIV/00216208:11320/25:8U3MTELJ - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195964000&partnerID=40&md5=52f9723714e981ca63c623da8b1eb3dc" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195964000&partnerID=40&md5=52f9723714e981ca63c623da8b1eb3dc</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Language and Speech Technology for Central Kurdish Varieties
Original language description
Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties. Previous studies addressing language and speech technology for Kurdish handle it in a monolithic way as a macro-language, resulting in disparities for dialects and varieties for which there are few resources and tools available. In this paper, we take a step towards developing resources for language and speech technology for varieties of Central Kurdish, creating a corpus by transcribing movies and TV series as an alternative to fieldwork. Additionally, we report the performance of machine translation, automatic speech recognition, and language identification as downstream tasks evaluated on Central Kurdish subdialects. Data and models are publicly available under an open license at https://github.com/sinaahmadi/CORDI. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Jt. Int. Conf. Comput. Linguist., Lang. Resour. Eval., LREC-COLING - Main Conf. Proc.
ISBN
978-249381410-4
ISSN
—
e-ISSN
—
Number of pages
12
Pages from-to
10034-10045
Publisher name
European Language Resources Association (ELRA)
Place of publication
—
Event location
Torino, Italia
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—