Out-of-Domain Dependency Parsing for Dialects of Arabic: A Case Study
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AGEE838V6" target="_blank" >RIV/00216208:11320/25:GEE838V6 - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85204296612&partnerID=40&md5=ea1b8f8170ff8ac4a67c7aaed1a6c080" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85204296612&partnerID=40&md5=ea1b8f8170ff8ac4a67c7aaed1a6c080</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Out-of-Domain Dependency Parsing for Dialects of Arabic: A Case Study
Original language description
We study dependency parsing for four Arabic dialects (Gulf, Levantine, Egyptian, and Maghrebi). Since no syntactically annotated data exist for Arabic dialects, we train the parser on a Modern Standard Arabic (MSA) corpus, which creates an out-of-domain setting. We investigate methods to close the gap between the source (MSA) and target data (dialects), e.g., by training on syntactically similar sentences to the test data. For testing, we manually annotate a small data set from a dialectal corpus. We focus on parsing two linguistic phenomena, which are difficult to parse: Idafa and coordination. We find that we can improve results by adding in-domain MSA data while adding dialectal embeddings only results in minor improvements. ©2024 Association for Computational Linguistics.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
ArabicNLP - Arabic Nat. Lang. Process. Conf., Proc. Conf.
ISBN
979-889176132-2
ISSN
—
e-ISSN
—
Number of pages
13
Pages from-to
170-182
Publisher name
Association for Computational Linguistics (ACL)
Place of publication
—
Event location
Bangkok
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—