Curras + Baladi: Towards a Levantine Corpus
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3AREZ3NVKA" target="_blank" >RIV/00216208:11320/22:REZ3NVKA - isvavai.cz</a>
Result on the web
<a href="https://aclanthology.org/2022.lrec-1.82" target="_blank" >https://aclanthology.org/2022.lrec-1.82</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Curras + Baladi: Towards a Levantine Corpus
Original language description
This paper presents two-fold contributions: a full revision of the Palestinian morphologically annotated corpus (Curras), and a newly annotated Lebanese corpus (Baladi). Both corpora can be used as a more general Levantine corpus. Baladi consists of around 9.6K morphologically annotated tokens. Each token was manually annotated with several morphological features and using LDC's SAMA lemmas and tags. The inter-annotator evaluation on most features illustrates 78.5% Kappa and 90.1% F1-Score. Curras was revised by refining all annotations for accuracy, normalization and unification of POS tags, and linking with SAMA lemmas. This revision was also important to ensure that both corpora are compatible and can help to bridge the nuanced linguistic gaps that exist between the two highly mutually intelligible dialects. Both corpora are publicly available through a web portal.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Thirteenth Language Resources and Evaluation Conference
ISBN
979-10-95546-72-6
ISSN
—
e-ISSN
—
Number of pages
10
Pages from-to
769-778
Publisher name
European Language Resources Association
Place of publication
—
Event location
Marseille, France
Event date
Jan 1, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—