Verifying Annotation Agreement without Multiple Experts: A Case Study with Gujarati SNACS
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A3B9Y7J64" target="_blank" >RIV/00216208:11320/23:3B9Y7J64 - isvavai.cz</a>
Result on the web
<a href="https://aclanthology.org/2023.findings-acl.696/" target="_blank" >https://aclanthology.org/2023.findings-acl.696/</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.18653/v1/2023.findings-acl.696" target="_blank" >10.18653/v1/2023.findings-acl.696</a>
Alternative languages
Result language
angličtina
Original language name
Verifying Annotation Agreement without Multiple Experts: A Case Study with Gujarati SNACS
Original language description
"Good datasets are a foundation of NLP research, and form the basis for training and evaluating models of language use. While creating datasets, the standard practice is to verify the annotation consistency using a committee of human annotators. This norm assumes that multiple annotators are available, which is not the case for highly specialized tasks or low-resource languages. In this paper, we ask: Can we evaluate the quality of a dataset constructed by a single human annotator? To address this question, we propose four weak verifiers to help estimate dataset quality, and outline when each may be employed. We instantiate these strategies for the task of semantic analysis of adpositions in Gujarati, a low-resource language, and show that our weak verifiers concur with a double-annotation study. As an added contribution, we also release the first dataset with semantic annotations in Gujarati along with several model baselines."
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
"Findings of the Association for Computational Linguistics: ACL 2023"
ISBN
978-1-959429-62-3
ISSN
—
e-ISSN
—
Number of pages
18
Pages from-to
10941-10958
Publisher name
ACL
Place of publication
Toronto, Canada
Event location
Toronto, Canada
Event date
Jan 1, 2023
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—