Verifying Annotation Agreement without Multiple Experts: A Case Study with Gujarati SNACS

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A3B9Y7J64" target="_blank" >RIV/00216208:11320/23:3B9Y7J64 - isvavai.cz</a>
Výsledek na webu
<a href="https://aclanthology.org/2023.findings-acl.696/" target="_blank" >https://aclanthology.org/2023.findings-acl.696/</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.18653/v1/2023.findings-acl.696" target="_blank" >10.18653/v1/2023.findings-acl.696</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Verifying Annotation Agreement without Multiple Experts: A Case Study with Gujarati SNACS
Popis výsledku v původním jazyce
"Good datasets are a foundation of NLP research, and form the basis for training and evaluating models of language use. While creating datasets, the standard practice is to verify the annotation consistency using a committee of human annotators. This norm assumes that multiple annotators are available, which is not the case for highly specialized tasks or low-resource languages. In this paper, we ask: Can we evaluate the quality of a dataset constructed by a single human annotator? To address this question, we propose four weak verifiers to help estimate dataset quality, and outline when each may be employed. We instantiate these strategies for the task of semantic analysis of adpositions in Gujarati, a low-resource language, and show that our weak verifiers concur with a double-annotation study. As an added contribution, we also release the first dataset with semantic annotations in Gujarati along with several model baselines."
Název v anglickém jazyce
Verifying Annotation Agreement without Multiple Experts: A Case Study with Gujarati SNACS
Popis výsledku anglicky
"Good datasets are a foundation of NLP research, and form the basis for training and evaluating models of language use. While creating datasets, the standard practice is to verify the annotation consistency using a committee of human annotators. This norm assumes that multiple annotators are available, which is not the case for highly specialized tasks or low-resource languages. In this paper, we ask: Can we evaluate the quality of a dataset constructed by a single human annotator? To address this question, we propose four weak verifiers to help estimate dataset quality, and outline when each may be employed. We instantiate these strategies for the task of semantic analysis of adpositions in Gujarati, a low-resource language, and show that our weak verifiers concur with a double-annotation study. As an added contribution, we also release the first dataset with semantic annotations in Gujarati along with several model baselines."

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
"Findings of the Association for Computational Linguistics: ACL 2023"
ISBN
978-1-959429-62-3
ISSN
—
e-ISSN
—
Počet stran výsledku
18
Strana od-do
10941-10958
Název nakladatele
ACL
Místo vydání
Toronto, Canada
Místo konání akce
Toronto, Canada
Datum konání akce
1. 1. 2023
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP LSCP: Enhanced Large Scale Colloquial Persian Language Understanding LSCP: Enhanced Large Scale Colloquial Persian Language Understanding

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Verifying Annotation Agreement without Multiple Experts: A Case Study with Gujarati SNACS

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)