Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3A10457002" target="_blank" >RIV/00216208:11320/22:10457002 - isvavai.cz</a>
Výsledek na webu
<a href="https://aclanthology.org/2022.inlg-genchal.9/" target="_blank" >https://aclanthology.org/2022.inlg-genchal.9/</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System
Popis výsledku v původním jazyce
In this paper, we present the results of two reproduction studies for the human evaluation originally reported by Dušek and Kasner (2020) in which the authors comparatively evaluated outputs produced by a semantic error detection system for data-to-text generation against reference outputs. In the first reproduction, the original evaluators repeat the evaluation, in a test of the repeatability of the original evaluation. In the second study, two new evaluators carry out the evaluation task, in a test of the reproducibility of the original evaluation under otherwise identical conditions. We describe our approach to reproduction, and present and analyse results, finding different degrees of reproducibility depending on result type, data and labelling task. Our resources are available and open-sourced.
Název v anglickém jazyce
Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System
Popis výsledku anglicky
In this paper, we present the results of two reproduction studies for the human evaluation originally reported by Dušek and Kasner (2020) in which the authors comparatively evaluated outputs produced by a semantic error detection system for data-to-text generation against reference outputs. In the first reproduction, the original evaluators repeat the evaluation, in a test of the repeatability of the original evaluation. In the second study, two new evaluators carry out the evaluation task, in a test of the reproducibility of the original evaluation under otherwise identical conditions. We describe our approach to reproduction, and present and analyse results, finding different degrees of reproducibility depending on result type, data and labelling task. Our resources are available and open-sourced.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges
ISBN
978-1-955917-60-5
ISSN
—
e-ISSN
—
Počet stran výsledku
10
Strana od-do
52-61
Název nakladatele
Association for Computational Linguistics
Místo vydání
Stroudsburg, PA, USA
Místo konání akce
Waterville, ME, USA
Datum konání akce
18. 7. 2022
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP ReproHum #0669-08: Reproducing Sentiment Transfer Evaluation With a Little Help from the Authors: Reproducing Human Evaluation of an MT Error Detector

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)