Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A10475892" target="_blank" >RIV/00216208:11320/23:10475892 - isvavai.cz</a>
Result on the web
<a href="https://aclanthology.org/2023.insights-1.1/" target="_blank" >https://aclanthology.org/2023.insights-1.1/</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.18653/v1/2023.insights-1.1" target="_blank" >10.18653/v1/2023.insights-1.1</a>
Alternative languages
Result language
angličtina
Original language name
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Original language description
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardisethen-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
R - Projekt Ramcoveho programu EK
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
The Fourth Workshop on Insights from Negative Results in NLP: Proceedings of the Workshop
ISBN
978-1-959429-49-4
ISSN
—
e-ISSN
—
Number of pages
10
Pages from-to
1-10
Publisher name
Association for Computational Linguistics
Place of publication
Stroudsburg, PA, USA
Event location
Dubrovnik, Croatia
Event date
May 5, 2023
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—