Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3A4TNH2RWA" target="_blank" >RIV/00216208:11320/25:4TNH2RWA - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85205326133&partnerID=40&md5=3c6551d74b97e2c16eb35165302506a8" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85205326133&partnerID=40&md5=3c6551d74b97e2c16eb35165302506a8</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective
Original language description
It is increasingly common to evaluate the same coreference resolution (CR) model on multiple datasets. Do these multi-dataset evaluations allow us to draw meaningful conclusions about model generalization? Or, do they rather reflect the idiosyncrasies of a particular experimental setup (e.g., the specific datasets used)? To study this, we view evaluation through the lens of measurement modeling, a framework commonly used in the social sciences for analyzing the validity of measurements. By taking this perspective, we show how multi-dataset evaluations risk conflating different factors concerning what, precisely, is being measured. This in turn makes it difficult to draw more generalizable conclusions from these evaluations. For instance, we show that across seven datasets, measurements intended to reflect CR model generalization are often correlated with differences in both how coreference is defined and how it is operationalized; this limits our ability to draw conclusions regarding the ability of CR models to generalize across any singular dimension. We believe the measurement modeling framework provides the needed vocabulary for discussing challenges surrounding what is actually being measured by CR evaluations. © 2024 Association for Computational Linguistics.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proc. Annu. Meet. Assoc. Comput Linguist.
ISBN
979-889176099-8
ISSN
0736-587X
e-ISSN
—
Number of pages
16
Pages from-to
15380-15395
Publisher name
Association for Computational Linguistics (ACL)
Place of publication
—
Event location
Hybrid, Bangkok
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—