Active Learning Efficiency Benchmark for Coreference Resolution Including Advanced Uncertainty Representations
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F23%3A00374834" target="_blank" >RIV/68407700:21230/23:00374834 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.1109/CISDS61173.2023.00016" target="_blank" >https://doi.org/10.1109/CISDS61173.2023.00016</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/CISDS61173.2023.00016" target="_blank" >10.1109/CISDS61173.2023.00016</a>
Alternative languages
Result language
angličtina
Original language name
Active Learning Efficiency Benchmark for Coreference Resolution Including Advanced Uncertainty Representations
Original language description
Active learning is a powerful technique that accelerates model learning by iteratively expanding training data based on the model’s feedback. This approach has proven particularly relevant in natural language processing and other machine learning domains. While active learning has been extensively studied for conventional classification tasks, its application to more specialized tasks like neural coreference resolution has the potential for improvement. In our research, we present a significant advancement by applying active learning to the neural coreference problem, and setting a benchmark of 39% reduction in required annotations for training data. Simultaneously, it preserves performance compared to the original model trained on the full data. We compare various uncertainty sampling techniques along with Bayesian modifications of coreference resolution models, conducting a comprehensive analysis of annotation efforts. The results demonstrate that the best-performing techniques seek to maximize label annotation in previously chosen documents, showcasing their effectiveness and preserving performance.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/TL05000057" target="_blank" >TL05000057: The Signal and the Noise in the Era of Journalism 5.0 - A Comparative Perspective of Journalistic Genres of Automated Content</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
2023 2nd International Conference on Frontiers of Communications, Information System and Data Science
ISBN
979-8-3503-8147-4
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
40-47
Publisher name
IEEE Computer Society
Place of publication
Los Alamitos
Event location
Xi’an
Event date
Nov 24, 2023
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—