Applying Natural Annotation and Curriculum Learning to Named Entity Recognition for Under-Resourced Languages

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3A8VUHD3N4" target="_blank" >RIV/00216208:11320/22:8VUHD3N4 - isvavai.cz</a>
Result on the web
<a href="https://aclanthology.org/2022.coling-1.394" target="_blank" >https://aclanthology.org/2022.coling-1.394</a>
DOI - Digital Object Identifier
—

Alternative languages

Result language
angličtina
Original language name
Applying Natural Annotation and Curriculum Learning to Named Entity Recognition for Under-Resourced Languages
Original language description
Current practices in building new NLP models for low-resourced languages rely either on Machine Translation of training sets from better resourced languages or on cross-lingual transfer from them. Still we can see a considerable performance gap between the models originally trained within better resourced languages and the models transferred from them. In this study we test the possibility of (1) using natural annotation to build synthetic training sets from resources not initially designed for the target downstream task and (2) employing curriculum learning methods to select the most suitable examples from synthetic training sets. We test this hypothesis across seven Slavic languages and across three curriculum learning strategies on Named Entity Recognition as the downstream task. We also test the possibility of fine-tuning the synthetic resources to reflect linguistic properties, such as the grammatical case and gender, both of which are important for the Slavic languages. We demonstrate the possibility to achieve the mean F1 score of 0.78 across the three basic entities types for Belarusian starting from zero resources in comparison to the baseline of 0.63 using the zero-shot transfer from English. For comparison, the English model trained on the original set achieves the mean F1-score of 0.75. The experimental results are available from https://github.com/ValeraLobov/SlavNER
Czech name
—
Czech description
—

Classification

Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
—
Continuities
—

Others

Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Article name in the collection
Proceedings of the 29th International Conference on Computational Linguistics
ISBN
—
ISSN
2951-2093
e-ISSN
—
Number of pages
13
Pages from-to
4468-4480
Publisher name
International Committee on Computational Linguistics
Place of publication
—
Event location
Gyeongju, Republic of Korea
Event date
Jan 1, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—

Similar results(10)

Enhancing Cross-Lingual Sarcasm Detection by a Prompt Learning Framework with Data Augmentation and Contrastive Learning Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties Can Pretrained English Language Models Benefit Non-English NLP Systems in Low-Resource Scenarios?

What are you looking for?

Quick search

Smart search

Applying Natural Annotation and Curriculum Learning to Named Entity Recognition for Under-Resourced Languages

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)