UkraiNER: A New Corpus and Annotation Scheme Towards Comprehensive Entity Recognition
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AAJNW2MLV" target="_blank" >RIV/00216208:11320/25:AJNW2MLV - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195968083&partnerID=40&md5=f137210b16c92b9d973b83933fe54cdb" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195968083&partnerID=40&md5=f137210b16c92b9d973b83933fe54cdb</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
UkraiNER: A New Corpus and Annotation Scheme Towards Comprehensive Entity Recognition
Popis výsledku v původním jazyce
Named entity recognition as it is traditionally envisioned excludes in practice a significant part of the entities of potential interest for real-word applications: nested, discontinuous, non-named entities. Despite various attempts to broaden their coverage, subsequent annotation schemes have achieved little adoption in the literature and the most restrictive variant of NER remains the default. This is partly due to the complexity of those annotations and their format. In this paper, we introduce a new annotation scheme that offers higher comprehensiveness while preserving simplicity, together with an annotation tool to implement that scheme. We also release the corpus UkraiNER, comprised of 10,000 French sentences in the geopolitical news domain and manually annotated with comprehensive entity recognition. Our baseline experiments on UkraiNER provide a first point of comparison to facilitate future research (82 F1 for comprehensive entity recognition, 87 F1 when focusing on traditional nested NER), as well as various insights on the composition and challenges that this corpus presents for state-of-the-art named entity recognition models. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Název v anglickém jazyce
UkraiNER: A New Corpus and Annotation Scheme Towards Comprehensive Entity Recognition
Popis výsledku anglicky
Named entity recognition as it is traditionally envisioned excludes in practice a significant part of the entities of potential interest for real-word applications: nested, discontinuous, non-named entities. Despite various attempts to broaden their coverage, subsequent annotation schemes have achieved little adoption in the literature and the most restrictive variant of NER remains the default. This is partly due to the complexity of those annotations and their format. In this paper, we introduce a new annotation scheme that offers higher comprehensiveness while preserving simplicity, together with an annotation tool to implement that scheme. We also release the corpus UkraiNER, comprised of 10,000 French sentences in the geopolitical news domain and manually annotated with comprehensive entity recognition. Our baseline experiments on UkraiNER provide a first point of comparison to facilitate future research (82 F1 for comprehensive entity recognition, 87 F1 when focusing on traditional nested NER), as well as various insights on the composition and challenges that this corpus presents for state-of-the-art named entity recognition models. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Jt. Int. Conf. Comput. Linguist., Lang. Resour. Eval., LREC-COLING - Main Conf. Proc.
ISBN
978-249381410-4
ISSN
—
e-ISSN
—
Počet stran výsledku
12
Strana od-do
16941-16952
Název nakladatele
European Language Resources Association (ELRA)
Místo vydání
—
Místo konání akce
Torino, Italia
Datum konání akce
1. 1. 2025
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—