Benchmarking pre-trained language models for multilingual NER: TraSpaS at the BSNLP2021 shared task

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F21%3A10441730" target="_blank" >RIV/00216208:11320/21:10441730 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Benchmarking pre-trained language models for multilingual NER: TraSpaS at the BSNLP2021 shared task
Popis výsledku v původním jazyce
In this paper we describe TraSpaS, a submission to the third shared task on named entity recognition hosted as part of the Balto-Slavic Natural Language Processing (BSNLP) Workshop. In it we evaluate various pre-trained language models on the NER task using three open-source NLP toolkits: character level language model with Stanza, language-specific BERT-style models with SpaCy and Adapter-enabled XLM-R with Trankit. Our results show that the Trankit-based models outperformed those based on the other two toolkits, even when trained on smaller amounts of data. Our code is available at https://github.com/NaiveNeuron/slavner-2021.
Název v anglickém jazyce
Benchmarking pre-trained language models for multilingual NER: TraSpaS at the BSNLP2021 shared task
Popis výsledku anglicky
In this paper we describe TraSpaS, a submission to the third shared task on named entity recognition hosted as part of the Balto-Slavic Natural Language Processing (BSNLP) Workshop. In it we evaluate various pre-trained language models on the NER task using three open-source NLP toolkits: character level language model with Stanza, language-specific BERT-style models with SpaCy and Adapter-enabled XLM-R with Trankit. Our results show that the Trankit-based models outperformed those based on the other two toolkits, even when trained on smaller amounts of data. Our code is available at https://github.com/NaiveNeuron/slavner-2021.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the 8th BSNLP Workshop on Balto-Slavic Natural Language Processing, BSNLP 2021 - Co-located with the 16th European Chapter of the Association for Computational Linguistics, EACL 2021
ISBN
978-1-954085-14-5
ISSN
—
e-ISSN
—
Počet stran výsledku
10
Strana od-do
105-114
Název nakladatele
Association for Computational Linguistics
Místo vydání
Stroudsburg
Místo konání akce
Kyjev
Datum konání akce
20. 4. 2021
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Trankit: A light-weight transformer-based toolkit for multilingual natural language processing ESPnet-SpeechLM: An Open Speech Language Model Toolkit ESPnet-SpeechLM: An Open Speech Language Model Toolkit

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Benchmarking pre-trained language models for multilingual NER: TraSpaS at the BSNLP2021 shared task

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)