Can Stanza be Used for Part-of-Speech Tagging Historical Polish?

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3A7G92CHBB" target="_blank" >RIV/00216208:11320/25:7G92CHBB - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85188704085&partnerID=40&md5=eecd47b630c5f35b672f0f87020d0931" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85188704085&partnerID=40&md5=eecd47b630c5f35b672f0f87020d0931</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Can Stanza be Used for Part-of-Speech Tagging Historical Polish?
Popis výsledku v původním jazyce
The goal of this paper is to evaluate the performance of Stanza, a part-of-speech (POS) tagger developed for modern Polish, on historical text to assess its possible use for automating the annotation of other historical texts. While the issue of the reliability of utilizing POS taggers on historical data has been previously discussed, most of the research focuses on languages whose grammar differs from Polish, meaning that their results need not be fully applicable in this case. The evaluation of Stanza is conducted on two sets of 10286 and 3270 manually annotated tokens from a piece of historical Polish writing (1899), and the errors are analyzed qualitatively and quantitatively. The results show a good performance of the tagger, especially when it comes to Universal Part-of-Speech (UPOS) tags, which is promising for utilizing the tagger for automatic annotation in larger projects, and pinpoint some common features of misclassified tokens. © 2024 Association for Computational Linguistics.
Název v anglickém jazyce
Can Stanza be Used for Part-of-Speech Tagging Historical Polish?
Popis výsledku anglicky
The goal of this paper is to evaluate the performance of Stanza, a part-of-speech (POS) tagger developed for modern Polish, on historical text to assess its possible use for automating the annotation of other historical texts. While the issue of the reliability of utilizing POS taggers on historical data has been previously discussed, most of the research focuses on languages whose grammar differs from Polish, meaning that their results need not be fully applicable in this case. The evaluation of Stanza is conducted on two sets of 10286 and 3270 manually annotated tokens from a piece of historical Polish writing (1899), and the errors are analyzed qualitatively and quantitatively. The results show a good performance of the tagger, especially when it comes to Universal Part-of-Speech (UPOS) tags, which is promising for utilizing the tagger for automatic annotation in larger projects, and pinpoint some common features of misclassified tokens. © 2024 Association for Computational Linguistics.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
EACL - Conf. Eur. Chapter Assoc. Comput. Linguist., Proc. Stud. Res. Workshop
ISBN
979-889176090-5
ISSN
—
e-ISSN
—
Počet stran výsledku
6
Strana od-do
44-49
Název nakladatele
Association for Computational Linguistics (ACL)
Místo vydání
—
Místo konání akce
St. Julian's
Datum konání akce
1. 1. 2025
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis KernelTagger – a PoS Tagger for Very Small Amount of Training Data Evaluation of Three Welsh Language POS Taggers

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Can Stanza be Used for Part-of-Speech Tagging Historical Polish?

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)