Evaluation of Three Welsh Language POS Taggers
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3AQVPDCYA5" target="_blank" >RIV/00216208:11320/22:QVPDCYA5 - isvavai.cz</a>
Result on the web
<a href="https://aclanthology.org/2022.cltw-1.5" target="_blank" >https://aclanthology.org/2022.cltw-1.5</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Evaluation of Three Welsh Language POS Taggers
Original language description
In this paper we describe our quantitative and qualitative evaluation of three Welsh language Part of Speech (POS) taggers. Following an introductory section, we explore some of the issues which face POS taggers, discuss the state of the art in English language tagging, and describe the three Welsh language POS taggers that will be evaluated in this paper, namely WNLT2, CyTag and TagTeg. In section 3 we describe the challenges involved in evaluating POS taggers which make use of different tagsets, and introduce our mapping of the taggers' individual tagsets to an Intermediate Tagset used to facilitate their comparative evaluation. Section 4 introduces our benchmarking corpus as an important component of our methodology. In section 5 we describe how the inconsistencies in text tokenization between the different taggers present an issue when undertaking such evaluations, and discuss the method used to overcome this complication. Section 6 illustrates how we annotated the benchmark corpus, while section 7 describes the scoring method used. Section 8 provides an in-depth analysis of the results, and a summary of the work is presented in the conclusion found in section 9. Keywords: POS Tagger, Welsh, Evaluation, Machine Learning
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022
ISBN
979-10-95546-73-3
ISSN
—
e-ISSN
—
Number of pages
10
Pages from-to
30-39
Publisher name
European Language Resources Association
Place of publication
—
Event location
Marseille, France
Event date
Jan 1, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—