Early Experiments on Automatic Annotation of Portuguese Medieval Texts
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3AM7D6CSYF" target="_blank" >RIV/00216208:11320/22:M7D6CSYF - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.1007/978-3-031-16802-4_44" target="_blank" >https://doi.org/10.1007/978-3-031-16802-4_44</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-031-16802-4_44" target="_blank" >10.1007/978-3-031-16802-4_44</a>
Alternative languages
Result language
angličtina
Original language name
Early Experiments on Automatic Annotation of Portuguese Medieval Texts
Original language description
This paper presents the challenges and solutions adopted to the lemmatization and part-of-speech (PoS) tagging of a corpus of Old Portuguese texts (up to 1525), to pave the way to the implementation of an automatic annotation of these Medieval texts. A highly granular tagset, previously devised for Modern Portuguese, was adapted to this end. A large text ($$sim $$∼155 thousand words) was manually annotated for PoS and lemmata and used to train an initial PoS-tagger model. When applied to two other texts, the resulting model attained 91.2% precision with a textual variant of the same text, and 67.4% with a new, unseen text. A second model was then trained with the data provided by the previous three texts and applied to two other unseen texts. The new model achieved a precision of 77.3% and 82.4%, respectively.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Linking Theory and Practice of Digital Libraries
ISBN
978-3-031-16802-4
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
442-449
Publisher name
Springer International Publishing
Place of publication
—
Event location
Cham
Event date
Jan 1, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000867565900044