Morphological analysis and disambiguation for Breton
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F21%3A10441632" target="_blank" >RIV/00216208:11320/21:10441632 - isvavai.cz</a>
Result on the web
<a href="https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=Ihw-0KwxJ0" target="_blank" >https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=Ihw-0KwxJ0</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s10579-020-09510-8" target="_blank" >10.1007/s10579-020-09510-8</a>
Alternative languages
Result language
angličtina
Original language name
Morphological analysis and disambiguation for Breton
Original language description
In this paper we present an extended description of two resources for natural language processing of Breton, a morphological analyser and constraint grammar-based disambiguator. The constraint grammar was developed using a novel methodology by a linguist and a language consultant creating rules to solve specific errors in disambiguation in a machine translation system. In addition we introduce a new morphologically-disambiguated corpus of Breton and evaluate both the morphological analyser and constraint grammar for coverage and accuracy. For comparison we use the same corpus to train several reference systems for part-of-speech tagging and lemmatisation and compare the performance. The experiments show that our system outperforms the reference systems by a wide margin when the reference systems are trained without an external full-form list, and performs comparably when they are trained with a full-form list generated from our morphological analyser.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Language Resources and Evaluation
ISSN
1574-020X
e-ISSN
1574-0218
Volume of the periodical
55
Issue of the periodical within the volume
2
Country of publishing house
NL - THE KINGDOM OF THE NETHERLANDS
Number of pages
43
Pages from-to
431-473
UT code for WoS article
000590025300001
EID of the result in the Scopus database
2-s2.0-85096077940