Enhancing deep neural networks with morphological information
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3ANGN4ZHWI" target="_blank" >RIV/00216208:11320/23:NGN4ZHWI - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85125658123&doi=10.1017%2fS1351324922000080&partnerID=40&md5=75618ed03193dbd1cbae6c9d5a06655a" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85125658123&doi=10.1017%2fS1351324922000080&partnerID=40&md5=75618ed03193dbd1cbae6c9d5a06655a</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1017/s1351324922000080" target="_blank" >10.1017/s1351324922000080</a>
Alternative languages
Result language
angličtina
Original language name
Enhancing deep neural networks with morphological information
Original language description
"Deep learning approaches are superior in natural language processing due to their ability to extract informative features and patterns from languages. The two most successful neural architectures are LSTM and transformers, used in large pretrained language models such as BERT. While cross-lingual approaches are on the rise, most current natural language processing techniques are designed and applied to English, and less-resourced languages are lagging behind. In morphologically rich languages, information is conveyed through morphology, for example, through affixes modifying stems of words. The existing neural approaches do not explicitly use the information on word morphology. We analyse the effect of adding morphological features to LSTM and BERT models. As a testbed, we use three tasks available in many less-resourced languages: named entity recognition (NER), dependency parsing (DP) and comment filtering (CF). We construct baselines involving LSTM and BERT models, which we adjust by adding additional input in the form of part of speech (POS) tags and universal features. We compare the models across several languages from different language families. Our results suggest that adding morphological features has mixed effects depending on the quality of features and the task. The features improve the performance of LSTM-based models on the NER and DP tasks, while they do not benefit the performance on the CF task. For BERT-based models, the added morphological features only improve the performance on DP when they are of high quality (i.e., manually checked) while not showing any practical improvement when they are predicted. Even for high-quality features, the improvements are less pronounced in language-specific BERT variants compared to massively multilingual BERT models. As in NER and CF datasets manually checked features are not available, we only experiment with predicted features and find that they do not cause any practical improvement in performance. © The Author(s), 2022. Published by Cambridge University Press."
Czech name
—
Czech description
—
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
"Natural Language Engineering"
ISSN
1351-3249
e-ISSN
—
Volume of the periodical
29
Issue of the periodical within the volume
2
Country of publishing house
US - UNITED STATES
Number of pages
26
Pages from-to
360-385
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85125658123