Part-of-Speech and Morphological Tagging of Algerian Judeo-Arabic
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3A7VX7S4A8" target="_blank" >RIV/00216208:11320/22:7VX7S4A8 - isvavai.cz</a>
Výsledek na webu
<a href="https://nejlt.ep.liu.se/article/view/4315" target="_blank" >https://nejlt.ep.liu.se/article/view/4315</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3384/nejlt.2000-1533.2022.4315" target="_blank" >10.3384/nejlt.2000-1533.2022.4315</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Part-of-Speech and Morphological Tagging of Algerian Judeo-Arabic
Popis výsledku v původním jazyce
Most linguistic studies of Judeo-Arabic, the ensemble of dialects spoken and written by Jews in Arab lands, are qualitative in nature and rely on laborious manual annotation work, and are therefore limited in scale. In this work, we develop automatic methods for morpho-syntactic tagging of Algerian Judeo-Arabic texts published by Algerian Jews in the 19th--20th centuries, based on a linguistically tagged corpus. First, we describe our semi-automatic approach for preprocessing these texts. Then, we experiment with both an off-the-shelf morphological tagger and several specially designed neural network taggers. Finally, we perform a real-world evaluation of new texts that were never tagged before in comparison with human expert annotators. Our experimental results demonstrate that these methods can dramatically speed up and improve the linguistic research pipeline, enabling linguists to study these dialects on a much greater scale.
Název v anglickém jazyce
Part-of-Speech and Morphological Tagging of Algerian Judeo-Arabic
Popis výsledku anglicky
Most linguistic studies of Judeo-Arabic, the ensemble of dialects spoken and written by Jews in Arab lands, are qualitative in nature and rely on laborious manual annotation work, and are therefore limited in scale. In this work, we develop automatic methods for morpho-syntactic tagging of Algerian Judeo-Arabic texts published by Algerian Jews in the 19th--20th centuries, based on a linguistically tagged corpus. First, we describe our semi-automatic approach for preprocessing these texts. Then, we experiment with both an off-the-shelf morphological tagger and several specially designed neural network taggers. Finally, we perform a real-world evaluation of new texts that were never tagged before in comparison with human expert annotators. Our experimental results demonstrate that these methods can dramatically speed up and improve the linguistic research pipeline, enabling linguists to study these dialects on a much greater scale.
Klasifikace
Druh
J<sub>ost</sub> - Ostatní články v recenzovaných periodicích
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Northern European Journal of Language Technology
ISSN
2000-1533
e-ISSN
1744-4217
Svazek periodika
8
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
21
Strana od-do
1-21
Kód UT WoS článku
—
EID výsledku v databázi Scopus
—