Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3A10440982" target="_blank" >RIV/00216208:11320/22:10440982 - isvavai.cz</a>
Výsledek na webu
<a href="https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=4R~5cPCLdM" target="_blank" >https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=4R~5cPCLdM</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s10579-021-09558-0" target="_blank" >10.1007/s10579-021-09558-0</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool
Popis výsledku v původním jazyce
In this paper, we introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank), along with the guidelines we adopted, and a new annotation tool (BoAT). The manual annotation process that we employed was shaped and implemented by a team of four linguists and five Natural Language Processing (NLP) specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework as well as our recent efforts for unifying the Turkish UD treebanks through manual re-annotation. To the best of our knowledge, the BOUN Treebank is the largest Turkish UD treebank. It contains a total of 9761 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a state-of-the-art dependency parser obtained over the BOUN Treebank as well as two other treebanks in Turkish. Our results demonstrate that the unification of the Turkish annotation scheme and the introduction of a more comprehensive treebank lead to improved performance with regards to dependency parsing.
Název v anglickém jazyce
Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool
Popis výsledku anglicky
In this paper, we introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank), along with the guidelines we adopted, and a new annotation tool (BoAT). The manual annotation process that we employed was shaped and implemented by a team of four linguists and five Natural Language Processing (NLP) specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework as well as our recent efforts for unifying the Turkish UD treebanks through manual re-annotation. To the best of our knowledge, the BOUN Treebank is the largest Turkish UD treebank. It contains a total of 9761 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a state-of-the-art dependency parser obtained over the BOUN Treebank as well as two other treebanks in Turkish. Our results demonstrate that the unification of the Turkish annotation scheme and the introduction of a more comprehensive treebank lead to improved performance with regards to dependency parsing.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Language Resources and Evaluation
ISSN
1574-020X
e-ISSN
—
Svazek periodika
56
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
NL - Nizozemsko
Počet stran výsledku
49
Strana od-do
259-307
Kód UT WoS článku
000715667900001
EID výsledku v databázi Scopus
2-s2.0-85118617288