Building a Part-of-Speech Tagged Corpus for Drenjongke (Bhutia)
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F20%3A10426992" target="_blank" >RIV/00216208:11320/20:10426992 - isvavai.cz</a>
Result on the web
<a href="https://www.aclweb.org/anthology/2020.aacl-srw.9" target="_blank" >https://www.aclweb.org/anthology/2020.aacl-srw.9</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Building a Part-of-Speech Tagged Corpus for Drenjongke (Bhutia)
Original language description
This research paper reports on the generation of the first Drenjongke corpus based on texts taken from a phrase book for beginners, written in the Tibetan script. A corpus of sentences was created after correcting errors in the text scanned through optical character reading (OCR). A total of 34 Part-of-Speech (PoS) tags were defined based on manual annotation performed by the three authors, one of whom is a native speaker of Drenjongke. The first corpus of the Drenjongke language comprises 275 sentences and 1379 tokens, which we plan to expand with other materials to promote further studies of this language.
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů