Universal Dependency Treebanks for Low-Resource Indian Languages: The Case of Bhojpuri
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F20%3A10424486" target="_blank" >RIV/00216208:11320/20:10424486 - isvavai.cz</a>
Result on the web
<a href="https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/WILDRE-5book.pdf#page=43" target="_blank" >https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/WILDRE-5book.pdf#page=43</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Universal Dependency Treebanks for Low-Resource Indian Languages: The Case of Bhojpuri
Original language description
This paper presents the first dependency treebank for Bhojpuri, an Indo-Aryan language. Bhojpuri is one of the resource-poor Indian languages. The objective of the Bhojpuri Treebank (BHTB) project is to provide a substantial, syntactically annotated treebank for Bhojpuri which helps in building language technological tools. This project will also help in cross-lingual learning and typological research. Currently, the treebank consists of 4,881 tokens using the annotation scheme of Universal Dependencies (UD). We develop a Bhojpuri tagger and parser using the machine learning approach. The accuracy of the model is 57.49% UAS, 45.50% LAS, 79.69% UPOS accuracy and 77.64% XPOS accuracy. Finally, we discuss linguistic analysis and annotation process of the Bhojpuri UD treebank.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the LREC 2020 WILDRE5 – 5th Workshop on Indian Language Data: Resources and Evaluation
ISBN
979-10-95546-67-2
ISSN
—
e-ISSN
—
Number of pages
6
Pages from-to
33-38
Publisher name
European Language Resources Association
Place of publication
Paris, France
Event location
Marseille, France
Event date
May 16, 2020
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—