Aalamaram: A Large-Scale Linguistically Annotated Treebank for the Tamil Language

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AZPC3NT23" target="_blank" >RIV/00216208:11320/25:ZPC3NT23 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195195757&partnerID=40&md5=87a815daf7cbcfbce6daa970a198049f" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195195757&partnerID=40&md5=87a815daf7cbcfbce6daa970a198049f</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Aalamaram: A Large-Scale Linguistically Annotated Treebank for the Tamil Language
Popis výsledku v původním jazyce
Tamil is a relatively low-resource language in the field of Natural Language Processing (NLP). Recent years have seen a growth in Tamil NLP datasets in Natural Language Understanding (NLU) or Natural Language Generation (NLG) tasks, but high-quality linguistic resources remain scarce. In order to alleviate this gap in resources, this paper introduces Aalamaram, a treebank with rich linguistic annotations for the Tamil language. It is hitherto the largest publicly available Tamil treebank with almost 10,000 sentences from diverse sources and is annotated for the tasks of Part-of-speech (POS) tagging, Named Entity Recognition (NER), Morphological Parsing and Dependency Parsing. Close attention has also been paid to multi-word segmentation, especially in the context of Tamil clitics. Although the treebank is based largely on the Universal Dependencies (UD) specifications, significant effort has been made to adjust the annotation rules according to the idiosyncrasies and complexities of the Tamil language, thereby providing a valuable resource for linguistic research and NLP developments. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Název v anglickém jazyce
Aalamaram: A Large-Scale Linguistically Annotated Treebank for the Tamil Language
Popis výsledku anglicky
Tamil is a relatively low-resource language in the field of Natural Language Processing (NLP). Recent years have seen a growth in Tamil NLP datasets in Natural Language Understanding (NLU) or Natural Language Generation (NLG) tasks, but high-quality linguistic resources remain scarce. In order to alleviate this gap in resources, this paper introduces Aalamaram, a treebank with rich linguistic annotations for the Tamil language. It is hitherto the largest publicly available Tamil treebank with almost 10,000 sentences from diverse sources and is annotated for the tasks of Part-of-speech (POS) tagging, Named Entity Recognition (NER), Morphological Parsing and Dependency Parsing. Close attention has also been paid to multi-word segmentation, especially in the context of Tamil clitics. Although the treebank is based largely on the Universal Dependencies (UD) specifications, significant effort has been made to adjust the annotation rules according to the idiosyncrasies and complexities of the Tamil language, thereby providing a valuable resource for linguistic research and NLP developments. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Workshop Indian Lang. Data Resour. Eval., WILDRE LREC-COLING - Workshop Proc.
ISBN
978-249381437-1
ISSN
—
e-ISSN
—
Počet stran výsledku
11
Strana od-do
73-83
Název nakladatele
European Language Resources Association (ELRA)
Místo vydání
—
Místo konání akce
Torino, Italia
Datum konání akce
1. 1. 2025
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Resources for Turkish Dependency Parsing: introducing the BOUN Treebank and the BoAT annotation tool TamilTB: An Effort Towards Building a Dependency Treebank for Tamil Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Aalamaram: A Large-Scale Linguistically Annotated Treebank for the Tamil Language

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)