cEnTam: Creation and Validation of a New English-Tamil Bilingual Corpus
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F20%3A10427024" target="_blank" >RIV/00216208:11320/20:10427024 - isvavai.cz</a>
Result on the web
<a href="https://www.aclweb.org/anthology/2020.bucc-1.10" target="_blank" >https://www.aclweb.org/anthology/2020.bucc-1.10</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
cEnTam: Creation and Validation of a New English-Tamil Bilingual Corpus
Original language description
Natural Language Processing (NLP), is the field of artificial intelligence that gives the computer the ability to interpret, perceive and extract appropriate information from human languages. Contemporary NLP is predominantly a data driven process. It employs machine learning and statistical algorithms to learn language structures from textual corpus. While application of NLP in English, certain European languages such as Spanish, German, etc. and Chinese, Arabic has been tremendous, it is not so, in many Indian languages. There are obvious advantages in creating aligned bilingual and multilingual corpora. Machine translation, cross-lingual information retrieval, content availability and linguistic comparison are a few of the most sought after applications of such parallel corpora. This paper explains and validates a parallel corpus we created for English-Tamil bilingual pair.
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů