Workflows for kickstarting RBMT in virtually No-Resource Situation
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F19%3A10427162" target="_blank" >RIV/00216208:11320/19:10427162 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.aclweb.org/anthology/W19-6803" target="_blank" >https://www.aclweb.org/anthology/W19-6803</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Workflows for kickstarting RBMT in virtually No-Resource Situation
Popis výsledku v původním jazyce
In this article we describe a work-inprogress best learnt practices on how to start working on rule-based machine translation when working with language that has virtually no pre-existing digital resources for NLP use. We use Karelian language as a case study, in the beginning of our project there were no publically available corpora, parallel or monolingual analysed, no analysers and no translation tools or languagemodels. We show workflows thatwe have find useful to curate and developnecessary NLP resources for thelanguage. Our workflow is aimed also for no-resources working in a sense of no funding and scarce access to native informants, we show that building core NLP resources in parallel can alleviate the problems therein.
Název v anglickém jazyce
Workflows for kickstarting RBMT in virtually No-Resource Situation
Popis výsledku anglicky
In this article we describe a work-inprogress best learnt practices on how to start working on rule-based machine translation when working with language that has virtually no pre-existing digital resources for NLP use. We use Karelian language as a case study, in the beginning of our project there were no publically available corpora, parallel or monolingual analysed, no analysers and no translation tools or languagemodels. We show workflows thatwe have find useful to curate and developnecessary NLP resources for thelanguage. Our workflow is aimed also for no-resources working in a sense of no funding and scarce access to native informants, we show that building core NLP resources in parallel can alleviate the problems therein.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů