Building NLP Pipeline for Russian with a Handful of Linguistic Knowledge
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F16%3A10372129" target="_blank" >RIV/00216208:11320/16:10372129 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Building NLP Pipeline for Russian with a Handful of Linguistic Knowledge
Original language description
This work addresses the issue of building a free NLP pipeline for processing Russian texts from plain text to morphologically and syntactically annotated structures in CONLL format. The pipeline is written in python3. Segmentation is provided by our own module. Mystem with numerous postprocessing fixes is used for lemmatization and morphology tagging. Finally, syntactical annotation is obtained with MaltParser utilizing our own model trained on SynTagRus, which was converted into CONLL format for this purpose, with its morphological tagset being converted into Mystem/Russian National Corpus tagset
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů