An Improved Bulgarian Natural Language Processing Pipeline
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A7JMFCF9C" target="_blank" >RIV/00216208:11320/23:7JMFCF9C - isvavai.cz</a>
Result on the web
<a href="https://www.researchgate.net/profile/Melania-Berbatova/publication/371081880_An_Improved_Bulgarian_Natural_Language_Processing_Pipeline/links/64787b68b3dfd73b7758815e/An-Improved-Bulgarian-Natural-Language-Processing-Pipeline.pdf" target="_blank" >https://www.researchgate.net/profile/Melania-Berbatova/publication/371081880_An_Improved_Bulgarian_Natural_Language_Processing_Pipeline/links/64787b68b3dfd73b7758815e/An-Improved-Bulgarian-Natural-Language-Processing-Pipeline.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.60063/gsu.fmi.110.37-50" target="_blank" >10.60063/gsu.fmi.110.37-50</a>
Alternative languages
Result language
angličtina
Original language name
An Improved Bulgarian Natural Language Processing Pipeline
Original language description
"In this paper, we present a language pipeline for processing Bulgarian language data. The pipeline consistsnof the following steps: tokenization, sentence splitting, part-of-speech tagging, dependency parsing,nnamed entity recognition, lemmatization, and word sense disambiguation. The rst two components arenbased on rules and lists of words specic to the Bulgarian language, while the rest of the components usenmachine learning algorithms trained on universal dependency data and pretrained word vectors. Thenpipeline is implemented in the Python library spaCy and achieves signicant results on all the includednsubtasks. The pipeline is open source and is available on Github for use by researchers and developersnfor a variety of natural language processing and text analysis tasks."
Czech name
—
Czech description
—
Classification
Type
J<sub>ost</sub> - Miscellaneous article in a specialist periodical
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
"ANNUAL OF SOFIA UNIVERSITY “ST. KLIMENT OHRIDSKI”FACULTY OF MATHEMATICS AND INFORMATICS"
ISSN
1313-9215
e-ISSN
—
Volume of the periodical
110
Issue of the periodical within the volume
2023
Country of publishing house
US - UNITED STATES
Number of pages
14
Pages from-to
37-50
UT code for WoS article
—
EID of the result in the Scopus database
—