Methods for Detoxification of Texts for the Russian Language
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F21%3A10441629" target="_blank" >RIV/00216208:11320/21:10441629 - isvavai.cz</a>
Result on the web
<a href="https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=T6M_T4JBMO" target="_blank" >https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=T6M_T4JBMO</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3390/mti5090054" target="_blank" >10.3390/mti5090054</a>
Alternative languages
Result language
angličtina
Original language name
Methods for Detoxification of Texts for the Russian Language
Original language description
We introduce the first study of the automatic detoxification of Russian texts to combat offensive language. This kind of textual style transfer can be used for processing toxic content on social media or for eliminating toxicity in automatically generated texts. While much work has been done for the English language in this field, there are no works on detoxification for the Russian language. We suggest two types of models-an approach based on BERT architecture that performs local corrections and a supervised approach based on a pretrained GPT-2 language model. We compare these methods with several baselines. In addition, we provide the training datasets and describe the evaluation setup and metrics for automatic and manual evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Multimodal Technologies and Interaction [online]
ISSN
2414-4088
e-ISSN
—
Volume of the periodical
5
Issue of the periodical within the volume
9
Country of publishing house
CH - SWITZERLAND
Number of pages
26
Pages from-to
54
UT code for WoS article
000702843200001
EID of the result in the Scopus database
2-s2.0-85114864317