Methods for Detoxification of Texts for the Russian Language

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F21%3A10441629" target="_blank" >RIV/00216208:11320/21:10441629 - isvavai.cz</a>
Výsledek na webu
<a href="https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=T6M_T4JBMO" target="_blank" >https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=T6M_T4JBMO</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3390/mti5090054" target="_blank" >10.3390/mti5090054</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Methods for Detoxification of Texts for the Russian Language
Popis výsledku v původním jazyce
We introduce the first study of the automatic detoxification of Russian texts to combat offensive language. This kind of textual style transfer can be used for processing toxic content on social media or for eliminating toxicity in automatically generated texts. While much work has been done for the English language in this field, there are no works on detoxification for the Russian language. We suggest two types of models-an approach based on BERT architecture that performs local corrections and a supervised approach based on a pretrained GPT-2 language model. We compare these methods with several baselines. In addition, we provide the training datasets and describe the evaluation setup and metrics for automatic and manual evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.
Název v anglickém jazyce
Methods for Detoxification of Texts for the Russian Language
Popis výsledku anglicky
We introduce the first study of the automatic detoxification of Russian texts to combat offensive language. This kind of textual style transfer can be used for processing toxic content on social media or for eliminating toxicity in automatically generated texts. While much work has been done for the English language in this field, there are no works on detoxification for the Russian language. We suggest two types of models-an approach based on BERT architecture that performs local corrections and a supervised approach based on a pretrained GPT-2 language model. We compare these methods with several baselines. In addition, we provide the training datasets and describe the evaluation setup and metrics for automatic and manual evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Multimodal Technologies and Interaction [online]
ISSN
2414-4088
e-ISSN
—
Svazek periodika
5
Číslo periodika v rámci svazku
9
Stát vydavatele periodika
CH - Švýcarská konfederace
Počet stran výsledku
26
Strana od-do
54
Kód UT WoS článku
000702843200001
EID výsledku v databázi Scopus
2-s2.0-85114864317

Podobné výsledky(10)

Multilingual Embeddings for Clustering Cultural Events Train Hard, Finetune Easy: Multilingual Denoising for RDF-to-Text Generation Text Detoxification as Style Transfer in English and Hindi

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Methods for Detoxification of Texts for the Russian Language

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)