Preserving Semantics in Textual Adversarial Attacks

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F23%3A00369249" target="_blank" >RIV/68407700:21230/23:00369249 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/68407700:21730/23:00369249
Výsledek na webu
<a href="https://doi.org/10.3233/FAIA230376" target="_blank" >https://doi.org/10.3233/FAIA230376</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3233/FAIA230376" target="_blank" >10.3233/FAIA230376</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Preserving Semantics in Textual Adversarial Attacks
Popis výsledku v původním jazyce
The growth of hateful online content, or hate speech, has been associated with a global increase in violent crimes against minorities [23]. Harmful online content can be produced easily, automatically and anonymously. Even though, some form of auto-detection is already achieved through text classifiers in NLP, they can be fooled by adversarial attacks. To strengthen existing systems and stay ahead of attackers, we need better adversarial attacks. In this paper, we show that up to 70% of adversarial examples generated by adversarial attacks should be discarded because they do not preserve semantics. We address this core weakness and propose a new, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE). Our method outperforms existing sentence encoders used in adversarial attacks by achieving 1.2x ~ 5.1x better real attack success rate. We release our code as a plugin that can be used in any existing adversarial attack to improve its quality and speed up its execution. (The code, datasets and test examples are available at https://github.com/DavidHerel/semantics-preserving-encoder.)
Název v anglickém jazyce
Preserving Semantics in Textual Adversarial Attacks
Popis výsledku anglicky
The growth of hateful online content, or hate speech, has been associated with a global increase in violent crimes against minorities [23]. Harmful online content can be produced easily, automatically and anonymously. Even though, some form of auto-detection is already achieved through text classifiers in NLP, they can be fooled by adversarial attacks. To strengthen existing systems and stay ahead of attackers, we need better adversarial attacks. In this paper, we show that up to 70% of adversarial examples generated by adversarial attacks should be discarded because they do not preserve semantics. We address this core weakness and propose a new, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE). Our method outperforms existing sentence encoders used in adversarial attacks by achieving 1.2x ~ 5.1x better real attack success rate. We release our code as a plugin that can be used in any existing adversarial attack to improve its quality and speed up its execution. (The code, datasets and test examples are available at https://github.com/DavidHerel/semantics-preserving-encoder.)

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
R - Projekt Ramcoveho programu EK

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
European Conference on Artificial Intelligence 2023
ISBN
978-1-64368-436-9
ISSN
0922-6389
e-ISSN
—
Počet stran výsledku
8
Strana od-do
1036-1043
Název nakladatele
IOS Press
Místo vydání
Amsterdam
Místo konání akce
Krakov
Datum konání akce
30. 9. 2023
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Adversarial Examples by Perturbing High-level Features in Intermediate Decoder Layers Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising Towards a Robust Deep Neural Network Against Adversarial Texts: A Survey

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Preserving Semantics in Textual Adversarial Attacks

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)