Preserving Semantics in Textual Adversarial Attacks
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F23%3A00369249" target="_blank" >RIV/68407700:21230/23:00369249 - isvavai.cz</a>
Alternative codes found
RIV/68407700:21730/23:00369249
Result on the web
<a href="https://doi.org/10.3233/FAIA230376" target="_blank" >https://doi.org/10.3233/FAIA230376</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3233/FAIA230376" target="_blank" >10.3233/FAIA230376</a>
Alternative languages
Result language
angličtina
Original language name
Preserving Semantics in Textual Adversarial Attacks
Original language description
The growth of hateful online content, or hate speech, has been associated with a global increase in violent crimes against minorities [23]. Harmful online content can be produced easily, automatically and anonymously. Even though, some form of auto-detection is already achieved through text classifiers in NLP, they can be fooled by adversarial attacks. To strengthen existing systems and stay ahead of attackers, we need better adversarial attacks. In this paper, we show that up to 70% of adversarial examples generated by adversarial attacks should be discarded because they do not preserve semantics. We address this core weakness and propose a new, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE). Our method outperforms existing sentence encoders used in adversarial attacks by achieving 1.2x ~ 5.1x better real attack success rate. We release our code as a plugin that can be used in any existing adversarial attack to improve its quality and speed up its execution. (The code, datasets and test examples are available at https://github.com/DavidHerel/semantics-preserving-encoder.)
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
R - Projekt Ramcoveho programu EK
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
European Conference on Artificial Intelligence 2023
ISBN
978-1-64368-436-9
ISSN
0922-6389
e-ISSN
—
Number of pages
8
Pages from-to
1036-1043
Publisher name
IOS Press
Place of publication
Amsterdam
Event location
Krakov
Event date
Sep 30, 2023
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—