Joint Energy-Based Model for Robust Speech Classification System against Dirty-Label Backdoor Poisoning Attacks
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F23%3APU150862" target="_blank" >RIV/00216305:26230/23:PU150862 - isvavai.cz</a>
Výsledek na webu
<a href="https://ieeexplore.ieee.org/document/10389697" target="_blank" >https://ieeexplore.ieee.org/document/10389697</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Joint Energy-Based Model for Robust Speech Classification System against Dirty-Label Backdoor Poisoning Attacks
Popis výsledku v původním jazyce
Our novel technique utilizes a Joint Energy-based Model (JEM) that integrates both discriminative and generative approaches to increase resistance against dirty-label backdoor attacks. Our approach is especially effective when the trigger is short or hardly perceivable. We simulate the attack on the Speech Commands Dataset consisting of 1 s audio clips. During training, we use JEM to model a view of the input implemented by a randomly selected 610 ms window. During inference, we combine all (40) possible views utilizing a generative part of JEM. The resulting system has slightly decreased accuracy but significantly increased resistance shown in multiple scenarios. Interestingly, replacing JEM with a standard discriminative model (Disc) provides increased resistance with a lesser effect compared to JEM but maintains accuracy. We introduce an extension motivated by semi-supervised training that further improves JEM but not Disc. JEM can also benefit from Gaussian noise during evaluation.
Název v anglickém jazyce
Joint Energy-Based Model for Robust Speech Classification System against Dirty-Label Backdoor Poisoning Attacks
Popis výsledku anglicky
Our novel technique utilizes a Joint Energy-based Model (JEM) that integrates both discriminative and generative approaches to increase resistance against dirty-label backdoor attacks. Our approach is especially effective when the trigger is short or hardly perceivable. We simulate the attack on the Speech Commands Dataset consisting of 1 s audio clips. During training, we use JEM to model a view of the input implemented by a randomly selected 610 ms window. During inference, we combine all (40) possible views utilizing a generative part of JEM. The resulting system has slightly decreased accuracy but significantly increased resistance shown in multiple scenarios. Interestingly, replacing JEM with a standard discriminative model (Disc) provides increased resistance with a lesser effect compared to JEM but maintains accuracy. We introduce an extension motivated by semi-supervised training that further improves JEM but not Disc. JEM can also benefit from Gaussian noise during evaluation.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů