Learnable Sparse Filterbank for Speaker Verification

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F22%3APU146107" target="_blank" >RIV/00216305:26230/22:PU146107 - isvavai.cz</a>
Result on the web
<a href="https://www.isca-speech.org/archive/pdfs/interspeech_2022/peng22e_interspeech.pdf" target="_blank" >https://www.isca-speech.org/archive/pdfs/interspeech_2022/peng22e_interspeech.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2022-11309" target="_blank" >10.21437/Interspeech.2022-11309</a>

Result language
angličtina
Original language name
Learnable Sparse Filterbank for Speaker Verification
Original language description
Recently, feature extraction with learnable filters was extensively investigated with speaker verification systems, with filters learned both in time- and frequency-domains. Most of the learned schemes however end up with filters close to their initialization (e.g. Mel filterbank) or filters strongly limited by their constraints. In this paper, we propose a novel learnable sparse filterbank, named LearnSF, by exclusively optimizing the sparsity of the filterbank, that does not explicitly constrain the filters to follow pre-defined distribution. After standard pre-processing (STFT and square of the magnitude spectrum), the learnable sparse filterbank is employed, with its normalized outputs fed into a neural network predicting the speaker identity. We evaluated the performance of the proposed approach on both VoxCeleb and CNCeleb datasets. The experimental results demonstrate the effectiveness of the proposed LearnSF compared to both widely-used acoustic features and existing parameterized learnable front-ends.
Czech name
—
Czech description
—

Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Article name in the collection
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISBN
—
ISSN
1990-9772
e-ISSN
—
Number of pages
5
Pages from-to
5110-5114
Publisher name
International Speech Communication Association
Place of publication
Incheon
Event location
Incheon Korea
Event date
Sep 18, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—

Similar results(10)