Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F21%3A43962417" target="_blank" >RIV/49777513:23520/21:43962417 - isvavai.cz</a>
Result on the web
<a href="https://www.isca-speech.org/archive/interspeech_2021/svec21_interspeech.html" target="_blank" >https://www.isca-speech.org/archive/interspeech_2021/svec21_interspeech.html</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2021-1704" target="_blank" >10.21437/Interspeech.2021-1704</a>
Alternative languages
Result language
angličtina
Original language name
Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings
Original language description
The paper describes a novel approach to Spoken Term Detection (STD) in large spoken archives using deep LSTM networks. The work is based on the previous approach of using Siamese neural networks for STD and naturally extends it to directly localize a spoken term and estimate its relevance score. The phoneme confusion network generated by a phoneme recognizer is processed by the deep LSTM network which projects each segment of the confusion network into an embedding space. The searched term is projected into the same embedding space using another deep LSTM network. The relevance score is then computed using a simple dot-product in the embedding space and calibrated using a sigmoid function to predict the probability of occurrence. The location of the searched term is then estimated from the sequence of output probabilities. The deep LSTM networks are trained in a self-supervised manner from paired recognition hypotheses on word and phoneme levels. The method is experimentally evaluated on MALACH data in English and Czech languages.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/VJ01010108" target="_blank" >VJ01010108: Robust processing of recordings for operations and security</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISBN
978-1-71383-690-2
ISSN
2308-457X
e-ISSN
—
Number of pages
5
Pages from-to
851-855
Publisher name
International Speech Communication Association
Place of publication
Red Hook, NY
Event location
Brno, Czech Republic
Event date
Aug 30, 2021
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—