Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F21%3A43962417" target="_blank" >RIV/49777513:23520/21:43962417 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.isca-speech.org/archive/interspeech_2021/svec21_interspeech.html" target="_blank" >https://www.isca-speech.org/archive/interspeech_2021/svec21_interspeech.html</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2021-1704" target="_blank" >10.21437/Interspeech.2021-1704</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings
Popis výsledku v původním jazyce
The paper describes a novel approach to Spoken Term Detection (STD) in large spoken archives using deep LSTM networks. The work is based on the previous approach of using Siamese neural networks for STD and naturally extends it to directly localize a spoken term and estimate its relevance score. The phoneme confusion network generated by a phoneme recognizer is processed by the deep LSTM network which projects each segment of the confusion network into an embedding space. The searched term is projected into the same embedding space using another deep LSTM network. The relevance score is then computed using a simple dot-product in the embedding space and calibrated using a sigmoid function to predict the probability of occurrence. The location of the searched term is then estimated from the sequence of output probabilities. The deep LSTM networks are trained in a self-supervised manner from paired recognition hypotheses on word and phoneme levels. The method is experimentally evaluated on MALACH data in English and Czech languages.
Název v anglickém jazyce
Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings
Popis výsledku anglicky
The paper describes a novel approach to Spoken Term Detection (STD) in large spoken archives using deep LSTM networks. The work is based on the previous approach of using Siamese neural networks for STD and naturally extends it to directly localize a spoken term and estimate its relevance score. The phoneme confusion network generated by a phoneme recognizer is processed by the deep LSTM network which projects each segment of the confusion network into an embedding space. The searched term is projected into the same embedding space using another deep LSTM network. The relevance score is then computed using a simple dot-product in the embedding space and calibrated using a sigmoid function to predict the probability of occurrence. The location of the searched term is then estimated from the sequence of output probabilities. The deep LSTM networks are trained in a self-supervised manner from paired recognition hypotheses on word and phoneme levels. The method is experimentally evaluated on MALACH data in English and Czech languages.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20205 - Automation and control systems

Návaznosti výsledku

Projekt
<a href="/cs/project/VJ01010108" target="_blank" >VJ01010108: Robustní zpracování nahrávek pro operativu a bezpečnost</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISBN
978-1-71383-690-2
ISSN
2308-457X
e-ISSN
—
Počet stran výsledku
5
Strana od-do
851-855
Název nakladatele
International Speech Communication Association
Místo vydání
Red Hook, NY
Místo konání akce
Brno, Czech Republic
Datum konání akce
30. 8. 2021
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

An Analysis of the RNN-Based Spoken Term Detection Training Transformer-Based Encoder-Encoder Architecture for Spoken Term Detection Kombinace slovního a fonémového přístupu k vyhledávání klíčových frází

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)