A relevance score estimation for spoken term detection based on RNN-generated pronunciation embeddings
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F17%3A43932653" target="_blank" >RIV/49777513:23520/17:43932653 - isvavai.cz</a>
Výsledek na webu
<a href="https://pdfs.semanticscholar.org/a8ad/654be9b7b1c3914ac69a697850fc4657473b.pdf" target="_blank" >https://pdfs.semanticscholar.org/a8ad/654be9b7b1c3914ac69a697850fc4657473b.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2017-1087" target="_blank" >10.21437/Interspeech.2017-1087</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
A relevance score estimation for spoken term detection based on RNN-generated pronunciation embeddings
Popis výsledku v původním jazyce
In this paper, we present a novel method for term score es- timation. The method is primarily designed for scoring the out-of-vocabulary terms, however it could also estimate scores for in-vocabulary results. The term score is computed as a co- sine distance of two pronunciation embeddings. The first one is generated from the grapheme representation of the searched term, while the second one is computed from the recognized phoneme confusion network. The embeddings are generated by specifically trained recurrent neural network built on the idea of Siamese neural networks. The RNN is trained from recognition results on word- and phone-level in an unsupervised fashion without need of any hand-labeled data. The method is evaluated on the MALACH data in two languages, English and Czech. The results are compared with two baseline methods for OOV term detection.
Název v anglickém jazyce
A relevance score estimation for spoken term detection based on RNN-generated pronunciation embeddings
Popis výsledku anglicky
In this paper, we present a novel method for term score es- timation. The method is primarily designed for scoring the out-of-vocabulary terms, however it could also estimate scores for in-vocabulary results. The term score is computed as a co- sine distance of two pronunciation embeddings. The first one is generated from the grapheme representation of the searched term, while the second one is computed from the recognized phoneme confusion network. The embeddings are generated by specifically trained recurrent neural network built on the idea of Siamese neural networks. The RNN is trained from recognition results on word- and phone-level in an unsupervised fashion without need of any hand-labeled data. The method is evaluated on the MALACH data in two languages, English and Czech. The results are compared with two baseline methods for OOV term detection.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20205 - Automation and control systems
Návaznosti výsledku
Projekt
<a href="/cs/project/GBP103%2F12%2FG084" target="_blank" >GBP103/12/G084: Centrum pro multi-modální interpretaci dat velkého rozsahu</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the 18th Annual Conference of the International Speech Communication Association (Interspeech 2017)
ISBN
978-1-5108-4876-4
ISSN
1990-9772
e-ISSN
—
Počet stran výsledku
5
Strana od-do
2934-2938
Název nakladatele
Curran Associates, Inc.
Místo vydání
Red Hook, NY
Místo konání akce
Stockholm, Sweden
Datum konání akce
20. 8. 2017
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000457505000607