Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A10475953" target="_blank" >RIV/00216208:11320/23:10475953 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.21437/Interspeech.2023-2225" target="_blank" >https://doi.org/10.21437/Interspeech.2023-2225</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2023-2225" target="_blank" >10.21437/Interspeech.2023-2225</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Popis výsledku v původním jazyce
Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed - this scheme cannot directly show a single incremental translation to users. Further, this method lacks mechanisms for controlling the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-n policies for quality-latency control. We apply our framework to models with limited and full-context encoders, with the latter demonstrating that offline models can be effectively converted to online models. Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.
Název v anglickém jazyce
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Popis výsledku anglicky
Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed - this scheme cannot directly show a single incremental translation to users. Further, this method lacks mechanisms for controlling the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-n policies for quality-latency control. We apply our framework to models with limited and full-context encoders, with the latter demonstrating that offline models can be effectively converted to online models. Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/GX19-26934X" target="_blank" >GX19-26934X: Neuronové reprezentace v multimodálním a mnohojazyčném modelování</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the 24st Annual Conference of the International Speech Communication Association
ISBN
—
ISSN
1990-9772
e-ISSN
—
Počet stran výsledku
5
Strana od-do
3979-3983
Název nakladatele
International Speech Communication Association
Místo vydání
Baixas, France
Místo konání akce
Dublin, Ireland
Datum konání akce
20. 8. 2023
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—