Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A10475953" target="_blank" >RIV/00216208:11320/23:10475953 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.21437/Interspeech.2023-2225" target="_blank" >https://doi.org/10.21437/Interspeech.2023-2225</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2023-2225" target="_blank" >10.21437/Interspeech.2023-2225</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Popis výsledku v původním jazyce
Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed - this scheme cannot directly show a single incremental translation to users. Further, this method lacks mechanisms for controlling the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-n policies for quality-latency control. We apply our framework to models with limited and full-context encoders, with the latter demonstrating that offline models can be effectively converted to online models. Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.
Název v anglickém jazyce
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Popis výsledku anglicky
Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed - this scheme cannot directly show a single incremental translation to users. Further, this method lacks mechanisms for controlling the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-n policies for quality-latency control. We apply our framework to models with limited and full-context encoders, with the latter demonstrating that offline models can be effectively converted to online models. Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/GX19-26934X" target="_blank" >GX19-26934X: Neuronové reprezentace v multimodálním a mnohojazyčném modelování</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the 24st Annual Conference of the International Speech Communication Association
ISBN
—
ISSN
1990-9772
e-ISSN
—
Počet stran výsledku
5
Strana od-do
3979-3983
Název nakladatele
International Speech Communication Association
Místo vydání
Baixas, France
Místo konání akce
Dublin, Ireland
Datum konání akce
20. 8. 2023
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Towards Efficient Simultaneous Speech Translation: CUNI-KIT System for Simultaneous Track at IWSLT 2023 Presenting Simultaneous Translation in Limited Space The IWSLT 2021 BUT Speech Translation Systems

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)