Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A10475953" target="_blank" >RIV/00216208:11320/23:10475953 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.21437/Interspeech.2023-2225" target="_blank" >https://doi.org/10.21437/Interspeech.2023-2225</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2023-2225" target="_blank" >10.21437/Interspeech.2023-2225</a>
Alternative languages
Result language
angličtina
Original language name
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Original language description
Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed - this scheme cannot directly show a single incremental translation to users. Further, this method lacks mechanisms for controlling the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-n policies for quality-latency control. We apply our framework to models with limited and full-context encoders, with the latter demonstrating that offline models can be effectively converted to online models. Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/GX19-26934X" target="_blank" >GX19-26934X: Neural Representations in Multi-modal and Multi-lingual Modeling</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 24st Annual Conference of the International Speech Communication Association
ISBN
—
ISSN
1990-9772
e-ISSN
—
Number of pages
5
Pages from-to
3979-3983
Publisher name
International Speech Communication Association
Place of publication
Baixas, France
Event location
Dublin, Ireland
Event date
Aug 20, 2023
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—