All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A10475953" target="_blank" >RIV/00216208:11320/23:10475953 - isvavai.cz</a>

  • Result on the web

    <a href="https://doi.org/10.21437/Interspeech.2023-2225" target="_blank" >https://doi.org/10.21437/Interspeech.2023-2225</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.21437/Interspeech.2023-2225" target="_blank" >10.21437/Interspeech.2023-2225</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

  • Original language description

    Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed - this scheme cannot directly show a single incremental translation to users. Further, this method lacks mechanisms for controlling the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-n policies for quality-latency control. We apply our framework to models with limited and full-context encoders, with the latter demonstrating that offline models can be effectively converted to online models. Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/GX19-26934X" target="_blank" >GX19-26934X: Neural Representations in Multi-modal and Multi-lingual Modeling</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2023

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of the 24st Annual Conference of the International Speech Communication Association

  • ISBN

  • ISSN

    1990-9772

  • e-ISSN

  • Number of pages

    5

  • Pages from-to

    3979-3983

  • Publisher name

    International Speech Communication Association

  • Place of publication

    Baixas, France

  • Event location

    Dublin, Ireland

  • Event date

    Aug 20, 2023

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article