All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F19%3APU136135" target="_blank" >RIV/00216305:26230/19:PU136135 - isvavai.cz</a>

  • Result on the web

    <a href="https://ieeexplore.ieee.org/document/8683380" target="_blank" >https://ieeexplore.ieee.org/document/8683380</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1109/ICASSP.2019.8683380" target="_blank" >10.1109/ICASSP.2019.8683380</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition

  • Original language description

    In this paper, we explore several new schemes to train a seq2seq model to integrate a pre-trained language model (LM). Our proposed fusion methods focus on the memory cell state and the hidden state in the seq2seq decoder long short-term memory (LSTM), and the memory cell state is updated by the LM unlike the prior studies. This means the memory retained by the main seq2seq would be adjusted by the external LM. These fusion methods have several variants depending on the architecture of this memory cell update and the use of memory cell and hidden states which directly affects the final label inference. We performed the experiments to show the effectiveness of the proposed methods in a mono-lingual ASR setup on the Librispeech corpus and in a transfer learning setup from a multilingual ASR (MLASR) base model to a low-resourced language. In Librispeech, our best model improved WER by 3.7%, 2.4% for test clean, test other relatively to the shallow fusion baseline, with multilevel decoding. In transfer learning from an MLASR base model to the IARPA Babel Swahili model, the best scheme improved the transferred model on eval set by 9.9%, 9.8% in CER, WER relatively to the 2-stage transfer baseline.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/LQ1602" target="_blank" >LQ1602: IT4Innovations excellence in science</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2019

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)

  • ISBN

    978-1-5386-4658-8

  • ISSN

  • e-ISSN

  • Number of pages

    5

  • Pages from-to

    6191-6195

  • Publisher name

    IEEE Signal Processing Society

  • Place of publication

    Brighton

  • Event location

    Brighton

  • Event date

    May 12, 2019

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article

    000482554006084