All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Recovery of Rare Words in Lecture Speech

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F10%3APU89608" target="_blank" >RIV/00216305:26230/10:PU89608 - isvavai.cz</a>

  • Result on the web

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Recovery of Rare Words in Lecture Speech

  • Original language description

    The vocabulary used in speech usually consists of two types of words: a limited set of common words, shared across multiple documents, and a virtually unlimited set of rare words, each of which might appear a few times only in particular documents. In most documents, however, these rare words are not seen at all. The first type of words is typically included in the language model of an automatic speech recognizer (ASR) and is thus widely referred to as invocabulary (IV). Words of the second type are missing in the language model and thus are called out-of-vocabulary (OOV). However, these words usually carry important information. We use a hybrid word/sub-word recognizer to detect OOV words occurring in English talks and describe them as sequences of sub-words.We detected about one third of all OOV words, and were able to recover the correct spelling for 26.2% of all detections by using a phoneme-to-grapheme (P2G) conversion trained on the recognition dictionary. By omitting detections

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    JC - Computer hardware and software

  • OECD FORD branch

Result continuities

  • Project

    <a href="/en/project/GA102%2F08%2F0707" target="_blank" >GA102/08/0707: Speech Recognition under Real-World Conditions</a><br>

  • Continuities

    Z - Vyzkumny zamer (s odkazem do CEZ)<br>S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2010

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proc. Text, Speech and Dialogue 2010

  • ISBN

    978-3-642-15759-2

  • ISSN

  • e-ISSN

  • Number of pages

    8

  • Pages from-to

  • Publisher name

    Springer Verlag

  • Place of publication

    Brno

  • Event location

    Brno

  • Event date

    Sep 6, 2010

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article