Recovery of Rare Words in Lecture Speech
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F10%3APU89608" target="_blank" >RIV/00216305:26230/10:PU89608 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Recovery of Rare Words in Lecture Speech
Original language description
The vocabulary used in speech usually consists of two types of words: a limited set of common words, shared across multiple documents, and a virtually unlimited set of rare words, each of which might appear a few times only in particular documents. In most documents, however, these rare words are not seen at all. The first type of words is typically included in the language model of an automatic speech recognizer (ASR) and is thus widely referred to as invocabulary (IV). Words of the second type are missing in the language model and thus are called out-of-vocabulary (OOV). However, these words usually carry important information. We use a hybrid word/sub-word recognizer to detect OOV words occurring in English talks and describe them as sequences of sub-words.We detected about one third of all OOV words, and were able to recover the correct spelling for 26.2% of all detections by using a phoneme-to-grapheme (P2G) conversion trained on the recognition dictionary. By omitting detections
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/GA102%2F08%2F0707" target="_blank" >GA102/08/0707: Speech Recognition under Real-World Conditions</a><br>
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2010
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proc. Text, Speech and Dialogue 2010
ISBN
978-3-642-15759-2
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
—
Publisher name
Springer Verlag
Place of publication
Brno
Event location
Brno
Event date
Sep 6, 2010
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—