Digits to Words Converter for Slavic Languages in Systems of Automatic Speech Recognition
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F17%3A00004821" target="_blank" >RIV/46747885:24220/17:00004821 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-319-66429-3_30" target="_blank" >http://dx.doi.org/10.1007/978-3-319-66429-3_30</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-66429-3_30" target="_blank" >10.1007/978-3-319-66429-3_30</a>
Alternative languages
Result language
angličtina
Original language name
Digits to Words Converter for Slavic Languages in Systems of Automatic Speech Recognition
Original language description
In this paper, a system for digits to words conversion for almost all Slavic languages is proposed. This system was developed for improvement of text corpora which we are using for building of a lexicon or for training of language models and acoustic models in the task of Large Vocabulary Continuous Speech Recognition (LVCSR). Strings of digits, some other special characters (%, €, $, …) or abbreviations of physical units (km, m, cm, kg, 1, °C, etc.) occur very often in our text corpora. It is in about 5% cases. The strings of digits or special characters are usually omitted if a lexicon is being built or if the language model is being trained. The task of digits to words conversion in non-inflected languages (e.g. English) is solved by relatively simple conversion or lookup table. The problem is more complex in inflected Slavic languages. The string of digits can be converted into several different word combinations. It depends on the context and resulting words are inflected by gender or cases. The main goal of this research was to find the rules (patterns) for conversion of string of digits into words for Slavic languages. The second goal was to unify this patterns over Slavic languages and to integrate them to the universal system for digits to words conversion.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20204 - Robotics and automatic control
Result continuities
Project
<a href="/en/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilingual Multimedia Monitoring and Analyzing Platform</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISBN
9783319664286
ISSN
0302-9743
e-ISSN
—
Number of pages
10
Pages from-to
312-321
Publisher name
Springer Verlag
Place of publication
Německo
Event location
Hatfield; United Kingdom
Event date
Jan 1, 2017
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—