Digits to Words Converter for Slavic Languages in Systems of Automatic Speech Recognition

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F17%3A00004821" target="_blank" >RIV/46747885:24220/17:00004821 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1007/978-3-319-66429-3_30" target="_blank" >http://dx.doi.org/10.1007/978-3-319-66429-3_30</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-66429-3_30" target="_blank" >10.1007/978-3-319-66429-3_30</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Digits to Words Converter for Slavic Languages in Systems of Automatic Speech Recognition
Popis výsledku v původním jazyce
In this paper, a system for digits to words conversion for almost all Slavic languages is proposed. This system was developed for improvement of text corpora which we are using for building of a lexicon or for training of language models and acoustic models in the task of Large Vocabulary Continuous Speech Recognition (LVCSR). Strings of digits, some other special characters (%, €, $, …) or abbreviations of physical units (km, m, cm, kg, 1, °C, etc.) occur very often in our text corpora. It is in about 5% cases. The strings of digits or special characters are usually omitted if a lexicon is being built or if the language model is being trained. The task of digits to words conversion in non-inflected languages (e.g. English) is solved by relatively simple conversion or lookup table. The problem is more complex in inflected Slavic languages. The string of digits can be converted into several different word combinations. It depends on the context and resulting words are inflected by gender or cases. The main goal of this research was to find the rules (patterns) for conversion of string of digits into words for Slavic languages. The second goal was to unify this patterns over Slavic languages and to integrate them to the universal system for digits to words conversion.
Název v anglickém jazyce
Digits to Words Converter for Slavic Languages in Systems of Automatic Speech Recognition
Popis výsledku anglicky
In this paper, a system for digits to words conversion for almost all Slavic languages is proposed. This system was developed for improvement of text corpora which we are using for building of a lexicon or for training of language models and acoustic models in the task of Large Vocabulary Continuous Speech Recognition (LVCSR). Strings of digits, some other special characters (%, €, $, …) or abbreviations of physical units (km, m, cm, kg, 1, °C, etc.) occur very often in our text corpora. It is in about 5% cases. The strings of digits or special characters are usually omitted if a lexicon is being built or if the language model is being trained. The task of digits to words conversion in non-inflected languages (e.g. English) is solved by relatively simple conversion or lookup table. The problem is more complex in inflected Slavic languages. The string of digits can be converted into several different word combinations. It depends on the context and resulting words are inflected by gender or cases. The main goal of this research was to find the rules (patterns) for conversion of string of digits into words for Slavic languages. The second goal was to unify this patterns over Slavic languages and to integrate them to the universal system for digits to words conversion.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20204 - Robotics and automatic control

Návaznosti výsledku

Projekt
<a href="/cs/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilinguální platforma pro monitoring a analýzu multimédií</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISBN
9783319664286
ISSN
0302-9743
e-ISSN
—
Počet stran výsledku
10
Strana od-do
312-321
Název nakladatele
Springer Verlag
Místo vydání
Německo
Místo konání akce
Hatfield; United Kingdom
Datum konání akce
1. 1. 2017
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Automatic Symbol Processing for Language Model Building in Slavic Languages Typologické aspekty dynamiky slovní zásoby Hyph-bench: Benchmark Dataset of Hyphenated Words for Generating Hyphenation Patterns

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Digits to Words Converter for Slavic Languages in Systems of Automatic Speech Recognition

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)