Automatic Symbol Processing for Language Model Building in Slavic Languages
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F16%3A00000307" target="_blank" >RIV/46747885:24220/16:00000307 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Automatic Symbol Processing for Language Model Building in Slavic Languages
Original language description
When we want to adapt an existing automatic speech recognition system to a new language, we need a large corpus of texts to create a lexicon, a language model and a database of annotated recordings to train an acoustic model. Usually the texts in the corpus (or in annotations) contain not only words but also some other symbols, mainly strings of digits, special characters and some frequent abbreviations of units. The common feature of all these symbols is that there is not a straightforward correspondence between their printed form and the spoken one. The main goal of this work was to develop efficient tools for automatic translation of symbols or symbolic terms to words for almost all Slavic languages. In this paper we present the research of the basic elements and the production rules in Slavic languages which was used for design of our universal text pre- and post-processing tools.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilingual Multimedia Monitoring and Analyzing Platform</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proc. of Information technologies Applications and Theory Conference - ITAT 2016
ISBN
978-1-5370-1674-0
ISSN
1613-0073
e-ISSN
—
Number of pages
5
Pages from-to
37-41
Publisher name
Slovenská spoločnosť pre umelú inteligenciu
Place of publication
Slovenská Republika
Event location
Slovenská Republika
Event date
Jan 1, 2016
Type of event by nationality
EUR - Evropská akce
UT code for WoS article
—