All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Study of Large Data Resources for Multilingual Training and System Porting

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F16%3APU121609" target="_blank" >RIV/00216305:26230/16:PU121609 - isvavai.cz</a>

  • Result on the web

    <a href="http://www.sciencedirect.com/science/article/pii/S1877050916300382" target="_blank" >http://www.sciencedirect.com/science/article/pii/S1877050916300382</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1016/j.procs.2016.04.024" target="_blank" >10.1016/j.procs.2016.04.024</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Study of Large Data Resources for Multilingual Training and System Porting

  • Original language description

    This study investigates the behavior of a feature extraction neural network model trained on a large amount of single language data ("source language") on a set of under-resourced target languages. The coverage of the source language acoustic space was changed in two ways: (1) by changing the amount of training data and (2) by altering the level of detail of acoustic units (by changing the triphone clustering). We observe the effect of these changes on the performance on target language in two scenarios: (1) the source-language NNs were used directly, (2) NNs were first ported to target language. The results show that increasing coverage as well as level of detail on the source language improves the target language system performance in both scenarios. For the first one, both source language characteristic have about the same effect. For the second scenario, the amount of data in source language is more important than the level of detail. The possibility to include large data into multilingual training set was also investigated. Our experiments point out possible risk of over-weighting the NNs towards the source language with large data. This degrades the performance on part of the target languages, compared to the setting where the amounts of data per language are balanced.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/TA04011311" target="_blank" >TA04011311: Meeting assistant (MINT)</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2016

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Procedia Computer Science

  • ISBN

  • ISSN

    1877-0509

  • e-ISSN

  • Number of pages

    8

  • Pages from-to

    15-22

  • Publisher name

    Elsevier Science

  • Place of publication

    Yogyakarta

  • Event location

    Yogyakarta

  • Event date

    May 7, 2016

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article

    000387446500002