All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Domain adaptation of statistical machine translation with domain-focused web crawling

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F15%3A10317963" target="_blank" >RIV/00216208:11320/15:10317963 - isvavai.cz</a>

  • Result on the web

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Domain adaptation of statistical machine translation with domain-focused web crawling

  • Original language description

    In this paper, we tackle the problem of domain adaptation of statistical machine translation (SMT) by exploiting domain-specific data acquired by domain-focused crawling of text from the World Wide Web. We design and empirically evaluate a procedure forauto- matic acquisition of monolingual and parallel text and their exploitation for system training, tuning, and testing in a phrase-based SMT framework. We present a strategy for using such resources depending on their availability and quantity supported by results of a large-scale evaluation carried out for the domains of environment and labour legislation, two language pairs (English-French and English-Greek) and in both directions: into and from English. In general, MT systems trained and tuned on ageneral domain perform poorly on specific domains and we show that such systems can be adapted successfully by retuning model parameters using small amounts of parallel in-domain data, and may be further improved by using additional mono

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>x</sub> - Unclassified - Peer-reviewed scientific article (Jimp, Jsc and Jost)

  • CEP classification

    AI - Linguistics

  • OECD FORD branch

Result continuities

  • Project

    <a href="/en/project/GBP103%2F12%2FG084" target="_blank" >GBP103/12/G084: Center for Large Scale Multi-modal Data Interpretation</a><br>

  • Continuities

    S - Specificky vyzkum na vysokych skolach<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

  • Publication year

    2015

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Language Resources and Evaluation

  • ISSN

    1574-020X

  • e-ISSN

  • Volume of the periodical

    49

  • Issue of the periodical within the volume

    1

  • Country of publishing house

    NL - THE KINGDOM OF THE NETHERLANDS

  • Number of pages

    47

  • Pages from-to

    147-193

  • UT code for WoS article

    000349889400006

  • EID of the result in the Scopus database

    2-s2.0-84925467070