Domain adaptation of statistical machine translation with domain-focused web crawling
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F15%3A10317963" target="_blank" >RIV/00216208:11320/15:10317963 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Domain adaptation of statistical machine translation with domain-focused web crawling
Original language description
In this paper, we tackle the problem of domain adaptation of statistical machine translation (SMT) by exploiting domain-specific data acquired by domain-focused crawling of text from the World Wide Web. We design and empirically evaluate a procedure forauto- matic acquisition of monolingual and parallel text and their exploitation for system training, tuning, and testing in a phrase-based SMT framework. We present a strategy for using such resources depending on their availability and quantity supported by results of a large-scale evaluation carried out for the domains of environment and labour legislation, two language pairs (English-French and English-Greek) and in both directions: into and from English. In general, MT systems trained and tuned on ageneral domain perform poorly on specific domains and we show that such systems can be adapted successfully by retuning model parameters using small amounts of parallel in-domain data, and may be further improved by using additional mono
Czech name
—
Czech description
—
Classification
Type
J<sub>x</sub> - Unclassified - Peer-reviewed scientific article (Jimp, Jsc and Jost)
CEP classification
AI - Linguistics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/GBP103%2F12%2FG084" target="_blank" >GBP103/12/G084: Center for Large Scale Multi-modal Data Interpretation</a><br>
Continuities
S - Specificky vyzkum na vysokych skolach<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Language Resources and Evaluation
ISSN
1574-020X
e-ISSN
—
Volume of the periodical
49
Issue of the periodical within the volume
1
Country of publishing house
NL - THE KINGDOM OF THE NETHERLANDS
Number of pages
47
Pages from-to
147-193
UT code for WoS article
000349889400006
EID of the result in the Scopus database
2-s2.0-84925467070