Type

Conference Proceedings

Authors

Peyman Passban
Gideon Maillette de Buy Wenniger
Andy Way
Dimitar Shterionov
Alberto Poncelas

Subjects

Linguistics

Topics
corpus based machine translation machine translation statistical machine translation data quality state building statistical mt neural machine translation state of the art

Investigating Backtranslation in Neural Machine Translation (2018)

Abstract A prerequisite for training corpus-based machine translation (MT) systems – either Statistical MT (SMT) or Neural MT (NMT) – is the availability of high-quality parallel data. This is arguably more important today than ever before, as NMT has been shown in many studies to outperform SMT, but mostly when large parallel corpora are available; in cases where data is limited, SMT can still outperform NMT. Recently researchers have shown that back-translating monolingual data can be used to create synthetic parallel corpora, which in turn can be used in combination with authentic parallel data to train a highquality NMT system. Given that large collections of new parallel text become available only quite rarely, backtranslation has become the norm when building state-of-the-art NMT systems, especially in resource-poor scenarios. However, we assert that there are many unknown factors regarding the actual effects of back-translated data on the translation capabilities of an NMT model. Accordingly, in this work we investigate how using back-translated data as a training corpus – both as a separate standalone dataset as well as combined with human-generated parallel data – affects the performance of an NMT model. We use incrementally larger amounts of back-translated data to train a range of NMT systems for German-to-English, and analyse the resulting translation performance.
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: ADAPT
Ireland -> Dublin City University -> Status = Published

Full list of authors on original publication

Peyman Passban, Gideon Maillette de Buy Wenniger, Andy Way, Dimitar Shterionov, Alberto Poncelas

Experts in our system

1
Peyman Passban
Dublin City University
Total Publications: 9
 
2
Gideon Maillette de Buy Wenniger
Dublin City University
Total Publications: 6
 
3
Andy Way
Dublin City University
Total Publications: 229
 
4
Alberto Poncelas
Dublin City University
Total Publications: 8