Type

Conference Proceedings

Authors

Andy Way
Sylwia Ozdowska

Subjects

Linguistics

Topics
state of the art machine translating source language system art parallel corpora pb smt training data phrase based statistical machine translation

Optimal bilingual data for French-English PB-SMT (2009)

Abstract We investigate the impact of the original source language (SL) on French–English PB-SMT. We train four configurations of a state-of-the-art PB-SMT system based on French–English parallel corpora which differ in terms of the original SL, and conduct experiments in both translation directions. We see that data containing original French and English translated from French is optimal when building a system translating from French into English. Conversely, using data comprising exclusively French and English translated from several other languages is suboptimal regardless of the translation direction. Accordingly, the clamour for more data needs to be tempered somewhat; unless the quality of such data is controlled, more training data can cause translation performance to decrease drastically, by up to 38% relative BLEU in our experiments.
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> Subject = Computer Science
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> Subject = Computer Science: Machine translating
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: National Centre for Language Technology (NCLT)
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres

Full list of authors on original publication

Andy Way, Sylwia Ozdowska

Experts in our system

1
Andy Way
Dublin City University
Total Publications: 229