Type

Conference Proceedings

Authors

Andy Way
Sivaji Bandyopadhyay
Pavel Pecina
Sudip Kumar Naskar
Santanu Pal

Subjects

Linguistics

Topics
multi word expressions pb smt target language phrase based statistical machine translation machine translating compound named entities word alignment

Handling named entities and compound verbs in phrase-based statistical machine translation (2010)

Abstract Data preprocessing plays a crucial role in phrase-based statistical machine translation (PB-SMT). In this paper, we show how single-tokenization of two types of multi-word expressions (MWE), namely named entities (NE) and compound verbs, as well as their prior alignment can boost the performance of PB-SMT. Single-tokenization of compound verbs and named entities (NE) provides significant gains over the baseline PB-SMT system. Automatic alignment of NEs substantially improves the overall MT performance, and thereby the word alignment quality indirectly. For establishing NE alignments, we transliterate source NEs into the target language and then compare them with the target NEs. Target language NEs are first converted into a canonical form before the comparison takes place. Our best system achieves statistically significant improvements (4.59 BLEU points absolute, 52.5% relative improvement) on an English—Bangla translation task.
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> Subject = Computer Science
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: Centre for Next Generation Localisation (CNGL)
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> Subject = Computer Science: Machine translating
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres

Full list of authors on original publication

Andy Way, Sivaji Bandyopadhyay, Pavel Pecina, Sudip Kumar Naskar, Santanu Pal

Experts in our system

1
Andy Way
Dublin City University
Total Publications: 229
 
2
Pavel Pecina
Dublin City University
Total Publications: 12