Conference Proceedings


Andrzej Zydron
Andy Way
Jinhua Du



statistical machine translation domain adaptation english language machine translating babelnet smt unknown words oovs

Using BabelNet to improve OOV coverage in SMT (2016)

Abstract Out-of-vocabulary words (OOVs) are a ubiquitous and difficult problem in statistical machine translation (SMT). This paper studies different strategies of using BabelNet to alleviate the negative impact brought about by OOVs. BabelNet is a multilingual encyclopedic dictionary and a semantic network, which not only includes lexicographic and encyclopedic terms, but connects concepts and named entities in a very large network of semantic relations. By taking advantage of the knowledge in BabelNet, three different methods – using direct training data, domain-adaptation techniques and the BabelNet API – are proposed in this paper to obtain translations for OOVs to improve system performance. Experimental results on English–Polish and English–Chinese language pairs show that domain adaptation can better utilize BabelNet knowledge and performs better than other methods. The results also demonstrate that BabelNet is a really useful tool for improving translation performance of SMT systems.
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: ADAPT
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> Subject = Computer Science: Machine translating

Full list of authors on original publication

Andrzej Zydron, Andy Way, Jinhua Du

Experts in our system

Andy Way
Dublin City University
Total Publications: 229
Jinhua Du
Dublin City University
Total Publications: 38