Type

Journal Article

Authors

Andy Way
Gideon Maillette de Buy Wenniger
Alberto Poncelas

Subjects

Linguistics

Topics
machine translating english n grams feature decay algorithms fda machine translation alignment algorithms entropy

Applying N-gram alignment entropy to improve feature decay algorithms (2017)

Abstract Data Selection is a popular step in Machine Translation pipelines. Feature Decay Algorithms (FDA) is a technique for data selection that has shown a good performance in several tasks. FDA aims to maximize the coverage of n-grams in the test set. However, intuitively, more ambiguous n-grams require more training examples in order to adequately estimate their translation probabilities. This ambiguity can be measured by alignment entropy. In this paper we propose two methods for calculating the alignment entropies for n-grams of any size, which can be used for improving the performance of FDA. We evaluate the substitution of the n-gram-specific entropy values computed by these methods to the parameters of both the exponential and linear decay factor of FDA. The experiments conducted on German-to-English and Czech-to-English translation demonstrate that the use of alignment entropies can lead to an increase in the quality of the results of FDA.
Collections Ireland -> Dublin City University -> Publication Type = Article
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> Subject = Computer Science: Machine translating
Ireland -> Dublin City University -> Subject = Computer Science: Algorithms

Full list of authors on original publication

Andy Way, Gideon Maillette de Buy Wenniger, Alberto Poncelas

Experts in our system

1
Andy Way
Dublin City University
Total Publications: 229
 
2
Gideon Maillette de Buy Wenniger
Dublin City University
Total Publications: 6
 
3
Alberto Poncelas
Dublin City University
Total Publications: 8