Type

Journal Article

Authors

Andy Way
Haithem Afli
Pintu Lohar

Subjects

Linguistics

Topics
user generated content machine translation source language social networking machine translating social media sentiment classification translation quality

Maintaining sentiment polarity in translation of user-generated content (2017)

Abstract The advent of social media has shaken the very foundations of how we share information, with Twitter, Facebook, and Linkedin among many well-known social networking platforms that facilitate information generation and distribution. However, the maximum 140-character restriction in Twitter encourages users to (sometimes deliberately) write somewhat informally in most cases. As a result, machine translation (MT) of user-generated content (UGC) becomes much more difficult for such noisy texts. In addition to translation quality being affected, this phenomenon may also negatively impact sentiment preservation in the translation process. That is, a sentence with positive sentiment in the source language may be translated into a sentence with negative or neutral sentiment in the target language. In this paper, we analyse both sentiment preservation and MT quality per sein the context of UGC, focusing especially on whether sentiment classification helps improve sentiment preservation in MT of UGC. We build four different experimental setups for tweet translation (i) using a single MT model trained on the whole Twitter parallel corpus, (ii) using multiple MT models based on sentiment classification, (iii) using MT models including additional out-of-domain data, and (iv) adding MT models based on the phrase-table fill-up method to accompany the sentiment translation models with an aim of improving MT quality and at the same time maintaining sentiment polarity preservation. Our empirical evaluation shows that despite a slight deterioration in MT quality, our system significantly outperforms the Baseline MT system (without using sentiment classification) in terms of sentiment preservation. We also demonstrate that using an MT engine that conveys a sentiment different from that of the UGC can even worsen both the translation quality and sentiment preservation.
Collections Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: ADAPT
Ireland -> Dublin City University -> Publication Type = Article
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> Subject = Computer Science: Machine translating

Full list of authors on original publication

Andy Way, Haithem Afli, Pintu Lohar

Experts in our system

1
Andy Way
Dublin City University
Total Publications: 229
 
2
Haithem Afli
Dublin City University
Total Publications: 14
 
3
Pintu Lohar
Dublin City University
Total Publications: 10