Type

Conference Proceedings

Authors

Andy Way
Haithem Afli
Pintu Lohar
Henny Sluyter-Gäthje

Subjects

Linguistics

Topics
real time sentiment translation bilingual corpus social networking online parallel corpus parallel data tweets machine translating

FooTweets: a bilingual parallel corpus of World Cup tweets (2018)

Abstract The way information spreads through society has changed significantly over the past decade with the advent of online social networking. Twitter, one of the most widely used social networking websites, is known as the real-time, public microblogging network where news breaks first. Most users love it for its iconic 140-character limitation and unfiltered feed that show them news and opinions in the form of tweets. Tweets are usually multilingual in nature and of varying quality. However, machine translation (MT) of twitter data is a challenging task especially due to the following two reasons: (i) tweets are informal in nature (i.e., violates linguistic norms), and (ii) parallel resource for twitter data is scarcely available on the Internet. In this paper, we develop FooTweets, a first parallel corpus of tweets for English–German language pair. We extract 4, 000 English tweets from the FIFA 2014 world cup and manually translate them into German with a special focus on the informal nature of the tweets. In addition to this, we also annotate sentiment scores between 0 and 1 to all the tweets depending upon the degree of sentiment associated with them. This data has recently been used to build sentiment translation engines and an extensive evaluation revealed that such a resource is very useful in machine translation of user generated content.
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: ADAPT
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> Subject = Computer Science: Machine translating

Full list of authors on original publication

Andy Way, Haithem Afli, Pintu Lohar, Henny Sluyter-Gäthje

Experts in our system

1
Andy Way
Dublin City University
Total Publications: 229
 
2
Haithem Afli
Dublin City University
Total Publications: 14
 
3
Pintu Lohar
Dublin City University
Total Publications: 10