Type

Conference Proceedings

Authors

Andy Way
Teresa Lynn
Meghan Dowling

Subjects

Linguistics

Topics
user generated content machine translation machine translating irish english irish language crowd sourcing data collection minority languages

A crowd-sourcing approach for translations of minority language user-generated content (UGC) (2017)

Abstract Data sparsity is a common problem for machine translation of minority and less-resourced languages. While data collection for standard, grammatical text can be challenging enough, efforts for collection of parallel user-generated content can be even more challenging. In this paper we describe an approach to collecting English↔Irish translations of user-generated content (tweets) that overcomes some of these hurdles. We show how a crowd-sourced data collection campaign, which was tailored to our target audience (the Irish language community), proved successful in gathering data for a niche domain. We also discuss the reliability of crowd-sourcing English↔Irish tweet translations in terms of quality by reporting on a self-rating approach along with qualified reviewer ratings.
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: ADAPT
Ireland -> Dublin City University -> Subject = Humanities: Irish language
Ireland -> Dublin City University -> Status = Unpublished
Ireland -> Dublin City University -> Subject = Computer Science: Machine translating

Full list of authors on original publication

Andy Way, Teresa Lynn, Meghan Dowling

Experts in our system

1
Andy Way
Dublin City University
Total Publications: 229
 
2
Teresa Lynn
Dublin City University
Total Publications: 17
 
3
Meghan Dowling
Dublin City University
Total Publications: 4