Linking localisation and language resources (2012)

Abstract Industrial localisation is changing from the periodic translation of large bodies of content to a long-tail of small, heterogeneous translations processed in an agile and demand-driven manner. Software localisation and crowd-source translation already practice continuous fine-grained distribution of translation work. This requires close integration and round-trip interoperabil- ity between content creation and localisation processes, while at the same time recording the provenance of translated content to maximise it reuse in future translation tasks, and, increasingly, in training Statistical Machine Translation (SMT) engines. This work adopts a Linked Data approach to integrating the content translation roundtrip process with the logging of process quality assur- ance provenance. This integration supports a pull-based interoperability model that supports continuous synchronising of content and process meta-data be- tween the generating organisation and any number of language service provid- ers or translators. We present a platform architecture for sharing, searching and interlinking of Linked Localisation and Language Data (termed L3Data) on the web. This is accomplished using a semantic schema for L3Data that is compati- ble with existing localisation data exchange standards and can be used to sup- port the round-trip sharing of language resources. The paper describes our ap- proach to development of L3Data schema and data management processes, web-based tools and data sharing infrastructure that use it. An initial proof of concept prototype is presented which implements a web application that seg- ments and machine translates content for crowd-sourced post-editing and rating.
