Type

Conference Proceedings

Authors

Josef van Genabith
Djamé Seddah
Joachim Wagner
Jennifer Foster

Subjects

Computer Science

Topics
section 23 corpus machine translating national self training parsers training data domain

Adapting WSJ-trained parsers to the British national corpus using in-domain self-training (2007)

Abstract We introduce a set of 1,000 gold standard parse trees for the British National Corpus (BNC) and perform a series of self-training experiments with Charniak and Johnson’s reranking parser and BNC sentences. We show that retraining this parser with a combination of one million BNC parse trees (produced by the same parser) and the original WSJ training data yields improvements of 0.4% on WSJ Section 23 and 1.7% on the new BNC gold standard set.
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> Subject = Computer Science
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> Subject = Computer Science: Machine translating
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: National Centre for Language Technology (NCLT)
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres

Full list of authors on original publication

Josef van Genabith, Djamé Seddah, Joachim Wagner, Jennifer Foster

Experts in our system

1
Joachim Wagner
Dublin City University
Total Publications: 24
 
2
Jennifer Foster
Dublin City University
Total Publications: 53