Type

Conference Proceedings

Authors

Andy Way
Josef van Genabith
Mairead McCarthy
Aoife Cahill

Subjects

Computer Science

Topics
parsing probabilistic context free grammar penn ii structures automatic annotation section 23 machine translating algorithm

Parsing with PCFGs and automatic f-structure annotation (2002)

Abstract The development of large coverage, rich unification- (constraint-) based grammar resources is very time consuming, expensive and requires lots of linguistic expertise. In this paper we report initial results on a new methodology that attempts to partially automate the development of substantial parts of large coverage, rich unification- (constraint-) based grammar resources. The method is based on a treebank resource (in our case Penn-II) and an automatic f-structure annotation algorithm that annotates treebank trees with proto-f-structure information. Based on these, we present two parsing architectures: in our pipeline architecture we first extract a PCFG from the treebank following the method of (Charniak,1996), use the PCFG to parse new text, automatically annotate the resulting trees with our f-structure annotation algorithm and generate proto-f-structures. By contrast, in the integrated architecture we first automatically annotate the treebank trees with f-structure information and then extract an annotated PCFG (A-PCFG) from the treebank. We then use the A-PCFG to parse new text to generate proto-f-structures. Currently our best parsers achieve more than 81% f-score on the 2400 trees in section 23 of the Penn-II treebank and more than 60% f-score on gold-standard proto-f-structures for 105 randomly selected trees from section 23.
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> Subject = Computer Science
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> Subject = Computer Science: Machine translating
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: National Centre for Language Technology (NCLT)
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres

Full list of authors on original publication

Andy Way, Josef van Genabith, Mairead McCarthy, Aoife Cahill

Experts in our system

1
Andy Way
Dublin City University
Total Publications: 229
 
2
Josef van Genabith
Dublin City University
Total Publications: 115