Improved CCG Parsing with Semi-supervised Supertagging

Mike Lewis; Mark Steedman

Vol. 2 (2014)

TACL approved

Improved CCG Parsing with Semi-supervised Supertagging

Published 2014-10-07

Mike Lewis
Mark Steedman

Mike Lewis
Edinburgh University

Mark Steedman
Edinburgh University

Abstract

Current supervised parsers are limited by the size of their labelled training data, making improving them with unlabelled data an important goal. We show how a state-of-the-art CCG parser can be enhanced, by predicting lexical categories using unsupervised vector-space embeddings of words. The use of word embeddings enables our model to better generalize from the labelled data, and allows us to accurately assign lexical categories without depending on a POS-tagger. Our approach leads to substantial improvements in dependency parsing results over the standard supervised CCG parser when evaluated on Wall Street Journal (0.8%), Wikipedia (1.8%) and biomedical (3.4%) text. We compare the performance of two recently proposed approaches for classification using a wide variety of word embeddings. We also give a detailed error analysis demonstrating where using embeddings outperforms traditional feature sets, and showing how including POS features can decrease accuracy.

PDF (Presented at EMNLP 2014)

Author Biography

Mike Lewis

PhD student at Edinburgh University