Parsing with Traces: An O(n^4) Algorithm and a Structural Representation

Jonathan K. Kummerfeld; Dan Klein

Vol. 5 (2017)

TACL approved

Parsing with Traces: An O(n^4) Algorithm and a Structural Representation

Published 2017-11-25

Jonathan K. Kummerfeld
Dan Klein

Jonathan K. Kummerfeld
University of California, Berkeley

Dan Klein
University of California, Berkeley

Abstract

General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons. We propose a new representation and algorithm for a class of graph structures that is flexible enough to cover almost all treebank structures, while still admitting efficient learning and inference. In particular, we consider directed, acyclic, one-endpoint-crossing graph structures, which cover most long-distance dislocation, shared argumentation, and similar tree-violating linguistic phenomena. We describe how to convert phrase structure parses, including traces, to our new representation in a reversible manner. Our dynamic program uniquely decomposes structures, is sound and complete, and covers 97.3% of the Penn English Treebank. We also implement a proof-of-concept parser that recovers a range of null elements and trace types.

PDF (presented at EMNLP 2017)