Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut

Nathan Schneider; Emily Danchik; Chris Dyer; Noah A. Smith

Vol. 2 (2014)

TACL approved

Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut

Published 2014-04-30

Nathan Schneider
Emily Danchik
Chris Dyer
Noah A. Smith

Nathan Schneider
Carnegie Mellon University

Emily Danchik
Carnegie Mellon University

Chris Dyer
Carnegie Mellon University

Noah A. Smith
Carnegie Mellon University

Abstract

We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode a subset of projective MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for feature-rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE identification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation procedure, achieving 60% F₁ for MWE identification.

PDF (Presented at ACL 2014)