Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut
Published
2014-04-30
Nathan Schneider
,
Emily Danchik
,
Chris Dyer
,
Noah A. Smith
Nathan Schneider
Carnegie Mellon University
Emily Danchik
Carnegie Mellon University
Chris Dyer
Carnegie Mellon University
Noah A. Smith
Carnegie Mellon University
Abstract
We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode a subset of projective MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for feature-rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE identification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation procedure, achieving 60% F1 for MWE identification.
PDF (Presented at ACL 2014)