Learning Composition Models for Phrase Embeddings

Mo Yu; Mark Dredze

Vol. 3 (2015)

TACL approved

Learning Composition Models for Phrase Embeddings

Published 2015-05-12

Mo Yu
Mark Dredze

Mo Yu
Harbin Institute of Technology

Mark Dredze
Johns Hopkins University

Abstract

Lexical embeddings can serve as useful representations for words for a variety of NLP tasks, but learning embeddings for phrases can be challenging. While separate embeddings are learned for each word, this is infeasible for every phrase. We construct phrase embeddings by learning how to compose word embeddings using features that capture phrase structure and context. We propose efficient unsupervised and task-specific learning objectives that scale our model to large datasets. We demonstrate improvements on both language modeling and several phrase semantic similarity tasks with various phrase lengths. We make the implementation of our model and the datasets available for general use.

PDF (presented at NAACL 2016)