Encoding Prior Knowledge with Eigenword Embeddings

Dominique Osborne; Shashi Narayan; Shay B. Cohen

Vol. 4 (2016)

TACL approved

Encoding Prior Knowledge with Eigenword Embeddings

Published 2016-07-29

Dominique Osborne
Shashi Narayan
Shay B. Cohen

Dominique Osborne
University of Strathclyde

Shashi Narayan
University of Edinburgh

Shay B. Cohen
University of Edinburgh

Abstract

Canonical correlation analysis (CCA) is a method for reducing the dimension of data represented using two views. It has been previously used to derive word embeddings, where one view indicates a word, and the other view indicates its context. We describe a way to incorporate prior knowledge into CCA, give a theoretical justification for it, and test it by deriving word embeddings and evaluating them on a myriad of datasets.

PDF (presented at EACL 2017)