A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment

Jing Wang; Mohit Bansal; Kevin Gimpel; Brian D. Ziebart; Clement T. Yu

Vol. 3 (2015)

TACL approved

A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment

Published 2015-01-20

Jing Wang
Mohit Bansal
Kevin Gimpel
Brian D. Ziebart
Clement T. Yu

Jing Wang
University of Illinois at Chicago, Chicago, IL, 60607, USA

Mohit Bansal
Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA

Kevin Gimpel
Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA

Brian D. Ziebart
University of Illinois at Chicago, Chicago, IL, 60607, USA

Clement T. Yu
University of Illinois at Chicago, Chicago, IL, 60607, USA

Abstract

Word sense induction (WSI) seeks to automatically discover the senses of a word in a corpus via unsupervised methods. We propose a sense-topic model for WSI, which treats sense and topic as two separate latent variables to be inferred jointly. Topics are informed by the entire document, while senses are informed by the local context surrounding the ambiguous word. We also discuss unsupervised ways of enriching the original corpus in order to improve model performance, including using neural word embeddings and external corpora to expand the context of each data instance. We demonstrate significant improvements over the previous state-of-the-art, achieving the best results reported to date on the SemEval-2013 WSI task.

PDF (presented at NAACL 2015)

Author Biography

Jing Wang

Computer Science