Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence

Md Arafat Sultan; Steven Bethard; Tamara Sumner

Vol. 2 (2014)

TACL approved

Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence

Published 2014-05-31

Md Arafat Sultan
Steven Bethard
Tamara Sumner

Md Arafat Sultan
Institute of Cognitive Science and Department of Computer Science, University of Colorado Boulder

Steven Bethard

Tamara Sumner
Institute of Cognitive Science and Department of Computer Science, University of Colorado Boulder

Abstract

We present a simple, easy-to-replicate monolingual aligner that demonstrates state-of-the-art performance while relying on almost no supervision and a very small number of external resources. Based on the hypothesis that words with similar meanings represent potential pairs for alignment if located in similar contexts, we propose a system that operates by finding such pairs. In two intrinsic evaluations on alignment test data, our system achieves F1 scores of 88–92%, demonstrating 1–3% absolute improvement over the previous best system. Moreover, in two extrinsic evaluations our aligner outperforms existing aligners, and even a naive application of the aligner approaches state-of-the-art performance in each extrinsic task.

PDF (Presented at ACL 2014)