Optimizing Statistical Machine Translation for Text Simplification

Wei Xu; Courtney Napoles; Ellie Pavlick; Quanze Chen; Chris Callison-Burch

Vol. 4 (2016)

TACL approved

Optimizing Statistical Machine Translation for Text Simplification

Published 2016-07-27

Wei Xu
Courtney Napoles
Ellie Pavlick
Quanze Chen
Chris Callison-Burch

Wei Xu
University of Pennsylvania Computer and Information Science Department

Courtney Napoles
Johns Hopkins University Department of Computer Science

Ellie Pavlick
University of Pennsylvania Computer and Information Science Department

Quanze Chen
University of Pennsylvania

Chris Callison-Burch
University of Pennsylvania Computer and Information Science Department

Abstract

Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an in-depth adaptation of statistical machine translation to perform text simplification, taking advantage of large-scale paraphrases learned from bilingual texts and a small amount of manual simplifications with multiple references. Our work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.

PDF (presented at ACL 2016)