Multilingual Denoising Pre-training for Neural Machine Translation

Yinhan Liu; Jiatao Gu; Naman Goyal; Xian Li; Sergey Edunov; Marjan Ghazvininejad; Mike Lewis; Luke Zettlemoyer

Vol. 8 (2020)

TACL approved

Multilingual Denoising Pre-training for Neural Machine Translation

Published 2020-11-27

Yinhan Liu
Jiatao Gu
Naman Goyal
Xian Li
Sergey Edunov
Marjan Ghazvininejad
Mike Lewis
Luke Zettlemoyer

Yinhan Liu
Birch.AI

Jiatao Gu
Facebook AI Research

Naman Goyal
Facebook AI Research

Xian Li
Facebook AI Applied Research

Sergey Edunov
Facebook AI Research

Marjan Ghazvininejad
Facebook AI Research

Mike Lewis
Facebook AI Research

Luke Zettlemoyer
Facebook AI Research

Abstract

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine-tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it enables transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.

Article at MIT Press Presented at EMNLP 2020