Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Melvin Johnson; Mike Schuster; Quoc V. Le; Maxim Krikun; Yonghui Wu; Zhifeng Chen; Nikhil Thorat; Fernanda Viégas; Martin Wattenberg; Greg Corrado; Macduff Hughes; Jeffrey Dean

Vol. 5 (2017)

TACL approved

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Published 2017-10-09

Melvin Johnson
Mike Schuster
Quoc V. Le
Maxim Krikun
Yonghui Wu
Zhifeng Chen
Nikhil Thorat
Fernanda Viégas
Martin Wattenberg
Greg Corrado
Macduff Hughes
Jeffrey Dean

Melvin Johnson
Google

Mike Schuster
Google

Quoc V. Le
Google

Maxim Krikun
Google

Yonghui Wu
Google

Zhifeng Chen
Google

Nikhil Thorat
Google

Fernanda Viégas
Google

Martin Wattenberg
Google

Greg Corrado
Google

Macduff Hughes
Google

Jeffrey Dean
Google

Abstract

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

PDF (presented at EMNLP 2017)

Author Biography

Melvin Johnson

Software Engineer at Google

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,
Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G.,
Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X.
Tensorflow: A system for large-scale machine learning. Tech. rep., Google Brain, 2016. arXiv preprint.
Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align
and translate. In International Conference on Learning Representations (2015).
Caglayan, O., Aransa, W., Wang, Y., Masana, M., García-Martínez, M., Bougares, F.,
Barrault, L., and van de Weijer, J. Does multimodality help human and machine for translation
and image captioning? In Proceedings of the First Conference on Machine Translation (Berlin, Germany,
August 2016), Association for Computational Linguistics, pp. 627–633.
Caruana, R. Multitask learning. In Learning to learn. Springer, 1998, pp. 95–133.
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H.,
and Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine
translation. CoRR abs/1406.1078 (2014).
Dong, D., Wu, H., He, W., Yu, D., and Wang, H. Multi-task learning for multiple language
translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics
and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of
Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers
(2015), pp. 1723–1732.
Firat, O., Cho, K., and Bengio, Y. Multi-way, multilingual neural machine translation with a shared
attention mechanism. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies, San Diego California,
USA, June 12-17, 2016 (2016), pp. 866–875.
Firat, O., Sankaran, B., Al-Onaizan, Y., Vural, F. T. Y., and Cho, K. Zero-resource
translation with multi-lingual neural machine translation. arXiv preprint arXiv:1606.04164 (2016).
Gillick, D., Brunk, C., Vinyals, O., and Subramanya, A. Multilingual language processing from
bytes. CoRR abs/1512.00103 (2015).
Hutchins, W. J., and Somers, H. L. An introduction to machine translation, vol. 362. Academic
Press London, 1992.
Jason, L., Cho, K., and Hofmann, T. Fully character-level neural machine translation without
explicit segmentation. arXiv preprint arXiv:1610.03017 (2016).
Luong, M.-T., Le, Q. V., Sutskever, I., Vinyals, O., and Kaiser, L. Multi-task sequence to
sequence learning. In International Conference on Learning Representations (2015).
Luong, M.-T., Pham, H., and Manning, C. D. Effective approaches to attention-based neural
machine translation. In Conference on Empirical Methods in Natural Language Processing (2015).
Luong, M.-T., Sutskever, I., Le, Q. V., Vinyals, O., and Zaremba, W. Addressing the rare word
problem in neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for
Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
(2015).
Maaten, L. V. D., and Hinton, G. Visualizing Data using t-SNE. Journal of Machine Learning
Research 9 (2008).
Richens, R. H. Interlingual machine translation. The Computer Journal 1, 3 (1958), 144–147.
Schultz, T., and Kirchhoff, K. Multilingual speech processing. Elsevier Academic Press, Amsterdam,
Boston, Paris, 2006.
Schuster, M., and Nakajima, K. Japanese and Korean voice search. 2012 IEEE International
Conference on Acoustics, Speech and Signal Processing (2012).
Sébastien, J., Kyunghyun, C., Memisevic, R., and Bengio, Y. On using very large target
vocabulary for neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association
for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
(2015).
Sennrich, R., Haddow, B., and Birch, A. Controlling politeness in neural machine translation via
side constraints. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, San Diego California, USA,
June 12-17, 2016 (2016), pp. 35–40.
Sennrich, R., Haddow, B., and Birch, A. Neural machine translation of rare words with subword
units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016).
Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. In
Advances in Neural Information Processing Systems (2014), pp. 3104–3112.
Tsvetkov, Y., Sitaram, S., Faruqui, M., Lample, G., Littell, P., Mortensen, D. R., Black,
A. W., Levin, L. S., and Dyer, C. Polyglot neural language models: A case study in cross-lingual
phonetic representation learning. CoRR abs/1605.03832 (2016).
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao,
Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Łukasz Kaiser,
Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W.,
Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and
Dean, J. Google’s neural machine translation system: Bridging the gap between human and machine
translation. arXiv preprint arXiv:1609.08144 (2016).
Zhou, J., Cao, Y., Wang, X., Li, P., and Xu, W. Deep recurrent models with fast-forward
connections for neural machine translation. CoRR abs/1606.04199 (2016).
Zoph, B., and Knight, K. Multi-source neural translation. In NAACL HLT 2016, The 2016 Conference
of the North American Chapter of the Association for Computational Linguistics