Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean

Abstract


We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.


References


Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,

Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G.,

Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X.

Tensorflow: A system for large-scale machine learning. Tech. rep., Google Brain, 2016. arXiv preprint.

Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align

and translate. In International Conference on Learning Representations (2015).

Caglayan, O., Aransa, W., Wang, Y., Masana, M., García-Martínez, M., Bougares, F.,

Barrault, L., and van de Weijer, J. Does multimodality help human and machine for translation

and image captioning? In Proceedings of the First Conference on Machine Translation (Berlin, Germany,

August 2016), Association for Computational Linguistics, pp. 627–633.

Caruana, R. Multitask learning. In Learning to learn. Springer, 1998, pp. 95–133.

Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H.,

and Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine

translation. CoRR abs/1406.1078 (2014).

Dong, D., Wu, H., He, W., Yu, D., and Wang, H. Multi-task learning for multiple language

translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics

and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of

Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers

(2015), pp. 1723–1732.

Firat, O., Cho, K., and Bengio, Y. Multi-way, multilingual neural machine translation with a shared

attention mechanism. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of

the Association for Computational Linguistics: Human Language Technologies, San Diego California,

USA, June 12-17, 2016 (2016), pp. 866–875.

Firat, O., Sankaran, B., Al-Onaizan, Y., Vural, F. T. Y., and Cho, K. Zero-resource

translation with multi-lingual neural machine translation. arXiv preprint arXiv:1606.04164 (2016).

Gillick, D., Brunk, C., Vinyals, O., and Subramanya, A. Multilingual language processing from

bytes. CoRR abs/1512.00103 (2015).

Hutchins, W. J., and Somers, H. L. An introduction to machine translation, vol. 362. Academic

Press London, 1992.

Jason, L., Cho, K., and Hofmann, T. Fully character-level neural machine translation without

explicit segmentation. arXiv preprint arXiv:1610.03017 (2016).

Luong, M.-T., Le, Q. V., Sutskever, I., Vinyals, O., and Kaiser, L. Multi-task sequence to

sequence learning. In International Conference on Learning Representations (2015).

Luong, M.-T., Pham, H., and Manning, C. D. Effective approaches to attention-based neural

machine translation. In Conference on Empirical Methods in Natural Language Processing (2015).

Luong, M.-T., Sutskever, I., Le, Q. V., Vinyals, O., and Zaremba, W. Addressing the rare word

problem in neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for

Computational Linguistics and the 7th International Joint Conference on Natural Language Processing

(2015).

Maaten, L. V. D., and Hinton, G. Visualizing Data using t-SNE. Journal of Machine Learning

Research 9 (2008).

Richens, R. H. Interlingual machine translation. The Computer Journal 1, 3 (1958), 144–147.

Schultz, T., and Kirchhoff, K. Multilingual speech processing. Elsevier Academic Press, Amsterdam,

Boston, Paris, 2006.

Schuster, M., and Nakajima, K. Japanese and Korean voice search. 2012 IEEE International

Conference on Acoustics, Speech and Signal Processing (2012).

Sébastien, J., Kyunghyun, C., Memisevic, R., and Bengio, Y. On using very large target

vocabulary for neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association

for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing

(2015).

Sennrich, R., Haddow, B., and Birch, A. Controlling politeness in neural machine translation via

side constraints. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the

Association for Computational Linguistics: Human Language Technologies, San Diego California, USA,

June 12-17, 2016 (2016), pp. 35–40.

Sennrich, R., Haddow, B., and Birch, A. Neural machine translation of rare words with subword

units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016).

Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. In

Advances in Neural Information Processing Systems (2014), pp. 3104–3112.

Tsvetkov, Y., Sitaram, S., Faruqui, M., Lample, G., Littell, P., Mortensen, D. R., Black,

A. W., Levin, L. S., and Dyer, C. Polyglot neural language models: A case study in cross-lingual

phonetic representation learning. CoRR abs/1605.03832 (2016).

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao,

Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Łukasz Kaiser,

Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W.,

Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and

Dean, J. Google’s neural machine translation system: Bridging the gap between human and machine

translation. arXiv preprint arXiv:1609.08144 (2016).

Zhou, J., Cao, Y., Wang, X., Li, P., and Xu, W. Deep recurrent models with fast-forward

connections for neural machine translation. CoRR abs/1606.04199 (2016).

Zoph, B., and Knight, K. Multi-source neural translation. In NAACL HLT 2016, The 2016 Conference

of the North American Chapter of the Association for Computational Linguistics


Refbacks

  • There are currently no refbacks.


Copyright (c) 2017 Association for Computational Linguistics

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.