Skip to main navigation menu Skip to main content Skip to site footer

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Abstract

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

PDF (presented at EMNLP 2017)

Author Biography

Melvin Johnson

Software Engineer at Google


References

  1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,
  2. Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G.,
  3. Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X.
  4. Tensorflow: A system for large-scale machine learning. Tech. rep., Google Brain, 2016. arXiv preprint.
  5. Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align
  6. and translate. In International Conference on Learning Representations (2015).
  7. Caglayan, O., Aransa, W., Wang, Y., Masana, M., García-Martínez, M., Bougares, F.,
  8. Barrault, L., and van de Weijer, J. Does multimodality help human and machine for translation
  9. and image captioning? In Proceedings of the First Conference on Machine Translation (Berlin, Germany,
  10. August 2016), Association for Computational Linguistics, pp. 627–633.
  11. Caruana, R. Multitask learning. In Learning to learn. Springer, 1998, pp. 95–133.
  12. Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H.,
  13. and Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine
  14. translation. CoRR abs/1406.1078 (2014).
  15. Dong, D., Wu, H., He, W., Yu, D., and Wang, H. Multi-task learning for multiple language
  16. translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics
  17. and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of
  18. Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers
  19. (2015), pp. 1723–1732.
  20. Firat, O., Cho, K., and Bengio, Y. Multi-way, multilingual neural machine translation with a shared
  21. attention mechanism. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of
  22. the Association for Computational Linguistics: Human Language Technologies, San Diego California,
  23. USA, June 12-17, 2016 (2016), pp. 866–875.
  24. Firat, O., Sankaran, B., Al-Onaizan, Y., Vural, F. T. Y., and Cho, K. Zero-resource
  25. translation with multi-lingual neural machine translation. arXiv preprint arXiv:1606.04164 (2016).
  26. Gillick, D., Brunk, C., Vinyals, O., and Subramanya, A. Multilingual language processing from
  27. bytes. CoRR abs/1512.00103 (2015).
  28. Hutchins, W. J., and Somers, H. L. An introduction to machine translation, vol. 362. Academic
  29. Press London, 1992.
  30. Jason, L., Cho, K., and Hofmann, T. Fully character-level neural machine translation without
  31. explicit segmentation. arXiv preprint arXiv:1610.03017 (2016).
  32. Luong, M.-T., Le, Q. V., Sutskever, I., Vinyals, O., and Kaiser, L. Multi-task sequence to
  33. sequence learning. In International Conference on Learning Representations (2015).
  34. Luong, M.-T., Pham, H., and Manning, C. D. Effective approaches to attention-based neural
  35. machine translation. In Conference on Empirical Methods in Natural Language Processing (2015).
  36. Luong, M.-T., Sutskever, I., Le, Q. V., Vinyals, O., and Zaremba, W. Addressing the rare word
  37. problem in neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for
  38. Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
  39. (2015).
  40. Maaten, L. V. D., and Hinton, G. Visualizing Data using t-SNE. Journal of Machine Learning
  41. Research 9 (2008).
  42. Richens, R. H. Interlingual machine translation. The Computer Journal 1, 3 (1958), 144–147.
  43. Schultz, T., and Kirchhoff, K. Multilingual speech processing. Elsevier Academic Press, Amsterdam,
  44. Boston, Paris, 2006.
  45. Schuster, M., and Nakajima, K. Japanese and Korean voice search. 2012 IEEE International
  46. Conference on Acoustics, Speech and Signal Processing (2012).
  47. Sébastien, J., Kyunghyun, C., Memisevic, R., and Bengio, Y. On using very large target
  48. vocabulary for neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association
  49. for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
  50. (2015).
  51. Sennrich, R., Haddow, B., and Birch, A. Controlling politeness in neural machine translation via
  52. side constraints. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the
  53. Association for Computational Linguistics: Human Language Technologies, San Diego California, USA,
  54. June 12-17, 2016 (2016), pp. 35–40.
  55. Sennrich, R., Haddow, B., and Birch, A. Neural machine translation of rare words with subword
  56. units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016).
  57. Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. In
  58. Advances in Neural Information Processing Systems (2014), pp. 3104–3112.
  59. Tsvetkov, Y., Sitaram, S., Faruqui, M., Lample, G., Littell, P., Mortensen, D. R., Black,
  60. A. W., Levin, L. S., and Dyer, C. Polyglot neural language models: A case study in cross-lingual
  61. phonetic representation learning. CoRR abs/1605.03832 (2016).
  62. Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao,
  63. Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Łukasz Kaiser,
  64. Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W.,
  65. Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and
  66. Dean, J. Google’s neural machine translation system: Bridging the gap between human and machine
  67. translation. arXiv preprint arXiv:1609.08144 (2016).
  68. Zhou, J., Cao, Y., Wang, X., Li, P., and Xu, W. Deep recurrent models with fast-forward
  69. connections for neural machine translation. CoRR abs/1606.04199 (2016).
  70. Zoph, B., and Knight, K. Multi-source neural translation. In NAACL HLT 2016, The 2016 Conference
  71. of the North American Chapter of the Association for Computational Linguistics