Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Abstract
We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.
Author Biography
Melvin Johnson
Software Engineer at Google
References
- Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,
- Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G.,
- Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X.
- Tensorflow: A system for large-scale machine learning. Tech. rep., Google Brain, 2016. arXiv preprint.
- Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align
- and translate. In International Conference on Learning Representations (2015).
- Caglayan, O., Aransa, W., Wang, Y., Masana, M., García-Martínez, M., Bougares, F.,
- Barrault, L., and van de Weijer, J. Does multimodality help human and machine for translation
- and image captioning? In Proceedings of the First Conference on Machine Translation (Berlin, Germany,
- August 2016), Association for Computational Linguistics, pp. 627–633.
- Caruana, R. Multitask learning. In Learning to learn. Springer, 1998, pp. 95–133.
- Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H.,
- and Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine
- translation. CoRR abs/1406.1078 (2014).
- Dong, D., Wu, H., He, W., Yu, D., and Wang, H. Multi-task learning for multiple language
- translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics
- and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of
- Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers
- (2015), pp. 1723–1732.
- Firat, O., Cho, K., and Bengio, Y. Multi-way, multilingual neural machine translation with a shared
- attention mechanism. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of
- the Association for Computational Linguistics: Human Language Technologies, San Diego California,
- USA, June 12-17, 2016 (2016), pp. 866–875.
- Firat, O., Sankaran, B., Al-Onaizan, Y., Vural, F. T. Y., and Cho, K. Zero-resource
- translation with multi-lingual neural machine translation. arXiv preprint arXiv:1606.04164 (2016).
- Gillick, D., Brunk, C., Vinyals, O., and Subramanya, A. Multilingual language processing from
- bytes. CoRR abs/1512.00103 (2015).
- Hutchins, W. J., and Somers, H. L. An introduction to machine translation, vol. 362. Academic
- Press London, 1992.
- Jason, L., Cho, K., and Hofmann, T. Fully character-level neural machine translation without
- explicit segmentation. arXiv preprint arXiv:1610.03017 (2016).
- Luong, M.-T., Le, Q. V., Sutskever, I., Vinyals, O., and Kaiser, L. Multi-task sequence to
- sequence learning. In International Conference on Learning Representations (2015).
- Luong, M.-T., Pham, H., and Manning, C. D. Effective approaches to attention-based neural
- machine translation. In Conference on Empirical Methods in Natural Language Processing (2015).
- Luong, M.-T., Sutskever, I., Le, Q. V., Vinyals, O., and Zaremba, W. Addressing the rare word
- problem in neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for
- Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
- (2015).
- Maaten, L. V. D., and Hinton, G. Visualizing Data using t-SNE. Journal of Machine Learning
- Research 9 (2008).
- Richens, R. H. Interlingual machine translation. The Computer Journal 1, 3 (1958), 144–147.
- Schultz, T., and Kirchhoff, K. Multilingual speech processing. Elsevier Academic Press, Amsterdam,
- Boston, Paris, 2006.
- Schuster, M., and Nakajima, K. Japanese and Korean voice search. 2012 IEEE International
- Conference on Acoustics, Speech and Signal Processing (2012).
- Sébastien, J., Kyunghyun, C., Memisevic, R., and Bengio, Y. On using very large target
- vocabulary for neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association
- for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
- (2015).
- Sennrich, R., Haddow, B., and Birch, A. Controlling politeness in neural machine translation via
- side constraints. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the
- Association for Computational Linguistics: Human Language Technologies, San Diego California, USA,
- June 12-17, 2016 (2016), pp. 35–40.
- Sennrich, R., Haddow, B., and Birch, A. Neural machine translation of rare words with subword
- units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016).
- Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with neural networks. In
- Advances in Neural Information Processing Systems (2014), pp. 3104–3112.
- Tsvetkov, Y., Sitaram, S., Faruqui, M., Lample, G., Littell, P., Mortensen, D. R., Black,
- A. W., Levin, L. S., and Dyer, C. Polyglot neural language models: A case study in cross-lingual
- phonetic representation learning. CoRR abs/1605.03832 (2016).
- Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao,
- Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Łukasz Kaiser,
- Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W.,
- Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and
- Dean, J. Google’s neural machine translation system: Bridging the gap between human and machine
- translation. arXiv preprint arXiv:1609.08144 (2016).
- Zhou, J., Cao, Y., Wang, X., Li, P., and Xu, W. Deep recurrent models with fast-forward
- connections for neural machine translation. CoRR abs/1606.04199 (2016).
- Zoph, B., and Knight, K. Multi-source neural translation. In NAACL HLT 2016, The 2016 Conference
- of the North American Chapter of the Association for Computational Linguistics