Skip to main navigation menu Skip to main content Skip to site footer

A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings

Abstract

We propose a novel generative model to explore both local and global context for joint learning topics and topic-specific word embeddings. In particular, we assume that global latent topics are shared across documents; a word is generated by a hidden semantic vector encoding its contextual semantic meaning; and its context words are generated conditional on both the hidden semantic vector and global latent topics. Topics are trained jointly with the word embeddings. The trained model maps words to topic-dependent embeddings, which naturally addresses the issue of word polysemy. Experimental results show that the proposed model outperforms the word-level embedding methods in both word similarity evaluation and word sense disambiguation. Furthermore, the model also extracts more coherent topics compared to existing neural topic models or other models for joint learning of topics and word embeddings. Finally, the model can be easily integrated with existing deep contextualized word embedding learning methods to further improve the performance of downstream tasks such as sentiment classification.

Article at MIT Press Presented at EMNLP 2020

References

  1. Oren Barkan. 2017. Bayesian neural word embedding. In AAAI, pages 3135-3143.
  2. Diane Bouchacourt, Ryota Tomioka, and Sebastian Nowozin. 2018. Multi-level variational autoencoder: Learning disentangled representations from grouped observations. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.
  3. Arthur Bražinskas, Serhii Havrylov, and Ivan Titov. 2018. Embedding words as distributions with a bayesian skip-gram model. In Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics: Technical Papers, pages 1775–1787.
  4. Eleftheria Briakou, Nikos Athanasiou, and Alexandros Potamianos. 2019. Cross-topic distributional semantic representations via unsupervised mappings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1052–1061, Minneapolis, Minnesota. Association for Computational Linguistics.
  5. Rajarshi Das, Manzil Zaheer, and Chris Dyer. 2015. Gaussian lda for topic models with word embeddings. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pages 795–804.
  6. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  7. Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web, pages 406–414.
  8. James R. Foulds. 2018. Mixed membership word embeddings for computational social science. In AISTATS, pages 86–95.
  9. Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 328-339.
  10. Ignacio Iacobacci and Roberto Navigli. 2019. LSTMEmbed: Learning word and sense representations from a large semantically annotated corpus with long short-term memories. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1685–1695, Florence, Italy. Association for Computational Linguistics.
  11. Shoaib Jameel and Steven Schockaert. 2019. Word and document embedding with vMF-mixture priors on context word vectors. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3319–3328, Florence, Italy. Association for Computational Linguistics.
  12. Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul. 1999. An introduction to variational methods for graphical models. Machine learning, 37(2):183–233.
  13. Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. stat, 1050:1.
  14. Chenliang Li, Yu Duan, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2017. Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Transactions on Information Systems (TOIS), 36(2):11.
  15. Yang Liu, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2015. Topical word embeddings. In AAAI, pages 2418–2424.
  16. Diana McCarthy and Roberto Navigli. 2007. Semeval-2007 task 10: English lexical substitution task. In Proceedings of the 4th International Workshop on Semantic Evaluations, pages 48-53. Association for Computational Linguistics.
  17. Oren Melamud, Omer Levy, and Ido Dagan. 2015. A simple word embedding model for lexical substitution. In Proceedings of NAACL-HLT, pages 1-7.
  18. Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In International Conference on Machine Learning (ICML), pages 1727–1736.
  19. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. CoRR 2013.
  20. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Proc. of the 26th Int. Conf. on Neural Information Processing Systems, NIPS’13, pages 3111–3119, USA. Curran Associates Inc.
  21. Roberto Navigli and Simone Paolo Ponzetto. 2012. Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217–250.
  22. Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum. 2014. Efficient non-parametric estimation of multiple embeddings per word in vector space. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1059–1069.
  23. Dat Quoc Nguyen, Richard Billingsley, Lan Du, and Mark Johnson. 2015. Improving topic models with latent feature word representations. Transactions of ACL, pages 299–313.
  24. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
  25. Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), volume 1, pages 2227–2237.
  26. Mohammad Taher Pilehvar and Nigel Collier. 2016. De-conflated semantic representations. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1680–1690.
  27. Miguel Rios, Wilker Aziz, and Khalil Sima’an. 2018. Deep generative model for joint alignment and word representation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1011–1023.
  28. Michael Röder, Andreas Both, and Alexander Hinneburg. 2015. Exploring the space of topic coherence measures. In Proceedings of the eight International Conference on Web Search and Data Mining, Shanghai, February 2-6.
  29. Frank Rosner, Alexander Hinneburg, Michael Röder, Martin Nettling, and Andreas Both. 2014. Evaluating topic coherence measures. CoRR, abs/1403.6397.
  30. Hassan Saif, Yulan He, Miriam Fernandez, and Harith Alani. 2016. Contextual semantics for sentiment analysis of twitter. Information Processing & Management, 52(1):5–19. Bei Shi, Wai Lam, Shoaib Jameel, Steven Schockaert, and Kwun Ping Lai. 2017. Jointly learning word embeddings and latent topics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages 375–384. ACM.
  31. Akash Srivastava and Charles Sutton. 2017. Autoencoding variational inference for topic models. In Proceedings of the 5th International Conference on Learning Representations (ICLR).
  32. Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune bert for text classification? In China National Conference on Chinese Computational Linguistics, pages 194–206. Springer.
  33. Stefan Thater, Hagen Fürstenau, and Manfred Pinkal. 2011. Word meaning in context: A simple and effective vector model. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 1134–1143.
  34. Nam N Tran and Joon Lee. 2017. Online reviews as health data: Examining the association between availability of health care services and patient star ratings exemplified by the yelp academic dataset. JMIR public health and surveillance, 3(3).
  35. Luke Vilnis and Andrew McCallum. 2015. Word representations via gaussian embedding. In ICLR.
  36. Hongteng Xu, Wenlin Wang, Wei Liu, and Lawrence Carin. 2018. Distilled Wasserstein learning for word embedding and topic modeling. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31 , pages 1722–1731. Curran Associates, Inc.
  37. Deniz Yuret. 2007. Ku: Word sense disambiguation by substitution. In Proceedings of the 4th International Workshop on Semantic Evaluations, pages 207–213. Association for Computational Linguistics.
  38. He Zhao, Lan Du, Wray Buntine, and Mingyuan Zhou. 2018. Inter and intra topic structure learning with word embeddings. In International Conference on Machine Learning (ICML), pages 5887–5896.