Comparing Apples to Apple: The Effects of Stemmers on Topic Models

Alexandra Schofield; David Mimno

Vol. 4 (2016)

TACL approved

Comparing Apples to Apple: The Effects of Stemmers on Topic Models

Published 2016-07-12

Alexandra Schofield
David Mimno

Alexandra Schofield
Cornell University

David Mimno
Cornell University

Abstract

Rule-based stemmers such as the Porter stemmer are frequently used to preprocess English corpora for topic modeling. In this work, we train and evaluate topic models on a variety of corpora using several different stemming algorithms. We examine several different quantitative measures of the resulting models, including likelihood, coherence, model stability, and entropy. Despite their frequent use in topic modeling, we find that stemmers produce no meaningful improvement in likelihood and coherence and in fact can degrade topic stability.

PDF (presented at EMNLP 2016)

Author Biography

Alexandra Schofield

PhD Student, Department of Computer Science

David Mimno

Assistant Professor, Department of Information Science