Questionable Answers in Question Answering Research: Reproducibility and Variability of Published Results

Matt Crane

Vol. 6 (2018)

TACL approved

Questionable Answers in Question Answering Research: Reproducibility and Variability of Published Results

Published 2018-04-27

Matt Crane

Matt Crane
University of Waterloo

Abstract

Based on theoretical reasoning it has been suggested that the reliability of findings published in the scientific literature decreases with the popularity of a research field" (Pfeiffer and Hoffmann, 2009). As we know, deep learning is very popular and the ability to reproduce results is an important part of science. There is growing concern within the deep learning community about the reproducibility of results that are presented. In this paper we present a number of controllable, yet unreported, effects that can substantially change the effectiveness of a sample model, and thusly the reproducibility of those results. Through these environmental effects we show that the commonly held belief that distribution of source code is all that is needed for reproducibility is not enough. Source code without a reproducible environment does not mean anything at all. In addition the range of results produced from these effects can be larger than the majority of incremental improvement reported.

Article at MIT Press PDF (presented at NAACL 2018)