Measuring and Improving Consistency in Pretrained Language Models

Yanai Elazar; Nora Kassner; Shauli Ravfogel; Abhilasha Ravichander; Eduard Hovy; Hinrich Schütze; Yoav Goldberg

Vol. 9 (2021)

TACL approved

Measuring and Improving Consistency in Pretrained Language Models

Published 2022-01-04

Yanai Elazar
Nora Kassner
Shauli Ravfogel
Abhilasha Ravichander
Eduard Hovy
Hinrich Schütze
Yoav Goldberg

Yanai Elazar
Bar-Ilan University AI2

Nora Kassner
LMU Munich

Shauli Ravfogel
Bar-Ilan University AI2

Abhilasha Ravichander
Carnegie Mellon University

Eduard Hovy
Carnegie Mellon University

Hinrich Schütze
LMU Munich

Yoav Goldberg
Bar-Ilan University AI2

Abstract

Consistency of a model --- that is, the invariance of its behavior under meaning-preserving alternations in its input --- is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel, we show that the consistency of all PLMs we experiment with is poor -- though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.

Presented at EMNLP 2021 Article at MIT Press