Towards General Natural Language Understanding with Probabilistic Worldbuilding

Abulhair Saparov; Tom Mitchell

Vol. 10 (2022)

TACL approved

Towards General Natural Language Understanding with Probabilistic Worldbuilding

Published 2022-04-05

Abulhair Saparov
Tom Mitchell

Abulhair Saparov
Carnegie Mellon University

Tom Mitchell
Carnegie Mellon University

Abstract

We introduce the Probabilistic Worldbuilding Model (PWM), a new fully-symbolic Bayesian model of semantic parsing and reasoning, as a first step in a research program toward more domain- and task-general NLU and AI. Humans create internal mental models of their observations which greatly aid in their ability to understand and reason about a large variety of problems. In PWM, the meanings of sentences, acquired facts about the world, and intermediate steps in reasoning are all expressed in a human-readable formal language, with the design goal of interpretability. PWM is Bayesian, designed specifically to be able to generalize to new domains and new tasks. We derive and implement an inference algorithm that reads sentences by parsing and abducing updates to its latent world model that capture the semantics of those sentences, and evaluate it on two out-of-domain question-answering datasets: (1) ProofWriter and (2) a new dataset we call FictionalGeoQA, designed to be more representative of real language but still simple enough to focus on evaluating reasoning ability, while being robust against heuristics. Our method outperforms baselines on both, thereby demonstrating its value as a proof-of-concept.

Presented at ACL 2022 Article at MIT Press