Skip to main navigation menu Skip to main content Skip to site footer

FoVer: First-Order Logic Verification for Natural Language Reasoning

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in various tasks, including logical reasoning. However, their propensity for generating incorrect or inconsistent responses remains a significant concern. To address this issue, we propose FoVer (First-order logic Verification), an automated pipeline that verifies the logical correctness of reasoning texts using first-order logic. The pipeline operates in two main steps: (1) LLM-driven translation of natural language into executable logical expressions, and (2) automated logical verification using the Z3 theorem prover. We evaluate FoVer on specialized logical datasets (ProofWriter and FOLIO) and real-world LLM outputs (REVEAL). The results demonstrate that FoVer significantly outperforms existing methods in logical verification, showing notable improvements in reliability and accuracy across both ideal and practical scenarios. The pipeline also demonstrates its potential in identifying annotation errors in existing datasets, and it could be utilized for constructing new logical reasoning datasets. This work presents a significant step forward in enhancing the reliability and trustworthiness of LLM outputs, particularly in tasks requiring logical integrity.

Article at MIT Press Presented at EMNLP 2025