Transformers as Transducers

Lena Strobl; Dana Angluin; David Chiang; Jonathan Rawski; Ashish Sabharwal

Vol. 13 (2025)

TACL approved

Transformers as Transducers

Published 2025-12-25

Lena Strobl
Dana Angluin
David Chiang
Jonathan Rawski
Ashish Sabharwal

Lena Strobl
Umeå University, Allen Institute for Artificial Intelligence

Dana Angluin
Yale University

David Chiang
University of Notre Dame

Jonathan Rawski
MIT & San Jose State University

Ashish Sabharwal
Allen Institute for AI

Abstract

We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of (total functional) transductions. We do so using variants of RASP, a programming language designed to help people "think like transformers,'' as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence transductions and show that it computes exactly the first-order rational transductions (such as string rotation). Then, we introduce two new extensions. B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular transductions. S-RASP adds prefix sum, which enables additional arithmetic operations (such as squaring a string) and contains all first-order polyregular transductions. Finally, we show that masked average-hard attention transformers can simulate S-RASP.

Presented at ACL 2025 Article at MIT Press