Few-shot Multilingual Open-domain QA from 5 Examples

Fan Jiang; Tom Drummond; Trevor Cohn

Vol. 13 (2025)

TACL approved

Few-shot Multilingual Open-domain QA from 5 Examples

Published 2025-12-25

Fan Jiang
Tom Drummond
Trevor Cohn

Fan Jiang
University of Melbourne

Tom Drummond
The University of Melbourne

Trevor Cohn
The University of Melbourne

Abstract

Recent approaches to multilingual open-domain question answering (MLODQA) have achieved promising results given abundant language-specific training data.
However, the considerable annotation cost limits the probability of these methods for underrepresented languages.
We introduce a \emph{few-shot} learning approach that generates large-scale multilingual data from large language models (LLMs) with minimal supervision for MLODQA.
Our method begins with large-scale self-supervised pre-training by exploiting WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot examples.
The resulting model, FsModQA, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval.
We further show our method can be extended for effective zero-shot adaptation to new languages through a \emph{cross-lingual prompting} strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.

Article at MIT Press

Author Biography

Fan Jiang

First-year PhD Student at School of Computing and Information Systems, University of Melbourne/