Skip to main navigation menu Skip to main content Skip to site footer

Few-shot Multilingual Open-domain QA from 5 Examples

Abstract

Recent approaches to multilingual open-domain question answering (MLODQA) have achieved promising results given abundant language-specific training data.
However, the considerable annotation cost limits the probability of these methods for underrepresented languages.
We introduce a \emph{few-shot} learning approach that generates large-scale multilingual data from large language models (LLMs) with minimal supervision for MLODQA.
Our method begins with large-scale self-supervised pre-training by exploiting WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot examples.
The resulting model, FsModQA, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval.
We further show our method can be extended for effective zero-shot adaptation to new languages through a \emph{cross-lingual prompting} strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.

Article at MIT Press

Author Biography

Fan Jiang

First-year PhD Student at School of Computing and Information Systems, University of Melbourne/