Source-Free Domain Adaptation for Question Answering with Masked Self-training

Maxwell Juncheng Yin; Boyu Wang; Yue Dong; Charles Ling

Vol. 12 (2024)

TACL approved

Source-Free Domain Adaptation for Question Answering with Masked Self-training

Published 2024-06-15

Maxwell Juncheng Yin
Boyu Wang
Yue Dong
Charles Ling

Maxwell Juncheng Yin
Western University

Boyu Wang

Yue Dong

Charles Ling

Abstract

Previous unsupervised domain adaptation (UDA) methods for question answering (QA) require access to source domain data while fine-tuning the model for the target domain. Source domain data may, however, contain sensitive information and should be protected. In this study, we investigate a more challenging setting, source-free UDA, in which we have only the pretrained source model and target domain data, without access to source domain data. We propose a novel self-training approach to QA models that integrates a specially designed mask module for domain adaptation. The mask is auto-adjusted to extract key domain knowledge when trained on the source domain. To maintain previously learned domain knowledge, certain mask weights are frozen during adaptation, while other weights are adjusted to mitigate domain shifts with pseudo-labeled samples generated in the target domain. Our empirical results on four benchmark datasets suggest that our approach significantly enhances the performance of pretrained QA models on the target domain, and even outperforms models that have access to the source data during adaptation.

Presented at NAACL 2024 Article at MIT Press