Simultaneous Selection and Adaptation of Source Data via Four-Level Optimization

Pengtao Xie; Xingchen Zhao; Xuehai He

Vol. 12 (2024)

TACL approved

Simultaneous Selection and Adaptation of Source Data via Four-Level Optimization

Published 2024-05-25

Pengtao Xie
Xingchen Zhao
Xuehai He

Pengtao Xie
Carnegie Mellon University

Xingchen Zhao

Xuehai He

Abstract

In many NLP applications, to mitigate data deficiency in a target task, source data is collected to help with target model training. Existing transfer learning methods either select a subset of source examples that are close to the target domain or try to adapt all source examples into the target domain, then use selected or adapted source examples to train the target model. These methods either incur significant information loss or bear the risk that after adaptation, source examples which are originally already in the target domain may be outside the target domain. To address the limitations of these methods, we propose a four-level optimization based framework which simultaneously selects and adapts source data. Our method can automatically identify in-domain and out-of-domain source examples and apply example-specific processing methods: selection for in-domain examples and adaptation for out-of-domain examples. Experiments on various datasets demonstrate the effectiveness of our proposed method.

Article at MIT Press