Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization

Thomas Effland; Michael Collins

Vol. 11 (2023)

TACL approved

Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization

Published 2023-01-12

Thomas Effland
Michael Collins

Thomas Effland
Columbia University

Michael Collins
Google Research

Abstract

We present Expected Statistic Regularization (ESR), a novel regularization technique that utilizes low-order multi-task structural statistics to shape model distributions for semi-supervised learning on low-resource datasets. We study ESR in the context of cross-lingual transfer for syntactic analysis (POS tagging and labeled dependency parsing) and present several classes of low-order statistic functions that bear on model behavior. Experimentally, we evaluate the proposed statistics with ESR for unsupervised transfer on 5 diverse target languages and show that all statistics, when estimated accurately, yield improvements to both POS and LAS, with the best statistic improving POS by +7.0 and LAS by +8.5 on average. We also present semi-supervised transfer and learning curve experiments that show ESR provides significant gains over strong cross-lingual-transfer-plus-fine-tuning baselines for modest amounts of label data. These results indicate that ESR is a promising and complementary approach to model-transfer approaches for cross-lingual parsing.