Minimally-Supervised Morphological Segmentation using Adaptor Grammars

Kairit Sirts; Sharon Goldwater

Vol. 1 (2013)

TACL approved

Minimally-Supervised Morphological Segmentation using Adaptor Grammars

Published 2013-05-31

Kairit Sirts
Sharon Goldwater

Kairit Sirts
Tallinn University of Technology

Sharon Goldwater
The University of Edinburgh

Abstract

This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar, then use a small labelled data set to select which potential morph boundaries identified by the meta-grammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.

PDF (Presented at ACL 2013)