Many Languages, One Parser

Waleed Ammar; George Mulcaire; Miguel Ballesteros; Chris Dyer; Noah A. Smith

Vol. 4 (2016)

TACL approved

Many Languages, One Parser

Published 2016-07-28

Waleed Ammar
George Mulcaire
Miguel Ballesteros
Chris Dyer
Noah A. Smith

Waleed Ammar
Carnegie Mellon University

George Mulcaire
University of Washington

Miguel Ballesteros
Pompeu Fabra University

Chris Dyer
Carnegie Mellon University

Noah A. Smith
University of Washington

Abstract

We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser's performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.

PDF (presented at ACL 2016)