Multilingual Autoregressive Entity Linking

Nicola De Cao; Ledell Wu; Kashyap Popat; Mikel Artetxe; Naman Goyal; Mikhail Plekhanov; Luke Zettlemoyer; Nicola Cancedda; Sebastian Riedel; Fabio Petroni

Vol. 10 (2022)

TACL approved

Multilingual Autoregressive Entity Linking

Published 2022-03-25

Nicola De Cao
Ledell Wu
Kashyap Popat
Mikel Artetxe
Naman Goyal
Mikhail Plekhanov
Luke Zettlemoyer
Nicola Cancedda
Sebastian Riedel
Fabio Petroni

Nicola De Cao
University of Amsterdam University of Edinburgh Facebook AI

Ledell Wu
Facebook AI

Kashyap Popat
Facebook AI

Mikel Artetxe
Facebook AI

Naman Goyal
Facebook AI

Mikhail Plekhanov
Facebook AI

Luke Zettlemoyer
Facebook AI University of Washington

Nicola Cancedda
Facebook AI

Sebastian Riedel
Facebook AI University College London

Fabio Petroni
Facebook AI

Abstract

We present mGENRE, a sequence-to-sequence system for the Multilingual Entity Linking (MEL) problem---the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode mention string and entity names to capture more interactions than the standard dot product between mention and entity vectors. It also enables fast search within a large KB even for mentions that do not appear in mention tables and with no need for large-scale vector indices. While prior MEL works use a single representation for each entity, we match against entity names of as many languages as possible, which allows exploiting language connections between source input and target name. Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time. This leads to over 50% improvements in average accuracy. We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks where we establish new state-of-the-art results. Source code available at https://github.com/facebookresearch/GENRE.

Presented at ACL 2022 Article at MIT Press