Context-Aware Machine Translation with Source Coreference Explanation

Huy Hien Vu; Hidetaka Kamigaito; Taro Watanabe

Vol. 12 (2024)

TACL approved

Context-Aware Machine Translation with Source Coreference Explanation

Published 2024-07-17

Huy Hien Vu
Hidetaka Kamigaito
Taro Watanabe

Huy Hien Vu
The Division of Information Science, Nara Institute of Science and Technology

Hidetaka Kamigaito
The Division of Information Science, Nara Institute of Science and Technology

Taro Watanabe
The Division of Information Science, Nara Institute of Science and Technology

Abstract

Classification systems are evaluated in a countless number of papers. However, we find that evaluation practice is often nebulous. Frequently, metrics are selected without arguments, and blurry terminology invites misconceptions. For instance, many works use so-called 'macro' metrics to rank systems (e.g., 'macro F1') but do not clearly specify what they would expect from such a 'macro' metric. This is problematic, since picking a metric can affect paper findings as well as shared task rankings, and thus any clarity in the process should be maximized.

Starting from the intuitive concepts of bias and prevalence, we perform an analysis of common evaluation metrics, considering expectations as found expressed in papers. Equipped with a thorough understanding of the metrics, we survey metric selection in recent shared tasks of Natural Language Processing. The results show that metric choices are often not supported with convincing arguments, an issue that can make any ranking seem arbitrary. This work aims at providing overview and guidance for more informed and transparent metric selection, fostering meaningful evaluation.

Presented at EMNLP 2024 Article at MIT Press