Transactions of the Association for Computational Linguistics
https://transacl.org/index.php/tacl
<div id="journalDescription"> <p>Welcome to the TACL submission site!</p> <p>Transactions of the Association for Computational Linguistics (TACL) is an <a href="http://www.aclweb.org/">ACL</a>-sponsored journal <a href="https://www.mitpressjournals.org/loi/tacl">published by MIT Press</a> that publishes papers in all areas of computational linguistics and natural language processing. TACL has the following features:</p> <ul> <li>TACL publishes conference-length papers, but has a journal-style reviewing process (for example, the option for an action editor to recommend the “revise and resubmit” category for a paper).</li> <li>Papers appearing at TACL are eligible for a presentation at certain ACL-sponsored conferences. Thus the model combines the benefits of a journal, with the benefits of being able to present the work at a major conference. (Presentation is optional; authors do not have to present their papers at the conference).</li> <li>TACL accepts submissions all year (the 1st day of each month is a submission deadline).</li> <li class="hover">TACL is committed to fast-turnaround reviewing.</li> </ul> Links: <a href="https://transacl.org/ojs/index.php/tacl/about/submissions" target="_blank" rel="noopener">Information for authors, including submission instructions</a>; <a href="https://transacl.org/ojs/index.php/tacl/about/editorialPolicies#peerReviewProcess" target="_blank" rel="noopener">review process</a>; <a href="https://transacl.org/ojs/index.php/tacl/about/history">annual reports</a> (includes journal statistics presented to the ACL); <a href="https://transacl.org/ojs/index.php/tacl/about/editorialPolicies#custom-0">publication ethics statement</a>.</div> <div> </div> <div id="announcementsHome"> </div>The MIT Pressen-USTransactions of the Association for Computational Linguistics2307-387X<p>Copyright for TACL papers is held by the Association for Computational Linguistics, and articles are distributed under Creative Commons License CC-BY.</p><p>However, for TACL papers published in volumes 1 or 2 or in volume 3 up to and including page 403, the original published pdfs may not include a note about licensing or may mention Creative Commons License CC-BY-NC-SA <span class="__postbox-detected-content __postbox-detected-date">4.0 instead.</span></p><p> </p><p> </p>Language Varieties of Italy: Technology Challenges and Opportunities
https://transacl.org/index.php/tacl/article/view/5485
<p>Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which implicitly encodes local knowledge, cultural traditions, artistic expressions and history of its speakers. However, most local languages and dialects in Italy are at risk of disappearing within few generations. The NLP community has recently begun to engage with endangered languages, including those of Italy. Yet, most efforts assume that these varieties are under-resourced language monoliths with an established written form and homogeneous functions and needs, and thus highly interchangeable with each other and with high-resource, standardized languages. In this paper, we introduce the linguistic context of Italy and challenge the default machine-centric assumptions of NLP for Italy's language varieties. We advocate for a shift in the paradigm from machine-centric to speaker-centric NLP, and provide recommendations and opportunities for work that prioritizes languages and their speakers over technological advances. To facilitate the process, we finally propose building a local community towards responsible, participatory efforts aimed at supporting vitality of languages and dialects of Italy.</p>Alan Ramponi
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-01-122024-01-1212Cultural Adaptation of Recipes
https://transacl.org/index.php/tacl/article/view/5655
<p>Building upon the considerable advances in Large Language Models (LLMs), we are now equipped to address more sophisticated tasks demanding a nuanced understanding of cross-cultural contexts. A key example is recipe adaptation, which goes beyond simple translation to include a grasp of ingredients, culinary techniques, and dietary preferences specific to a given culture. We introduce a new task involving the translation and cultural adaptation of recipes between Chinese and English-speaking cuisines. To support this investigation, we present CulturalRecipes, a unique dataset comprised of automatically paired recipes written in Mandarin Chinese and English. This dataset is further enriched with a human-written and curated test set. In this intricate task of cross-cultural recipe adaptation, we evaluate the performance of various methods, including GPT-4 and other LLMs, traditional machine translation, and information retrieval techniques. Our comprehensive analysis includes both automatic and human evaluation metrics. While GPT-4 exhibits impressive abilities in adapting Chinese recipes into English, it still lags behind human expertise when translating English recipes into Chinese. This underscores the multifaceted nature of cultural adaptations. We anticipate that these insights will significantly contribute to future research on culturally-aware language models and their practical application in culturally diverse contexts.</p>Yong CaoYova KementchedjhievaRuixiang CuiAntonia KaramolegkouLi ZhouMegan DareLucia DonatelliDaniel Hershcovich
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-02-032024-02-0312Benchmarking Large Language Models for News Summarization
https://transacl.org/index.php/tacl/article/view/5015
<span>Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, we find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability. Second, existing studies have been limited by low-quality references, leading to underestimates of human performance and lower few-shot and finetuning performance. To better evaluate LLMs, we perform human evaluation over high-quality summaries we collect from freelance writers. Despite major stylistic differences such as the amount of paraphrasing, we find that LMM summaries are judged to be on par with human written summaries.</span>Tianyi ZhangFaisal LadhakEsin DurmusPercy LiangKathleen McKeownTatsunori Hashimoto
Copyright (c) 2024 Association for Computational Linguistics
https://creativecommons.org/licenses/by/4.0
2024-02-032024-02-0312Addressing the Binning Problem in Calibration Assessment through Scalar Annotations
https://transacl.org/index.php/tacl/article/view/5555
<p>Computational linguistics models commonly target the prediction of discrete — categorical — labels. When assessing how well-calibrated these model predictions are, popular evaluation schemes require practitioners to manually determine a binning scheme: grouping labels into bins to approximate true label posterior. The problem is that these metrics are sensitive to binning decisions. We consider two solutions to the binning problem that apply at the stage of data annotation: collecting either distributed (redundant) labels or direct scalar value assignment. <br /><br />In this paper, we show that although both approaches address the binning problem by evaluating instance-level calibration, direct scalar assignment is significantly more cost-effective. We provide theoretical analysis and empirical evidence to support our proposal for dataset creators to adopt scalar annotation protocols to enable a higher-quality assessment of model calibration.</p>Zhengping JiangAnqi LiuBenjamin Van Durme
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-02-192024-02-1912Lost in the Middle: How Language Models Use Long Contexts
https://transacl.org/index.php/tacl/article/view/5757
<p>While recent language models have the ability to take long contexts as input, relatively little is known about how well they \textit{use} longer context. We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts. In particular, we observe that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts, even for explicitly long-context models. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.</p>Nelson F. LiuKevin LinJohn HewittAshwin ParanjapeMichele BevilacquaFabio PetroniPercy Liang
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-02-192024-02-1912mGPT: Few-shot Learners Go Multilingual
https://transacl.org/index.php/tacl/article/view/5497
<p>This paper introduces mGPT, a multilingual variant of GPT-3, pretrained on 61 languages from linguistically diverse 25 language families using Wikipedia and C4 Corpus. We detail the design and pretraining procedure. The models undergo an intrinsic and extrinsic evaluation: language modeling in all languages, downstream evaluation on cross-lingual NLU datasets and benchmarks in 33 languages, and world knowledge probing in 23 languages. The in-context learning abilities are on par with the contemporaneous language models while covering a larger amount of languages, including underrepresented and low-resource languages of the Commonwealth of Independent States and the small peoples in Russia. The source code and the language models are publicly available under the MIT license.</p>Oleh ShliazhkoAlena FenogenovaMaria TikhonovaAnastasia KozlovaVladislav MikhailovTatiana Shavrina
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-02-032024-02-0312Metric-Free Learning Network with Dual Relations Propagation for Few-Shot Aspect Category Sentiment Analysis
https://transacl.org/index.php/tacl/article/view/5707
<p>Few-shot Aspect Category Sentiment Analysis (ACSA) is a crucial task for aspect-based sentiment analysis, which aims to detect sentiment polarity for a given aspect category in a sentence with limited data. However, few-shot learning methods focus on distance metrics between the query and support sets to classify queries, heavily relying on aspect distributions in the embedding space. Thus, they suffer from overlapping distributions of aspect embeddings caused by irrelevant sentiment noise among sentences with multiple sentiment aspects, leading to misclassifications. To solve the above issues, we propose a metric-free method for few-shot ACSA, which models the associated relations among the aspects of support and query sentences by Dual Relations Propagation (DRP), addressing the passive effect of overlapping distributions. Specifically, DRP uses the dual relations (similarity and diversity) among the aspects of support and query sentences to explore intra-cluster commonality and inter-cluster uniqueness for alleviating sentiment noise and enhancing aspect features. Besides, the dual relations are transformed from support-query to class-query to promote query inference by learning class knowledge. Experiments show that we achieve convincing performance on few-shot ACSA, esp. an average improvement of 2.93% accuracy and 2.10% F1 score in the 3-way 1-shot setting.</p>Shiman ZhaoYutao XieWei ChenTengjiao WangJiahui YaoJiabin Zheng
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-02-032024-02-0312Red Teaming Language Model Detectors with Language Models
https://transacl.org/index.php/tacl/article/view/5565
<p>The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent works have proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of these LLM detectors under adversarial attacks. We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation. In both strategies, we leverage an auxiliary LLM to generate the word replacements or the instructional prompt. Different from previous works, we consider a challenging setting where the auxiliary LLM can also be protected by a detector. Experiments reveal that our attacks effectively compromise the performance of all detectors in the study with plausible generations, underscoring the urgent need to improve the robustness of LLM-generated text detection systems. Code is available at <a href="https://github.com/shizhouxing/LLM-Detector-Robustness">https://github.com/shizhouxing/LLM-Detector-Robustness</a>.</p>Zhouxing ShiYihan WangFan YinXiangning ChenKai-Wei ChangCho-Jui Hsieh
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-02-192024-02-1912Exploring Human-Like Translation Strategy with Large Language Models
https://transacl.org/index.php/tacl/article/view/5857
<p>Large language models (LLMs) have demonstrated impressive capabilities in general scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses, human-level intelligence. Among their numerous skills, the translation abilities of LLMs have received considerable attention. Compared to typical machine translation that focuses solely on source-to-target mapping, LLM-based translation can potentially mimic the human translation process which might take preparatory steps to ensure high-quality translation. This work explores this possibility by proposing the MAPS framework, which stands for Multi-Aspect Prompting and Selection. Specifically, we enable LLMs first to analyze the given source sentence and induce three aspects of translation-related knowledge: keywords, topics, and relevant demonstrations to guide the final translation process. Moreover, we employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge. Both automatic (3 LLMs × 11 directions × 2 automatic metrics) and human evaluation (preference study and MQM) demonstrate the effectiveness of MAPS. Further analysis shows that by mimicking the human translation process, MAPS reduces various translation errors such as hallucination, ambiguity, mistranslation, awkward style, untranslated text, and omission. Source code is available at <a href="https://github.com/zwhe99/MAPS-mt">https://github.com/zwhe99/MAPS-mt</a>.</p>Zhiwei HeTian LiangWenxiang JiaoZhuosheng ZhangYujiu YangRui WangZhaopeng TuShuming ShiXing Wang
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-03-192024-03-1912Unifying Structured Data as Graph for Data-to-Text Pre-Training
https://transacl.org/index.php/tacl/article/view/5009
<p>Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performances. However, previous pre-training methods either oversimplified structured data into a sequence without considering input structures or designed training objectives tailored for a specific data structure (e.g., table or knowledge graph). In this paper, we unify different types of structured data (i.e., table, key-value data, knowledge graph) into the graph format and cast different data-to-text generation tasks as graph-to-text generation. To effectively exploit the structural information of the input graph, we propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer. Concretely, we devise a position matrix for the Transformer, encoding relative positional information of connected nodes in the input graph. In addition, we propose a new attention matrix to incorporate graph structures into the original Transformer by taking the available explicit connectivity structure into account. Extensive experiments on six benchmark datasets show the effectiveness of our model. Our source codes are available at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/unid2t.</p>Shujie LiLiang LiRuiying GengMin YangBinhua LiGuanghu YuanWanwei HeShao YuanCan MaFei HuangYongbin Li
Copyright (c) 2024 Association for Computational Linguistics
https://creativecommons.org/licenses/by/4.0
2024-03-192024-03-1912AmbiFC: Fact-Checking Ambiguous Claims with Evidence”
https://transacl.org/index.php/tacl/article/view/5523
<p>Automated fact-checking systems verify claims against evidence to predict their veracity. In real-world scenarios, the retrieved evidence may not unambiguously support or refute the claim and yield conflicting but valid interpretations. Existing fact-checking datasets assume that the models developed with them predict a single veracity label for each claim, thus discouraging the handling of such ambiguity. To address this issue we present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs. It contains fine-grained evidence annotations of 50k passages from 5k Wikipedia pages. We analyze the disagreements arising from ambiguity when comparing claims against evidence in AmbiFC, observing a strong correlation of annotator disagreement with linguistic phenomena such as underspecification and probabilistic reasoning. We develop models for predicting veracity handling this ambiguity via soft labels and find that a pipeline that learns the label distribution for sentence-level evidence selection and veracity prediction yields the best performance. We compare models trained on different subsets of AmbiFC and show that models trained on the ambiguous instances perform better when faced with the identified linguistic phenomena.</p>Max GlocknerIeva StaliūnaitėJames ThorneGisela VallejoAndreas VlachosIryna Gurevych
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-01-122024-01-1212Text Attribute Control via Closed-Loop Disentanglement
https://transacl.org/index.php/tacl/article/view/5733
<p>Changing an attribute of a text without changing the content usually requires to first disentangle the text into irrelevant attributes and content representations. After that, in the inference phase, the representation of one attribute is tuned to a different value, expecting that the corresponding attribute of the text can also be changed accordingly. The usual way of disentanglement is to add some constraints on the latent space of an encoder-decoder architecture, including adversarial-based constraints and mutual-information-based constraints. However, the previous semi-supervised processes of attribute change are usually not enough to guarantee the success of attribute change and content preservation. In this paper, we propose a novel approach to achieve a robust control of attributes while enhancing content preservation. In this approach, we use a semi-supervised contrastive learning method to encourage the disentanglement of attributes in latent spaces. Differently from previous works, we re-disentangle the reconstructed sentence and compare the re-disentangled latent space with the original latent space, which makes a closed-loop disentanglement process. This also helps content preservation. In addition, the contrastive learning method is also able to replace the role of minimizing mutual information and adversarial training in the disentanglement process, which alleviates the computation cost. We conducted experiments on three text datasets, including the Yelp Service review dataset, the Amazon Product review dataset, and the GoEmotions dataset. The experimental results show the effectiveness of our model.</p>Lei ShaThomas Lukasiewicz
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-03-192024-03-1912An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation
https://transacl.org/index.php/tacl/article/view/5361
<p><span style="font-size: 10.5pt; font-family: SimSun;">Word-level AutoCompletion (WLAC) is a rewarding yet challenging task in Computer-aided Translation. Existing work addresses this task through a classification model based on a neural network that maps the hidden vector of the input context into its corresponding label (i.e., the candidate target word is treated as a label). Since the context hidden vector itself does not take the label into account and it is projected to the label through a linear classifier, the model can not sufficiently leverage valuable information from the source sentence as verified in our experiments, which eventually hinders its overall performance. To alleviate this issue, this work proposes an energy-based model for WLAC, which enables the context hidden vector to capture crucial information from the source sentence. Unfortunately, training and inference suffer from efficiency and effectiveness challenges, thereby we employ three simple yet effective strategies to put our model into practice. Experiments on four standard benchmarks demonstrate that our reranking-based approach achieves substantial improvements (about 6.07%) over the previous state-of-the-art model. Further analyses show that each strategy of our approach contributes to the final performance. </span></p>Cheng YangGuoping HuangMo YuZhirui ZhangSiheng LiMingming YangShuming ShiYujiu YangLemao Liu
Copyright (c) 2024 Association for Computational Linguistics
http://creativecommons.org/licenses/by/4.0
2024-02-192024-02-1912