Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

November 2024

Abstract

In this work, we show the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure. By examining 24 models on 6 data sets, we find out that 88-98% of cases return statistically significantly distinguishable generation probability and uncertainty distributions. Using this general phenomenon, we showcase a hallucination-reducing training algorithm. Our algorithm outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures.

Type

Conference paper

Publication

Findings of the Association for Computational Linguistics: EMNLP 2024

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

Abstract

Taehun Cha

Ph.D. Candidate

Donghun Lee

Assistant Professor