Skip to main content

Research Repository

Advanced Search

Towards improving open-box hallucination detection in large language models (LLMs).

Suresh, Malavika; Aljundi, Rahaf; Nkisi-Orji, Ikechukwu; Wiratunga, Nirmalie

Authors

Rahaf Aljundi



Contributors

Abstract

Due to the increasing availability of Large Language Models (LLMs) through both proprietary and open-sourced releases of models, the adoption of LLMs across applications has drastically increased making them commonplace in day-to-day lives. Yet, the problem of detecting and mitigating hallucinations in these models remains an open challenge. This work considers the problem of open-box hallucination detection, i.e., detecting hallucinations when there is full access to the generation process. Recent work has shown that simple binary probes constructed on the model activation space can act as reliable hallucination detectors. This work extends probing-based detection methods by considering the activation space at multiple layers, components and token positions during generation. Experiments are conducted across two LLMs and three open-domain fact recall datasets. The results indicate that hallucinations can be detected at various layers as well as token positions during the generation process. This indicates the potential for saving compute costs through early detection as well as for improving detection performance by designing more sophisticated probing methods.

Citation

SURESH, M., ALJUNDI, R., NKISI-ORJI, I. and WIRATUNGA, N. 2024. Towards improving open-box hallucination detection in large language models (LLMs). In Martin, K., Salimi, P. and Wijayasekara, V. (eds.) 2024. SICSA REALLM workshop 2024: proceedings of the SICSA (Scottish Informatics and Computer Science Alliance) REALLM (Reasoning, explanation and applications of large language models) workshop (SICSA REALLM workshop 2024), 17 October 2024, Aberdeen, UK. CEUR workshop proceedings, 3822. Aachen: CEUR-WS [online], pages 1-10. Available from: https://ceur-ws.org/Vol-3822/paper1.pdf

Presentation Conference Type Conference Paper (published)
Conference Name 2024 SICSA (Scottish Informatics and Computer Science Alliance) REALLM (Reasoning, explanation and applications of large language models) workshop (SICSA REALLM workshop 2024)
Start Date Oct 17, 2024
Acceptance Date Oct 1, 2024
Online Publication Date Oct 17, 2024
Publication Date Nov 4, 2024
Deposit Date Dec 5, 2024
Publicly Available Date Dec 5, 2024
Publisher CEUR-WS
Peer Reviewed Peer Reviewed
Pages 1-10
Series Title CEUR-workshop proceedings
Series Number 3822
Series ISSN 1613-0073
Keywords Large language models (LLMs); Hallucination; Model probing
Public URL https://rgu-repository.worktribe.com/output/2613504
Publisher URL https://ceur-ws.org/Vol-3822/

Files




You might also like



Downloadable Citations