MALAVIKA SURESH m.suresh@rgu.ac.uk
Research Student
Towards improving open-box hallucination detection in large language models (LLMs).
Suresh, Malavika; Aljundi, Rahaf; Nkisi-Orji, Ikechukwu; Wiratunga, Nirmalie
Authors
Rahaf Aljundi
Dr Ikechukwu Nkisi-Orji i.nkisi-orji@rgu.ac.uk
Chancellor's Fellow
Professor Nirmalie Wiratunga n.wiratunga@rgu.ac.uk
Associate Dean for Research
Contributors
Dr Kyle Martin k.martin3@rgu.ac.uk
Editor
PEDRAM SALIMI p.salimi@rgu.ac.uk
Editor
Mr Vihanga Wijayasekara v.wijayasekara@rgu.ac.uk
Editor
Abstract
Due to the increasing availability of Large Language Models (LLMs) through both proprietary and open-sourced releases of models, the adoption of LLMs across applications has drastically increased making them commonplace in day-to-day lives. Yet, the problem of detecting and mitigating hallucinations in these models remains an open challenge. This work considers the problem of open-box hallucination detection, i.e., detecting hallucinations when there is full access to the generation process. Recent work has shown that simple binary probes constructed on the model activation space can act as reliable hallucination detectors. This work extends probing-based detection methods by considering the activation space at multiple layers, components and token positions during generation. Experiments are conducted across two LLMs and three open-domain fact recall datasets. The results indicate that hallucinations can be detected at various layers as well as token positions during the generation process. This indicates the potential for saving compute costs through early detection as well as for improving detection performance by designing more sophisticated probing methods.
Citation
SURESH, M., ALJUNDI, R., NKISI-ORJI, I. and WIRATUNGA, N. 2024. Towards improving open-box hallucination detection in large language models (LLMs). In Martin, K., Salimi, P. and Wijayasekara, V. (eds.) 2024. SICSA REALLM workshop 2024: proceedings of the SICSA (Scottish Informatics and Computer Science Alliance) REALLM (Reasoning, explanation and applications of large language models) workshop (SICSA REALLM workshop 2024), 17 October 2024, Aberdeen, UK. CEUR workshop proceedings, 3822. Aachen: CEUR-WS [online], pages 1-10. Available from: https://ceur-ws.org/Vol-3822/paper1.pdf
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 2024 SICSA (Scottish Informatics and Computer Science Alliance) REALLM (Reasoning, explanation and applications of large language models) workshop (SICSA REALLM workshop 2024) |
Start Date | Oct 17, 2024 |
Acceptance Date | Oct 1, 2024 |
Online Publication Date | Oct 17, 2024 |
Publication Date | Nov 4, 2024 |
Deposit Date | Dec 5, 2024 |
Publicly Available Date | Dec 5, 2024 |
Publisher | CEUR-WS |
Peer Reviewed | Peer Reviewed |
Pages | 1-10 |
Series Title | CEUR-workshop proceedings |
Series Number | 3822 |
Series ISSN | 1613-0073 |
Keywords | Large language models (LLMs); Hallucination; Model probing |
Public URL | https://rgu-repository.worktribe.com/output/2613504 |
Publisher URL | https://ceur-ws.org/Vol-3822/ |
Files
SURESH 2024 Towards improving open-box (VOR)
(311 Kb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
Copyright Statement
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
You might also like
Explainable weather forecasts through an LSTM-CBR twin system.
(2023)
Presentation / Conference Contribution
CBR for interpretable response selection in conversational modelling.
(2022)
Presentation / Conference Contribution
Detecting contradictory COVID-19 drug efficacy claims from biomedical literature.
(2023)
Presentation / Conference Contribution
Taxonomic corpus-based concept summary generation for document annotation.
(2017)
Presentation / Conference Contribution
Ontology alignment based on word embedding and random forest classification.
(2019)
Presentation / Conference Contribution
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search