AGREE: a feature attribution aggregation framework to address explainer disagreements with alignment metrics.

Pirie, Craig; Wiratunga, Nirmalie; Wijekoon, Anjana; Moreno-Garcia, Carlos Francisco

AGREE: a feature attribution aggregation framework to address explainer disagreements with alignment metrics.

Pirie, Craig; Wiratunga, Nirmalie; Wijekoon, Anjana; Moreno-Garcia, Carlos Francisco

Authors

Mr Craig Pirie c.pirie11@rgu.ac.uk
Research Assistant

Professor Nirmalie Wiratunga n.wiratunga@rgu.ac.uk
Associate Dean for Research

Anjana Wijekoon

Dr Carlos Moreno-Garcia c.moreno-garcia@rgu.ac.uk
Associate Professor

Contributors

Lukas Malburg
Editor

Deepika Verma
Editor

Abstract

As deep learning models become increasingly complex, practitioners are relying more on post hoc explanation methods to understand the decisions of black-box learners. However, there is growing concern about the reliability of feature attribution explanations, which are key to explaining machine learning models. Studies have shown that some explainable artificial intelligence (XAI) methods are highly sensitive to noise and that explanations can vary significantly between techniques. As a result, practitioners often employ multiple methods to reach a consensus on the reliability of their models, which can lead to disagreements among explainers. Although some literature has formalised and reviewed this problem, few solutions have been proposed. In this paper, we propose a novel case-based approach to evaluating disagreement among explainers and advance AGREE-an explainer aggregation approach to resolving the disagreement problem based on explanation weights. Our approach addresses the problem of both local and global explainer disagreement by utilising information from the neighbourhood spaces of feature attribution vectors. We evaluate our approach against simpler feature overlap metrics by weighting the latent space of a k-NN predictor using consensus feature importance and observing the performance degradation. For local explanations in particular, our method captures a more precise estimate of disagreement than the baseline methods and is robust against high dimensionality. This can lead to increased trust in ML models, which is essential for their successful adoption in real-world applications.

Citation

PIRIE, C., WIRATUNGA, N., WIJEKOON, A. and MORENO-GARCIA, C.F. 2023. AGREE: a feature attribution aggregation framework to address explainer disagreements with alignment metrics. In Malburg, L. and Verma, D. (eds.) Workshop proceedings of the 31st International conference on case-based reasoning (ICCBR-WS 2023), 17 July 2023, Aberdeen, UK. CEUR workshop proceedings, 3438. Aachen: CEUR-WS [online], pages 184-199. Available from: https://ceur-ws.org/Vol-3438/paper_14.pdf

Presentation Conference Type	Conference Paper (published)
Conference Name	Workshops of the 31st International conference on case-based reasoning (ICCBR-WS 2023)
Start Date	Jul 17, 2023
Acceptance Date	Jun 14, 2023
Online Publication Date	Jul 17, 2023
Publication Date	Aug 7, 2023
Deposit Date	Aug 21, 2023
Publicly Available Date	Aug 21, 2023
Publisher	CEUR-WS
Peer Reviewed	Peer Reviewed
Pages	184-199
Series Title	CEUR workshop proceedings
Series Number	3438
Series ISSN	1613-0073
Keywords	XAI; Case alignment; AGREE; Disagreement problem; Feature attribution
Public URL	https://rgu-repository.worktribe.com/output/2009671
Publisher URL	https://ceur-ws.org/Vol-3438/paper_14.pdf