Skip to main content

Research Repository

Advanced Search

AGREE: a feature attribution aggregation framework to address explainer disagreements with alignment metrics.

Pirie, Craig; Wiratunga, Nirmalie; Wijekoon, Anjana; Moreno-Garcia, Carlos Francisco

Authors

Anjana Wijekoon



Contributors

Lukas Malburg
Editor

Deepika Verma
Editor

Abstract

As deep learning models become increasingly complex, practitioners are relying more on post hoc explanation methods to understand the decisions of black-box learners. However, there is growing concern about the reliability of feature attribution explanations, which are key to explaining machine learning models. Studies have shown that some explainable artificial intelligence (XAI) methods are highly sensitive to noise and that explanations can vary significantly between techniques. As a result, practitioners often employ multiple methods to reach a consensus on the reliability of their models, which can lead to disagreements among explainers. Although some literature has formalised and reviewed this problem, few solutions have been proposed. In this paper, we propose a novel case-based approach to evaluating disagreement among explainers and advance AGREE-an explainer aggregation approach to resolving the disagreement problem based on explanation weights. Our approach addresses the problem of both local and global explainer disagreement by utilising information from the neighbourhood spaces of feature attribution vectors. We evaluate our approach against simpler feature overlap metrics by weighting the latent space of a k-NN predictor using consensus feature importance and observing the performance degradation. For local explanations in particular, our method captures a more precise estimate of disagreement than the baseline methods and is robust against high dimensionality. This can lead to increased trust in ML models, which is essential for their successful adoption in real-world applications.

Citation

PIRIE, C., WIRATUNGA, N., WIJEKOON, A. and MORENO-GARCIA, C.F. 2023. AGREE: a feature attribution aggregation framework to address explainer disagreements with alignment metrics. In Malburg, L. and Verma, D. (eds.) Workshop proceedings of the 31st International conference on case-based reasoning (ICCBR-WS 2023), 17 July 2023, Aberdeen, UK. CEUR workshop proceedings, 3438. Aachen: CEUR-WS [online], pages 184-199. Available from: https://ceur-ws.org/Vol-3438/paper_14.pdf

Presentation Conference Type Conference Paper (published)
Conference Name Workshops of the 31st International conference on case-based reasoning (ICCBR-WS 2023)
Start Date Jul 17, 2023
Acceptance Date Jun 14, 2023
Online Publication Date Jul 17, 2023
Publication Date Aug 7, 2023
Deposit Date Aug 21, 2023
Publicly Available Date Aug 21, 2023
Publisher CEUR-WS
Peer Reviewed Peer Reviewed
Pages 184-199
Series Title CEUR workshop proceedings
Series Number 3438
Series ISSN 1613-0073
Keywords XAI; Case alignment; AGREE; Disagreement problem; Feature attribution
Public URL https://rgu-repository.worktribe.com/output/2009671
Publisher URL https://ceur-ws.org/Vol-3438/paper_14.pdf

Files






You might also like



Downloadable Citations