Robustness and Usefulness in AI Explanation Methods

🔬 Research summary by Connor Wright, our Partnerships Manager.

Overview: Given the hype received by explainability methods to offer insights into black-box algorithms, many have opted-in to their use. Yet, this work aims to show how their implementation is not always appropriate, with the methods at hand possessing some apparent downfalls.

Introduction

Explainability methods within the machine learning (ML) space have received much attention, given their potential to provide insight into black-box algorithms. Explanation methods can influence a consumer’s trust in a model, increasing the speed at which auditors carry out their work. However, some practitioners rely too heavily on these tools, jeopardising their use. This work focuses specifically on post-hoc explanation methods about trained models, reflecting on the methods of LIME, SmoothGrad and SHAP. Having explained what these methods entail, the paper evaluates what they offer and their potential downfalls.

Key Insights

Local interpretable model agnostic explanations (LIME)

LIME aims to explain the outcomes of any classifier no matter its “complexity” or “linearity” (p. 2). This is done by deriving an interpretable representation similar to the actual model. The representation is taken from a selection of different interpretable representations generated by LIME. It then perturbs (generates noise) around every output of interest and labels it. The complexity of the model limits the number of representations generated.

SmoothGrad

SmoothGrad is an extension to any gradient-based explainability method, rather than being one itself. Mainly used in convolutional neural networks in image recognition, it was introduced to solve previous problems associated with gradient methods. It seeks to generate multiple versions of an image through perturbation before averaging them together. This helps smooth over any gradient fluctuations between images.

Shapley Additive Explanations (SHAP)

SHAP uses game theory to understand model predictions. It employs Shapley evaluation methods to assign a value to each prediction.

Having explained the three methods at hand, they fell under two different methodologies: perturbation (adds noise to inputs and assesses small changes in the model space) and gradient (uses the gradients produced by different inputs to determine the model outcome). Their evaluation was as follows (p. 4):

Criterion 1: Local explanations

Local explanation sought to capture the ability of explanation methods to decipher and understand examples from the data. Its importance lies in how most of the adverse effects felt by individuals come from individual predictions made by the model. All three explanation methods offer this feature.

Criterion 2: Visualize

Models that can produce accessible and quickly comprehensible visualisations are hugely valuable. SmoothGrad creates heatmaps in conjunction with the gradient methods used in convolutional neural networks. SHAP comes with a Python package that generates force plots to see how each feature affects the model. This can be applied to the whole dataset, displaying how a particular data feature affects the entire dataset.

However, LIME does not offer this feature. Only the attributes of the more interpretable model generated can be visualized, rather than any insight into the larger model at hand.

Criterion 3: Model-agnostic

Model agnostic methods aim to take charge of the different formats in which data input can come (whether in video, text or image). SHAP and LIME both build on this, acting across their different interpretable models. However, SmoothGrad needs to have differentiable models rather than representations.

Weakness of explainability methods

Galinkin points out that the core question which explanation methods ought to solve is the following: to whom and under what circumstances is the model interpretable? Two types of factors affect a model’s interpretability:

Human factors

Reaching for fairness in ML often involves a practitioner visualising an ideal world, which, given its subjectivity, can be to the detriment of specific populations. Should they use general notions of fairness, this could then leave everyone worse off. For example, an ideal world in which a facial recognition system doesn’t account for people’s skin colour, resulting in recurrent misidentifications.

The robustness of explainer methods

Perturbation methods are worse at explaining small changes to the data output than gradient methods, meaning they are less robust. Nevertheless, practitioners are sometimes hooked by the pleasing representations these methods produce, rather than other forms of representation that provide insight into the model. In this sense, practitioners can be guilty of over-relying and over-using these explanation methods. Instead, there can be more insightful representations offered by other methods. Sometimes, a linear model fits just as well.

Between the lines

I think this paper does really well to highlight the extent to which practitioners can be blinded by the hype surrounding different methodologies. At times, despite a specific method offering pleasant representations, its employment may not be necessary. Instead of focusing on what the explainer method can offer, perhaps we should first consider it against other methods to see what work it is, doing for our models. There is a difference between being useful and being appropriate in this sense. While explainer methods may be useful, they may not always be the appropriate choice.