🔬 Research Summary by Hidde Fokkema, a PhD student under supervision of Dr. Tim van Erven, working on the topic of mathematically formalizing Explainable AI.
[Original paper by Hidde Fokkema, Damien Garreau, and Tim van Erven]
Overview: Generating counterfactual explanations is a popular method of providing users with an explanation of a decision made by a machine learning model. However, in our paper, we show that providing many users with such an explanation can adversely affect the predictive accuracy of the model. We characterize where this drop in accuracy can happen and show the consequences of possible solutions.
Introduction
As Machine Learning models are increasingly being deployed in many areas, such as finance and healthcare, the need for interpretable machine learning or explanations of predictions has increased. One proposed method of providing explanations is Counterfactual Explanations. This type of explanation gives recourse to the user.
The prototypical example to explain this method is considering a credit loan applicant and a loan provider. The provider might have some automated system in place that rejects or accepts your application based on the information that you provide. If your application gets denied, a counterfactual explanation would be, “If your income would have been $2000 higher and your debt was $5000 lower, your application would have been accepted”. In the last years, many different methods have been proposed that generate such explanations, considering considerations like feasibility, actionability, and effort of the suggested action to be taken.
These kinds of explanations are desirable, but if you investigate the impact of providing such explanations to many applicants, there is a considerable chance that this practice will hurt the predictive power of the automated system! On average more people will default on their credit loans. This is harmful to the loan provider but also hurtful to the applicants. In our paper, we determine when this harmful effect can appear and quantify the impact. We also empirically show that the risk increase will happen in many different scenarios and discuss different ways to strategically protect against this phenomenon, with their possible downsides.
How recourse can hurt predictive accuracy
Counterfactual Explanations change the data
In our loan applicant example, the loan provider has a model that classifies if an applicant is creditworthy or un-creditworthy. This classifier divides the space of possible applicants into two regions, where on one side of the border, you are deemed creditworthy and on the other side, not. A counterfactual explanation shows you the closest point just over this border whenever you are in the uncreditworthy area. This means that if a large portion of the people on that side get a counterfactual explanation, the distribution of possible applicants will change as they will start moving toward the border. However, the model of the loan provider was constructed assuming the original distribution of applicants!
The impact on accuracy
The observation that providing recourse to many applicants will change the underlying distribution of applicants, leads to the question of what the actual impact of this change is. To answer this question, we need to look at the decision boundary previously mentioned. If a point lies close to the decision boundary, but in the uncreditworthy area, it actually means that the model is quite uncertain about the creditworthiness. If that same point would have been attached to slightly different information, then it would have been classified as creditworthiness. In other words, the model will be wrong more often for points that lie close to the decision boundary than for points that are far away. Normally, this does not cause any problems, since the number of points close to the decision boundary will be small compared to points that are further away. But remember, the counterfactual explanations moved many points closer to the decision boundary! This effect essentially ensures that the model will be less accurate after many applicants receive a counterfactual explanation.
There is another effect that potentially plays a role. If a loan applicant knows what needs to be done to get the loan, then they could maliciously follow this advice without changing the underlying behavior which caused them to be deemed uncreditworthy. This is a well-known problem that is addressed in the field of called strategic classification. It is interesting to note that the derived drop in accuracy in our setting holds, even when there is no malicious intent involved! This latter effect will only enlarge the observed drop in accuracy.
Strategizing
After seeing these results, a loan provider might want to protect itself against this accuracy drop. The optimal choice, as it turns out, would be to construct a model so that it becomes increasingly difficult to get your application accepted in the first place. Then, providing recourse should only allow the applicants that would have been previously accepted to move towards the region where the loan will get accepted. For all the other denied applicants, moving towards the accepted region should be too difficult or take too much effort. If the loan provider can construct such a model, the accuracy will be the same as before, which would be good for the loan provider. However, as an unfortunate result, it made the whole process of receiving a loan more difficult for the applicants.
Conclusion
The example used throughout this article exemplifies the general result we show in our main article. There are, of course, many aspects and extensions that can be made to make this example more realistic and which are encompassed by the results in our paper. Our results show that using counterfactual explanations to provide explanations to your users can have negative effects for both parties. This does not mean that they should never be used, however. In situations where you would not need to act out the counterfactual explanation, it could provide useful insight into how the model works.
Between the lines
A large amount of research is being conducted into developing new counterfactual methods that work faster, better, and that can work for many different models. However, keeping our results in mind, the question of when a counterfactual explanation benefits both user and provider is rarely considered. When explanations for your particular application are needed, this question needs to be answered first, and when done so, it could be that other forms of explanations are more desirable.