Research summary: A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores

Summary contributed by Abhishek Gupta (@atg_abhishek), founder of the Montreal AI Ethics Institute.

*Authors of full paper & link at the bottom

Mini-summary: The paper highlights important considerations in the design of automated systems when they are used in “mission-critical” contexts, for example, in places where such systems are making decisions that will have significant impacts on human lives. The authors use the case study of a risk-assessment score system that helps to streamline the screening process for child welfare services cases. It considers the phenomena of algorithmic aversion and automation bias keeping in mind omission and commission errors and the ability of humans to acknowledge such errors and act accordingly. It goes into detail on how designing the systems where humans are empowered with the autonomy to consider additional information and override the recommendations made by the system lead to demonstrably better results. It also points out how this is more feasible in cases where humans have training and experience in making decisions without the use of an automated system.

Full summary:

The paper highlights the risks of full automation and the importance of designing decision pipelines that provide humans with autonomy, avoiding the so-called token human problem when it comes to human-in-the-loop systems. For example, when looking at the impact that automated decision aid systems have had on the rates of incarceration and decisions taken by judges, it has been observed that the magnitude of impact is much smaller than expected. This has been attributed to the heterogeneity of adherence to these decision aid system outputs by the judges.

There are two phenomena that are identified: algorithmic aversion and automation bias. In algorithmic aversion, users don’t trust the system enough because of prior erroneous results and in automation bias, users trust the system more than they should ignoring erroneous cases.

There are also other errors that arise in the use of automated systems: omission errors and commission errors. Omission errors occur when humans fail to detect errors made by the system because they are not flagged as such by the system. Commission errors are the case when humans act on erroneous recommendations by the system, failing to incorporate contradictory or external information.

One of the case studies that the paper considers is to look at child welfare screening systems where the aim is to help streamline the incoming case loads and to determine whether they warrant a deeper look. What they observed that was noticeable was that the humans that were being assisted by the system were better calibrated with the assessed score rather than the score that they were shown by the system. In screening-in cases, especially even when the scores shown by the system were low, the call workers were incorporating their experience and external information to include these cases rather than ignoring them as recommended by the system. Essentially, they were able to overcome omission errors by the system which showcases the power of empowering users of the system with autonomy leading to better results rather than relying on complete automation. The study conducted by the authors of the paper showed higher precision in post-deployment periods: meaning that more of the screened-in referrals were being provided with services which demonstrated that this combination of humans and automated systems where humans have autonomy led to better results than just using humans alone or relying fully on automated systems.

One of the important things highlighted in the paper is that when inputs related to previous child welfare history were being miscalculated, because of the degree of autonomy granted to the workers allowed them access to the correct information in the data systems, it allowed them to take that into consideration, enabling them to take better informed decisions. But, this was only possible because the workers prior to the conduction of this study had been trained extensively in handling these screen-ins and thus had experience that they could draw on to make these decisions. They had the essential skills of being able to parse through and interpret the raw data. On the other hand, cases like the catastrophic automation failures like with the Air France flight a few years ago when the autopilot disengaged and handed back control to pilots, the decisions that were made were poor because the human pilots never had training without the assistance of the automated system which limited not only their ability to take decisions independent of the automated system but also their wherewithal to judge when the system might be making mistakes and avoid the omission and commission errors.

The authors conclude by mentioning that designing such automated systems in a manner such that humans are trained to not only acknowledge that the system can make errors but also know how to fall back to “manual” methods so that they are not paralyzed into inaction.

Original paper by Maria De-Arteaga, Riccardo Fogliato, and Alexandra Chouldechova: https://arxiv.org/abs/2002.08035