System Cards for AI-Based Decision-Making for Public Policy

🔬 Research Summary by Furkan Gursoy and Ioannis A. Kakadiaris.

Furkan Gursoy is a Postdoctoral Fellow at the Computational Biomedicine Lab, University of Houston, TX, USA.

Ioannis A. Kakadiaris is a Hugh Roy and Lillie Cranz Cullen University Professor of Computer Science, Electrical & Computer Engineering, and Biomedical Engineering, and the Director of the Computational Biomedicine Lab at the University of Houston, Houston, TX, USA.

[Original paper by Furkan Gursoy and Ioannis A. Kakadiaris]

Overview: AI systems have been increasingly employed to make or assist critical decisions that impact human lives. Minimizing the risks and harms of an AI system requires a careful assessment of its socio-technical context, interpretability and explainability, transparency, nondiscrimination, robustness, and privacy and security. This paper proposes a System Accountability Benchmark, a criteria framework for auditing machine learning-based automated decision systems, and System Cards that visually present the outcomes of such audits.

Introduction

Automated decision systems are employed to inform many critical decisions impacting human lives, such as criminal risk prediction, public education decisions, and credit risk assessment. Although such systems are increasingly more complex and conceal social factors underlying their design, the earlier aversion against them is diminishing as those systems are not inherently free from bias, opaqueness, lack of explainability, and maleficence. These issues must be accounted for to ensure the systems meet legal and societal expectations.

This work proposes a unifying and comprehensive framework, System Accountability Benchmark and System Cards, based on an extensive analysis of the literature on socio-technical context, interpretability and explainability, transparency, nondiscrimination, robustness, and privacy and security concerning AI systems.

Key Insights

AI Accountability Framework

System Accountability Benchmark covers the whole lifecycle of an AI system and its different aspects and components. It produces system cards for machine learning-based automated decision systems (Figure 1).

Figure 1. Automated Decision Systems Meet System Accountability Benchmark to Generate System Cards. (The leftmost image: Alina Constantin / Better Images of AI / Handmade A.I / CC-BY 4.0).

System Accountability Benchmark

System Accountability Benchmark provides a comprehensive criteria framework to evaluate compliance with accountability standards. It reflects the state-of-the-art developments for accountable systems, serves as a checklist for future algorithm audits, and paves the way for sequential work as future research.

The first version of the System Accountability Benchmark consists of 50 criteria organized within a framework of four by four matrix. Its rows relate to the aspects of the software. Its columns relate to the categories of accountability.

The four aspects of the benchmark are: Data, Model, Code, and System.

Data refers to the aspects related to the properties and characteristics of the data that the model learns from or works with.
Model refers to the properties and behavior of the machine learning-based decision model(s) that the system utilizes.
Code refers to the system’s source code, including the code surrounding the decision model(s) in its development and use.
System refers to the software and its socio-technical context as a whole.

The categories of the criteria are: Development, Assessment, Mitigation, and Assurance.

Documentation covers the criteria related to the managerial, operational, and informational record-keeping for the development and use of the software.
Assessment covers the criteria that involve estimating the abilities or qualities of the system and its components.
Mitigation covers the criteria that can be utilized to prevent or mitigate potential shortcomings detected in the assessment.
Assurance covers the criteria that aim to provide guarantees regarding the software and its use.

System Cards

A System Card is the overall outcome of the evaluation for a specific machine learning-based automated decision system. It is visualized as four concentric circles where each circle corresponds to a column in the framework for System Accountability Benchmark. An arc denotes each criterion within its respective circle. The color of each arc indicates the evaluation outcome for the respective criteria, with the worst outcome represented in red color, the best outcome denoted in green color, and other outcomes indicated with divergent colors between red and green color. Figure 2 demonstrates a sample System Card.

Figure 2. A Sample System Card.

Between the lines:

Considering the cumulative nature of science in general and the infancy of the algorithmic accountability field, the System Accountability Benchmark is not the ultimate solution for all and at once. However, being a tangible, comprehensive, and relatively specific framework, System Accountability Benchmark, helps move the current practices forward toward real-world impact, ignite further discussions on the topic, and motivate efforts to hold automated decision systems socially accountable in practice.

The system accountability benchmark paves the way for three primary lines of future work that are also sequentially linked.

Establishing specific and full-fledged guidelines or automated tools to evaluate the proposed criteria.
Developing mature auditing procedures that elaborate on what to audit, when to audit, who audits, and how audits must be conducted.
Generating Systems Cards for real-world ML-based automated decision systems.