🔬 Research Summary by Xuenan Cao and Roozbeh Yousefzadeh. Xuenan Cao Ph.D. is a scholar of media, communication, and contemporary China. Roozbeh Yousefzadeh Ph.D. works on the mathematics of deep learning.
[Original paper by Roozbeh Yousefzadeh, Xuenan Cao]
Overview: AI Models can extrapolate in socially significant ways, outside the range of data they have been trained on, often without our knowing. This paper draws on several of our studies, arguing that models should report when and in which ways they extrapolate. Our studies suggest that policymakers should incorporate this requirement in AI regulations. In the absence of such regulations, civil society and individuals may consider using the legal system to inquire about extrapolations performed by automated systems.
Introduction
Extrapolation describes how machine learning models can be clueless when encountering unfamiliar samples (i.e., samples outside a “convex hull” of their training sets, as we will explain below). Consider a young and well-educated immigrant applying for a car loan. An automated system evaluates the loan request. Extrapolation might happen for an applicant because she is an immigrant, relatively young, and very well educated – the model has not seen any profile of this kind. The model might not make a sound choice because the information about this applicant is not similar to the samples on which the model has been trained. It would be reasonable to have a loan officer look over the model’s decision. Similarly, in a clinical setting, when a nurse encounters a patient with unfamiliar features, he may elevate the case to a physician. But in the use of AI systems, this common-sense approach is somehow lost, and so far, there is no requirement for AI models to report when they are clueless.
We report that AI models extrapolate outside their range of familiar data, frequently and without notifying the users and stakeholders. We provide mathematical algorithms that can calculate the extent of extrapolation. Knowing whether a model has extrapolated (or not) is a fundamental insight that helps explain AI models to people affected by them.
Key Insights
The Right to AI Explainability
AI and ML models have earned the title of “blackbox” because we lack methods to explain how they make decisions. A consensus has consolidated in the research community and policy-making about the right to reasonable explanations for people affected by decisions made by Artificial Intelligence models. In 2021, the AI Act by the European Commission drafted a highly sophisticated product safety framework to rank and regulate the risks of AI-driven systems. The act has only one page on transparency issues. It hovers above the key concern of the right to explanation without landing precisely on it. Instead of dwelling on the negatives, this paper offers ways to clear the roadblocks in promoting AI transparency.
Clueless AI: Extrapolation Explained
AI, broadly defined, is a set of mathematical methods automating the learning process. Using certain algorithms, a model learns from a training set (data on which the model is trained), then uses the learned phenomenon to make predictions in the world at large. In the case of the loan application, a model may learn from the mortgage payment behavior of many clients, and possibly predict with some accuracy for the new applicants. If an applicant’s profile falls outside the range of information in the training set, the model has to extrapolate to generate an output.
In math, there are well-defined algorithms for verifying whether a model is extrapolating, and if so, in which directions and dimensions. A training set, however small or large, forms a convex hull. Think of it as a dome. Any new sample will either fall within that convex hull or outside it. When a new data point is outside the convex hull of its corresponding training set, a model will need to extrapolate to process it. Conversely, when a new data point is within the convex hull of its training set, the model would interpolate. Whether a model has extrapolated is a piece of information lying at the heart of the right to explanation. In automated decision-making, if a model is making vital decisions or predictions about a patient with features not similar to any sample it has seen before, the model should be mandated to report and possibly, elevate the case to human experts.
Why we should not trivialize extrapolation
In the research community, there have been discussions about whether machine learning models interpolate or extrapolate. Some researchers assume that models are predominantly interpolating between their training samples (think of it as under the dome or within the convex hull) and do not often extrapolate. All the datasets we have investigated prove to be extrapolating frequently enough to be taken seriously.
On the other hand, a group of researchers recently reported that in datasets with more than 100 features, learning always amounts to extrapolation. Some scholars have argued that since extrapolation happens frequently, it must be trivial. Two issues arise. First, it leaves out many cases where datasets have less than 100 features. Second and more importantly, this claim can be used to deprecate extrapolation as a useful concept for unpacking the “blackbox.”
If we continue to believe that extrapolation is trivial, people affected by it may not be entitled to know about this fundamental issue.
Between the lines
There has been resistance in the research community to accept that extrapolation does happen, and frequently enough to be considered. Another narrative pushes the debate to the other extreme: all decisions are extrapolation, thus rendering the concept less useful for AI transparency. This paper argues against both claims.
First, it shows that extrapolation does happen, with social consequences; and it proves that it is not always the case, thus worth reporting when it happens. When AI models are clueless in making a decision, they should report that. Then instead of keeping the extrapolation hidden, a human agent can review the decision and judge if extrapolation has led to desirable or undesirable outcomes. When stakeholders inquire about AI decisions, they should also have the right to know if extrapolation has happened.Â
This paper offers practical clauses helpful to include in AI regulations, such as the National AI Initiative Act in the US and the AI Act by the European Commission. We suggest that the following clause be appended to article 13 of AI Act: “Any decisions made by an AI system should be accompanied with the information on whether the model has extrapolated. If extrapolation is performed, AI systems should also report the attributes of extrapolation.”.