🔬 Research Summary by Lydia T. Liu, a postdoctoral researcher at Cornell University and incoming assistant professor of computer science at Princeton University.
[Original paper by Lydia T. Liu, Solon Barocas, Jon Kleinberg, and Karen Levy]
Overview: In many social impact domains, including education and healthcare, machine learning is commonly used to predict future outcomes, such as student success or disease risk. However, a significant challenge arises when attempting to act on these predictions effectively, as there are often numerous possible interventions for each individual. This paper demonstrates via a probabilistic model of prediction and intervention that pure outcome prediction rarely results in the most effective policy for taking action, even when combined with other measurements.
In today’s data-driven world, the extensive use of machine learning to predict important outcomes in people’s lives now parallels the ubiquity of weather forecasts. These algorithms offer a proverbial glimpse into the future, from predicting a student’s successful graduation to predicting a patient’s disease risk. Yet, beneath the surface of these predictions lies a complex challenge: how do we transform them into actionable, real-world interventions that improve the outcomes we truly care about?
In this paper, we provide theoretical evidence that there are inherent limitations in effectively translating outcome predictions into actionable interventions when the interventions required for each individual vary depending on their latent needs. For example, in an educational setting, different students may need different interventions based on their needs; in a medical setting, different patients may need different interventions. Using a simple mathematical model that considers these latent needs, actions, and data measurements, we show that merely predicting future outcomes is often not the best strategy for effective interventions. Our findings indicate that unless there’s only one clear-cut action that guarantees better outcomes for all individuals, outcome prediction doesn’t maximize the effectiveness of our interventions. This is true even when outcome predictions are combined with additional data. Instead, measurements of the hidden factors that can be acted upon significantly improve the utility of interventions.
Problem: An example from education
When predicting a student’s academic performance at the secondary level, the ultimate goal is to enhance their educational outcomes. However, merely predicting a student’s future academic performance doesn’t automatically lead to improvement unless specific actions are taken, like offering additional tutoring or financial support. The effectiveness of these actions depends on the student’s unique situation, whether they lack prerequisites or face time constraints due to multiple jobs. School officials must collect data or measurements, such as diagnostic tests, past grades, and income surveys, to understand the students’ latent needs. Collecting these measurements can be resource-intensive, demanding both time and labor. Consequently, school officials face these critical questions: what should they measure to best predict the student’s future academic performance, and what should they measure to best improve it? Furthermore, they must discern when these measurement requirements actually align so that they can focus scarce resources on measurements that bring the most benefits.
Our data-driven decision-making model comprises four key elements: latent states of individuals, measurements, outcomes, and actions to enact change in the latent states. An institutional decision maker, whom we refer to as the planner, makes measurements for each individual in a population and takes actions based on those measurements for each individual to influence their future outcome. This model can be formally represented as a graphical model, where each individual is treated as a random draw from this model. Latent states are random variables in this graphical model, and the outcome and measurements are functions of the latent states. Actions are costly to the planner, and each action modifies the value of some latent state.
Prediction and Intervention Task
The prediction task is to use the measurement(s) collected to predict the outcome of an individual. A measurement has a high prediction value if the planner can use it to build a highly accurate outcome predictor. For example, if midterm GPA (measurement) is very predictive of successful graduation (outcome), then midterm GPA has a high prediction value.
On the other hand, the intervention task is to use the measurement(s) to choose an action that most improves the individual’s outcome. The higher the action value of a measurement, the more informative it is for taking actions to improve the outcome cost-effectively. Suppose that there are two possible actions for improving a student’s likelihood of graduation: financial aid to reduce time spent on part-time jobs or additional tutoring for academic prerequisites. In this case, midterm GPA might not be as high action-value as, say, a diagnostic test or a survey on part-time jobs–measurements that can actually suggest a particular action to take.
Prediction and Intervention can be misaligned
Imagine if there was a single action–a sufficient action—that improved outcomes for all individuals who were not going to succeed otherwise. Then all the planner has to do is predict each individual’s outcome and perform the sufficient action. This is an overly idealistic assumption in most applications, yet this is the assumption we are making if we rely on outcome prediction to improve outcomes.
We found that for a large class of outcome models, where all the random variables take a value of either 0 or 1, the highest prediction-value measurement, that is, knowing the outcome itself, generally does not maximize action value. In fact, its action value can always be strictly improved when the outcome model does not have sufficient action.
Even when you can take multiple measurements, we show that in most outcome models, the sets of measurements that maximize action value tend not to include the measurement that predicts the outcome itself.
Between the lines
In today’s world, data-driven predictions are more crucial in high-stakes decision-making across diverse sectors–education, healthcare, employment, and beyond. Practitioners developing such predictive models must discern whether forecasting an individual’s outcome genuinely benefits them, considering the available actions for intervention. As exemplified in the case of student success at the secondary level, an excessive focus on predictions can lead to a misallocation of resources if the resulting predictions lack action value. This paper aims to establish a universal framework for reasoning about predictions and interventions, abstracting from specific domains while empowering practitioners to clearly distinguish the data requirements for prediction versus those needed for effective intervention.
Looking ahead, it becomes increasingly critical to holistically evaluate deployed machine learning algorithms, moving beyond the dependence on accuracy metrics such as test error, F1 score, or the AUC. Our framework highlights the importance of metrics like action value in pursuing the broader goal of enhancing outcomes. It underscores the need for further research to explore these holistic evaluation approaches.