🔬 Research Summary by Subho Majumdar, the founder of AI Vulnerability Database, co-founder of Bias Buccaneers and Trustworthy ML Initiative.
[Towards Algorithmic Fairness in Space-Time: Filling in Black Holes by C Flynn, A Guha, S Majumdar, D Srivastava, and Z Zhou]
[Detecting Bias in the Presence of Spatial Autocorrelation by S Majumdar, C Flynn, and R Mitra]
Overview: Given the recent deluge of research in algorithmic fairness, the lack of attention devoted to fairness problems in spatiotemporal data is surprising. These two papers initiate the systematic study of spatial fairness, motivating the need for developing spatial techniques applicable to real life situations, and proposing a framework for algorithmic bias detection for spatial data.
Introduction
The viral Augmented Reality (AR) game Pokémon GO came out in 2016. Multiple studies since then have evidenced the fact that the design of this game—especially the placement of ‘monster’ locations—perpetuates existing geographical biases, towards urban but against minority-dense areas. Later studies found similar biases in other public-facing decisioning systems, such as the distribution of bikeshare stations in a city, or response to COVID-19.
Around the same time Pokémon GO came out, algorithmic fairness came into popular coverage and discussion. Over the past few years the awareness and research on fairness and equity in data-driven systems has taken off astronomically. However, most of these methods consider data samples to be independent-and-identically distributed (i.i.d.). These assumptions do not carry over to spatiotemporal data. As an example, data from closely situated locations often exhibit more correlation with each other than data from locations far away from each other. This is called spatial autocorrelation. Due to spatial autocorrelation and a few other factors, fairness methods designed for i.i.d. data don’t give the best results when applied to spatiotemporal data.
In two recent papers, my coauthors and I propose a roadmap and take first steps towards rigorous research on algorithmic fairness techniques for spatiotemporal data. The first paper puts our previous work in a broader context, laying out a broader vision for the challenges, opportunities, and progress for spatiotemporal fairness research. The second paper proposes methods of bias detection that adjusts for inherent spatial autocorrelation to quantify the actual association between a feature of interest and a demographic feature (such as race, gender, income).
Key Insights
Sources of Bias
Based on recent work on geospatial data and related news, our first paper identified five major sources of bias in spatiotemporal bias:
- Geographical differences that are inherent to a location. A good example is the case of minority populations concentrated in a flood-prone part of a city.
- Mental Maps are implicit maps shaped by history and prejudice. They can divert important resources away from minority populations, or subject them to unnecessary scrutiny (e.g. predictive policing).
- Spatial clustering, i.e. similar people living close by, can unintentionally bias spatial data-driven models.
- Spatio-temporal data glitches happen when there is non-random missingness in the data. In this case, even when you apply safeguards, data bias can result in model bias.
- Spatio-temporal feedback loops happen when the above problems are not accounted for, and end up affecting future data collection and analysis cycles.
These five sources of bias often occur simultaneously. It is important to understand the specific confounding effect, i.e. which sources are at play for a certain use case, to devise effective detection and mitigation of the biases.
Opportunities
A big challenge of spatiotemporal fairness is how to untangle the above sources of bias. We divide the efforts to tackle this broad problem into four areas. For each of these areas, we lay out research directions as a roadmap for algorithmic fairness researchers.
Measurement
The research community needs to account for the dynamic nature of underlying human mobility patterns to detect bias in spatiotemporal data while accounting for the interplay of underlying factors. They also should recognize that feedback loops for future data may be more muted for minority-heavy areas.
Our first paper was the earliest work that dealt with this issue. Based on a technique called spatial filtering, we proposed to produce spatially uncorrelated versions of a feature of interest and a demographic feature that can be used for bias measurement. The fact that this method (a) works for discrete or continuous features and (b) is agnostic of the bias detection metric makes it highly effective for general use.
Data Gathering
Measurement issues can only be mitigated by crafting better datasets. Researchers and practitioners must combine both: collecting more data from underrepresented regions and improving the efficacy/quality of existing data through other methods such as publicly available data, semi-supervised models, and weakly supervised models.
Algorithmic Mitigation
After biases have been quantified on satisfactory-quality data, these biases need to be mitigated. Following fairness methods for i.i.d. data, there is an opportunity to develop pre-processing, in-processing, or post-processing mitigation techniques. There is also a chance to adapt fairness mitigation methods from network data to mitigate fairness problems in geospatial network analysis.
Spatial Experimentation
Finally, shorter-scale experiments and surveys can close the loop on many spatiotemporal data problems. For example, exploration-exploitation approaches may be able to break biased feedback by bringing in more diversity. Context-aware/causal models can also guide more effective experiments, leading to quick-fire testing of mitigation strategies.
Between the lines
Spatiotemporal data is inherently political. Before diving head-first into math and code, fairness researchers working in this area need to closely collaborate with policymakers and stakeholders to understand the nuances of the underlying social problems. On the other side, policymakers should also consciously recognize the power of data-driven decisions and listen to data experts while setting geoeconomic policies.