6 Ways Machine Learning Threatens Social Justice

This guest post was written by Eric Siegel, PhD (@predictanalytic). He is the founder of Predictive Analytics World, and the instructor of the Coursera’s Machine Learning for Everyone.

*Originally published on Big Think

Overview: When you harness the power and potential of machine learning, there are also some drastic downsides that you’ve got to manage. Deploying machine learning, you face the risk that it be discriminatory, biased, inequitable, exploitative, or opaque. In this article, I cover 6 ways that machine learning threatens social justice – linking to short videos that dive deeply into each one – and reach an incisive conclusion: The remedy is to take on machine learning standardization as a form of social activism.

When you use machine learning, you aren’t just optimizing models and streamlining business. You’re governing. In essence, the models embody policies that control access to opportunities and resources for many people. They drive consequential decisions as to whom to investigate, incarcerate, set up on a date, or medicate – or to whom to grant a loan, insurance coverage, housing, or a job.

For the same reason that machine learning is valuable – that it drives operational decisions more effectively – it also wields power in the impact it has on millions of individuals’ lives. Threats to social justice arise when that impact is detrimental, when models systematically limit the opportunities of underprivileged or protected groups.

Here are six ways machine learning threatens social justice:

For each threat, follow the link(s) to view short videos that provide a deep dive. Most videos are from Machine Learning for Everyone, my course series on Coursera, which is also free to access in its entirety.

1) Blatantly discriminatory models are predictive models that base decisions partly or entirely on a protected class. Protected classes include race, religion, national origin, gender, gender identity, sexual orientation, pregnancy, and disability status. By taking one of these characteristics as an input, the model’s outputs – and the decisions driven by the model – are based at least in part on membership in a protected class. Although models rarely do so directly, there is precedent and support for doing so.

This would mean that a model could explicitly hinder, for example, black defendants for being black. So, imagine sitting across from a person being evaluated for a job, a loan, or even parole. When they ask you how the decision process works, you inform them, “For one thing, our algorithm penalized your score by seven points because you’re black.” This may sound shocking and sensationalistic, but I’m only literally describing what the model would do, mechanically, if race were permitted as a model input. [ Deep dive videos: part I, part II, part III ]

2) Machine bias. Even when protected classes are not provided as a direct model input, we find, in some cases, that model predictions are still inequitable. This is because other variables end up serving as proxies to protected classes. This is a bit complicated, since it turns out that models that are fair in one sense are unfair in another.

For example, some crime risk models succeed in flagging both black and white defendants with equal precision – each flag tells the same probabilistic story, regardless of race – and yet the models falsely flag black defendants more often than white ones. A crime-risk model called COMPAS, which is sold to law enforcement across the US, falsely flags white defendants at a rate of 23.5%, and black defendants at 44.9%. In other words, black defendants who don’t deserve it are erroneously flagged almost twice as much as white defendants who don’t deserve it. [Deep dive videos: part I, part II, part III]

3) Inferring sensitive attributes – predicting pregnancy and beyond. Machine learning predicts sensitive information about individuals, such as sexual orientation, whether they’re pregnant, whether they’ll quit their job, and whether they’re going to die. Researchers have shown that it is possible to predict race based on Facebook likes. These predictive models deliver dynamite.

In a particularly extraordinary case, officials in China use facial recognition to identify and track the Uighurs, a minority ethnic group systematically oppressed by the government. This is the first known case of a government using machine learning to profile by ethnicity. One Chinese start-up valued at more than $1 billion said its software could recognize “sensitive groups of people.” Its website said, “If originally one Uighur lives in a neighborhood, and within 20 days six Uighurs appear, it immediately sends alarms” to law enforcement. [Deep dive video]

4) A lack of transparency. A computer can keep you in jail, or deny you a job, a loan, insurance coverage, or housing – and yet you cannot face your accuser. The predictive models generated by machine learning to drive these weighty decisions are generally kept locked up as a secret, unavailable for audit, inspection, or interrogation. Such models, inaccessible to the public, perpetrate a lack of due process and a lack of accountability.

Two ethical standards oppose this shrouding of electronically-assisted decisions: 1) model transparency, the standard that predictive models be accessible, inspectable, and understandable. And 2) the right to explanation, the standard that consequential decisions that are driven or informed by a predictive model are always held up to that standard of transparency. Meeting those standards would mean, for example, that a defendant be told which factors contributed to their crime risk score — which aspects of their background, circumstances, or past behavior caused the defendant to be penalized. This would provide the defendant the opportunity to respond accordingly, establishing context, explanations, or perspective on these factors. [Deep dive video on transparency, explainable machine learning, and “the right to explanation”]

5) Predatory micro-targeting. Powerlessness begets powerlessness – and that cycle can magnify for consumers when machine learning increases the efficiency of activities designed to maximize profit for companies. Improving the micro-targeting of marketing and the predictive pricing of insurance and credit can magnify the cycle of poverty. For example, highly-targeted ads are more adept than ever at exploiting vulnerable consumers and separating them from their money.

And insurance pricing can lead to the same result. With insurance, the name of the game is to charge more for those at higher risk. Left unchecked, this process can quickly slip into predatory pricing. For example, a churn model may find that elderly policyholders don’t tend to shop around and defect to better offers, so there’s less of an incentive to keep their policy premiums in check. And pricing premiums based on other life factors also contributes to a cycle of poverty. For example, individuals with poor credit ratings are charged more for car insurance. In fact, a low credit score can increase your premium more than an at-fault car accident. [Deep dive video]

6) The coded gaze. If a group of people is underrepresented in the data from which the machine learns, the resulting model won’t work as well for members of that group. This results in exclusionary experiences and discriminatory practices. This phenomenon can occur for both facial image processing and speech recognition. [Deep dive video]

Recourse: Establish machine learning standards as a form of social activism

To address these problems, take on machine learning standardization as a form of social activism. We must establish standards that go beyond nice-sounding yet vague platitudes such as “be fair”, “avoid bias”, and “ensure accountability”. Without being precisely defined, these catch phrases are subjective and do little to guide concrete action. Unfortunately, such broad language is fairly common among the principles released by many companies. In so doing, companies protect their public image more than they protect the public.

Your role is critical. As someone involved in initiatives to deploy machine learning, you have a powerful, influential voice – quite possibly much more important and powerful than you realize. You are one of a relatively small number of people who mold and set the trajectory for systems that automatically dictate the rights and resources that great numbers of consumers and citizens gain access to. You’re in a unique position to defend the civil rights of hundreds of thousands or millions of people and therefore, I would say, you have a unique responsibility to do so.

Famed machine learning leader and educator Andrew Ng drove it home: “AI is a superpower that enables a small team to affect a huge number of people’s lives… It’s important that you… make sure the work you do leaves society better off.”

And Allan Sammy, Director, Data Science and Audit Analytics at Canada Post, clarified why the onus is on you: “A decision made by an organization’s analytic model is a decision made by that entity’s senior management team.”

Implementing ethical data science is as important as ensuring a self-driving car knows when to put on the breaks.

Establishing well-formed ethical standards for machine learning will be an intensive, ongoing process. For more, watch this short video, in which I provide some specifics meant to kick-start the process.

Follow Eric on Twitter at @predictanalytic.