• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Fair Interpretable Representation Learning with Correction Vectors

June 19, 2022

🔬 Research Summary by Mattia Cerrato and Marius Köppel

Marius Köppel is a PhD. student at the Johannes Gutenberg-Universität Mainz working on FPGA based data acquisition systems, lepton flavor violation and fairness in Machine Learning.

Mattia Cerrato is a Post-Doc at the Johannes Gutenberg-Universität Mainz working at the intersection of interpretability and fairness in Machine Learning.

[Original paper by Mattia Cerrato, Alesia Vallenas Coronel, Marius Köppel, Alexander Segner, Roberto Esposito, Stefan Kramer]


Overview: Neural networks are inherently opaque. While it is possible to train them to learn “fair representations”, it is still hard to make sense of their decisions on an individual basis. This is in contrast with law requirements in the EU. We propose a new technique to open the “black box” of fair neural networks. 


Introduction

AI methodologies based on deep neural networks are nowadays hugely popular, especially in Computer Vision and Natural Language Processing. Compared to other AI/machine learning techniques, deep neural networks promise a higher level of performance and thus to enable a plethora of applications. 

In recent years, the research community and the general public has been focusing more and more on the limitations of AI and the failure cases in which this set of technologies and deep neural networks in particular may be impacting people’s well-being negatively. 

Perhaps two of the most famous failure points in applications employing deep neural networks have been reported by Forbes and The Guardian: “Google Photos Tags Two African-Americans As Gorillas Through Facial Recognition Software” [1] and “A beauty contest was judged by AI and the robots didn’t like dark skin” [2]. There is therefore true cause for worrying that negative biases in society may be transferring to machine learning models, thus leading to models which are discriminating in turn.

While different approaches show that “unfair” behavior may be avoided in deep neural network models via group fairness techniques, we still do not know much about their internal reasoning.  Therefore, we are still unable to answer individual appeals. The legal status of these models is thus unclear. As an example, the General Data Protection Regulation (GDPR) in the EU states that individuals subjected to automatic decision making have a “right to an explanation” [3]. 

In this space, our research constrains deep neural networks so that they are decomposable and therefore human-readable. Our framework is centered around the concept of a correction vector, i.e. a vector of features which is interpretable in feature space and represents the “fairness correction” each data point is subject to so that the results will be statistically fair.

Our recent paper focuses on answering the following three questions:

  • Are the proposed models both fair and accurate?
  • Are the interpretable models as fair as their non-interpretable counterparts?
  • Are the correction vectors interpretable?

Key Insights

Group Fairness in Machine Learning

One possible approach to avoid discriminating models is to remove information about the sensitive attribute (e.g. ethnicity) from the model’s internal representations. These techniques are commonly referred to as “fair representation learning”, where a projection f : X → Z from feature space X into a latent space Z is learned. It can be shown that the information about the sensitive attribute s is minimal in the latent space.

This reasoning is at the core of commonly employed “fair” neural networks (top). It learns a mapping Z into an opaque feature space and then a decision Y. Our correction vector framework (bottom) learns a debiased but interpretable representation X. This provides individual users and analysts with further insight into the debiasing process. The correction displayed in the figure is the average correction learned by one of our models on the LSAT feature of the Law Students dataset, in which the task is to provide law school admissions in a balanced fashion between student ethnicities.

Our approach is to learn fair corrections for each of the dimensions in X. Fair corrections are then added to the original features so that the semantics of the algorithm are as clear as possible. For each feature, one can obtain a clear understanding of how that feature has been changed to counteract the bias in the data. We compute these corrections in two different methodologies.

Explicit Computation of Correction Vectors with Feedforward Networks

Explicit computation requires constraining a neural network architecture so that it does not leave feature space. Different fairness methodologies may be constrained in such a way. A gradient reversal-based neural network constrained for interpretability to be part of the proposed framework (see figure above). The learned correction vector w matches in size with X, and can then be summed with the original representation X and analyzed for interpretability.

Implicit Computation of Correction Vectors with Normalizing Flows

Implicit computation relies on a pair of invertible normalizing flow models to map individuals belonging into different groups into a single one. Here, a correction vector may still be computed as the new representations are still interpretable in feature space.

Experimentation / Conclusions

In this paper we presented a framework for interpretable fair learning based around the computation of correction vectors. Experiments showed that existing methodologies may be constrained for explicit computation of correction vectors with negligible losses in performance. In the light of recent developments in regulation at the EU level, we reason that our correction vector framework is able to open up the black box of fair DNNs. Under the GDPR, it remains to be seen which content requirements are to be met when having to disclose “the logic involved” and, secondly, their realization in practice. The “black box” problem and the way it limits legal compliance in practice seems to not have been considered when the GDPR was brought to life.

Thus, the transparency requirements of the GDPR must be interpreted retrospectively with regard to the functioning of DNNs.

Between the lines

AI methodologies are facing more stringent law requirements by the day. While the GDPR was not written with black box models in mind it should be clear that providing interpretable fair models is the way to go for further research. This leads to more attention at the intersection of law interpretation and machine learning technology. Furthering our understanding of how end users may trust AI systems will be more and more important.

References

[1] https://www.forbes.com/sites/mzhang/2015/07/01/google-photos-tags-two-african-americans-as-gorillas-through-facial-recognition-software/?sh=7cc29def713d

[2] https://www.theguardian.com/technology/2016/sep/08/artificial-intelligence-beauty-contest-doesnt-like-black-people

[3] Malgieri, Gianclaudio, The Concept of Fairness in the GDPR: A Linguistic and Contextual Interpretation (January 10, 2020). Proceedings of FAT* ’20, January 27–30, 2020. ACM, New York, NY, USA, 14 pages. DOI: 10.1145/3351095.3372868, Available at SSRN: https://ssrn.com/abstract=3517264

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: Frontier AI Safety Commitments, AI Seoul Summit 2024

AI Policy Corner: The Colorado State Deepfakes Act

Special Edition: Honouring the Legacy of Abhishek Gupta (1992–2024)

AI Policy Corner: The Turkish Artificial Intelligence Law Proposal

From Funding Crisis to AI Misuse: Critical Digital Rights Challenges from RightsCon 2025

related posts

  • Research summary: AI Mediated Exchange Theory by Xiao Ma and Taylor W. Brown

    Research summary: AI Mediated Exchange Theory by Xiao Ma and Taylor W. Brown

  • FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines

    FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines

  • Discursive framing and organizational venues: mechanisms of artificial intelligence policy adoption

    Discursive framing and organizational venues: mechanisms of artificial intelligence policy adoption

  • RAIN Africa and MAIEI on The Future of Responsible AI in Africa (Public Consultation Summary)

    RAIN Africa and MAIEI on The Future of Responsible AI in Africa (Public Consultation Summary)

  • Virtues Not Principles

    Virtues Not Principles

  • Risk of AI in Healthcare: A Study Framework

    Risk of AI in Healthcare: A Study Framework

  • Predatory Medicine: Exploring and Measuring the Vulnerability of Medical AI to Predatory Science

    Predatory Medicine: Exploring and Measuring the Vulnerability of Medical AI to Predatory Science

  • Online public discourse on artificial intelligence and ethics in China: context, content, and implic...

    Online public discourse on artificial intelligence and ethics in China: context, content, and implic...

  • Who will share Fake-News on Twitter? Psycholinguistic cues in online post histories discriminate bet...

    Who will share Fake-News on Twitter? Psycholinguistic cues in online post histories discriminate bet...

  • Deployment corrections: An incident response framework for frontier AI models

    Deployment corrections: An incident response framework for frontier AI models

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.