• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Fair Interpretable Representation Learning with Correction Vectors

June 19, 2022

🔬 Research Summary by Mattia Cerrato and Marius Köppel

Marius Köppel is a PhD. student at the Johannes Gutenberg-Universität Mainz working on FPGA based data acquisition systems, lepton flavor violation and fairness in Machine Learning.

Mattia Cerrato is a Post-Doc at the Johannes Gutenberg-Universität Mainz working at the intersection of interpretability and fairness in Machine Learning.

[Original paper by Mattia Cerrato, Alesia Vallenas Coronel, Marius Köppel, Alexander Segner, Roberto Esposito, Stefan Kramer]


Overview: Neural networks are inherently opaque. While it is possible to train them to learn “fair representations”, it is still hard to make sense of their decisions on an individual basis. This is in contrast with law requirements in the EU. We propose a new technique to open the “black box” of fair neural networks. 


Introduction

AI methodologies based on deep neural networks are nowadays hugely popular, especially in Computer Vision and Natural Language Processing. Compared to other AI/machine learning techniques, deep neural networks promise a higher level of performance and thus to enable a plethora of applications. 

In recent years, the research community and the general public has been focusing more and more on the limitations of AI and the failure cases in which this set of technologies and deep neural networks in particular may be impacting people’s well-being negatively. 

Perhaps two of the most famous failure points in applications employing deep neural networks have been reported by Forbes and The Guardian: “Google Photos Tags Two African-Americans As Gorillas Through Facial Recognition Software” [1] and “A beauty contest was judged by AI and the robots didn’t like dark skin” [2]. There is therefore true cause for worrying that negative biases in society may be transferring to machine learning models, thus leading to models which are discriminating in turn.

While different approaches show that “unfair” behavior may be avoided in deep neural network models via group fairness techniques, we still do not know much about their internal reasoning.  Therefore, we are still unable to answer individual appeals. The legal status of these models is thus unclear. As an example, the General Data Protection Regulation (GDPR) in the EU states that individuals subjected to automatic decision making have a “right to an explanation” [3]. 

In this space, our research constrains deep neural networks so that they are decomposable and therefore human-readable. Our framework is centered around the concept of a correction vector, i.e. a vector of features which is interpretable in feature space and represents the “fairness correction” each data point is subject to so that the results will be statistically fair.

Our recent paper focuses on answering the following three questions:

  • Are the proposed models both fair and accurate?
  • Are the interpretable models as fair as their non-interpretable counterparts?
  • Are the correction vectors interpretable?

Key Insights

Group Fairness in Machine Learning

One possible approach to avoid discriminating models is to remove information about the sensitive attribute (e.g. ethnicity) from the model’s internal representations. These techniques are commonly referred to as “fair representation learning”, where a projection f : X → Z from feature space X into a latent space Z is learned. It can be shown that the information about the sensitive attribute s is minimal in the latent space.

This reasoning is at the core of commonly employed “fair” neural networks (top). It learns a mapping Z into an opaque feature space and then a decision Y. Our correction vector framework (bottom) learns a debiased but interpretable representation X. This provides individual users and analysts with further insight into the debiasing process. The correction displayed in the figure is the average correction learned by one of our models on the LSAT feature of the Law Students dataset, in which the task is to provide law school admissions in a balanced fashion between student ethnicities.

Our approach is to learn fair corrections for each of the dimensions in X. Fair corrections are then added to the original features so that the semantics of the algorithm are as clear as possible. For each feature, one can obtain a clear understanding of how that feature has been changed to counteract the bias in the data. We compute these corrections in two different methodologies.

Explicit Computation of Correction Vectors with Feedforward Networks

Explicit computation requires constraining a neural network architecture so that it does not leave feature space. Different fairness methodologies may be constrained in such a way. A gradient reversal-based neural network constrained for interpretability to be part of the proposed framework (see figure above). The learned correction vector w matches in size with X, and can then be summed with the original representation X and analyzed for interpretability.

Implicit Computation of Correction Vectors with Normalizing Flows

Implicit computation relies on a pair of invertible normalizing flow models to map individuals belonging into different groups into a single one. Here, a correction vector may still be computed as the new representations are still interpretable in feature space.

Experimentation / Conclusions

In this paper we presented a framework for interpretable fair learning based around the computation of correction vectors. Experiments showed that existing methodologies may be constrained for explicit computation of correction vectors with negligible losses in performance. In the light of recent developments in regulation at the EU level, we reason that our correction vector framework is able to open up the black box of fair DNNs. Under the GDPR, it remains to be seen which content requirements are to be met when having to disclose “the logic involved” and, secondly, their realization in practice. The “black box” problem and the way it limits legal compliance in practice seems to not have been considered when the GDPR was brought to life.

Thus, the transparency requirements of the GDPR must be interpreted retrospectively with regard to the functioning of DNNs.

Between the lines

AI methodologies are facing more stringent law requirements by the day. While the GDPR was not written with black box models in mind it should be clear that providing interpretable fair models is the way to go for further research. This leads to more attention at the intersection of law interpretation and machine learning technology. Furthering our understanding of how end users may trust AI systems will be more and more important.

References

[1] https://www.forbes.com/sites/mzhang/2015/07/01/google-photos-tags-two-african-americans-as-gorillas-through-facial-recognition-software/?sh=7cc29def713d

[2] https://www.theguardian.com/technology/2016/sep/08/artificial-intelligence-beauty-contest-doesnt-like-black-people

[3] Malgieri, Gianclaudio, The Concept of Fairness in the GDPR: A Linguistic and Contextual Interpretation (January 10, 2020). Proceedings of FAT* ’20, January 27–30, 2020. ACM, New York, NY, USA, 14 pages. DOI: 10.1145/3351095.3372868, Available at SSRN: https://ssrn.com/abstract=3517264

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

related posts

  • Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

  • Why was your job application rejected: Bias in Recruitment Algorithms? (Part 2)

    Why was your job application rejected: Bias in Recruitment Algorithms? (Part 2)

  • Incentivized Symbiosis: A Paradigm for Human-Agent Coevolution

    Incentivized Symbiosis: A Paradigm for Human-Agent Coevolution

  • Reduced, Reused, and Recycled: The Life of a Benchmark in Machine Learning Research

    Reduced, Reused, and Recycled: The Life of a Benchmark in Machine Learning Research

  • Social Robots and Empathy: The Harmful Effects of Always Getting What We Want

    Social Robots and Empathy: The Harmful Effects of Always Getting What We Want

  • Research summary: Challenges in Supporting Exploratory Search through Voice Assistants

    Research summary: Challenges in Supporting Exploratory Search through Voice Assistants

  • Research summary: Using Multimodal Sensing to Improve Awareness in Human-AI Interaction

    Research summary: Using Multimodal Sensing to Improve Awareness in Human-AI Interaction

  • Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles

    Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles

  • Response to Office of the Privacy Commissioner of Canada Consultation Proposals pertaining to amendm...

    Response to Office of the Privacy Commissioner of Canada Consultation Proposals pertaining to amendm...

  • Applying the TAII Framework on Tesla Bot

    Applying the TAII Framework on Tesla Bot

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.