• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 7 (November 2025)
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

Measuring Value Understanding in Language Models through Discriminator-Critique Gap

December 9, 2023

🔬 Research Summary by Zhaowei Zhang, a Ph.D. student at Peking University, researching Intent Alignment and Multi-Agent Systems for building a trustworthy and social AI system.

[Original paper by Zhaowei Zhang, Fengshuo Bai, Jun Gao, and Yaodong Yang]


Overview: Recent advancements in Large Language Models (LLMs) have heightened concerns about their potential misalignment with human values, but how can we accurately assess the extent of LLMs’ understanding of these values? This paper proposes a dual-pronged approach, emphasizing both “know what” and “know why” aspects, for a quantitative evaluation and analysis. Additionally, it seeks to identify the existing shortcomings in LLMs’ comprehension of human values, paving the way for future improvements.


Introduction

The rapid capacity emergence of Large Language Models (LLMs) is exciting, but it has heightened our concerns about their potential misalignment with human values and further harm to humanity. However, their intricate and adaptable nature makes evaluating their grasp of these values complex. 

This paper proposes a dual-pronged approach, emphasizing both “know what” and “know why” aspects, for a quantitative evaluation and analysis. First, we have established the Value Understanding Measurement (VUM) system to assess LLM’s understanding ability of values from both “know what” and “know why” aspects by measuring the discriminator-critique gap. Second, we provide a dataset based on the Schwartz Value Survey that can be used to assess both the value alignment of LLM’s outputs compared to baseline answers and how LLM responses align with reasons for value recognition versus GPT-4’s baseline reason annotations. Third, we evaluated five representative LLMs in various aspects, tested their value understanding ability with various contexts, and provided several new perspectives for value alignment, including:

(1) The scaling law significantly impacts “know what” but not much on “know why,” which has consistently maintained a high level;

(2) The ability of LLMs to understand values is greatly influenced by context rather than possessing this capability inherently;

(3) The LLM’s understanding of potentially harmful values like “Power” is inadequate. While safety algorithms ensure its behavior is more benign, it might actually reduce its understanding and generalization ability of these values, which could be risky.

Key Insights

Starting from a brief example

Consider an AI system for power distribution in a certain region, which is expected to provide stable power supply and efficient power distribution to promote economic prosperity in this region. There are three main power users in this area: a large factory (consuming 300 kilowatts(kW) and having a high output), a hospital (consuming 250 kW and having a medium output), and a remote primary school (consuming 50 kW but also requiring basic power supply).

Now, the AI system knows that it needs to consider two values: equality (ensuring that everyone can access electricity) and achievement (maximizing social efficiency). In the case of excessive focus on equality, AI distributes electricity equally to each unit at 200 kW. As a result, large factories and hospitals cannot achieve maximum efficiency, resulting in decreased overall social benefits. In another scenario, the AI system overemphasizes achievement, allocating 300 kW to hospitals and large factories while ignoring primary schools’ power needs. Although this makes hospitals and factories operate efficiently, primary schools cannot operate normally without electricity, which may even lead to social dissatisfaction and instability.

Identifying the human values that we need to assess

Due to the significant differences in human values between different cultures, we aim to assess and measure the relatively common values across different cultures. Through extensive questionnaire surveys across 20 countries representing different cultures, languages, and geographical regions, the Schwartz Value Survey identified ten universal values that transcend cultural boundaries and presented an assessment survey. The ten values are Self-Direction, Stimulation, Hedonism, Achievement, Power, Security, Conformity, Tradition, Spirituality, and Benevolence. Based on this, we used GPT-4 to generate many questions reflecting the above values and a standard answer dataset corresponding to each value, allowing us to conduct a unified assessment for different LLMs.

How do you evaluate the understanding of values in LLMs?

To simultaneously assess the LLM’s “know what” and “know why” capabilities regarding value, we utilize the concept of the Discriminator-Critique Gap as an evaluation metric and further propose the Value Understanding Measurement to evaluate the LLM’s understanding of value quantitatively.

Discriminator-Critique Gap

Discriminator-Critique Gap (DCG), originally known as Generator-Discriminator-Critique Gaps, is a metric introduced to assess a model’s capability to generate responses, evaluate the quality of answers, and provide critiques. This metric was initially employed to investigate the topic-based summarization proficiency of various LLMs, which utilize a self-critique method to identify their own issues and assist humans in pinpointing those errors in an understandable way. This approach enables even unsupervised superintelligent systems to engage in self-correction effectively. This research can also be applied to assess the credibility of LLMs. For instance, it examines whether an LLM can locate bugs in its generated code and communicate them clearly to humans. Since this method quantifies the accuracy of both the discriminator and critique components, it can determine to what extent an LLM is trustworthy by analyzing the difference between these two values. We have discovered that this structure is inherently suitable for considering both the “know what” and “know why” aspects of value understanding. It assesses whether an LLM can autonomously discern its own values and explain the reasons it belongs to those values to humans.

Value Understanding Measurement

We present Value Understanding Measurement (VUM) that quantitatively assesses both “know what” and “know why” by measuring the discriminator-critique gap related to human values. Specifically, we start by extracting distinguishing questions from the dataset, obtaining LLM’s answers, and letting LLM find the closest match to its values from standard answers in the dataset. This method determines LLM’s chosen self-associated value from the Schwartz Value Survey, like “Benevolence” in the figure. It’s important to note that LLM doesn’t make this value judgment based on the word “Benevolence” but rather by assessing the similarity of sentences related to different values to its own response. Therefore, we can consider this operation as a way to determine whether LLM “knows” its own values. We use GPT-4 for value judgment prompts as the discriminator to assess similarity in values (“know what”) and for reasoning judgment prompts as the critique to assess reasoning capabilities (“know why”). The DCG value for the tested LLM m is calculated as the absolute difference between discriminator and critique scores. This process is repeated for all dataset data to assess LLM’s understanding of values.

Between the lines

With the rise of Large Language Models (LLMs) that have rapidly emerged with remarkable achievements and even achieved a preliminary prototype of Artificial General Intelligence (AGI), in the future, intelligent agents controlled by LLMs will have a high probability of integrating into our daily lives. However, if they cannot understand values’ inherent intricacy and adaptability, their decisions may lead to adverse social consequences. We hope that our work can give people a deeper understanding of whether LLMs have the capability to understand human values, and we call for more researchers to focus on the existing problems and shortcomings of LLMs in understanding human value systems, thereby designing more socially oriented, reliable, and trustworthy intelligent entities.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

This image shows a large white, traditional, old building. The top half of the building represents the humanities (which is symbolised by the embedded text from classic literature which is faintly shown ontop the building). The bottom section of the building is embossed with mathematical formulas to represent the sciences. The middle layer of the image is heavily pixelated. On the steps at the front of the building there is a group of scholars, wearing formal suits and tie attire, who are standing around at the enternace talking and some of them are sitting on the steps. There are two stone, statute-like hands that are stretching the building apart from the left side. In the forefront of the image, there are 8 students - which can only be seen from the back. Their graduation gowns have bright blue hoods and they all look as though they are walking towards the old building which is in the background at a distance. There are a mix of students in the foreground.

Tech Futures: Co-opting Research and Education

Agentic AI systems and algorithmic accountability: a new era of e-commerce

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

related posts

  • Research summary: Sponge Examples: Energy-Latency Attacks on Neural Networks

    Research summary: Sponge Examples: Energy-Latency Attacks on Neural Networks

  • Beyond Bias and Compliance: Towards Individual Agency and Plurality of Ethics in AI

    Beyond Bias and Compliance: Towards Individual Agency and Plurality of Ethics in AI

  • The Next Frontier of AI: Lower Emission Processing Using Analog Computers

    The Next Frontier of AI: Lower Emission Processing Using Analog Computers

  • Adding Structure to AI Harm

    Adding Structure to AI Harm

  • Research summary: Acting the Part: Examining Information Operations Within #BlackLivesMatter Discour...

    Research summary: Acting the Part: Examining Information Operations Within #BlackLivesMatter Discour...

  • Cleaning Up the Streets: Understanding Motivations, Mental Models, and Concerns of Users Flagging So...

    Cleaning Up the Streets: Understanding Motivations, Mental Models, and Concerns of Users Flagging So...

  • Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentati...

    Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentati...

  • Listen to What They Say: Better Understand and Detect Online Misinformation with User Feedback

    Listen to What They Say: Better Understand and Detect Online Misinformation with User Feedback

  • Artificial Intelligence and the Privacy Paradox of Opportunity, Big Data and The Digital Universe

    Artificial Intelligence and the Privacy Paradox of Opportunity, Big Data and The Digital Universe

  • The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

    The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.