• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

  • Articles
    • Public Policy
    • Privacy & Security
    • Human Rights
      • Ethics
      • JEDI (Justice, Equity, Diversity, Inclusion
    • Climate
    • Design
      • Emerging Technology
    • Application & Adoption
      • Health
      • Education
      • Government
        • Military
        • Public Works
      • Labour
    • Arts & Culture
      • Film & TV
      • Music
      • Pop Culture
      • Digital Art
  • Columns
    • AI Policy Corner
    • Recess
  • The AI Ethics Brief
  • AI Literacy
    • Research Summaries
    • AI Ethics Living Dictionary
    • Learning Community
  • The State of AI Ethics Report
    • Volume 6 (February 2022)
    • Volume 5 (July 2021)
    • Volume 4 (April 2021)
    • Volume 3 (Jan 2021)
    • Volume 2 (Oct 2020)
    • Volume 1 (June 2020)
  • About
    • Our Contributions Policy
    • Our Open Access Policy
    • Contact
    • Donate

A collection of principles for guiding and evaluating large language models

February 5, 2024

🔬 Research Summary by Matthias Samwald, an associate professor at the Medical University of Vienna and works on making powerful AI systems driving biomedical progress more trustworthy.

[Original paper by Konstantin Hebenstreit, Robert Praas, and Matthias Samwald]

NOTE: Published at Socially Responsible Language Modelling Research (SoLaR) workshop, NeurIPS 2023


Overview: This paper addresses the challenges in assessing and guiding the behavior of large language models (LLMs). It proposes a set of core principles based on an extensive review of literature from a wide variety of disciplines, including explainability, AI system safety, human critical thinking, and ethics.


Introduction

Large language models (LLMs) like GPT-4 show great promise. But what are the principles that should govern how they reason and behave? 

This paper tries to provide answers to these questions by assembling ideas from a wide variety of sources, such as work on machine learning explainability, AI system safety, human critical thinking, and ethical guidelines (Figure 1). The study’s main objective is to compile a set of core principles for steering and evaluating LLMs’ reasoning, using an extensive literature review and a small-scale expert survey. These principles can prove useful to monitor and improve model behavior during training and inference and can guide human evaluation of model reasoning.

Key Insights

Compilation and curation of principles

The authors set out the study by conducting an extensive literature review across several domains, leading to the extraction of 220 principles. These were categorized into seven areas: assumptions and perspectives, information and evidence, robustness and security, utility, reasoning, ethics, and implications. A consolidated set of 37 core principles was distilled, considering the relevance to current LLMs and adaptability to various reasoning tasks and modalities.

Expert Survey and Validation

For external validation, domain experts were surveyed to rate the importance of each principle. This helped refine and validate the core principles, ensuring they align with expert opinions and are relevant to practical applications.

Results and Implications

The principles are meant to guide the reasoning of LLMs. They are envisioned to serve in monitoring and steering models, improving behavior during training, and aiding in human evaluation.

Between the lines

The principles are biased towards current language-based applications of LLMs. Expansions will likely be necessary soon to address emerging concerns, as AI systems evolve from LLMs into more autonomous agents.

The work is preliminary and the set of principles is far from exhaustive. The authors call for further community engagement to refine and expand these principles. They suggest methods like Delphi studies or online platforms for iterative principle development and emphasize the need for input from a broader range of stakeholders.

There is ample opportunity for empirical research testing the practical utility of these (and other) principles. As a simple example, prompts derived from these principles can be tested on state-of-the-art models to evaluate their impact on performance, explainability, and ethical alignment.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

ALL IN Conference 2025: Four Key Takeaways from Montreal

Beyond Dependency: The Hidden Risk of Social Comparison in Chatbot Companionship

AI Policy Corner: Restriction vs. Regulation: Comparing State Approaches to AI Mental Health Legislation

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

related posts

  • Perspectives and Approaches in AI Ethics: East Asia (Research Summary)

    Perspectives and Approaches in AI Ethics: East Asia (Research Summary)

  • Using attention methods to predict judicial outcomes

    Using attention methods to predict judicial outcomes

  • Policy Brief: AI’s Promise and Peril for the U.S. Government (Research summary)

    Policy Brief: AI’s Promise and Peril for the U.S. Government (Research summary)

  • Achieving a ‘Good AI Society’: Comparing the Aims and Progress of the EU and the US

    Achieving a ‘Good AI Society’: Comparing the Aims and Progress of the EU and the US

  • Human-centred mechanism design with Democratic AI

    Human-centred mechanism design with Democratic AI

  • Research summary: Changing My Mind About AI, Universal Basic Income, and the Value of Data

    Research summary: Changing My Mind About AI, Universal Basic Income, and the Value of Data

  • Responsibility assignment won’t solve the moral issues of artificial intelligence

    Responsibility assignment won’t solve the moral issues of artificial intelligence

  • AI in Finance: 8 Frequently Asked Questions

    AI in Finance: 8 Frequently Asked Questions

  • Going public: the role of public participation approaches in commercial AI labs

    Going public: the role of public participation approaches in commercial AI labs

  • Research summary: Comparing Privacy Law GDPR Vs CCPA

    Research summary: Comparing Privacy Law GDPR Vs CCPA

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer


Articles

Columns

AI Literacy

The State of AI Ethics Report


 

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.

Contact

Donate


  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.