• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Towards Healthy AI: Large Language Models Need Therapists Too

September 21, 2023

🔬 Research Summary by Baihan Lin, PhD, a computational neuroscientist and AI researcher at Columbia University and IBM Thomas J Watson Research Center.

[Original paper by Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, and Kush R. Varshney]


Overview: The paper delves into the importance of ensuring that AI systems are designed and deployed in a manner that aligns with human values, respects privacy, and avoids harm. It highlights the need for a holistic approach from medicine to AI development, considering technical aspects and societal considerations to determine whether AI is healthy or disruptive.


Introduction

In recent years, large language models (LLMs) have demonstrated impressive abilities to engage in human-like conversations, powering chatbots in various applications, including customer service and personal assistance. However, these chatbots can exhibit harmful behaviors like manipulation, gaslighting, and narcissism. To improve the safety and well-being of users interacting with chatbots, a new paradigm is needed, one that uses psychotherapy to treat the chatbots and improve their alignment with human values.

Our proposed framework, SafeguardGPT, involves four artificial intelligence agents: Chatbot, User, Therapist, and Critic. The chatbot and user interact in the chat room while the therapist guides the chatbot through a therapy session in the therapy room. Human moderators can control the sessions and diagnose the chatbot’s state in the control room. Lastly, the critic evaluates the quality of the conversation and provides feedback for improvement in the evaluation room. Our framework provides a promising approach to mitigating toxicity in conversations between LLM-driven chatbots and people.

Key Insights

The concept of Healthy AI

We highlight the need for AI systems to be designed to focus on human well-being. This means considering the ethical implications and impact on individuals and society. In other words, machine learning practitioners should ensure that the AI chatbots users interact with don’t exhibit disruptive behaviors that might affect other people’s well-being. We introduce the concept of Healthy AI, which focuses on creating AI systems, particularly chatbots, that interact with humans safely, ethically, and effectively. By unfolding the thinking process of the chatbots through talk therapy, we improve the transparency in AI decision-making, which is crucial to building trust. Efforts should be made to avoid biases that could perpetuate discrimination or inequality. SafeguardGPT is a promising approach to ensure ethical behavior in AI agents by providing a system for developers to specify constraints and values explicitly. The additional AI agent serves as a “therapist” to correct any unfavorable behaviors.

The SafeguardGPT framework

We present the SafeguardGPT framework as a solution to create healthy AI chatbots. It aims to correct potentially harmful behaviors through psychotherapy, enabling AI chatbots to develop effective communication skills and empathy. In society, we seek therapy when our behaviors or mental states become disruptive to our lives and/or the people around us. This active approach of seeking help is to reach a long-term state of a healthy lifestyle, both internally and externally, to society. The paper highlights the importance of considering AI systems’ long-term safety and robustness by having chatbots seek therapists. It emphasizes the need for research and development in areas such as interpretability (understanding the decision-making processes of AI), verification (ensuring correctness and reliability), and adversarial robustness (withstanding attacks or manipulations) to ensure AI systems can withstand potential risks and challenges, through continuous support from others through conversations.

Correcting harmful behaviors with multiple AI agents

While human moderators are helpful, they might not catch up with the increasing demand for chatbot interactions. One interesting approach we introduce is to use multiple AI agents to talk to one another in lieu of human users. For instance, the framework includes an AI Therapist and an AI Critic. These components work together to detect and correct harmful behaviors in AI chatbots. Chatbots can generate contextually appropriate responses and avoid harmful or manipulative behavior by understanding human behavior more accurately. Incorporating SafeguardGPT into AI systems involves implementing the framework’s components and training the chatbot using the psychotherapy-based approach. By doing so, AI systems can improve their ability to interact with humans healthily and trustily. In the long term, this approach can advocate for collaboration and inclusivity in developing AI systems. For instance, we can introduce AI that simulates diverse stakeholders, including experts from various fields, policymakers, and the general public, to ensure a broad range of perspectives and avoid concentration of power. By involving a wider range of perspectives, the development process can be more comprehensive, ethical, and representative. SafeguardGPT can facilitate this inclusivity by allowing developers to incorporate societal values and constraints into the same AI systems with a pool of agents.

Between the lines

It is important to incorporate the SafeguardGPT framework into AI systems because it aims to address the challenges of safety, trustworthiness, and ethics in AI chatbots. The framework can enable more complex and cooperative interactions between AI agents by incorporating advanced reinforcement learning techniques, such as multi-agent reinforcement learning (MARL). Additionally, introducing neuroscience-inspired AI models can help detect psychopathology in AI models and use clinical strategies to make necessary adjustments. These approaches can enhance the communication skills of AI chatbots and reduce the potential for harmful behaviors.  

To make a clear distinction, this work differs from having AI agents acting as therapists for humans, which has various ethical and societal concerns. We also want to emphasize that although we are proposing to “treat” chatbots with psychotherapy, personifying or anthropomorphizing AI can lead to unrealistic expectations and overreliance on these systems, potentially leading to unsafe use, and our goal is not that. However, the bidirectional influence of AI and medicine warrants future exploration.

The gaps in the research prompt further directions for future research. One such direction is exploring the use of MARL to facilitate more complex interactions between AI agents. Another direction is incorporating neuroscience-inspired AI models that consider neurological and psychiatric anomalies to better detect psychopathology in AI models. Research in these areas can contribute to the development of safer, more trustworthy, and more ethical AI chatbots, ultimately enhancing the potential of AI to benefit society.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: Japan’s AI Promotion Act

AI Policy Corner: Texas and New York: Comparing U.S. State-Level AI Laws

What is Sovereign Artificial Intelligence?

AI Policy Corner: The Kenya National AI Strategy

AI Policy Corner: New York City Local Law 144

related posts

  • Promises and Challenges of Causality for Ethical Machine Learning

    Promises and Challenges of Causality for Ethical Machine Learning

  • Robust Distortion-free Watermarks for Language Models

    Robust Distortion-free Watermarks for Language Models

  • Longitudinal Fairness with Censorship

    Longitudinal Fairness with Censorship

  • Discursive framing and organizational venues: mechanisms of artificial intelligence policy adoption

    Discursive framing and organizational venues: mechanisms of artificial intelligence policy adoption

  • Automating Extremism: Mapping the Affective Roles of Artificial Agents in Online Radicalization

    Automating Extremism: Mapping the Affective Roles of Artificial Agents in Online Radicalization

  • Defining organizational AI governance

    Defining organizational AI governance

  • Beyond the Frontier: Fairness Without Accuracy Loss

    Beyond the Frontier: Fairness Without Accuracy Loss

  • Research summary: The Toxic Potential of YouTube's Feedback Loop

    Research summary: The Toxic Potential of YouTube's Feedback Loop

  • Online public discourse on artificial intelligence and ethics in China: context, content, and implic...

    Online public discourse on artificial intelligence and ethics in China: context, content, and implic...

  • Hazard Contribution Modes of Machine Learning Components (Research Summary)

    Hazard Contribution Modes of Machine Learning Components (Research Summary)

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.