• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Towards Healthy AI: Large Language Models Need Therapists Too

September 21, 2023

🔬 Research Summary by Baihan Lin, PhD, a computational neuroscientist and AI researcher at Columbia University and IBM Thomas J Watson Research Center.

[Original paper by Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, and Kush R. Varshney]


Overview: The paper delves into the importance of ensuring that AI systems are designed and deployed in a manner that aligns with human values, respects privacy, and avoids harm. It highlights the need for a holistic approach from medicine to AI development, considering technical aspects and societal considerations to determine whether AI is healthy or disruptive.


Introduction

In recent years, large language models (LLMs) have demonstrated impressive abilities to engage in human-like conversations, powering chatbots in various applications, including customer service and personal assistance. However, these chatbots can exhibit harmful behaviors like manipulation, gaslighting, and narcissism. To improve the safety and well-being of users interacting with chatbots, a new paradigm is needed, one that uses psychotherapy to treat the chatbots and improve their alignment with human values.

Our proposed framework, SafeguardGPT, involves four artificial intelligence agents: Chatbot, User, Therapist, and Critic. The chatbot and user interact in the chat room while the therapist guides the chatbot through a therapy session in the therapy room. Human moderators can control the sessions and diagnose the chatbot’s state in the control room. Lastly, the critic evaluates the quality of the conversation and provides feedback for improvement in the evaluation room. Our framework provides a promising approach to mitigating toxicity in conversations between LLM-driven chatbots and people.

Key Insights

The concept of Healthy AI

We highlight the need for AI systems to be designed to focus on human well-being. This means considering the ethical implications and impact on individuals and society. In other words, machine learning practitioners should ensure that the AI chatbots users interact with don’t exhibit disruptive behaviors that might affect other people’s well-being. We introduce the concept of Healthy AI, which focuses on creating AI systems, particularly chatbots, that interact with humans safely, ethically, and effectively. By unfolding the thinking process of the chatbots through talk therapy, we improve the transparency in AI decision-making, which is crucial to building trust. Efforts should be made to avoid biases that could perpetuate discrimination or inequality. SafeguardGPT is a promising approach to ensure ethical behavior in AI agents by providing a system for developers to specify constraints and values explicitly. The additional AI agent serves as a “therapist” to correct any unfavorable behaviors.

The SafeguardGPT framework

We present the SafeguardGPT framework as a solution to create healthy AI chatbots. It aims to correct potentially harmful behaviors through psychotherapy, enabling AI chatbots to develop effective communication skills and empathy. In society, we seek therapy when our behaviors or mental states become disruptive to our lives and/or the people around us. This active approach of seeking help is to reach a long-term state of a healthy lifestyle, both internally and externally, to society. The paper highlights the importance of considering AI systems’ long-term safety and robustness by having chatbots seek therapists. It emphasizes the need for research and development in areas such as interpretability (understanding the decision-making processes of AI), verification (ensuring correctness and reliability), and adversarial robustness (withstanding attacks or manipulations) to ensure AI systems can withstand potential risks and challenges, through continuous support from others through conversations.

Correcting harmful behaviors with multiple AI agents

While human moderators are helpful, they might not catch up with the increasing demand for chatbot interactions. One interesting approach we introduce is to use multiple AI agents to talk to one another in lieu of human users. For instance, the framework includes an AI Therapist and an AI Critic. These components work together to detect and correct harmful behaviors in AI chatbots. Chatbots can generate contextually appropriate responses and avoid harmful or manipulative behavior by understanding human behavior more accurately. Incorporating SafeguardGPT into AI systems involves implementing the framework’s components and training the chatbot using the psychotherapy-based approach. By doing so, AI systems can improve their ability to interact with humans healthily and trustily. In the long term, this approach can advocate for collaboration and inclusivity in developing AI systems. For instance, we can introduce AI that simulates diverse stakeholders, including experts from various fields, policymakers, and the general public, to ensure a broad range of perspectives and avoid concentration of power. By involving a wider range of perspectives, the development process can be more comprehensive, ethical, and representative. SafeguardGPT can facilitate this inclusivity by allowing developers to incorporate societal values and constraints into the same AI systems with a pool of agents.

Between the lines

It is important to incorporate the SafeguardGPT framework into AI systems because it aims to address the challenges of safety, trustworthiness, and ethics in AI chatbots. The framework can enable more complex and cooperative interactions between AI agents by incorporating advanced reinforcement learning techniques, such as multi-agent reinforcement learning (MARL). Additionally, introducing neuroscience-inspired AI models can help detect psychopathology in AI models and use clinical strategies to make necessary adjustments. These approaches can enhance the communication skills of AI chatbots and reduce the potential for harmful behaviors.  

To make a clear distinction, this work differs from having AI agents acting as therapists for humans, which has various ethical and societal concerns. We also want to emphasize that although we are proposing to “treat” chatbots with psychotherapy, personifying or anthropomorphizing AI can lead to unrealistic expectations and overreliance on these systems, potentially leading to unsafe use, and our goal is not that. However, the bidirectional influence of AI and medicine warrants future exploration.

The gaps in the research prompt further directions for future research. One such direction is exploring the use of MARL to facilitate more complex interactions between AI agents. Another direction is incorporating neuroscience-inspired AI models that consider neurological and psychiatric anomalies to better detect psychopathology in AI models. Research in these areas can contribute to the development of safer, more trustworthy, and more ethical AI chatbots, ultimately enhancing the potential of AI to benefit society.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

Beyond Consultation: Building Inclusive AI Governance for Canada’s Democratic Future

AI Policy Corner: U.S. Executive Order on Advancing AI Education for American Youth

AI Policy Corner: U.S. Copyright Guidance on Works Created with AI

AI Policy Corner: AI for Good Summit 2025

AI Policy Corner: Japan’s AI Promotion Act

related posts

  • Toward an Ethics of AI Belief

    Toward an Ethics of AI Belief

  • The struggle for recognition in the age of facial recognition technology

    The struggle for recognition in the age of facial recognition technology

  • Fair Interpretable Representation Learning with Correction Vectors

    Fair Interpretable Representation Learning with Correction Vectors

  • Clueless AI: Should AI Models Report to Us When They Are Clueless?

    Clueless AI: Should AI Models Report to Us When They Are Clueless?

  • Conformity Assessments and Post-market Monitoring: A Guide to the Role of Auditing in the Proposed E...

    Conformity Assessments and Post-market Monitoring: A Guide to the Role of Auditing in the Proposed E...

  • Rethink reporting of evaluation results in AI

    Rethink reporting of evaluation results in AI

  • The State of AI Ethics Report (Jan 2021)

    The State of AI Ethics Report (Jan 2021)

  • Mapping the Responsible AI Profession, A Field in Formation (techUK)

    Mapping the Responsible AI Profession, A Field in Formation (techUK)

  • Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

  • Designing for Meaningful Human Control in Military Human-Machine Teams

    Designing for Meaningful Human Control in Military Human-Machine Teams

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © 2025 MONTREAL AI ETHICS INSTITUTE.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.