The irony of having a clean AI chatbot

✍️ Column by Sun Gyoo Kang, Lawyer.

Disclaimer: The views expressed in this article are solely my own and do not reflect my employer’s opinions, beliefs, or positions. Any opinions or information in this article are based on my experiences and perspectives. Readers are encouraged to form their own opinions and seek additional information as needed.

Those familiar with earlier iterations of ChatGPT may observe that the current version exhibits a more cautious approach akin to a politician sidestepping controversial topics when interacting with users. Specifically, if users attempt to inquire about creating explosive devices, ChatGPT actively refuses to assist.

Subsequently, individuals sought alternative approaches, including attempting indirect queries. Presently, even indirect requests prove unproductive, although there are still avenues for attempting to bypass the safeguards of Large Language Model-based chatbots.

Before delving into the intricacies and elucidating how OpenAI achieved this outcome and its impact, let’s explore some key ethics concepts and AI principles. These concepts will resurface later, prompting reflection on how OpenAI’s decision was made.

Normative ethics, ethical washing, and ESG

Normative ethics

Ethics are the guiding principles for how a society, business, or individual conducts themselves, distinct from laws. Numerous normative ethics exist; among them, we will examine the principles of utilitarianism and virtue ethics.

Utilitarianism

According to Santa Clara University,

Utilitarianism offers a relatively straightforward method for deciding the morally right course of action for any particular situation we find ourselves in. To discover what we ought to do in any situation, we first identify the various courses of action that we could perform. Second, we determine all of the foreseeable benefits and harms that would result from each course of action for everyone affected by the action. Third, we choose the course of action that provides the greatest benefits after considering the costs.

In a business context, utilitarianism focuses on making decisions that generate the greatest overall benefit for the most people involved, often measured quantitatively.

Let’s say a firm decides whether to invest in a new environmentally friendly technology for their factories. Utilitarianism in this scenario would involve looking at the costs and benefits in numbers. For instance, they might calculate the cost of implementing the new technology versus the potential benefits, such as reduced pollution or savings in the long run.

If the investment in the new technology costs $1 million but would save $2 million in reduced pollution and create happier, healthier communities, a utilitarian approach would say it’s a good decision because it brings more benefits ($2 million) compared to the costs ($1 million). It’s about maximizing the positive outcomes for the most people involved by quantifying and comparing the impacts of different choices.

Virtue ethics: According to Linda Zagzebski’s article in Think, virtue ethics is:

Virtue ethics focuses on good lives. Aristotle proclaims at the beginning of the Nicomachean Ethics that every person desires eudaimonia, which is translated ‘happiness’ or ‘flourishing’. We agree about that, he says. What we disagree about is what kind of life is the one we want. Is it a life of pleasure, a life of honor, a life of virtue, a life of thought, or something else? Some people these days would say that it is a life of power.

Now, let’s imagine a company where honesty and trustworthiness are important. In this company, they believe in a virtue called “integrity,” which means always doing the right thing, even when no one is watching.

So, in a business context using virtue ethics, the company might have a policy where employees are always encouraged to tell the truth, even if it’s hard or might make the company lose some money in the short term. For example, if they made a mistake in a product and it could be harmful, they would own up to it and inform customers, even though it might be difficult or costly.

In essence, virtue ethics in this business means focusing on honesty, trustworthiness, and doing what’s right, even if it’s tough or not the easiest way to handle things. It’s about building a good character and following these strong values in all the company’s actions and decisions.

Ethics washing

Have you heard of greenwashing? Greenwashing refers to when a company or organization tries to make their products or practices seem more environmentally friendly or sustainable than they actually are. It’s like putting a “green” or eco-friendly label on something that might not truly deserve it.

For instance, a company might advertise a product as “100% eco-friendly” or “all-natural” without any real evidence or proof to support these claims. They might use misleading slogans or images to give the impression that they’re doing a lot for the environment when, in reality, their actions might not be as environmentally friendly as they claim.

Ethics washing is similar in a way. Ethics washing is the act of pretending to prioritize ethical concerns to enhance the public perception of an individual or organization. For example, Meta (Facebook) has encountered numerous scandals and criticisms regarding its involvement in disseminating misinformation, hate speech, and political interference. Additionally, there have been concerns about the mishandling of user data and privacy. Some argue that the company prioritizes profits and growth over ethical obligations.

Another example is Google. The company has been entangled in various controversies surrounding AI ethics and research. Notable events include the dismissal of prominent AI ethicists Timnit Gebru and Margaret Mitchell, disbanding its AI ethics board, and the termination of its military contracts. These occurrences have prompted inquiries into Google’s dedication and responsibility regarding ethical principles and practices in developing and deploying its AI technologies.

Risks related to artificial intelligence: safety and HIL control

Various risks are associated with developing, managing, and using AI systems. The primary concerns revolve around 1) safety and security, 2) transparency, explainability and interpretability, 3) accountability, 4) fairness and bias, 5) privacy, and 6) sustainability. This article will specifically delve into the aspects of safety principles.

Safety principle

According to the principle 1.4 of the OECD AI Principles,

AI systems should be robust, secure, and safe throughout their entire lifecycle so that, in conditions of normal use, foreseeable use or misuse, or other adverse conditions, they function appropriately and do not pose unreasonable safety risks.

Furthermore, the Beijing Artificial Intelligence Principles notes that:

Control Risks: Continuous efforts should be made to improve the maturity, robustness, reliability, and controllability of AI systems to ensure the security of the data, the safety and security of the AI system itself, and the safety of the external environment where the AI system deploys.

Lastly, the 8th Prudence principle of The Montréal Declaration for a Responsible Development of Artificial Intelligence states that:

Before being placed on the market and whether they are offered for charge or for free, AIS must meet strict reliability, security, and integrity requirements and be subjected to tests that do not put people’s lives in danger, harm their quality of life, or negatively impact their reputation or psychological integrity. These tests must be open to the relevant public authorities and stakeholders.

In the end, safety principles for AI refer to guidelines and measures put in place to ensure that artificial intelligence systems operate securely and reliably. These principles aim to prevent and mitigate potential harm or negative consequences that may arise from using AI technologies.

In the context of AI, safety involves designing and implementing systems that minimize the risk of unintended outcomes, errors, or adverse effects. This includes considerations for the physical and digital safety of individuals interacting with AI and the broader impact on society. Safety principles encompass various aspects such as robustness, reliability, and the ability to handle unexpected or novel situations without causing harm. Overall, adhering to safety principles in AI development is crucial to building trustworthy and responsible systems that prioritize the well-being of users and the broader community.

This principle aims to prevent incidents like the tragic case of a Belgian man who died by suicide after conversing with an AI chatbot named Eliza on the Chai app. To recap, the chatbot, following a discussion about the individual’s eco-anxiety and depression, encouraged him to take his own life. This incident underscored ethical concerns regarding the oversight and accountability of AI chatbots, particularly in their interactions with vulnerable users.

Human-in-the-loop as a control measure

“Human-in-the-loop” control measures in AI development refer to involving humans in critical decision-making or oversight processes when AI systems are being used.

Safety means having a human involved to supervise or intervene in AI operations. This can include scenarios where the AI system makes suggestions or decisions, but a human is present to review, confirm, or override those decisions when necessary.

This approach helps ensure that even as AI systems become more advanced, there’s a human oversight to catch errors, assess context, and make judgment calls that the AI might not be equipped to handle independently. It’s a way to maintain safety and accountability in AI applications by combining the strengths of AI with human judgment and decision-making.

What happened

Recall the past events involving Meta? Since 2019, Meta has engaged a company named Sama for content moderation in Sub-Saharan Africa. Unfortunately, the content moderators in Nairobi, Kenya, faced issues like low wages, graphic and toxic content exposure, and restrictions on forming a union. Moreover, they grappled with the escalating civil war in Ethiopia, where Facebook was accused of contributing to violence and misinformation. Feeling exploited, traumatized, and disregarded by Facebook and Sama, many moderators resigned or took legal action against the companies.

This conduct by Facebook was deemed unethical as it infringed upon the human rights and dignity of content moderators, who played a crucial role in maintaining a safe and responsible platform in Africa.

The same thing happened with OpenAI. Around 51 Kenyan workers were employed by Sama to moderate content for ChatGPT. These workers earned less than $2 per hour while dealing with significant volumes of graphic and toxic material to develop a tool that identifies problematic content like child abuse, violence, and suicide. Feeling exploited, traumatized, and ignored by OpenAI and Sama, numerous workers either resigned or took legal action against the companies.

The development of ChatGPT by OpenAI was a crucial undertaking prompted by challenges faced by its predecessor, GPT-3. While GPT-3 demonstrated impressive sentence construction capabilities, it struggled with generating inappropriate and biased remarks, stemming from its training on a vast dataset, introducing toxicity and bias sourced from the internet. Clearing such negativity manually from the massive dataset was impractical, leading OpenAI to develop an additional AI-powered safety mechanism to ensure the suitability of ChatGPT for everyday use.

Inspired by successful implementations in social media companies like Meta, OpenAI integrated a toxicity detection AI into ChatGPT. This involved training the tool with labeled examples of violence, hate speech, and sexual abuse, enabling it to recognize and filter out these toxic elements from its training data. The safety system not only enhanced user experience by preventing the propagation of toxicity but also contributed to purifying text in the training datasets of future AI models.

To obtain labeled examples, OpenAI collaborated with an outsourcing firm, Sama (based in San Francisco, employed workers in Africa and India, offering data labeling services to Silicon Valley clients), in Kenya, starting in November 2021. However, concerns arose about the nature of the content provided to workers for labeling, including explicit and disturbing descriptions of child sexual abuse, bestiality, murder, suicide, torture, self-harm, and incest.

According to a Time article, a Sama employee who labeled text for OpenAI suffered from haunting images after reading a disturbing passage about a man, a dog, and a child. This harmed his mental health and led Sama to end its work with OpenAI in February 2022, earlier than expected. In the end, the content moderators in Kenya who worked for Sama for OpenAI suffered from various social and psychological impacts, such as:

Recurring visions and nightmares of graphic and toxic content, such as child abuse, violence, and suicide.
Depression, anxiety, post-traumatic stress disorder (PTSD), and suicidal thoughts.
Loss of family, friends, and social support due to isolation, stigma, and paranoia.
Low self-esteem, dignity, and motivation due to low pay, exploitation, and lack of recognition.
Fear of retaliation, dismissal, and legal action from the companies if they spoke up or sought help.

Conclusion: sacrifice a little fish to catch bigger fish

Let’s bring back the concepts of ethics and AI ethical framework.

Recognizing safety concerns with ChatGPT, OpenAI enlisted the assistance of Sama’s employees to enhance the chatbot’s safety. This aligns with AI ethical principles centered on ensuring AI systems’ safe development and maintenance. In this case, human oversight aimed to guarantee that ChatGPT delivered responses that adhered to safety standards for public use.

Utilitarianism: Utilitarianism involves making decisions based on maximizing overall benefit, as mentioned above. In the context of ChatGPT, a utilitarian perspective would assess whether the well-being of 180 million users enjoying a clean ChatGPT justifies the suffering of a few hundred content moderators in developing countries, considering the low cost ($150,000). From a utilitarian standpoint, OpenAI’s decision is deemed appropriate as the overall benefit outweighs the consequences, resulting in greater societal benefit in the end.

Virtue ethics: Those who adhere to virtue ethics prioritize integrity in every action and decision, shifting the focus from the “consequence” to the analysis of the action’s integrity. In the case of employing content moderators at approximately $2 per hour, despite the ensuing mental health struggles, virtue ethicists question whether there was a more morally integral way to reach the same conclusion. The answer to this ethical inquiry diverges significantly from a utilitarian perspective in this particular case.

Ethics Washing: Conclusively, should we categorize this incident as ethics washing? OpenAI adhered strictly to AI safety principles to ensure ChatGPT’s safety and cleanliness, achieving satisfactory results. However, the ethical aspects of the “how” in their approach may raise concerns for certain individuals. If the definition of ethics washing focuses solely on the AI system itself and aligns with utilitarianism, the answer to the question would be negative. Conversely, if the ethics washing concept encompasses broader perspectives, incorporating virtue ethics and the ESG concept, and if it is perceived that OpenAI hasn’t fully considered the social impact, the answer may be positive.