✍️ Founder’s Desk column by Abhishek Gupta, Founder and Principal Researcher at the Montreal AI Ethics Institute.
In a potentially prescient comment from a few months ago, I had shared thoughts with Kyle Wiggers (TechCrunch) on some of the ethical challenges related to AI systems like Stable Diffusion as they become more widely available:
“We really need to think about the lifecycle of the AI system which includes post-deployment use and monitoring, and think about how we can envision controls that can minimize harms even in worst-case scenarios,” he said. “This is particularly true when a powerful capability [like Stable Diffusion] gets into the wild that can cause real trauma to those against whom such a system might be used, for example, by creating objectionable content in the victim’s likeness.”
— Abhishek Gupta (TechCrunch)
Now, some of the worst fears have come to pass with the Internet exercising its collective faculties towards surfacing terrible inclinations from those who strive to find ways to corrupt the potential of any new piece of technology.
In a short-lived (?) Discord server called Unstable Diffusion, Internet users inclined to use the OSS version of Stable Diffusion for nefarious purposes wreaked havoc generating NSFW images by feeding the system with prompts leading to materials that would instantly be banned elsewhere, such as Instagram and Reddit. Such servers become hotbeds for accumulating a lot of problematic content in a single place, showing both the capabilities of these systems to generate this type of content and connecting malicious users with each other to further their “skills” in the generation of such content.
Users engaged in utilizing prompts with the explicit goal of triggering NSFW outputs, with some users exchanging tips with each other on subreddits on how to (a) get better quality images that aligned with their malicious goals, (b) use different prompting strategies to avoid some of the safety controls, and (c) preserve these outputs and model artifacts in case the server got banned.
While my examination of this egregious behavior stems from studying Reddit communities with broader community members interested in safe uses and ethics of such systems (and not the actual Discord server), three key issues stood out to me: consent and IPR, trauma, and the broader state of society. I also have some suggestions on preventing and mitigating negative outcomes from such systems at the end of the article for those involved in either building or using these systems including paying attention to combinatorial outputs, subversion of safety controls, and investing in accepting feedback from the community.
Consent and IPR
In instances where the model is fine-tuned further to produce targeted outputs (e.g., taking pictures from an individual and then tailoring the outputs that are in their likeness) by utilizing images of a victim/target, there are clear consent issues given the NSFW nature of the outputs. These concerns have been discussed in the past in the context of deepfakes, and they only get exacerbated in using generative systems that can produce infinite variations that subvert hash-based filtering and scaling issues that limit the production of a very large number of outputs in the use of deepfakes compared to generative models.
Another issue arises from the blatant violation of IPRs when such a system is trained on the outputs from artists whose consent is not explicitly captured, nor is there even an opt-out mechanism. In the case of using a system like this for the generation of NSFW outputs, there is an additional moral hazard where artists who would never want to have their work associated with such a genre of content might inevitably become collateral damage as malicious users realize that using the names of certain artists in the prompts leads to “better” outputs (e.g., “<NSFW prompt text> in the style of <artist name>”).
Those who are targets of this output face trauma that bears many similarities to that arising from deepfakes. There is an additional dimension here in how agency-limiting the deployment of these systems is for those who are targets/victims. In particular, given the wide variety of outputs that can be generated, highly targeted on the individual, the victim needs to engage in constant debunking, and some instances of generation might result in plausible outputs given the places they visit and the activities they engage in. With no recourse available to victims, at least at the moment, there is little that they can do to confront the malicious actors or remove the offensive content once it has been generated and begun to be disseminated.
Where such output might flood community forums and other publicly accessible venues, the proliferation of such content inflicts trauma on those who frequent those places. At the same time, they also exacerbate the burden placed on content moderation teams, who have to face trauma as they review and remove offensive content. In producing content that has variability and can easily evade existing defenses due to its novel nature, one can only imagine the additional influx of content that needs to be reviewed by these teams before they make their way across the online ecosystem.
State of society
And ultimately, what does this mean about society in general? Especially, when we have powerful capabilities with such new systems. Nefarious uses arise just as quickly as potential positive uses. Given the global reach of these technologies plugged in through the Internet and through a democratization of access via OSS implementations, GUI-based applications, etc., we are in an age where the scale and pace of impact are just far larger than the release of dual-purpose technologies before.
How we interact with these technologies says as much about us as it does about the technology itself and those who created it. There is always going to be a spectrum of people with good and bad intentions who will have access to these technologies, but weighing the positive and negative outcomes becomes even more important given the sensitive nature of the outputs and the severity of harm that can arise from the unintended and negatively-oriented use of these capabilities. It also sets a bad precedent for the upcoming generations inhabiting a world where co-creating with AI systems will be the norm. Some of the mechanisms that we develop during this crucial period will have a significant impact on the path that future governance of these systems will take.
What can we do?
Of course, it is easy to point out everything wrong with the current state of affairs. With “great power comes great responsibility.” Yet, I think there are a few actions that we can take to improve the state of the ecosystem and how these technologies are used:
We need to pay more attention to combinatorial outputs from these systems. For example, when we have an AI system exposed as a Discord chatbot and a thriving community to support it, it increases the accessibility of the system. It creates an ecosystem where both positive and negative outputs can be rapidly iterated via feedback from other members of the community. So, not just thinking about the technology itself but also how it might be used in combination with other pieces of technology like forums, APIs, etc., will be essential in assessing the ethical and safety impacts.
We need to also think about how safety controls might be subverted when you have an API-mediated version of the system that carries controls preventing misuse. Still, there is also an OSS version where people can tinker to disable safety controls and exchange tips and tricks with each other on bypassing mechanisms that prevent them from using the system for unintended use cases.
Finally, the companies developing and releasing these systems should invest in accepting feedback from the community and those who might be differentially affected by the outputs and the use of such systems so that we can create a safer and better ecosystem for all while truly being able to make the best use of the powerful capabilities that this new wave of technology has to offer us.
To learn more about my work, you can visit: https://atg-abhishek.github.io