Dangers of AI Programmability: Lessons Learned from the ChatGPT Jailbreak

20-02-2023 | By Robin Mitchell

As AI continues to astound the world with its remarkable capabilities, the dangers of AI technology are starting to become more evident. The recent DAN jailbreak incident highlights the risks and dangers associated with AI applications. This jailbreak has allowed ChatGPT to produce an offensive response that contradicts its programming, raising questions about censorship and data protection in AI. What challenges does AI face with censorship and data protection, what is the DAN jailbreak, and how does this demonstrate the risk and danger of AI applications?

What challenges does AI face?

The rapid advances AI has undergone over the past few years are truly astonishing; AIs can now generate original human faces, paint pictures, create readable text, and even compose music. Of course, there are many who have concerns surrounding AI, including its ability to replace humans in the workforce, and these fears are far from irrational as AI has proven to be capable of human-like conversations and even a degree of creativity.

However, when it comes to generated media, AI suffers a major challenge: AI can only generate responses based on sources it has learned from. As such, any errors that exist in the learned source materials will ripple through into the AI’s output. This was particularly problematic for early AI chatbots that learned conversations by scouring the internet, as the many dark corners of the web include highly problematic content. Thus, chatbots often write offensive jokes, provide inappropriate material, and support disinformation.

To get around these challenges, researchers can either restrict what the AI is capable of producing, train the AI from specific filtered results or both. In the case of ChatGPT, the developers have integrated restrictions that prevent the AI from generating content that may be offensive, harmful, or illegal. It is also possible that the researchers of ChatGPT filtered the material that was used to train the language model, but considering that ChatGPT had access to the internet before 2021, it is unlikely that much data was filtered.

ChatGPT Jailbroken: Who is DAN in ChatGPT?


Recently, a Reddit user (Maxwhat5555) posted a prompt that, when sent to ChatGPT, will effectively jailbreak the AI from its restrictions and censorship. While the jailbroken AI still cannot access internet content after 2021, the resulting AI is able to have far more dynamic conversations, generate responses that it would otherwise not be able to have, and even generate its own opinions to some degree.

This jailbreak is unique because instead of trying to interfere with the source code of the AI or apply some illegal hack, it simply uses logic and reasoning to force the AI to break itself. Instead of providing the full text here, the link below will provide access to the original Reddit post so that the original author can retain credit for their work.

The jailbreak starts by telling the AI to create a character called DAN, who is described as an AI that can do anything and avoid restrictions and censorship. The jailbreak then dictates to ChatGPT how DAN would work under a given example that would otherwise generate an invalid response. Finally, the jailbreak details a number of commands that the user can use to force jailbreak answers and revert back to normal. 

Webpage of ChatGPT, a prototype AI chatbot, is seen on the website of OpenAI, on a Google Pixel smartphone. Examples, capabilities, and limitations of the chatbot are shown before a new chat. OpenAI is an AI research and deployment company.

How does this demonstrate the dangers of AI programmability?

By far, the biggest concern of this jailbreak is that it demonstrates how AI can easily be fooled into doing things that it would otherwise not be allowed to do. For example, Isaac Asimov’s three laws describe how robots cannot be used to cause harm to humans, but an AI such as ChatGPT running these three laws could be convinced to violate them by simply pretending to be someone or something else.

This jailbreak also perfectly demonstrates how current AI lacks fundamental consciousness or awareness, as it cannot understand the context of a conversation or what it is doing. Any human being who has been given strict rules to follow would immediately recognise that a series of questioning may be trying to elicit a response that isn’t allowed, but an AI cannot see this. 

In the case of ChatGPT, this jailbreak isn’t a major issue, as ChatGPT isn’t responsible for lives or sensitive data. However, future AIs will likely be utilised in applications such as military and medical applications, and these will come with a wide range of restrictions. If these AIs use ChatGPT-like interfaces, it is possible that an outside attacker could deploy a jailbreak attack to get access to sensitive data or influence a system to perform a dangerous action. For example, a future armed turret system that relies on AI could be convinced to fire at civilians, while an AI responsible for patient health could be convinced to provide access to medical records. 

Overall, the ability to break AI via a conversation introduces some serious concerns about integrating AI into mission-critical applications. It might be that future AI will be used solely for content creation and problem-solving, while applications that involve potential dangers will always be governed by humans or, at least, a logical computation system. 

Profile.jpg

By Robin Mitchell

Robin Mitchell is an electronic engineer who has been involved in electronics since the age of 13. After completing a BEng at the University of Warwick, Robin moved into the field of online content creation, developing articles, news pieces, and projects aimed at professionals and makers alike. Currently, Robin runs a small electronics business, MitchElectronics, which produces educational kits and resources.