Samsung Fab Data Leak: How ChatGPT Exposed Sensitive Information
12-04-2023 | By Robin Mitchell
Recently, employees from Samsung Fab had been permitted to use ChatGPT, an increasingly popular AI model with numerous capabilities yet certain limitations, to aid in work tasks like coding, notes, and explanations, but over the course of 20 days, three occasions have led to sensitive data being stored on ChatGPT servers due to a lack of due-diligence by OpenAI and the company.
Engineers need to exercise caution when using ChatGPT in the workplace, especially for commercial products, as proprietary code may be subject to terms of service, code examination is crucial to catch mistakes, engineers hold responsibility for any errors, and adversarial attacks pose a threat to the security and integrity of AI models. Why has ChatGPT become immensely popular, what exactly happened at Samsung, and why should engineers be cautious of ChatGPT?
Why has ChatGPT become immensely popular?
While ChatGPT itself was released in June 2020, it has only been in the past few months that its popularity has skyrocketed. Thousands of articles have been published online exploring the abilities of ChatGPT, the challenges it presents, and how it could change the world, while millions of users each day interact with the predictive text engine to help out with daily tasks.
Despite ChatGPT only having knowledge of the world prior to September 2021, ChatGPT has numerous capabilities, ranging from writing articles to even writing code. As ChatGPT has been trained on millions of websites and user interactions, the results that it generates are exceedingly impressive and, more times than not, technically accurate.
Of course, there are a number of limitations that ChatGPT has, such as some of its generated responses being somewhat robotic (sentences are generally a few words, lack conjunctions etc.), certain responses being blocked (such as those that violate the terms and conditions of ChatGPT), and even a degree of scientific and political bias (which undoubtedly arises from the designers influencing ChatGPTs algorithms). Regardless, this hasn’t stopped the rise of ChatGPT in the workplace, and many now turn to ChatGPT on a daily basis.
Samsun Fab leaks sensitive data over ChatGPT
Recognising the advantages of ChatGPT, Samsung Semiconductor decided to permit workers to use ChatGPT, as it can be highly beneficial for coding, preparing notes, and even providing simplified explanations. However, it appears that Samsung Semiconductor either forgot to perform due diligence on the OpenAI product, forgot to inform staff on what can and cannot be submitted to ChatGPT, or both because a new report shows that Samsung Semiconductor employees have unknowingly submitted sensitive data to the AI language model.
Currently, it is believed that highly sensitive data which relates to internal business practices, source code, and even top-secrete methods have been submitted to ChatGPT on three separate occasions over a period of 20 days. While the employees would not have done this on purpose, the fact that ChatGPT records all conversations and learns from them could see this data leaked to other users. For example, if a Samsung Semiconductor employee uploaded a piece of proprietary test code, another user asking how to perform a similar test could be given access to this data.
In another case, a Samsung Semiconductor employee uploaded the conversation of an internal meeting, which included private talks relating to internal business operations and plans. Thus, it is also possible for this data to leak to other users who ask questions related to the subject.
In response to these leaks, Samsung has announced that it is planning to develop its own internal ChatGPT-like AI service that will help employees with daily activities. By doing so, the data used to train the language model will be held internally by Samsung, thereby protecting potential data breaches. However, until this new AI can be developed, Samsung Semiconductor has limited ChatGPT questions to 1024 bytes, which prevents long pieces of text and code from being copied and pasted without first examining its contents.
Why should engineers be cautious of ChatGPT?
There is no doubt that ChatGPT offers many benefits, and the continued interactions with users help to improve the quality of its results. However, engineers looking to utilise the power of ChatGPT should also be extremely cautious about using it, especially when working on commercial products.
Firstly, any and all questions sent to ChatGPT are stored on OpenAI servers for the sake of improving future results. This also means that new information presented to ChatGPT, such as proprietary code, could end up being shared with others. While this is purely speculation, it is possible that there is a clause or condition in the ChatGPT terms of service that effectively makes all data open to all users. Thus, it could be difficult to defend a piece of proprietary code in court that gets leaked on ChatGPT.
The second factor that engineers need to keep in mind is that while ChatGPT can write code, this code will need to be carefully examined before being used in a commercial product. Small mistakes made by ChatGPT can easily go unnoticed (such as boundary conditions in arrays), but these mistakes may not manifest themselves immediately. Despite ChatGPT having generated the code, the responsibility for that code would still fall under engineers who take that code and use it in a product.
Finally, AI models like ChatGPT can be vulnerable to adversarial attacks and manipulations. Engineers should take measures to ensure the security and integrity of AI models to prevent malicious actors from exploiting them. Simply put, the responses they generate are based on user interactions, and any bias in these interactions will affect the resulting output.