Amazon develops AI to look for private data in open-source code
09-12-2021 | By Robin Mitchell
Amazon has recently developed a new software service called “Secretes Detector” that can look through publicly available source code and identify potentially private information. What challenges does open-source code and code repositories present, what will the new service do, and what measures can engineers take to avoid such mistakes?
What challenges do open-source code and code repositories present?
The battle between closed-source and open-source has lasted for decades and has much as either side hates to admit. They both have their own advantages and disadvantages. For example, closed-source software is arguably more secure as hackers do not have access to source code which hinders their ability to find flaws. However, bugs that make their way into the final release can only be fixed by those who own the source code, which may never happen.
Open-source is the exact opposite; making the source code publicly available allows anyone to study the code in-depth and find flaws more easily. However, open-source communities generally allow third-party contribution and suggestions to code which can see bugs identified and patched much faster. Open-source code also has the added bonus that users can see exactly what the code is doing by being completely transparent, which makes it very difficult to add malicious code.
The public nature of open-source code means that engineers need to be extremely careful about what is released. For example, IoT projects will undoubtedly use credentials such as API keys, usernames, and passwords, and so it’s essential that these must be taken out of files before publishing. This is easy to do if the files are hosted in one location where one user controls the published files, but the introduction of services such as GIT can make this extremely difficult.
Multiple people can have ownership of a project, all of which can push and pull code, and it only takes one of these individuals to have left in personal details. This problem is made worse when considering that services such as GIT have version control that keeps copies of older code and tracking changes. Thus, removing an API key found in some uploaded files could still be present on other pulled requests and tracked in version control.
Amazon releases Secretes Detection to CodeGuru
Last year, Amazon released an intelligent software manager called CodeGuru that helps users create high-quality code by checking for syntax, structure, and overall code quality. Recognising the security challenges faced by version control and open-source code, Amazon recently released its latest service to CodeGuru called Secretes Detection.
Powered by machine learning, the new system can go through code and identify potentially private information, including usernames, passwords, credentials, and API keys. Amazon hopes that the new system will prevent the accidental publication of such data, especially for widely used software. An example of what Secretes Detection could have prevented was the unintentional publication of AWS credentials by an Uber design engineer back in 2017.
Secrets Detection will be available to CodeGuru developers at no additional charge and is expected to be a significant game-changer in software version control. Furthermore, the new system will allow for checking code including Java and Python and configuration files and documentation.
What measures can engineers take to prevent accidental publication of credentials?
For developers, version control is an essential part of the development process. Accidental public exposure may occur when appropriate precautions aren’t taken while modifying code and pushing out new versions. It only takes one exhausted developer to make a few changes to the source code to correct a flaw and then push the latest version without deleting credentials.
One method that engineers can use is the separation of credentials from code and then creating a blank configuration file that cannot be updated. Users who pull source code have their own local copy of a credentials file which includes their own private data. When code changes are made and then published, the version control system ignores the credential file.
However, this method depends on engineers’ understanding of the importance of security and putting all private data into their own local credentials file. An engineer would still be able to overwrite variables that reference an external credentials file with an absolute value.
This is why software systems such as Amazon’s Secrets Detector can be powerful for engineers. If more than two engineers are working on a coded project, great care should be given to what information is stored, how it is stored, and where.