Why Open-Source Projects Need to Address Dependency-related Security Risks
13-02-2023 | By Robin Mitchell
Open-source software and hardware projects are becoming increasingly popular, but their complexity and large supply chains bring new challenges for engineers in terms of cybersecurity. With the growing threat of cyber attacks, it's important to understand the security issues posed by dependencies and how the future of open-source projects can mitigate these risks. PyTorch, a popular open-source platform for Python, is just one example of the potential benefits and drawbacks of open-source projects at scale. Open-source projects have the potential to provide innovative solutions but also come with risks that must be carefully considered.
Why is open source becoming increasingly popular?
As technology continues to progress, open-source solutions are becoming increasingly dominant. Agricultural industries that have historically been tied to manufacture-specific solutions that lock out individual developers are being challenged, software companies are shifting their focus to open-source solutions in an attempt to demonstrate security and privacy, and even large businesses (such as IBM) who have garnered success on closed-source solutions are now even joining in the open-source movement.
But why exactly has the open-source movement proven to be a modest success? Many would be quick to suggest that the free nature of open-source hardware makes it popular with those looking to save money, and there is undoubtedly some truth in this. However, considering that the vast majority of people continue to use paid solutions (such as MS Office over LibreOffice) provides counter-evidence to this motive.
Instead, the rapid growth in open-source projects is more likely to reside in reduced development time, freedom, and the ability for modification. For example, developing a new IoT product requires a processor platform to be designed, code written to power sensors, and a protocol to transmit data across the internet. While all of this could be entirely custom, the long development time of hardware and software design would make such a project unnecessarily expensive, especially if open-source solutions already exist.
Furthermore, making an IoT product compatible with pre-existing solutions (such as protocols) allows for that device to work with other manufacturers, thereby providing customers with more flexibility. Thus, a project that would otherwise take 3 months can be compressed into a matter of days (or even hours), saving time on design, coding, and infrastructure developments. In addition, using open-source solutions also means that others have already addressed potential bugs and issues, and the open nature of open-source projects usually sees these bugs reported. As such, challenges faced during development will be more likely solvable.
Finally, the use of open-source software also provides a great deal of trust to customers. Simply put, making the vast majority of components available for the public to view means that trying to integrate spyware and other malicious devices is virtually impossible. Thus, customers can typically trust open-source solutions, especially since security flaws are more likely to be spotted, reported, and fixed.
What challenge does dependencies introduce?
While open-source software and hardware projects may have the benefit of public exposure, the increasingly complex nature of these projects introduces one problem in particular; dependencies.
Generally speaking, open-source software projects are either written from scratch or are dependent on other libraries, and these libraries will, themselves, be open-source. However, an open-source project will likely post links and references to dependencies instead of including source material of dependencies. For example, installing Python libraries via PIP with dependencies doesn’t download a single file from a single repository but instead builds a list of dependencies which are then installed individually.
This introduces a major security risk to modern projects with a long list of dependencies, especially if those dependencies have dependencies. A single dependency that is infected with malware not only has the ability to go unnoticed in a project but can infect any project associated with it. One such example of a dependency being injected with malware was the recent PyTorch attack. PyTorch, a powerful AI tool, requires numerous dependencies, and one of these, called torchtriton, was replaced with an infected version. As such, thousands of users who rely on nightly builds were immediately affected by the malware.
While the project was quickly addressed, the ability for malware to be slipped in unnoticed demonstrates the dangers posed by long dependency chains. This was similar to a case where researchers had inserted potentially malicious code into the Linux kernel to demonstrate how open-source projects can be abused.
How could future open-source projects deal with dependencies?
Trying to solve this problem is not easy, as dependencies are out of the control of users. It is possible to pull a specific dependency version and package that with a project, but this may violate licenses that restrict redistribution. One workaround is to minimise the use of dependencies as much as possible, but this is highly unlikely in this day and age.
Another possible solution is the use of AI-powered tools that can recursively scan code for malware. Detected dependencies would be automatically downloaded and checked, and the recursive nature of such a check would quickly see an entire project scanned for all malware. However, developing AI tools to identify malware isn’t entirely obvious, especially when new malware may be difficult to identify.
It is also possible to introduce certification and digital signatures, similar to SSL certificates, whereby a central authority is able to verify the contents of a dependency. Any changes to the code would change that source’s checksum, which would immediately change its digital signature. However, the speed at which software is updated could make this challenging to implement. For example, adding malware to a library and calling it a new version would, by nature, have a new checksum, and this would be registered with the central authority.
Overall, ensuring that all dependencies in a project are free from malware is a challenging task but one that needs to be undertaken by engineers.