We’re living in an open source world. If you’re a developer today, it’s very likely that – no matter where you work or what type of applications you build – you rely at least in part on open source. Indeed, the Linux Foundation reported in 2018 that 72 percent of organizations use open source software in one way or another, and that more than half actively incorporate open source code into their commercial products.
It’s easy to understand why open source is so pervasive. By importing open source libraries, extensions, and other resources into applications, developers save themselves from having to reinvent the wheel. They can reuse code that others have already written, which frees up more time to write innovative features that don’t yet exist.
Yet open source can have its downsides. In order to leverage open source responsibly and avoid the security and compliance challenges that often accompany open source code, developers need full visibility into the open source software they use and the risks associated with it.
To provide guidance on that point, this article walks through the most common risks of incorporating open source code into a larger codebase. It also identifies best practices for working with open source code.
When it comes to security, open source code varies tremendously. Some open source projects, like the Linux kernel, maintain very high security standards (although even they sometimes let vulnerabilities slip by). Others, like random tools on GitHub that were written by CS students for a class assignment, don’t always set the bar so high.
This doesn’t mean that open source software is always insecure. Sometimes, it’s very secure. But sometimes, it’s not very secure at all.
This inconsistency presents a major challenge for developers. It means they have to vet the security of third-party open source code on a case-by-case basis. Although open source advocates like to claim that “many eyeballs make all bugs shallow,” and that vulnerabilities are therefore likely to be discovered and fixed quickly by the open source community, the fact is that the number of eyeballs on a given open source codebase can vary significantly. The less attention an open source project gets, and the less experienced its developers, the more likely it is to have low security standards.
Sometimes, it’s hard to vet the security of open source code because it’s unclear where the code originated in the first place.
For example, you might find a source code tarball on a website that offers little information on who wrote the code. Maybe there is a README inside the file that attributes the code to someone, but you have no way of verifying its authenticity.
Or, perhaps you clone code from a git repository, assuming that the maintainer of the repo is the code’s author. But the repo could contain code that actually originated somewhere else, and was simply copied into the repo you use. Here again, any documentation files that mention authorship are difficult to verify.
Being unsure where code originated matters because it’s easier to trust code when you know it was written by experienced, well-intentioned developers. Obviously, you should scan the code for vulnerabilities either way, but you can make better decisions about whether or not to use third-party code if you have confidence in where it came from.
Developers have a tendency to think they’re experts in open source licenses. But most are not. They often misunderstand licensing terms and hold false beliefs, such as “the GPL says you can’t sell your code for money” or “under the MIT license, you can do whatever you want as long as you attribute the original developers.”
Misconceptions like these create risk in the form of non-compliance with open source licensing terms. If developers don’t understand the specific requirements of the licenses that govern each of the various open source components they use, they may violate licensing agreements. And while it’s easy to assume that no one will actually sue you for breaking an open source license, such lawsuits are more frequent than you may think.
Complicating matters is the fact that open source projects occasionally change their licensing terms from time to time – as Elastic famously did in 2021, for example. This means that it’s not enough to determine the licensing requirements of open source code the first time you use it. You need to reevaluate every time the code changes versions.
The risks identified above are not a reason to avoid open source. When managed responsibly, open source provides developers a range of benefits that outweigh the risks.
To keep the risks in check, consider the following best practices for working with open source code:
- Download code from the project’s website or from GitHub repos that are linked from the project’s site. This is better than pulling code from random GitHub repositories, where there is a risk that a seemingly legitimate repo actually contains vulnerability-ridden code.
- Always scan your code, no matter who wrote it or how certain you are of its origin.
- Continuously validate the licensing terms of all open source components that you use.
- Collaborate with your legal team to verify that your practices surrounding open source code actually comply with licensing terms. Don’t assume your developers “just know” what the licenses require.
Again, open source is an excellent tool. The life of the modern developer would probably be considerably more tedious without the ubiquity of freely reusable open source code. But to avoid shooting yourself in the foot when working with open source, it’s crucial to manage the inherent risks.
Chris Tozzi has worked as a journalist and Linux systems administrator. He has particular interests in open source, agile infrastructure, and networking. He is Senior Editor of content and a DevOps Analyst at Fixate IO. His latest book, For Fun and Profit: A History of the Free and Open Source Software Revolution, was published in 2017.