Protecting yourself against malicious open-source packages

An infectious surprise

In a cozy apartment an hour outside of San Francisco, a developer grabs her second cup of tea, opens VSCode, pulls updates from GitHub for her current project, and runs npm install. As she works, malware is infecting her machine; it finds her GitHub credentials and begins to infect every one of the repos she has commit rights to: which is almost everything. As her colleagues come online and begin to work, infected GitHub Actions begin to steal credentials to cloud providers, sending them who knows where. She pushes her changes and starts a PR, triggering those Actions workflows to run, infecting one of the CI runners long enough to steal deploy keys and AWS credentials.

She didn’t know that one of the dependencies her app relies on had been infected by Shai-Hulud. Not the Great Worm from the universe of Dune, but a particularly clever and nasty bit of malware that steals credentials and self-replicates. And the maintainer of that package didn’t know that. But she found out when her company’s SCA (one of the few on the market that will report malicious open-source packages) warned her and blocked the merge of her PR. By then, though, a lot of damage was done.

Why didn’t security tools stop the harm?

Application Security tools exist that designed to understand what open-source components your software relies on. These are typically offered under the category of Software Composition Analysis (SCA). Your typical SCA tool enumerates open-source components and reports when those components have vulnerabilities that could leave your applications open to potential attack.

The key there is potential. A software vulnerability is a potential risk. If you allow it into production, someone might find it, they might know how to attack it, and your operational controls might be inadequate to the task. Because of this, responding to SCA findings is typically an exercise in risk management. The tool finds a risk, gives information about its severity and general likelihood, and the security team applies that information to their environment, threat model and existing controls. Someone takes a decision whether to ask developers to fix the issue, and what priority the repair should receive.

But malicious packages are different. A great many deliver their harmful payloads the moment they’re installed. That’s not a thing that might happen: at that point, it’s a thing that has happened. An installed malicious package is not a mere vulnerability, but rather an attack in progress!

SCA wasn’t built to defend against this threat. Sure, good SCA tools can detect malicious packages whenever the application is scanned, and that’s a valuable layer of defense. But it occurs far too late to prevent malicious packages from being installed. Which means developer workstations and even CI/CD systems can be infected for hours or days before an SCA scan even has a chance to detect the problem. Something else is needed. Something more proactive.

Three core controls for protecting yourself against malicious packages

Since SCA tends to happen too late, and take too long, what can development teams do to protect themselves against being compromised by malicious open-source packages, and how can security help? A comprehensive posture requires a lot of nuance and careful planning, but it hinges on three main things:

Central management of dependencies using a package manager proxy
Proactive defense of environments where packages get installed, including developer workstations, CI/CD and other build environments, and (depending on your deployment approach) sometimes even production systems.
Continuous monitoring of production and pre-production environments, including the package manager proxy

These three core capabilities all require one common piece of technology stack: a system that lets you rapidly check whether packages you’re about to install are known to contain malicious content. If you can’t make this check before the package is actually installed, then you cannot defend your organization against the threat of malicious open-source packages.

Checkmarx Zero maintains the largest human-curated database of known-malicious and suspicious open-source packages. And we expose an API (the Malicious Package Identification API, or MPIAPI) that lets you accomplish this goal in a technology-agnostic manner, plugging into whatever systems you use to install dependencies and build and deploy your software. This same database also backs our Malicious Package Protection (MPP) add-on for the Checkmarx SCA product, meaning you can use the same database for both proactive defense and continuous monitoring.

Even if you don’t use Checkmarx, though, your approach remains the same. Let’s take a look at what adding defenses against malicious open-source packages to your application security or product security program looks like.

After you read this article, familiarize yourself in depth with the threat of malicious open-source packages (the Checkmarx field teams have helpfully created an executive summary and a free eBook discussing the issue in depth). This understanding will help you as you decide how to configure the following controls effectively for your organization.

Centrally-manage your dependencies with a package manager proxy

The single most important change you can make, if you haven’t already done so, is to create a “choke point” that allows you to centrally control which packages are available to install within your organization. Fortunately, there’s an entire product category that serves this need as well as providing private package registries; it includes well-known products like JFrog’s Artifactory, Sonatype Nexus Repository, and Azure Artifacts.

These products’ primary purpose is to house first-party artifacts: that is, deployable components your organization produces, enabling them to be installed by common package managers (like npm or pip) without requiring you to publish them in the public repositories. But they also can serve as a proxy to upstream package registries; and most products in this class allow you to set policies that serve as a filter for installing public packages. That means that when a developer or a build system runs a command like npm install <package_name>, instead of the npm command downloading from the public NPM registry, it will download from your cache system instead.

Inserting a private registry as a proxy between developers and public registries like npm allows for protective policy enforcement

At minimum, this package manager proxy feature will:

dramatically speed up your response time to supply-chain incidents: moving to a safe version can be as simple as re-building without code changes; and where it isn’t, use of the affected item is automatically brought to developers’ attention on next build, preventing the incident from spreading and driving remediation
prevent supply-chain regressions: once a dangerous package or version is blocked, no one can make a new installation of it
enable proactive control centrally: with identification systems (such as the Checkmarx MPIAPI) in place that can check package safety before fulfilling a request, you can defend your entire organization against malicious open-source packages in once place, reducing the risk that something malicious will slip through the cracks

Remember our developer from the introduction? Her project installed [email protected], one of the packages infected with the Shai-Hulud malware; she actually requested “any version of ngx-color that’s at least version 10.0.0”, and npm figured out that 10.0.2 was the newest version that matched.

After cleaning up the infection, she had to go through her project and change any reference to that package. Easy enough if her application was the one asking for ngx-color; but if it was a transitive dependency — a package requested by another package our developer asked for — then that can be a significant effort.

But what if her laptop had been set up to use, say, the organization’s Artifactory server instead of the public npm? Unless there was a proactive defense plugin in place, she’d still have gotten infected. But her post-cleanup task is much easier and much more reliable. She can ask the security team to block the malicious version and just re-run the installation. Now Artifactory will reject any request for [email protected] and provide the newest safe version instead. Even if something in her dependencies requests that version explicitly, the proxy will block the download and she’ll be able to quickly determine where she needs to make a fix.

And now that it has been blocked within Artifactory, no one else in the organization will be able to install it, meaning that the spread is stopped and regressions have been prevented. All it takes to make this the experience across the organization is setting up the proxy features, pushing a configuration to build systems to make sure they use it, and blocking direct access to the public repositories. Then simply block any package or package version that poses too much risk to be permitted.

Of course, we can make this better with proactive control. Remember that malicious packages can infect systems upon installation, so even though a proxy like this speeds the response and prevents regression, you still have an infection to clean up after. If Artifactory had our MPIAPI plugin installed and configured, then when our developer tried to run npm install, the plugin would have noted that [email protected] was known to be malicious and refused to let the proxy deliver it to her machine in the first place.

Proactively defend the SDLC everywhere you install packages

Developer workstations, CI/CD systems, and any other system that builds and packages your applications for deployment or distribution are potential targets for malicious open-source packages. In some environments, where production systems install open-source dependencies directly, production servers and containers may also be at risk.

Detecting the infection after it happens is an expensive way to operate; proactive defense is essential. This means using “dry run” or “simulation” features of package managers to determine what would be installed, checking to see if those packages or specific versions are malicious or suspicious, and blocking the installation in response. If the target system or container is prevented from installing packages without using your central package manager proxy, then proactive defenses in your proxy may be enough. Otherwise, individual systems become responsible for their own safety.

Most major package managers have a way to generate a list of packages that would be installed; for a couple of examples:

mvn has the -DdryRun=true option
pip has --dry-run as of pip 22.2; before that, --no-install may be available
npm and yarn (which also accesses the NPM registry) both have --dry-run and --package-lock-only

A build process that performs a dry run to get a list of packages that would be installed, then checks that against a database of malicious packages, can block the installation process before any malicious open-source packages are installed. And this works even if using the public registries, which makes it an excellent safety measure even when using a package manager proxy.

I have an open-source project cx-mpicheck (check it out on GitHub) that serves as an example of how to do this efficiently with pip, poetry, npm, pnpm, and go-mod projects.

Beyond that, a policy that delays the availability of new package versions a bit can be a valuable control. When using a package proxy, you can configure it so that new packages and new versions of existing packages don’t become available to your organization for, say, 48 hours. This gives the security research community time to identify malicious content, package maintainers time to notice that they’ve been compromised, etc.

A policy plugin can check the malicious package database via API call, blocking malicious packages; and enforce other policies, before deciding which packages to fetch and cache from the public registry

None of this can ever be perfect, but it provides a valuable layer of defense. The delay increases the chance that malicious code will be identified and added to databases of malicious packages, and the proactive blocking of things on that database lowers the chance malicious code will appear on developer desktops or in your applications.

Continuously monitor production and pre-production applications

Of course, no proactive defense is perfect. And defenses against malicious open-source packages are no different. Proactive defenses we’ve talked about can fail in a few ways:

Malicious packages are installed before they are known to be malicious or suspicious
A change to a build configuration or build system configuration may accidentally or intentionally bypass defenses
Malicious packages may be installed in on an unmanaged system that still has access to sensitive data; this is most likely to happen in a containerized environment where use of approved, managed images is not adequately enforced

Because of this, you still need a reactive layer of defense. And the most sensible place to put this layer is within or alongside your SCA scans. The Checkmarx SCA scanner handles this at an enterprise level by enabling the Malicious Package Protection feature across your organization; other SCA systems may require adjustments to scan configurations or the addition of a separate malicious package scanner.

Instrumenting SCA scans where they’re sensible within your SDLC, which is most often on merges to deployable code branches and on a scheduled basis for those same branches, is already an important vulnerability management step. Including malicious package checks at the same time allows you to react to malicious packages that may have slipped through your preventive controls.

If you have successfully deployed a package manager proxy as a sort of centralized “choke point” for open-source dependencies, then you can also create a powerful layer of rapidly-reactive defenses by routinely checking your list of cached package versions to see if any represent malicious open-source packages. This can be done in two basic ways, depending on your selected tools:

Generate an SBOM file containing all your open-source dependencies, and run your malicious package detection tools against this file. Not all proxy products make this easy, unfortunately, but it is generally at least possible; though in some cases it may require some 3rd-party open-source tools to fully complete. The Checkmarx SCA tool with MPP can be used for this: just set up an SCA scan with MPP enabled and select the SBOM file as the source. The resulting report will include known vulnerabilities and malicious packages that are cached by your proxy.
Generate a CSV or similar file containing the package repo name (like ‘pypi’ or ‘npm’), package name, and package version for each package version in your proxy’s cache. Feed this to an API like the Checkmarx MPIAPI to identify any malicious open-source packages in the list.

By routinely checking your projects and your centralized package manager proxy, you can quickly find out where you have a risk of infection in your organization and engage your response process.

Putting it all together

To adequately defend your organization against open-source risks, establish proactive detection systems like the Checkmarx MPIAPI, reactive detection systems like Checkmarx SCA with MPP, and set up a package manager proxy like Artifactory (ideally with both proactive and reactive controls monitoring it) to speed up response times and centralize control. Delay the availability of newly-published packages and versions to your organization to provide a “buffer” for security researchers to do their jobs. And familiarize yourself in depth with the threat of malicious open-source packages (start by reading an executive summary and a free eBook discussing the issue in depth, prepared by the experienced Checkmarx field teams).

Let’s look back at our developer story with this all in place. In a cozy apartment an hour outside of San Francisco, a developer grabs her second cup of tea, opens VSCode, pulls updates from GitHub for her current project, and runs npm install. She receives an error from npm letting her know that [email protected] was requested but isn’t found or was blocked by policy. The Shai-Hulud infection never happens.

She runs something like npm remove ngx-color ; npm install ngx-color@~10.0.0 --save and npm reaches back out to the proxy and gets the newest safe version of ngx-color in the 10.0.x tree (which happens to be 10.0.0 at that moment).

She runs her tests and takes a moment to make a pull request to update the project with the new version of ngx-color, so her colleagues won’t run into the same issue.

She doesn’t know that she was just protected from Shai-Hulud. She doesn’t know that her whole team was saved too. She just fixed a small problem in her project and got on with her day. And that is a real win for the security team that set up the controls that protected against malicious open-source packages.

Tags:

Malicious Packages

Open-Source Supply Chain

Software Supply Chain Security

Supply Chain Security