After the SolarWinds software supply chain incident was discovered in 2020, regulators and security leaders began paying much more attention to software supply-chain risks. One major result of this increased attention is increasing regulatory and contractual compliance requirements for software vendors to maintain and provide Software Bills of Materials. SBOM documents are automation-friendly data files that identify all of the open-source and other third-party libraries and similar components that are included in a particular piece of software; there are a few standard formats for these files, most popular being OWASP’s CycloneDX and the ISO’s SPDX®.
But processes for producing, maintaining, and distributing SBOMs are often inconsistent. There is unfortunately very little standards work on SBOM inclusion and distribution. The Python maintainers, though, have just made a big stride forward for their own community, and provide a potential model for open-source software (OSS) producers to package SBOMs with their packages — they just adopted guidance known as PEP 770 – Improving measurability of Python packages with Software Bill-of-Materials that standardizes how package maintainers provide SBOM documents in a consistent and automation-friendly way.
And of course, with any new capability, there are risks—adversaries have potential paths to use SBOM inclusions maliciously, and even legitimate actors can make errors or act unscrupulously.
Why distribute SBOMs?
If you’re thinking about SBOMs, it’s probably for compliance. Many governments are either requiring or planning to require software vendors to provide SBOMs in various cases, and large contract-based compliance programs (like the Payment Card Industry standards) have started to reflect this as well. And in some cases, private buyers are favoring vendors that can provide SBOMs over those that don’t, even when compliance isn’t in play for them.
- In the USA, Executive Order 14028 (2021) directed the National Institute of Standards and Technology (NIST) to develop government-wide software security standards. This EO explicitly requires SBOMs for software purchased by the US Federal Government.
- The US FDA (Food and Drug Administration) extends SBOM requirements to medical manufacturers, even if they’re not selling to the government
- The US CISA (Cybersecurity and Infrastructure Security Agency) provides detailed SBOM requirements [pdf]
- The Payment Card Industry Data Security Standard (PCI DSS) has begun to require SBOMs for card processing systems
- The EU’s Cyber Resilience Act (CRA) has provisions that will require many software vendors to provide SBOMs
And that’s just a few of the major requirements in play that mean just about anyone that sells software — even if that software is provided as a service over the web — is likely to have to collect, generate, and distribute SBOMs for their software. In fact, that’s why Checkmarx has SBOM generation capabilities baked into our SCA (Software Composition Analysis) product in the first place.
But there’s still a need to be able to consume and manage SBOMs provided by other 3rd parties (like your upstream vendors) and pass them on to your customers and share them with auditors. And while SBOM management products can help you with managing your SBOMs once you have them, you still need to reliably get SBOMs from others in the first place. You need to be able to find and ingest 3rd-party SBOMs effectively, so anyone who provides software to you — whether that’s an OSS project or a commercial vendor — needs a systematic way to give them to you.
What is Python PEP-770, and how does it affect me?
A PEP is a Python Enhancement Proposal — they’re a Python-specific RFC-like system for people to propose standards for Python and discuss them with stakeholders. When a PEP is Accepted (and eventually becomes Final), it becomes a formal part of the standards that define Python.
PEP-770 (Improving measurability of Python packages with Software Bill-of-Materials) provides a standard way for people who produce Python packages — OSS or commercial, from libraries and frameworks to full application stacks — to include SBOM documents in those packages. This will allow consumers of those packages to automatically find and ingest SBOMs.
Key things for you to know:
- Authors can provide SBOMs in any format by including them in the .dist-info/sboms directory within the package; but UTF-8-encoded, JSON-based variants are strongly encouraged
- The standard doesn’t require an SBOM be provided, just a mechanism for doing so
- The standard doesn’t require any particular SBOM format, but strongly encourages a widely accepted, standard format such as CycloneDX or SPDX
- The standard defines fields that are considered mandatory, but stops short of an absolute requirement that they be present (using SHOULD language over MUST, as defined in RFC-2119)
If distribution of SBOMs in line with PEP-770 becomes reasonably common, it will help with research activities, allowing researchers to inexpensively get preliminary data about the composition of packages to identify trends and speed analysis. But if not consumed carefully — if the SBOM attachments or their contents are treated with too much trust — it could also cause issues for consumers, ranging from a false sense of security to increased vulnerability to supply-chain attacks.
If you consume or redistribute Python packages
You should be aware that packages may begin to contain SBOM data and determine if it is relevant or useful for you to ingest that data. If there are privacy concerns with advertising SBOM data, you may need to consider strategies for removing this data from installed packages that you are redistributing — though that should be a relatively unusual consideration.
And you should also be aware of the potential for misinformation and malice: from errors in SBOM data (accidental or deliberate) to attacks on SBOM ingestion tools. Included SBOMs should be treated with care, and contents should be verified with a level of assurance appropriate to your needs and threat model.
If you produce Python packages
You should consider including SBOM documents relevant to your distribution to support the compliance needs of those downstream. This may not be suitable for all producers! Things to consider:
- Accuracy is essential; if you cannot produce an accurate SBOM, it’s better not to provide one. This is not always easy to do outside of a binary distribution, since source distributions may specify ranges of dependencies that can fundamentally alter the composition of the software on a user’s machine.
- Scope matters a lot. For example, if you distribute an application that includes private Python packages, should the Application distribute all SBOMs at its level, or should the individual packages be left to supply their own SBOMs? This is a key consideration in making sure you’re providing useful data.
- SBOMs add maintenance considerations. Each new release should contain an updated SBOM; that probably means maintaining automation to generate them. That automation can become outdated, break down, etc.
Content signing can be a useful addition to increase trust for your consumers. Signing your packages with sigstore or a similar system, and/or including public-key signatures alongside your SBOM documents, can help your consumers increase trust that the SBOM data was produced by you and not an adversary. But signing and verification are hard to get right, and increase overhead for producers as well.
“Buyer Beware”: risks and limitations of distributed SBOMs
Of course, SBOM distribution is far from a panacea. Despite the ways it can be helpful, there are also some risks associated with consuming SBOMs provided by any third party, and PEP-770’s implementation doesn’t mitigate several key ones. Not that it should, mind you. But it’s important to be aware of a few things.
Inaccuracies, misinformation, and attacks are possible within SBOM management — and this an area of interest for Checkmarx Zero that you can expect to see future discussion about. For now, consider the following:
SBOM inclusion is optional — PEP-770 doesn’t require package maintainers to provide an SBOM document. Which means your SCA tooling will continue to need to handle analysis of packages to determine their dependency tree regardless. There will always be packages that choose not to provide that information.
Providers (and adversaries) can lie — one big weakness inherent to relying on SBOMs is one of trust. Absolutely nothing stops a producer of an SBOM from simply including flat out lies in the SBOM document. This opens up a potential supply chain attack pathway: if people start to overly trust PEP-770 SBOMs, adversaries can hide malicious or known-vulnerable package versions in the dependencies of a package, then use the SBOM to lie about which version is included.
Mistakes and inaccuracies are possible, even likely — there’s no guarantee that even good-faith processes are producing an accurate SBOM (version ranges of deps, etc.). It can be so easy to make a mistake, fail to update an SBOM when the dependency tree changes, etc. Consider that there are distribution choices that can mean the version of a dependency might be resolved properly only when you actually install the package. The SBOM is static and can’t deal with that case. Will maintainers properly ensure that every time they push a new distribution to PyPI the included SBOMs are properly updated?
The SBOMs themselves are a possible attack vector — a PEP-770 SBOM is still just a file. It’s a data blob that needs deserializing. And that’s an attack vector for SBOM consumers. If a popular SBOM consumption tool has a vulnerability in the way it processes common SBOM formats (which are XML or JSON files), then data-deserialization attacks like XXE Injection (CWE-611) become a real opportunity for adversaries. Imagine installing a package, scanning it for safety, and then having your SBOM ingestion tool get popped by the included SBOM…
Sure, some of this can seem a little paranoid. And it certainly seems inappropriate to be fearful of PEP-770 because of these types of risks. But it’s also important to realize the limits of this approach and ensure that you’re taking steps to ensure the accuracy of your insights into your supply chain.
Conclusion (TL;DR)
PEP-770, which defines a standard for including SBOM documents in Python packages, is a great step. Standardizing distribution of SBOM data has some notable advantages for compliance programs and researchers. But SBOM distribution comes with risks that are important to understand and address if you intend on consuming that data, or if you’re going to consider distributing your own Python packages with PEP-770 compliant SBOM documents.