Automatic code execution is triggered upon downloading approximately one third of the packages on PyPi.
A worrying feature in pip/PyPi allows code to automatically run when developers are merely downloading a package. Also, this feature is alarming due to the fact that a great deal of the malicious packages we are finding in the wild use this feature of code execution upon installation to achieve higher infection rates.
It is important that python developers understand that package downloading can expose them to an increased risk of a supply chain attack.
When executing the well-known “pip install <package_name>” command, users may expect code to be run on their machine as part of the installation process. One source of such code usually resides in the setup.py file of python packages.
When a python package is installed, pip, python’s package manager, tries to collect and process the metadata of this package, such as its version and the dependencies it needs in order to work properly. This process occurs automatically in the background by pip running the main setup.py script that comes as part of the package structure.
The purpose of setup.py is to provide a data structure for the package manager to understand how to handle the package.
However, the setup.py file is still a regular python script that can contain any code the developer of the package would like. An attacker who understands this process can plant malicious code in the setup.py file, which would then execute automatically during the package’s installation. In fact, much of the malicious packages we are detecting contain malicious code in the setup.py file.
What if we just download the package rather than install it?
In addition to the “install” command, pip provides several more options, among them is the “download” command. This command is intended to allow users to download packages’ files without the need to install them.
There could be various reasons someone would need this. For example, a developer may want to look into the package’s code before using it. A user may want or need to perform a security check, or perhaps even observe the setup.py file for any anomalies.
As it turns out, executing the command “Pip download <package_name>” will run the setup.py file, as well as any potentially malicious code contained within it. It may surprise you, but this behavior is not a bug but rather a feature in the pip design. Users who intentionally only download a package do not expect code to run on their system automatically.
As a matter of fact, this concern was expressed in an issue from 2014 on the pypa project https://github.com/pypa/pip/issues/1884, yet it was not addressed, and the issue continues to exist to this day.
The .whl file type
Python wheels are essentially .whl files that are part of the Python ecosystem and bring various performance benefits to the package installation process. But that is not the only thing that wheels bring to the table. In the past, when python code was built into a package, the result would be a tar.gz file that would then be published to the PyPi platform. tar.gz files include the setup.py file which is run upon download and installation.
But suppose you've recently tried downloading or installing a Python package using pip. In that case, you may have noticed Python supplying you with a .whl file. The reason for this is when developers build a python package using, for example, the "pip -m build" command, in newer pip versions, pip automatically tries to create a secondary .whl file in addition to the tar.gz file, which is then published together to the Python Package manager platform. When a user downloads or installs this package, PIP will by default deliver the .whl file to the user's machine. The way wheels work cuts the setup.py execution out of the equation.
Why is the setup.py still relevant?
Even though pip defaults to using wheels instead of tar.gz files, malicious actors can still intentionally publish python packages without a .whl file. When a user downloads a python package from PyPi, pip will preferentially use the .whl file, but will fall back to the tar.gz file if the .whl file is lacking.
Is there anything you can do about this?
Currently, there are actions users can take to prevent automatic execution upon package download. One action is checking the package file contents at https://pypi.org/project/<package>/#files and observing if a .whl file is present. If there is a .whl file, the user can feel confident they will receive the .whl file, and no code will be executed on their machine.
If there is only a tar.gz present, a user can use a safe method of download such as working directly with PyPi's "simple" API: https://pypi.org/simple/<package-name>/. For example, when using the package listed above, prp1, a user can download it from the following link https://pypi.org/simple/prp1/.
Code execution upon installation is one of the features attackers use the most in open-source attacks. Developers opting to download, instead of installing packages, are reasonably expecting that no code will run on the machine upon downloading the files. However, PyPi includes a feature allowing just that—code execution on the user’s machine when all that was requested was a file download.
It is possible to protect yourselves from suspicious package by following the steps detailed above.
To learn more about how Checkmarx is helping secure the open source software supply chain, download our white paper: Don’t Take Code from Strangers – An Introduction to Checkmarx Supply Chain Security