The Glass Sandbox - The Complexity of Python Sandboxing

You can’t write any significant Python application without thinking about scope. From your first ‘Hello World’ to those complex production applications every Python user, regardless of experience, interacts with it. Developers love using scope to build sandboxes — controlled execution environments that isolate code and prevent it from accessing resources outside its designated boundaries. This sandboxing pattern is commonly used when evaluating user-provided input to protect the system from potentially malicious code.

But here’s where it gets interesting — Python’s object system creates a critical security blind spot when building sandboxes for code evaluation. What looks like properly restricted environments can actually be bypassed entirely, meaning the sandbox is as fragile as glass. This occurs because Python’s interconnected object hierarchy provides alternative pathways to data access. In some cases, this leads to the creation of Remote Code Execution (RCE) flaws, even when scope access has been carefully limited by the developer. These RCE flaws resulting from such “glass sandboxes” are typically a variant of CWE-94 (Code Injection).

Why aren’t scope restrictions alone enough to keep your Python code secure? And why must security practitioners be especially vigilant when reviewing code that evaluates user input, looking for insufficient sandboxing that relies solely on scope restrictions?

Let’s use pandas.eval() — a method from one of Python’s most widely-used data analysis packages, pandas — to demonstrate how Python’s object system can create conditions for RCE vulnerabilities even in seemingly well-restricted environments.

But first, let me quickly give a huge shoutout to Duarte Santos, currently at Flutter, who uncovered this security gem during his time at Checkmarx. Check out his original findings about Pandas on his personal blog; it initially raised some of the questions we’re exploring here.

Under the Hood – Understanding Object System Traversal Risks

In Python, everything is an object – from simple integers to complex functions. Even when you write a basic program, you’re interacting with Python’s rich object system:

What’s more interesting is how these objects are connected. Every object in Python maintains references to other objects through its attributes and methods, forming a chain or hierarchy of accessible objects. The key to understanding why sandboxing in Python is challenging lies in understanding this object hierarchy. When developers attempt to create isolated execution environments by restricting scope, they’re implementing a pattern that works well in many programming contexts but fails when used with Python’s object model. In Python, every object inherits from a base object class, creating a path that can be traversed. Yet Python’s object system does not have a way to enforce encapsulation — there are no truly “private” methods, for example — so there is no way to enforce restrictions on code outside an object accessing that object’s whole hierarchy, including its data attributes and methods. Let’s see this in practice:

As can be seen from the example above, we were able to climb up from string and function to the base object with the help of “dunder” (double-underscore) methods. These dunder methods are particularly interesting from a security perspective because they can also be used to access sensitive functionality. We will see this later.

At this point, we showed that every Python object has the same ancestor — a base object class. But how are they connected? Let’s explore this using the example below:

Starting from our simple little string, we climbed up the object hierarchy. First to its class (str), then to the base object class that everything in Python inherits from. Once we’re at the base object, we can see all its subclasses.

Among these subclasses, we found the BuiltinImporter — a class that can load any built-in module, which gives us access to pretty much everything in our Python environment.

From there, we can jump into any namespace. In our case, we used it to call our some_function(), and just like that we were able to successfully invoke our function simply starting from our innocent string.

This object system is a fundamental feature of Python that provides powerful capabilities. However, a critical security implication emerges whenever you incorporate code you haven’t thoroughly reviewed — whether from public package managers, internal repositories, or even code snippets found online. You’re not just getting the functions you asked for — you’re getting an entire tree of dependencies, each with its own objects, methods, and connections to the base object system. And just like we saw in the example, these same capabilities can be leveraged to access system functions that weren’t meant to be accessible, potentially leading to arbitrary code execution.

Sandboxing in Python is Challenging — A `pandas` Example

The interconnected nature of Python’s object system poses a fundamental challenge for sandboxing. When developers try to create restricted environments for the execution of user input, they often focus on limiting scope of access the evaluated code will have by carefully controlling the data scopes of variables. This is good practice, but it’s insufficient — Python’s object system offers multiple paths to traverse between objects that can bypass these restrictions.

Let’s look at pandas.eval() as an example. It extends Python’s built-in evaluation capabilities for data analysis while trying to create a “safe” environment through multiple layers of control:

At first glance, this looks solid — we’re controlling both local and global scopes. But here’s where it gets interesting. Even with these restrictions, we can still execute arbitrary code:

Why does this work? Because the payload doesn’t need any variables from either local or global scopes — it starts with an empty tuple `()` (which is just a Python object) and climbs up through Python’s object hierarchy using dunder methods until it reaches Python’s base object. From there, it can access all subclasses of this base object — which is essentially everything. Ultimately, it finds a way to load modules and execute system commands, opening the door to an RCE. Even with both local and global scope restrictions in place, we can’t restrict this fundamental object system that every Python object inherits, making it possible to bypass any attempted sandboxing.

When Santos and Checkmarx reported this behavior in pandas.eval() to the pandas maintainers, they explained that pandas.eval(), like Python’s eval(), was never designed to be a “safe” sandbox for untrusted input.

At this point, you’re probably thinking: “Alright, I get it – I just need to create a blocklist of these dunder methods that allow object traversal, and I’m good to go, right?” Well, not quite — blocklists almost never work as a robust defense to injection.

This is because a fundamental challenge that exists in Python (and every other language) is that the language itself is constantly evolving. As Dor Tumarkin demonstrated in his research on Hugging Face security, even creating an exhaustive blocklist of potentially dangerous methods today doesn’t protect against what tomorrow’s Python update might introduce. Tumarkin’s research identified an assumption in the Pickle protocol that attempted to protect a sandbox against a specific Pickle behavior — but this behavior didn’t exist in newer versions of the protocol. This allowed attackers to bypass detection by simply changing to a newer protocol version. This is an excellent example of how blocklists can create “unknown unknowns” — and create a false sense of security.

This means that any sandbox implementation must not only defend against known attack vectors but somehow anticipate how future Python features might be weaponized — an essentially impossible task.

When you combine all of this — the inherent “traversability” of Python’s object system due to missing encapsulation in conjunction with constantly evolving language features — it becomes clear that what developers attempt to build is merely a “glass sandbox.” The well-meaning scope restrictions create visible boundaries that give the illusion of security but still allow attackers to break through.

Building Safer Code Execution Environments

Given the above challenges, what options do developers have to create a more protected environment for code evaluation?

First things first: if you’re a security professional, you probably have “Never Trust User Input” tattooed somewhere (possibly figuratively). Yet, we see applications trying to sanitize, filter, or restrict potentially dangerous input. As we’ve demonstrated with pandas.eval(), attempting to create a safe sandbox for untrusted input using scope restrictions simply doesn’t work.

As AppSec Engineers you should extend the focus beyond what developers implement: audit codebases for any use of eval(), exec(), compile() or similar functions – these are immediate red flags. Look for custom sandboxing attempts that rely solely on Python’s scope restrictions; these “glass sandboxes” create a dangerous illusion of security.

Checkmarx Static Analysis Security Testing (SAST) tool can help identify these patterns at scale. You can configure custom rules to detect dangerous evaluation patterns in Python (and any other language we support) and trace data flow from user inputs to evaluation functions. Checkmarx’s SAST capabilities are particularly valuable for identifying vulnerable pathways, especially when they cross multiple functions or modules.

Given that true sandboxing in Python is challenging, what can developers do to secure their applications? Here are three practical strategies:

Favor input parsing over dynamic code evaluation

Incorporate isolation and segmentation in your design

Practice continuous security monitoring

Favor input parsing over dynamic code evaluation

The simplest and most effective approach – avoid dynamic code evaluation entirely. Your first question should always be: “Do we really need to evaluate user input as code?”

In most cases, you can replace dynamic evaluation with purpose-built parsers that only understand exactly what you need. Essentially, use allow-list approaches instead of block-list approaches. For mathematical expressions, consider libraries that can parse formulas without eval(). For data manipulations, translate user inputs into specific operations you control.

This approach requires more development effort initially but eliminates an entire class of vulnerabilities that are difficult to remediate later, so it saves time and effort in the long run.

Incorporate isolation and segmentation in your design

Don’t rely on Python’s scope restrictions – use real isolation boundaries. Implement OS-level process isolation, network segmentation, and proper access controls. Python’s internal restrictions can’t protect you from object system bypasses, so you need to look beyond language-level features.

For smaller applications, the subprocess module offers a practical approach – you can run evaluation code in separate processes with restricted privileges. For larger applications where the risk justifies additional overhead, containerization provides stronger isolation.

Network segmentation also plays a crucial role. Consider a simple design where your web server accepts and validates requests and then passes calculations to a separate evaluation service on an internal network. This evaluation service should have no direct internet access and communicate only through specific, controlled channels. Even if attackers breach your sandbox, they remain trapped in a restricted network with limited ability to reach sensitive systems or data.

Practice continuous security monitoring

Keep your security controls updated with Python’s evolution. New language features mean new ways to interact with the object system – and new ways to bypass your protections. Remember that in many cases, you won’t control which exact version of Python is interpreting your program. That means your controls need to be effective against any version of Python under which your code is able to run.

Summary: Avoid the Glass Sandbox In Your Python Applications

Despite the best intentions of developers who attempt to build safe sandboxes for evaluation of user-supplied data, Python’s object hierarchy implementation allows attackers to easily escape these sandboxes. This can lead to code injection flaws, which can in turn lead to Remote Code Execution vulnerabilities in your applications.

Developers should be aware of this challenge and use safer approaches, such as custom parsers or alternative design approaches that avoid code evaluation entirely. Developers can work with Application Security teams to ensure that their SAST solutions have appropriate rules to catch mistakes like this in Python (Checkmarx One can help with this). And everyone from Developer teams to AppSec teams to Architecture teams should apply architecture and infrastructure controls that can limit the damage should a sandbox flaw — or any other type of code injection pathway — escape to production.

The Glass Sandbox – The Complexity of Python Sandboxing