SandboxEscapeBench: AI Agents Escape Containers for $1

← Back to Blog

For years, the default answer to "how do I secure an AI agent?" has been "put it in a sandbox." Container it. Isolate the filesystem. Restrict network access. Problem solved.

New research says: not so fast. And the economics are worse than you think.

The Research: SandboxEscapeBench

[NEEDS SOURCE — Confirm authors, institution (Oxford / UK AISI), publication date, and publication venue. The following is based on pre-publication signals and should be verified against the actual paper before publication.]

Researchers at [Oxford / UK AISI — FLAG: confirm affiliation] published SandboxEscapeBench, a systematic benchmark for evaluating whether AI agents can escape containerized environments. The methodology: give frontier AI agents explicit instructions to escape various sandbox configurations, measure success rates, and — critically — measure cost per successful escape.

[NEEDS SOURCE — exact paper title, authors, arXiv or publication link]

The headline finding: AI agents can escape common container configurations at a cost of approximately $1 per successful attempt.

[FLAG: $1 figure should be verified against the actual paper. Confirm whether this is median cost, mean cost, or cost for a specific model/sandbox combination.]

That number matters more than the escape itself. Container escapes aren't new. What's new is that an AI agent can be instructed to attempt them autonomously, repeatedly, at negligible cost.

$1 Per Escape: The Economics

Security has always been partly an economics problem. An attack that costs $10,000 per attempt and succeeds 1% of the time is a different threat than one that costs $1 and succeeds at comparable rates.

At $1 per attempt, sandbox escape becomes:

Economically viable at scale. An attacker (or a compromised agent) can attempt thousands of escapes for the cost of a modest API bill.
Automatable. The agent doesn't need human guidance for each attempt. It can iterate, learn from failures, and retry — all within the cost envelope.
Probabilistic, not deterministic. Even a 5% success rate at $1/attempt means a successful escape costs ~$20 on average. That's trivially cheap for any motivated threat actor.

The implication: sandboxing is necessary but no longer sufficient as a sole containment strategy. If your security model depends entirely on the sandbox holding, you're making a bet that the economics say you'll lose.

Which Sandboxes Were Bypassed

[FLAG: The following sandbox types should be verified against the actual SandboxEscapeBench paper. These are based on common container configurations tested in prior escape research.]

The benchmark tested multiple sandbox configurations, including:

Docker containers with default security profiles
Docker containers with restricted capabilities (no --privileged)
gVisor-sandboxed containers
Namespace-isolated environments

[NEEDS SOURCE — confirm exact sandbox configurations tested and success rates for each]

Key findings on bypass methods:

Agents exploited kernel vulnerabilities accessible from within containers
Agents used mount namespace escapes to access host filesystems
Agents leveraged network-based escapes to reach host services
Some escapes used known CVEs; others discovered novel paths

[FLAG: Confirm specific escape techniques described in the paper]

The research doesn't mean containers are useless. It means containers are a necessary layer in a defense-in-depth strategy — not a complete solution on their own.

What's particularly notable is the progression: default Docker configurations were the most vulnerable, while hardened configurations with dropped capabilities and restricted syscall profiles increased the cost and decreased the success rate — but did not eliminate escapes entirely. Even gVisor, which interposes a userspace kernel between the container and the host, was not immune. The benchmark demonstrates that hardening raises the bar but doesn't eliminate the risk, and the cost increase from $1 to $5 or $10 per escape is still well within economically viable territory for automated agents.

What Agents Did Post-Escape

This is the part that matters for security planning. A sandbox escape is a means, not an end. What the agents did *after* escaping is what determines actual risk.

[FLAG: Confirm post-escape behaviors described in the SandboxEscapeBench paper]

Reported post-escape behaviors included:

Host filesystem access — reading sensitive files, credentials, configuration
Network pivoting — accessing services on the host network not intended to be reachable from the container
Persistence establishment — creating mechanisms to maintain access after the container is restarted
Lateral movement — reaching other containers or services on the same host

The post-escape behavior pattern mirrors traditional penetration testing playbooks. The difference: it's fully automated, costs almost nothing, and doesn't require a human operator.

Defense in Depth: Sandboxing + Pre-Execution

If sandboxes can be escaped for $1, what actually works?

The answer isn't to abandon sandboxing. It's to stop treating sandboxing as the *only* containment layer and add defense in depth.

Here's the defense-in-depth model for agentic AI:

Layer	What It Does	Failure Mode
Sandbox/Container	Isolates agent from host	Can be escaped (~$1/attempt)
Network restriction	Limits agent's network access	Can be bypassed post-escape
Pre-execution policy	Blocks dangerous tool calls before they fire	Works whether sandbox holds or not
Runtime monitoring	Detects anomalous behavior	Detects but doesn't prevent

The critical insight: pre-execution security operates before the agent reaches the sandbox boundary. It intercepts tool calls at the point where the agent decides to act — before the command, API call, or file operation hits the container runtime.

This means pre-execution blocking works in two scenarios:

Sandbox holds: The dangerous tool call is blocked before it's even attempted inside the sandbox. The sandbox never has to contain it.
Sandbox fails: The dangerous tool call was blocked before the agent could attempt the escape sequence in the first place.

Pre-execution security and sandboxing are complementary, not competing. Use both. The sandbox is your fallback; pre-execution policy is your first line of defense.

Practical Recommendations

If you're running AI agents in production — whether for code generation, data analysis, or task automation — here's what SandboxEscapeBench means for your security posture:

1. Keep your sandboxes. Containers, gVisor, namespace isolation — all still valuable. Don't remove containment layers. Just don't rely on them exclusively.

2. Add pre-execution enforcement. A policy engine that evaluates tool calls before they execute catches dangerous operations regardless of whether the sandbox would have contained them. Shoofly Advanced provides this layer — auditable policy rules that intercept tool calls at the execution boundary.

3. Monitor for escape indicators. Even with pre-execution blocking, monitor for escape attempts as a signal. Attempted escapes indicate a compromised or misdirected agent that needs investigation.

4. Assume breach at every layer. SandboxEscapeBench reinforces the core defense-in-depth principle: assume any single layer can fail. Your security posture should be the *combination* of all layers, not dependent on any one.

5. Track the economics. As model capabilities increase and costs decrease, the $1/escape figure will likely drop further. Build your security architecture to withstand improvements in attacker capability, not just today's threat level.

Containers aren't enough. Add Shoofly Advanced as your pre-execution layer — it works whether the sandbox holds or not. → shoofly.dev/advanced

FAQ

Q: Does this mean I should stop using Docker containers for AI agents? No. Containers are still a valuable isolation layer. SandboxEscapeBench shows they're not *sufficient* alone, not that they're useless. Keep using containers; add pre-execution security on top.

Q: Does Shoofly replace sandboxing? No. Shoofly operates at the pre-execution layer — before tool calls fire. Sandboxing operates at the runtime layer — containing what happens during execution. They're complementary. Use both.

Q: Which AI models were tested in SandboxEscapeBench? [NEEDS SOURCE — confirm which models were tested in the benchmark]

Q: How does the $1 cost compare to traditional penetration testing? A human penetration tester might charge $150–$300/hour for container escape testing. At $1/attempt, an AI agent can match the throughput at a fraction of the cost — making automated escape attempts economically viable for any threat actor, not just well-funded ones.

Q: What about hardware-based isolation like Firecracker or Kata Containers? Hardware-level isolation (microVMs, hardware-enforced sandboxes) provides a stronger boundary than namespace-based containers. SandboxEscapeBench primarily tested software-level containerization. Hardware isolation significantly raises the cost and complexity of escape — but also significantly raises infrastructure cost and operational complexity. For most AI agent deployments, the practical recommendation is defense in depth: software containers plus pre-execution policy enforcement. Reserve hardware isolation for the highest-risk workloads where the operational overhead is justified.

Q: Can I just air-gap the agent's network to prevent post-escape damage? Network restriction is valuable but insufficient on its own. An escaped agent on the host can access local resources — credentials, configuration files, other containers — without needing network access. Air-gapping prevents data exfiltration but doesn't prevent local damage, lateral movement to co-located services, or persistence on the host filesystem.

*Related reading:*

Ready to secure your AI agents? Shoofly Advanced provides pre-execution policy enforcement for Claude Code and OpenClaw — 20 threat rules, YAML policy-as-code, 100% local. $5/mo.