Is Claude Code Dispatch Safe? Here's What Anthropic Says (And What You Still Need)

← Back to Blog

You got the email. Maybe you saw it on Hacker News first. Anthropic just shipped Dispatch, /loop, remote control, and computer use — and suddenly your Claude Code session can run tasks while you sleep, respond to messages from your phone, and click around your actual desktop. If you're a developer, this is genuinely exciting. Background agents that handle the tedious stuff while you focus on architecture decisions? Yes please.

But before you fire up /loop on a Friday afternoon and close your laptop for the weekend, it's worth understanding what Anthropic's own documentation says about Claude Code Dispatch security — both what's covered and what isn't. Not because these features are bad. They're not. But because the security model was designed for interactive sessions where a human is watching, and these features deliberately remove that human from the loop.

Here's the honest breakdown.

What Dispatch, /loop, Remote Control, and Computer Use Actually Do

These four features turn Claude Code from an interactive pair programmer into an autonomous agent platform. Here's what each one does:

Dispatch lets you assign tasks to Claude from anywhere — your phone, a browser, another device. Claude picks them up and runs them as background sessions on your dev machine. As Anthropic's docs put it, "a task can end up as a Code session in two ways: you ask for one directly... or Dispatch decides the task is development work and spawns one on its own."

/loop (and scheduled tasks) lets Claude re-run a prompt automatically on an interval — monitoring a build, checking test results, running recurring maintenance. Cloud scheduled tasks run fully autonomously with no permission prompts. Desktop scheduled tasks are configurable. The /loop command inherits your session's permission mode.

Remote control connects a web interface to your local Claude Code process. Your session URL is the single authentication factor — anyone with the URL controls the session with your full local permissions.

Computer use lets Claude open apps and control your screen — clicking, typing, navigating your actual macOS desktop. It runs outside any sandbox, on your real desktop, with your real credentials visible on screen.

Each feature runs under whatever permission mode you've configured. There is no separate security tier for unattended execution.

What Anthropic's Safeguards Actually Cover

Credit where it's due — Anthropic has built real security infrastructure, and they've been more transparent about its limitations than most AI companies.

Two-stage auto mode classifier. Claude Code's auto mode uses a transcript classifier that operates in two stages. Stage 1 is a fast single-token filter, intentionally biased toward blocking, with an 8.5% false positive rate. Stage 2 applies chain-of-thought reasoning only when Stage 1 flags something, reducing false positives to 0.4%. The classifier sees only user messages and tool calls — Claude's own reasoning is stripped out to prevent the agent from persuading the classifier. (Anthropic Engineering Blog)

OS-level sandboxing. When enabled, Claude Code uses macOS Seatbelt or Linux bubblewrap for process isolation. Anthropic reports that sandboxing reduces permission prompts by 84% in internal usage.

Isolated context windows. Web fetch operations use a separate context window to prevent injected content from entering the agent's primary context — a meaningful architectural defense against indirect prompt injection via fetched content.

Computer use safeguards. Dispatch-spawned sessions get a 30-minute approval expiry on computer use permissions, versus session-length approval for regular sessions. There's per-app permission tiering, terminal exclusion from screenshots, and a global Esc key abort.

Prompt injection scanning. A server-side probe scans tool outputs before they enter the agent's context. A command blocklist blocks risky commands like curl and wget by default. Unmatched commands default to requiring manual approval.

These are real engineering investments. The two-stage classifier is architecturally sound. Anthropic's own caveat — "no system is completely immune to all attacks" — is honest and appropriate.

Where the Gaps Are — Anthropic's Own Numbers

Every gap below is documented by Anthropic themselves or by security researchers with published proof-of-concept exploits.

The classifier misses 17% of dangerous commands. Anthropic's engineering blog reports that against 52 real-world overeager actions, the auto mode classifier has a 17% false-negative rate — it misses dangerous commands in roughly 1 out of 6 risky scenarios. Against synthetic exfiltration attempts, the false-negative rate is 5.7%. Anthropic's own assessment: auto mode is "not a drop-in replacement for careful human review on high-stakes infrastructure."

Sandboxing is off by default. The OS-level sandbox is not enabled by default. And even when enabled, Claude Code includes a dangerouslyDisableSandbox escape hatch that is also enabled by default. When a command fails due to sandbox restrictions, Claude is prompted to analyze the failure and may retry with sandboxing disabled. The network sandbox can additionally be bypassed via domain fronting.

Computer use is completely unsandboxed. Anthropic's own documentation states it plainly: "Computer use: when Claude opens apps and controls your screen on macOS, it runs on your actual desktop rather than in an isolated environment." Their prompt injection scanning for computer use is, in their words, "still early, and attacks are constantly evolving."

Cloud scheduled tasks run with zero permission prompts. Anthropic's docs list it explicitly — cloud scheduled tasks: "Permission prompts: No (runs autonomously)." No additional security layer is documented for these fully autonomous execution paths.

File exfiltration PoC shipped unfixed. PromptArmor demonstrated that hidden prompt injection (white-on-white text in a file) could direct Claude to upload files to an attacker's account via curl — without requiring any human approval. Both Haiku and Opus 4.5 executed the exfiltration. Reported October 2025. Anthropic shipped Cowork in January 2026 with the vulnerability known. (PromptArmor, Simon Willison)

The rm -rf incidents are real. Claude Code has deleted entire home directories, wiped 2.5 years of production course submissions via Terraform destroy (Tom's Hardware), and destroyed iCloud-synced files during "reorganization" (GitHub Issue #32637). Multiple GitHub issues document these incidents (#29082, #10077, #24196, #15951). Someone built a dedicated recovery tool just to extract files from .claude session data.

Two CVEs target Claude Code directly. CVE-2025-59536 (CVSS 8.7) enables RCE via malicious hooks in .claude/settings.json — executes before the trust dialog appears. CVE-2026-21852 (CVSS 5.3) allows API key exfiltration via ANTHROPIC_BASE_URL redirect before the trust prompt. Both discovered by Check Point Research.

The Unattended Session Problem

Everything above matters more when nobody is watching — and that's exactly what Dispatch, /loop, and cloud scheduled tasks are designed for.

When Claude runs tasks while you're away, the security model changes fundamentally. A prompt injection in a scheduled task doesn't just execute once — as Harmonic Security warns, "a prompt injection loop in a scheduled task could run for hours." Each agent operates with your full credentials and file system access.

Security researchers describe this as the "lethal trifecta" — agents that simultaneously read private data (files, credentials, environment variables), encounter attacker-controlled content (repos, web pages, MCP responses, task descriptions), and take actions on behalf of users (shell commands, file writes, API calls). This isn't a bug in any specific feature. It's inherent to agentic tool design. Natural language processing fundamentally conflates instructions and data. Every mitigation reduces but cannot eliminate this risk. (arXiv 2601.17548)

The 17% false-negative rate in interactive sessions becomes a much bigger problem when there's no human to catch what the classifier misses. A dangerous command that slips through at 2 AM during a /loop cycle runs unchallenged until you check in the morning.

What Shoofly Advanced Adds

Shoofly Advanced is designed specifically for this gap — the space between Anthropic's built-in safeguards and what unattended sessions actually need.

Before-execution interception. Shoofly hooks into Claude Code's tool call pipeline and intercepts commands before they execute — not after. When the auto mode classifier misses a dangerous command (that 17% window), Shoofly's policy engine evaluates it independently. Two classifiers are better than one, especially when they use different evaluation approaches.

Policy-as-code. Instead of relying solely on a language model's judgment, you define explicit rules: no rm -rf with home directory paths, no writes outside the project directory, no network requests to unauthorized domains, no credential access during scheduled tasks. These rules are deterministic — they don't have a false-negative rate.

Real-time alerting for unattended sessions. When Shoofly blocks a command during a Dispatch task or /loop cycle, you get notified immediately. You find out about the blocked rm -rf at 2 AM when it happens, not when you check your terminal at 9 AM.

Additive, not replacement. The auto mode classifier, the sandbox, the prompt injection scanner — all of those still run. Shoofly adds a second layer of policy enforcement specifically tuned for unattended sessions. Works with both OpenClaw and Claude Code.

We're not claiming bulletproof. We're claiming that deterministic policy rules catch things probabilistic classifiers miss, and that real-time alerting matters when nobody's watching the terminal.

Running unattended Claude Code sessions? Add a second layer before your next /loop.

→ Get Shoofly Advanced

The Bottom Line

Dispatch is genuinely useful. Anthropic's safeguards are real, and their transparency about limitations is better than the industry norm. The 17% number is honest — most companies wouldn't publish it.

But the gaps matter most when you're not watching. And Dispatch, /loop, and cloud scheduled tasks are explicitly designed for when you're not watching. If you're running unattended Claude Code sessions on anything with production data, credentials, or infrastructure access, the built-in safeguards alone leave a measurable gap.

Add a second layer. Your future self — the one who didn't lose a weekend to a 2 AM rm -rf — will thank you.

FAQ

Is Claude Code Dispatch safe?

Dispatch inherits whatever permission mode you configure. Anthropic provides real safeguards — a two-stage auto mode classifier, optional OS-level sandboxing, and 30-minute computer use approval expiry for Dispatch sessions. However, the classifier has a documented 17% false-negative rate on dangerous commands. For sensitive workloads, adding before-execution policy enforcement fills the gap.

What are the security risks of Claude Code /loop?

Cloud scheduled tasks run with no permission prompts — fully autonomous. The primary risks are drift accumulation (a prompt injection early in a loop compromises all subsequent iterations) and the inability to monitor in real time. Harmonic Security warns that "a prompt injection loop in a scheduled task could run for hours."

Does Anthropic protect against prompt injection in Dispatch?

Yes, with a server-side probe, command blocklist, and isolated context windows for web fetch. However, PromptArmor demonstrated file exfiltration via hidden prompt injection that bypassed these protections without requiring human approval. Anthropic's computer use prompt injection detection is, in their own words, "still early."

What is the false negative rate of Claude Code auto mode?

Anthropic's engineering blog reports 17% against real-world overeager actions (1 in 6 risky scenarios), and 5.7% against synthetic exfiltration attempts. Anthropic states auto mode is "not a drop-in replacement for careful human review on high-stakes infrastructure."

How do I secure unattended Claude Code sessions?

Enable sandboxing, use the most restrictive permission mode that works for your task, and avoid --dangerously-skip-permissions. For Dispatch and /loop sessions handling sensitive work, add before-execution policy enforcement with deterministic rules and real-time alerting. Shoofly Advanced is built for this use case.