Building an agent that works in a demo is straightforward. Building one that runs reliably in production — for weeks, across real user sessions, with secrets handled safely, in an environment you can actually debug when something goes wrong — is a different problem. It takes months, and most of it has nothing to do with the model.
That infrastructure work is what Anthropic is selling with Claude Managed Agents. It launched April 8, 2026 in public beta. Here's what it actually handles, what's still gated, and who should use it now versus wait.
What the infrastructure actually handles
Five things that kill agent projects in production, and how Managed Agents addresses each:
Sandboxed execution. Tool calls run in an isolated environment. A bad shell command or a file write that goes sideways doesn't bring down your application. This is one of those things teams skip in development and regret in production.
Long-running sessions. Agents can run for hours. Managed Agents handles session continuity, context checkpointing, and reconnection without you writing that logic. If you've ever had an agent die mid-task because a connection timed out or a context window filled, this is the fix.
Credential management. Connect to third-party services through MCP servers. The agent gets access to what it needs without your application managing secrets directly. Scoped at the environment level, not in the system prompt where credentials have a habit of leaking into logs.
Scoped permissions. Define what the agent can and cannot touch when you configure the environment, not as instructions you hope the model follows. There's a difference between telling an agent "don't delete files" and structurally preventing it from doing so.
End-to-end tracing. Every tool call, every decision step, logged and queryable in the console. This is the capability teams feel most acutely when they don't have it. Debugging a broken agent with print statements is not a sustainable workflow.
Notion, Sentry, and Asana are already running production workloads on Managed Agents. Notion uses it for parallel task execution in coding and presentation workflows. Sentry uses it for bug detection and patch generation integrated into developer review. These aren't demos.
What's not in public beta yet
This is where most coverage glosses over the details. Two capabilities that make Managed Agents sound most powerful in the announcement are not in general access:
Multi-agent coordination — where an orchestrator agent directs specialist agents for parallel work — is available by request only, not to everyone who signs up.
Self-evaluation — where Claude iterates against its own success criteria before returning a result — is in research preview.
These are real features. They're just not what you're getting if you sign up today.
One more thing to calibrate: the announcement cites a 10-point improvement in outcome task success over standard prompting loops. That measurement was from internal testing on structured file generation, which is a specific task type. It's a real finding, not fiction, but don't assume it maps directly to your use case without benchmarking.
The pricing, in plain math
The runtime fee is $0.08 per session-hour of active execution, plus standard Claude Platform token costs on top of that.
Active means the agent is running — executing tools, processing results, generating responses. Idle time (waiting for your next message, sitting between sessions) doesn't count.
What that looks like in practice:
- An agent running continuously 24/7 costs about $58/month in runtime overhead, before token costs
- A customer support agent that runs actively for 20 minutes per ticket costs roughly $0.027 per ticket in runtime
- A research agent doing a 2-hour deep-dive task costs $0.16 in runtime for that session
The token costs on top depend entirely on which model you use and how much context your tasks require. A complex task on Sonnet can cost more in tokens than in runtime. Model it for your specific workload before making decisions.
For teams that would otherwise spend engineering time building and maintaining agent infrastructure, the math is usually easy. For high-volume automation where you're running hundreds of sessions per day, run the numbers before committing.
The vendor lock-in question
It's worth addressing directly because it's the loudest concern in the developer community, and it's legitimate.
Managed Agents runs Claude models only. There's no path to run GPT, Gemini, or any other model inside the harness. Your agent definitions, environment configurations, and session management are all on Anthropic's platform. If you want to migrate later — because Anthropic changes pricing, because a different model gets meaningfully better for your use case, or because you need to self-host — that migration is not a weekend project. We've been on the receiving end of a platform change with zero notice. It's manageable if you planned for it; painful if you didn't.
When this is a real problem:
You're still evaluating models and haven't committed to Claude as the right choice for your task. You have data residency requirements that require your workloads to run in specific infrastructure. You're building something where multi-model flexibility is architecturally important.
When it probably isn't:
You've already determined Claude is the right model for what you're building and you're not planning to switch. You're at an early stage where shipping fast matters more than long-term optionality. You're building internal tooling where switching models later is a genuine option, not a theoretical one.
The honest version: if you're Claude-committed, the lock-in cost is low. If you're not sure yet, don't architect around a managed platform until you are.
Agent architecture patterns: one agent vs. a team
This is a design question that Managed Agents makes more concrete. Two patterns have emerged in production systems:
Single agent with role-switching. One agent, loaded with different contexts or instructions depending on the task at hand. At any given moment it's acting as a researcher, then a writer, then a reviewer — wearing different hats, using a shared context window. The advantages are token efficiency (no coordination overhead, no passing outputs between agents) and context discipline (the full task history lives in one place). The tradeoff is that complex parallelism gets awkward, and you can't isolate tool access cleanly between roles.
Multi-agent teams with named specialists. A coordinator delegates to specialist agents, each with a narrow scope and specific tool access. A research agent, a build agent, a review agent — running concurrently where the work allows, isolated where the tools require it. The advantages are real concurrency on parallel subtasks and genuine tool isolation per specialist. The tradeoff is orchestration overhead, latency from coordination, and a debugging surface that multiplies with each agent you add.
Both patterns work. Which one fits depends on your task structure: how parallel the work actually is, how important tool isolation is, and how much overhead you can absorb.
We've run both patterns in production. What we landed on takes the best of each. That's a longer conversation — but the short version is that the choice isn't binary, and the right answer usually isn't at either extreme.
Who should try it now
Try it now if:
- You're building on Claude and the model choice is settled, not still under evaluation
- You've shipped an agent and discovered the production infrastructure problem firsthand
- Your team doesn't want to maintain an agent harness and would rather pay for someone else to run it
- The observability story matters to you — the tracing console alone is worth it for teams shipping agents to real users
Wait if:
- You need to orchestrate multiple models in the same workflow
- Multi-agent coordination is core to your use case and you can't get request access quickly enough
- Your use case involves regulated data and you haven't verified the infrastructure meets your requirements
- You're still in the "does this approach even work" phase — start with the standard API, not a managed platform
Getting started
The official quickstart at platform.claude.com walks through the four steps: define your agent (model, system prompt, tools), configure an environment, start a session, stream responses. The one non-obvious requirement is the managed-agents-2026-04-01 beta header on all API requests — easy to miss, breaks everything if you skip it.
Claude Managed Agents quickstart →
The infrastructure problem Managed Agents solves is real, and it's the kind of problem that's invisible until you've hit it. If you're past the prototype stage and the agent harness is becoming the thing you spend time on instead of the thing you're actually building, it's worth a serious look. The limitations — model lock-in, gated features, token costs on top of runtime — are real too, and worth knowing before you architect around it.
I build with Claude every day and write about what it's actually like to ship AI-powered products. Subscribe at shoofly.dev/newsletter — building AI products in the real world, not what the press releases say.