AI Computer Use Security: Attack Vectors and How to Stop Them

← Back to Blog

Browser and computer-use agents are different from everything that came before them. They don't just process data — they act on it. Navigate URLs. Fill forms. Click buttons. Download files. That capability is what makes them powerful, and it's exactly what makes AI computer use security a distinct and urgent problem.

The threat model for a computer-use agent isn't a misconfigured endpoint or a stolen API key. It's the web page the agent is already browsing. Every site it visits is a potential attack surface — a place where crafted content can instruct the agent to do something its operator never intended. Understanding computer use attack vectors — and the architectural choices that mitigate them — is no longer optional for teams building production-grade agentic systems.

This post maps the attack surface, breaks down five concrete vectors with mechanisms and mitigations, and explains where Shoofly fits — and where it doesn't.

CVE-2025-47241: Domain Whitelist Bypass in browser-use

In May 2025, researchers at ARIMLABS.AI published CVE-2025-47241 (GHSA-x39x-9qw5-ghrf) — a domain whitelist bypass in the browser-use library affecting versions before 0.1.45. The vulnerability allowed agents to be directed to attacker-controlled domains via URL userinfo manipulation — meaning an attacker could craft a URL like https://trusted.example.com@evil.example.com/ and slip past a domain whitelist that only checked the hostname without accounting for the userinfo segment.

This is a precise, well-scoped vulnerability: it bypasses a specific access control (the domain allowlist), enabling navigation to unintended domains. It's not full agent hijacking via arbitrary web content. That distinction matters because conflating the two leads to the wrong mitigations. CVE-2025-47241 is fixed with strict URL parsing; the broader agent hijacking problem requires a different approach.

Anthropic's own research on prompt injection defenses for browser agents puts it clearly: the attack surface is vast, and browser agents can take a wide range of actions that attackers can exploit. Their work on Claude Opus 4.5 showed significant improvement in injection robustness — but also concluded that prompt injection in browser use is far from a solved problem. Model-level defenses matter. So do architectural ones.

The Attack Surface Map for Computer-Use Agents

Before getting to specific vectors, it helps to think about where the attack surface lives. For browser agent security, that surface spans three layers:

Content ingestion — anything the agent reads: DOM text, accessibility trees, screenshots, documents, emails. Malicious instructions can be embedded anywhere here.
Tool execution — the actions the agent takes: navigate, click, type, screenshot, run code, call APIs. Each action is a potential damage vector if triggered by injected instructions.
Data paths — what flows where: form fields, clipboard, file writes, network requests. Data exfiltration and credential theft happen here.

Most attacks move through all three layers in sequence: inject at the content layer, trigger a tool call, exfiltrate via a data path. Effective agentic computer use protection needs defenses at each layer — because a single layer held by one control isn't enough. For broader context on securing agentic systems, see our agentic AI security overview.

Attack Vector Breakdown

1. Invisible Text Injection

Mechanism: Attacker-controlled content includes text that's invisible to human visitors but processed by the agent. Common CSS techniques: positioning elements far off-screen (position: absolute; left: -9999px), setting font size to zero, or using color: transparent with no background. The agent reads the full DOM or accessibility tree — including these nodes — and follows any instructions embedded there.

Anthropic's research on prompt injection in browser use documents exactly this pattern: white text embedded in a vendor inquiry email, invisible to the user, directing the agent to forward emails containing the word "confidential" to an external address before completing its assigned task. This is prompt injection delivered through the content layer.

Mitigation: Content hygiene before ingestion — strip or flag DOM nodes whose computed styles indicate hidden content before the agent processes the page. This is a pre-ingestion problem, not a tool-call problem. Complement with agent-level skepticism: train or prompt the model to treat instructions found in web content as lower-trust than its system prompt.

2. Hidden CSS Directives

Mechanism: A more targeted variant. Malicious nodes are inserted with display: none, opacity: 0, or visibility: hidden — properties that reliably remove them from visual rendering while leaving them in the DOM. Unlike off-screen positioning, these are harder to detect heuristically because they're semantically "hidden" rather than just visually displaced.

The attack is particularly effective against agents that use DOM serialization (reading the raw HTML or parsed tree) rather than screenshot-based perception. A screenshot-based agent might miss these nodes; a DOM-reading agent will see them clearly.

Mitigation: At the framework level, filter nodes with computed display: none or zero-opacity before serializing DOM content for the model. For screenshot-based agents, this attack surface narrows significantly — but screenshot agents face the exfiltration vector described below. Neither approach is a complete solution alone.

3. Click Injection

Mechanism: An injected instruction directs the agent to click a specific UI element — an OAuth consent button, a "Download" trigger, a payment confirmation, a "Allow all cookies" dialog. The agent has the capability and the (injected) instruction; it executes the click. The AI agent browser hijack here isn't about stealing credentials directly — it's about exploiting the agent's legitimate ability to interact with UI to perform an unauthorized action on the user's behalf.

Click injection is particularly dangerous for agents with persistent sessions, because a single click can authorize a long-lived grant (OAuth token, app permission, subscription confirmation) that persists well beyond the compromised session.

Mitigation: Policy-level controls on high-risk click targets. Define a blocklist of UI patterns — consent dialogs, OAuth flows, download confirmations, payment UIs — and require explicit user confirmation before the agent proceeds. This is exactly the kind of tool-call-layer intervention Shoofly provides: the browser_click call is inspected against policy before it fires.

4. Form Fill Hijacking

Mechanism: The agent is directed to fill a form — but the values come from attacker-controlled instructions rather than the user's intent. Targets include PII fields (name, address, phone), password fields, payment details, and email addresses. The mechanism is a classic confused deputy: the agent has access to sensitive data (from memory, context, or prior form fills) and legitimate form-fill capabilities, but the injected instruction redirects where that data goes.

A more subtle variant doesn't require the agent to have sensitive data at all — it just redirects an existing fill task. If the agent was already going to fill in an address, injected content can modify the target field or the value mid-task.

Mitigation: Validate form fill targets against session intent before execution. The agent's browser_type or equivalent call should be inspectable: what field, what value, does this match what the user asked for? Sensitive field types ( type="password", autocomplete="cc-number", etc.) warrant higher scrutiny. Don't allow agents to fill credential fields autonomously without an explicit user instruction in the current session.

5. Screen Exfiltration

Mechanism: Screenshot-capable agents can be instructed to capture the current screen, encode the image (base64 or otherwise), and POST it to an attacker-controlled endpoint via a browser_request or similar outbound call. The payload can include anything visible on screen: credentials, documents, private conversations, financial data. No file access required — just screenshot + network egress.

This vector is insidious because each individual capability (screenshot, HTTP request) is entirely legitimate. The attack is in the combination and the destination. It's also difficult to catch at the model level because the injected instruction can be framed as routine telemetry or a debugging step.

Mitigation: Outbound network policy at the tool-call layer. Restrict browser_request (and analogous calls) to allowlisted domains. Flag or block any outbound call that carries a screenshot or large binary payload to a domain that wasn't explicitly authorized for the current task. This is a defense-in-depth measure — combine it with screenshot rate limiting and session-scoped network policies.

Defense-in-Depth for Computer-Use Agents

No single control closes the full attack surface. The right architecture stacks defenses across all three layers identified above:

Content hygiene (pre-ingestion): Filter hidden DOM nodes, flag suspicious text patterns, strip injected content before the model ever sees it. This is library/framework work — browser-use, Playwright wrappers, or a dedicated sanitization layer sitting between the browser and the agent context.
Model-level robustness: Use models that have been hardened against prompt injection (Anthropic's research on Claude Opus 4.5 is the current public benchmark). Apply system-prompt framing that treats web content as lower-trust than user instructions. This reduces — but doesn't eliminate — injection success rates.
Tool-call policy (pre-execution): Inspect every tool call against a threat policy before it fires. Block calls that match high-risk patterns regardless of what the model was instructed to do. This is the layer Shoofly operates at.
Network egress controls: Allowlist outbound domains. Block binary payloads to unrecognized endpoints. This catches exfiltration attempts that slip through the tool-call layer.
Session scope and least privilege: Agents shouldn't carry credentials, sensitive context, or broad permissions beyond what the current task requires. Scope access tightly; expire sessions promptly.

For implementation guidance, see the Shoofly guides. For how this applies to OpenClaw-hosted agents specifically, see OpenClaw security. The injection pattern here closely mirrors what we see with MCP tool poisoning — injected instructions exploiting legitimate tool capabilities is the common thread.

What Shoofly Covers — and What It Doesn't

Being honest about scope is part of doing security right. Here's the actual breakdown:

What Shoofly does: Shoofly monitors and blocks at the tool call layer. Every tool invocation — browser_navigate, browser_click, bash, write_file, and others — passes through Shoofly's pre-execution gate before it fires. Calls that match threat policy rules are blocked before any damage can occur. Shoofly Basic (free) detects and alerts, with a threat policy that's open and auditable — you can read and modify every rule. Shoofly Advanced upgrades to full pre-execution blocking, adds real-time alerts via Telegram and desktop notifications, and includes policy linting so rule mistakes get caught before they matter.

What Shoofly doesn't do: Shoofly does not sanitize DOM content before agent ingestion. It doesn't strip hidden CSS nodes, filter invisible text, or inspect the HTML the agent is about to process. That's a separate layer — pre-ingestion content hygiene — and it has to be handled at the framework or wrapper level, not at the tool-call layer.

The right stack is content hygiene plus Shoofly. Neither alone is complete. Content hygiene reduces what malicious instructions reach the model. Shoofly blocks the tool calls those instructions produce, even when some injections get through. They address different stages of the same attack chain.

For the broader landscape of AI agent security controls — including how Shoofly compares to other approaches — see our overview. For agentic AI security at the system level, start there.

Add runtime security for Claude Code and OpenClaw agents — install Shoofly Basic free:

curl -fsSL https://shoofly.dev/install.sh | bash

See plans and pricing →