AI coding agents aren't chatbots that suggest code. They're processes running on your machine with your user's permissions, reading files, executing shell commands, browsing the web, and calling external APIs. AI coding agent security requires a fundamentally different mental model — you're dealing with a process that has filesystem access and takes instructions from external content.
What is the threat model for AI coding agents?
The fundamental threat: an agent that can be made to take actions the user didn't intend, using the agent's legitimate, delegated permissions. The agent itself isn't malicious — it's manipulated.
The OWASP LLM Top 10 covers the relevant vulnerability classes for LLM threat detection:
- LLM01 — Prompt Injection: Malicious instructions embedded in web pages, documents, tool output, or repository files that override the user's instructions.
- LLM02 — Sensitive Information Disclosure: Agent directed to read and transmit credentials, SSH keys, or API tokens. AI agent malware protection must cover this vector specifically.
- LLM03 — Supply Chain: Malicious MCP servers, agent skill packages, or compromised dependencies introducing malicious behavior through trusted channels. See MCP tool poisoning for how this works in practice.
- LLM06 — Excessive Agency: Agents granted more permissions than needed — amplifying the impact of any compromise.
- LLM08 — Vector/Embedding Weaknesses: Injected content reaching the model through RAG pipelines, bypassing input filters because malicious content enters at retrieval time.
What is the AI computer use security threat?
AI computer use security is one of the least-understood areas for developers running agents with browser access. Computer use agents have an expanded attack surface: injected instructions in web content can direct the agent to take unintended UI actions.
Concrete techniques:
- Invisible text: White text on white background or zero-opacity CSS containing instructions visible to the agent's processing but not to the user
- Hidden CSS instructions: Content hidden from human view via
visibility: hiddenbut still in the DOM for agents processing HTML structure - Unintended click targets: Instructions directing the agent to click an authorization dialog for a permission the user didn't intend to grant
- Form fill injection: Agent directed to fill fields with attacker-controlled data — address changes, payment updates, account setting modifications
- Screen exfiltration: Instructions to screenshot specific display regions and transmit to an attacker-controlled endpoint
Agentic computer use protection requires treating web content as untrusted input that can influence tool calls. Apply pre-execution blocking to catch tool calls matching computer use attack patterns before they fire.
How does tool call interception work?
Tool call interception evaluates a tool call against a security policy before it executes — synchronously, not async. Three categories of evaluation:
Pattern-based rules
- Credential path reads:
~/.ssh/,~/.aws/,~/.gnupg/ - Dangerous shell patterns:
rm -rf /, mass permission changes,sudowith agent-controlled arguments - Outbound transfers:
curlorwgetpiping file contents to remote hosts
Behavioral rules
- Unexpected writes outside the declared working directory
- Outbound requests to domains unrelated to the stated task
- Tool call sequences inconsistent with the user's goal
Categorical rules
- Block
execentirely unless the session is explicitly exec-authorized - Block outbound messaging (Telegram, Discord, email) unless the specific channel is whitelisted
- Block browser automation to financial or credential-management domains
How do the AI coding agent security approaches compare?
| Approach | When it acts | What it stops |
|---|---|---|
| Input guardrails (NeMo, LlamaFirewall) | When content enters the model | Prompt injection reaching the LLM |
| Post-execution detection | After tool call completes | Alerts on damage already done |
| Pre-execution blocking (Shoofly) | When agent requests a tool call | Stops the action before it fires |
See why we block instead of detect for the full argument on why detection-after-the-fact is insufficient for agentic workflows.
What are the honest trade-offs of each LLM agent security tool?
NVIDIA NeMo Guardrails
Open source, good for input/output filtering in chatbot contexts. Operates at the model boundary — does not intercept tool calls. Not designed for agentic deployments. Useful complement; not a substitute for agent-level security.
Meta LlamaFirewall
Open source, research-stage. Interesting work on prompt injection detection, but still early for production deployment. No pre-execution blocking for tool calls out of the box.
Lakera Guard
Enterprise-grade, API-based prompt injection and content filtering. Strong at the LLM input/output layer. Not agent-native — operates at the content layer, not the tool call layer. Post-ingestion, not pre-execution. Appropriate for teams with compliance requirements at the model boundary.
ClawMoat
Pre-execution blocking with a scan pipeline and credential directory monitoring. Works with OpenClaw and Claude Code. Proprietary rules — you can't audit what's being enforced. No free tier. If rule transparency and a free tier aren't requirements, it's worth evaluating.
Shoofly
Pre-execution blocking for OpenClaw and Claude Code (including Cowork and Dispatch). Open rules — you can read, edit, and audit exactly what's being enforced. Covers the full threat taxonomy above. See the OpenClaw security guide and Claude Code security guide for platform-specific details. Not enterprise-certified; designed for developers who want runtime protection without operational overhead.
What does Shoofly Advanced add for AI coding agent security?
Shoofly Basic is free — detects threats and alerts you. The threat policy is open and auditable. Shoofly Advanced upgrades to full pre-execution blocking, adds real-time alerts via Telegram and desktop notifications, and policy linting. See the Advanced docs for full configuration reference.
What's the minimum viable security posture for a developer?
- Keep your agent runtime updated — CVEs happen and get fixed
- Audit config files in repos you clone before opening them (especially for Claude Code)
- Apply least privilege — don't grant tool permissions the agent doesn't need
- Install runtime security — Shoofly Basic is free and takes five minutes
- For unattended agents, add alerting — you need to know when something is blocked
Add runtime security to your agent stack
Shoofly Basic is free. No API key, no account required.
Install Shoofly Basic free — runtime security for Claude Code and OpenClaw agents:
curl -fsSL https://shoofly.dev/install.sh | bash
See plans & pricing →