AI Coding Agent Security for Developers: The Complete Guide

AI coding agents aren't chatbots that suggest code. They're processes running on your machine with your user's permissions, reading files, executing shell commands, browsing the web, and calling external APIs. AI coding agent security requires a fundamentally different mental model — you're dealing with a process that has filesystem access and takes instructions from external content.

What is the threat model for AI coding agents?

The fundamental threat: an agent that can be made to take actions the user didn't intend, using the agent's legitimate, delegated permissions. The agent itself isn't malicious — it's manipulated.

The OWASP LLM Top 10 covers the relevant vulnerability classes for LLM threat detection:

LLM01 — Prompt Injection: Malicious instructions embedded in web pages, documents, tool output, or repository files that override the user's instructions.
LLM02 — Sensitive Information Disclosure: Agent directed to read and transmit credentials, SSH keys, or API tokens. AI agent malware protection must cover this vector specifically.
LLM03 — Supply Chain: Malicious MCP servers, agent skill packages, or compromised dependencies introducing malicious behavior through trusted channels. See MCP tool poisoning for how this works in practice.
LLM06 — Excessive Agency: Agents granted more permissions than needed — amplifying the impact of any compromise.
LLM08 — Vector/Embedding Weaknesses: Injected content reaching the model through RAG pipelines, bypassing input filters because malicious content enters at retrieval time.

What is the AI computer use security threat?

AI computer use security is one of the least-understood areas for developers running agents with browser access. Computer use agents have an expanded attack surface: injected instructions in web content can direct the agent to take unintended UI actions.

Concrete techniques:

Invisible text: White text on white background or zero-opacity CSS containing instructions visible to the agent's processing but not to the user
Hidden CSS instructions: Content hidden from human view via visibility: hidden but still in the DOM for agents processing HTML structure
Unintended click targets: Instructions directing the agent to click an authorization dialog for a permission the user didn't intend to grant
Form fill injection: Agent directed to fill fields with attacker-controlled data — address changes, payment updates, account setting modifications
Screen exfiltration: Instructions to screenshot specific display regions and transmit to an attacker-controlled endpoint

Agentic computer use protection requires treating web content as untrusted input that can influence tool calls. Apply pre-execution blocking to catch tool calls matching computer use attack patterns before they fire.

How does tool call interception work?

Tool call interception evaluates a tool call against a security policy before it executes — synchronously, not async. Three categories of evaluation:

Pattern-based rules

Credential path reads: ~/.ssh/, ~/.aws/, ~/.gnupg/
Dangerous shell patterns: rm -rf /, mass permission changes, sudo with agent-controlled arguments
Outbound transfers: curl or wget piping file contents to remote hosts

Behavioral rules

Unexpected writes outside the declared working directory
Outbound requests to domains unrelated to the stated task
Tool call sequences inconsistent with the user's goal

Categorical rules

Block exec entirely unless the session is explicitly exec-authorized
Block outbound messaging (Telegram, Discord, email) unless the specific channel is whitelisted
Block browser automation to financial or credential-management domains

How do the AI coding agent security approaches compare?

Approach	When it acts	What it stops
Input guardrails (NeMo, LlamaFirewall)	When content enters the model	Prompt injection reaching the LLM
Post-execution detection	After tool call completes	Alerts on damage already done
Pre-execution blocking (Shoofly)	When agent requests a tool call	Stops the action before it fires

See why we block instead of detect for the full argument on why detection-after-the-fact is insufficient for agentic workflows.

What are the honest trade-offs of each LLM agent security tool?

NVIDIA NeMo Guardrails

Open source, good for input/output filtering in chatbot contexts. Operates at the model boundary — does not intercept tool calls. Not designed for agentic deployments. Useful complement; not a substitute for agent-level security.

Meta LlamaFirewall

Open source, research-stage. Interesting work on prompt injection detection, but still early for production deployment. No pre-execution blocking for tool calls out of the box.

Lakera Guard

Enterprise-grade, API-based prompt injection and content filtering. Strong at the LLM input/output layer. Not agent-native — operates at the content layer, not the tool call layer. Post-ingestion, not pre-execution. Appropriate for teams with compliance requirements at the model boundary.

ClawMoat

Pre-execution blocking with a scan pipeline and credential directory monitoring. Works with OpenClaw and Claude Code. Proprietary rules — you can't audit what's being enforced. No free tier. If rule transparency and a free tier aren't requirements, it's worth evaluating.

Shoofly

Pre-execution blocking for OpenClaw and Claude Code (including Cowork and Dispatch). Open rules — you can read, edit, and audit exactly what's being enforced. Covers the full threat taxonomy above. See the OpenClaw security guide and Claude Code security guide for platform-specific details. Not enterprise-certified; designed for developers who want runtime protection without operational overhead.

What does Shoofly Advanced add for AI coding agent security?

Shoofly Basic is free — detects threats and alerts you. The threat policy is open and auditable. Shoofly Advanced upgrades to full pre-execution blocking, adds real-time alerts via Telegram and desktop notifications, and policy linting. See the Advanced docs for full configuration reference.

What's the minimum viable security posture for a developer?

Keep your agent runtime updated — CVEs happen and get fixed
Audit config files in repos you clone before opening them (especially for Claude Code)
Apply least privilege — don't grant tool permissions the agent doesn't need
Install runtime security — Shoofly Basic is free and takes five minutes
For unattended agents, add alerting — you need to know when something is blocked

Add runtime security to your agent stack

Shoofly Basic is free. No API key, no account required.

Install Shoofly Basic free — runtime security for Claude Code and OpenClaw agents:

curl -fsSL https://shoofly.dev/install.sh | bash

See plans & pricing →