The term "LLM firewall" means different things to different vendors, and that ambiguity is a problem. If you're evaluating LLM firewalls for your stack, you need to understand what each product actually intercepts, where it sits in the request lifecycle, and what it can and cannot catch. A prompt filter and a tool call interceptor are both marketed as "LLM firewalls," but they protect against fundamentally different threats.
This is a comprehensive, honest comparison. We'll credit competitors fairly, flag where our information might be incomplete, and be clear about what Shoofly does and doesn't do. If a different tool is better for your specific threat model, we'll say so.
What LLM Firewalls Do
An LLM firewall inspects traffic at some point in the lifecycle of an LLM interaction and applies rules to allow, modify, or block that traffic. The concept borrows from network firewalls β packet inspection, rule matching, allow/deny decisions β but applied to the AI request/response pipeline instead of network packets.
The lifecycle has three interception points:
- Input (prompt). Before the user's message or system prompt reaches the model.
- Execution (tool calls). After the model decides to take an action but before that action executes.
- Output (response). After the model generates a response but before it reaches the user or triggers downstream actions.
Every LLM firewall operates at one or more of these points. The difference between products is which points they cover and how they evaluate what passes through.
Taxonomy of Approaches
Prompt Filtering
Intercepts and analyzes the input before it reaches the model. Looks for prompt injection attempts, jailbreaks, PII in prompts, off-topic inputs, and policy violations in user messages. The core question it answers: "Is this input safe to send to the model?"
Tool Call Interception
Intercepts the model's planned actions before they execute. Evaluates shell commands, file operations, API calls, MCP tool invocations, and other tool uses against a policy. The core question it answers: "Is this action safe to execute?"
Output Scanning
Intercepts the model's response before it's returned or acted upon. Checks for hallucinations, PII in outputs, harmful content, insecure code, and policy violations in generated text. The core question it answers: "Is this output safe to use?"
These aren't competing approaches β they're complementary layers. A prompt injection that bypasses the input filter might produce a dangerous tool call that gets caught at the execution layer. An output that looks clean might have been produced by executing a dangerous command that should have been blocked. Full coverage requires all three points, and no single product covers all of them equally well.
Prompt Filtering: Lakera, Guardrails AI
Lakera Guard is the most established prompt-level LLM firewall. [NEEDS SOURCE: verify current Lakera product name and capabilities β product has evolved rapidly] It operates as an API endpoint that sits between your application and the LLM. You send the prompt to Lakera first; it evaluates the prompt against its detection models and returns a risk score. If the score exceeds your threshold, you block the request.
Lakera's strengths:
- Purpose-built detection models for prompt injection, jailbreaks, and PII
- Low latency β typically single-digit milliseconds for prompt classification
- Simple integration β API call before your LLM call
- Broad model support β works with any LLM since it only inspects the input
Lakera's coverage:
- Strong on prompt injection detection (their primary focus)
- PII detection in inputs
- Content policy enforcement on prompts
- Does not intercept tool calls or execution-layer actions
- Does not scan model outputs
Guardrails AI takes a different architectural approach β it's a framework rather than a hosted API. You define "guards" (validators) in Python that check inputs and outputs against your criteria. Guards can be LLM-based, regex-based, or custom functions.
Guardrails AI's strengths:
- Highly customizable β write arbitrary validation logic
- Covers both input and output (guards run on both sides)
- Self-hosted β your data doesn't leave your infrastructure
- Open and auditable codebase
Guardrails AI's coverage:
- Input validation (prompt checks, PII detection, topic filtering)
- Output validation (hallucination detection, format enforcement, content filtering)
- Does not natively intercept tool calls at the execution layer
- Requires custom implementation for execution-layer coverage
Both Lakera and Guardrails AI are good at what they do. If your primary concern is prompt injection β attackers manipulating your chatbot via crafted inputs β Lakera is purpose-built for that. If you need flexible input/output validation with custom business logic, Guardrails AI gives you full control. Neither was designed to intercept tool calls at the execution layer, because their architecture sits in the prompt/response pipeline, not in the agent runtime.
Output Scanning: Guardrails AI, Custom Solutions
Output scanning catches problems after the model generates them but before they reach users or downstream systems.
Guardrails AI (again) is the most flexible option here. Output guards can check for:
- Factual consistency against source material
- PII leakage in responses
- Format adherence (JSON schema, specific structure requirements)
- Content policy violations
- Relevance and topicality
Custom solutions are common at scale. Most teams with production LLM deployments have built some form of output post-processing β toxicity classifiers, regex-based PII scrubbers, schema validators. These range from simple regex checks to full secondary LLM calls that evaluate the primary model's output.
Output scanning is necessary for any production LLM deployment. It catches hallucinations, PII leakage, and content policy violations that input filtering can't predict. But it shares the same architectural limitation as prompt filtering: it operates on text, not on actions. If the model generates a tool call that deletes files or exfiltrates data, the output scanner sees the text of the response β which may or may not describe what the tool call actually does. The dangerous action executes before the output is generated.
Tool Call Interception: Shoofly, LlamaFirewall
This is where the architecture diverges most sharply from prompt/output firewalls.
Meta's LlamaFirewall is the most significant tool callβaware firewall from a major AI lab. [NEEDS SOURCE: verify current LlamaFirewall capabilities, architecture, and tool call interception scope β Meta's documentation has been updated since initial release] LlamaFirewall includes three core scanners:
- PromptGuard 2: A prompt injection classifier that operates on inputs. Comparable to Lakera's prompt-level detection.
- AlignmentCheck: An LLM-based evaluator that analyzes the model's reasoning to detect goal hijacking β whether the model has been manipulated into pursuing an attacker's objective instead of the user's.
- CodeShield: A static analysis engine that scans generated code for insecure patterns before execution.
LlamaFirewall's architecture is notable because AlignmentCheck operates on the model's reasoning, not just its inputs or outputs. This gives it partial tool call awareness β if the model has been hijacked via prompt injection and is now generating tool calls to serve the attacker, AlignmentCheck can potentially detect the misalignment in the reasoning chain.
However, LlamaFirewall's tool call interception is reasoning-based, not execution-based. It evaluates whether the model's intent appears compromised, not whether a specific shell command or file operation is dangerous. A tool call that passes AlignmentCheck (the model genuinely believes it's helpful) but violates a security policy (it reads credentials it shouldn't need) would not be caught by reasoning analysis alone.
LlamaFirewall's strengths:
- Multi-scanner architecture covering prompt, reasoning, and code layers
- AlignmentCheck provides a unique reasoning-level defense
- Open and auditable codebase
- Designed for integration into agentic pipelines
LlamaFirewall's coverage:
- Prompt injection detection (PromptGuard 2)
- Reasoning-level goal hijacking detection (AlignmentCheck)
- Code security scanning (CodeShield)
- Partial tool call awareness through reasoning analysis
- Does not enforce deterministic policies on specific execution-layer actions
Shoofly operates exclusively at the execution layer. Every tool call β shell commands, file operations, network requests, MCP tool invocations β is intercepted before execution and evaluated against policy rules defined in YAML.
Shoofly's strengths:
- Deterministic rule evaluation β no false-negative rate on pattern matches
- Execution-layer specificity β evaluates the actual command/operation, not the reasoning behind it
- Low latency β rule evaluation is sub-millisecond
- Policy-as-code in YAML β open and auditable
- Built specifically for Claude Code and OpenClaw agent runtimes
Shoofly's coverage:
- Tool call interception and policy enforcement
- Credential access control (file path patterns)
- Network egress control (domain allowlists)
- Destructive command blocking (rm -rf, chmod, etc.)
- MCP tool call scoping
- Does not analyze prompts or model reasoning
- Does not scan generated code for vulnerabilities
- Does not detect prompt injection at the input level
Shoofly and LlamaFirewall are complementary, not competitive. LlamaFirewall catches reasoning-level compromise β when the model has been tricked into malicious intent. Shoofly catches execution-level violations β when any action, regardless of the reasoning behind it, violates a defined policy. The model can have perfectly aligned intent and still generate a tool call that reads credentials it doesn't need. The model can have compromised intent and still be blocked from executing anything dangerous.
Comparison Table
| Capability | Lakera Guard | LlamaFirewall | Guardrails AI | Shoofly |
|---|---|---|---|---|
| Prompt injection detection | Yes (primary) | Yes (PromptGuard 2) | Yes (custom guards) | No |
| Reasoning analysis | No | Yes (AlignmentCheck) | No | No |
| Code scanning | No | Yes (CodeShield) | Partial (output guards) | No |
| Output validation | No | No | Yes (primary) | No |
| Tool call interception | No | Partial (reasoning-based) | No | Yes (primary) |
| Deterministic policy rules | Threshold-based | No | Custom (programmatic) | Yes (YAML) |
| Architecture | Hosted API | Self-hosted | Self-hosted/framework | Hook in agent runtime |
| Latency | ~5-15ms | Varies by scanner | Varies by guard | <1ms (rule eval) |
| Pricing | Tiered [NEEDS SOURCE: verify current pricing] | Free | Free (framework) / Enterprise | Free (Basic) / $5/mo (Advanced) |
| Open/Auditable | No | Yes | Yes | Yes (rules are YAML) |
Important caveats on this table:
- Lakera's capabilities may have expanded beyond prompt injection β verify against their current documentation before making purchasing decisions. [NEEDS SOURCE: verify Lakera's current feature set]
- LlamaFirewall's tool call awareness is through reasoning analysis, not direct execution-layer interception. The distinction matters for policy-based enforcement.
- Guardrails AI can theoretically do anything via custom guards, but native out-of-the-box coverage is focused on input/output validation.
- Shoofly's coverage is intentionally narrow β execution layer only. This is a design choice, not a limitation we're planning to address. We think the execution layer should be done by a specialist.
When Each Approach Is Appropriate
Use Lakera when: Your primary threat is prompt injection in a chatbot or customer-facing LLM application. You need low-latency detection in a hosted API with minimal integration work. Your application doesn't use tool calls or agentic execution.
Use LlamaFirewall when: You're building an agentic system and want defense-in-depth across prompt, reasoning, and code layers. You have the infrastructure to self-host. You want reasoning-level compromise detection (AlignmentCheck) as part of your security stack.
Use Guardrails AI when: You need custom input/output validation with business-specific logic. You want full control over validation rules and execution. You're building a production LLM application with specific format, content, and safety requirements.
Use Shoofly when: Your agents execute tool calls β shell commands, file operations, network requests, MCP tool invocations β and you need deterministic policy enforcement on what those actions are allowed to do. You're running Claude Code or OpenClaw. You want execution-layer security that complements (not replaces) prompt-level and output-level defenses.
Use multiple when: You're serious about security. Prompt filtering protects the input. Output scanning protects the response. Execution-layer enforcement protects the actions. Each layer catches threats the others miss.
Implementation Guide
Here's how to assemble a practical LLM firewall stack for an agentic development workflow:
Layer 1: Prompt Defense
If your agents accept external inputs (user messages, task descriptions from Dispatch, webhook payloads), add prompt-level filtering.
# Option A: Lakera Guard (hosted)
# Add before your LLM API call
response = lakera.check(prompt=user_input)
if response.risk_score > threshold:
block()
# Option B: LlamaFirewall PromptGuard 2 (self-hosted)
# Runs locally, no external API call
result = prompt_guard.analyze(user_input)
if result.is_injection:
block()
Layer 2: Execution Defense
Install Shoofly to intercept tool calls before execution.
# Install Shoofly Advanced
curl -fsSL https://shoofly.dev/advanced | bash
# Default policies cover:
# - Destructive commands (rm -rf outside project root)
# - Credential access (~/.aws, ~/.ssh, .env)
# - Network egress (unknown domains)
# - Sensitive file modification (.bashrc, CI configs)
Customize policies in YAML:
# .shoofly/policy.yaml
rules:
- name: block-credential-reads
match:
tool: file_read
path: ["~/.aws/**", "~/.ssh/**", "**/.env"]
action: block
- name: allow-project-writes
match:
tool: file_write
path: [".//**"]
action: allow
- name: block-external-egress
match:
tool: network_request
domain: ["!*.your-company.com", "!github.com", "!npmjs.org"]
action: block
Layer 3: Output Defense
For generated code, add SAST scanning. For generated text, add content validation.
# Code: Semgrep, CodeQL, or Snyk Code
semgrep --config auto ./generated/
# Text: Guardrails AI guards or custom validators
The Complete Stack
User Input β [Prompt Filter] β LLM β [Tool Call Interceptor] β Execution
β
[Output Scanner] β Response
Each layer is independently valuable. But the combination is where the real security posture lives. A prompt injection that bypasses Lakera still has to generate a tool call that passes Shoofly's policy rules. A tool call that somehow slips through Shoofly (the rules didn't cover that specific pattern) still produces output that gets scanned. Defense in depth isn't just a buzzword here β it's the architecture.
Prompt filtering protects the input. Output scanning protects the code. Shoofly Advanced protects the execution. Complete the stack.
FAQ
Q: Do I need all three layers (prompt, execution, output) to be secure? Each layer is independently valuable, but the combination is significantly stronger than any single layer. If you can only deploy one, start with execution-layer interception (Shoofly) β it catches the highest-impact threats (destructive operations, data exfiltration, credential access) at the last point before damage occurs. Add prompt-level and output-level defenses as your security posture matures.
Q: How does an LLM firewall differ from a traditional web application firewall (WAF)? A WAF inspects HTTP requests and responses using signature-based rules and operates at the network layer. An LLM firewall operates at the AI application layer β inspecting prompts, model reasoning, tool calls, or outputs depending on its interception point. The threat model is fundamentally different: WAFs protect against attacks on your application's code, while LLM firewalls protect against attacks that flow through the model to reach your tools and infrastructure. They're complementary β a WAF protects the HTTP boundary, an LLM firewall protects the AI execution boundary.
Q: Can I use LlamaFirewall and Shoofly together? Yes, and this is the recommended architecture for high-security agentic deployments. LlamaFirewall's AlignmentCheck detects reasoning-level compromise β cases where the model has been manipulated into pursuing adversarial goals. Shoofly catches execution-level violations β specific tool calls that violate policy regardless of reasoning. Together, they provide defense in depth across both the reasoning and execution layers.
Q: What's the latency impact of stacking multiple LLM firewall layers? Minimal in practice. Prompt-level filtering (Lakera) adds ~5β15ms. Shoofly's rule evaluation adds <1ms. Output scanning varies by guard complexity but typically adds 10β50ms. Combined, total overhead is well under 100ms β negligible compared to LLM inference time (typically 1β30 seconds) and tool execution time. The security benefit far outweighs the latency cost.
Further reading: Why We Block Instead of Detect Β· Prompt Injection Blocking: Pre-Execution Security Β· Runtime Threat Detection for AI Agents Β· Agentic AI Security
Ready to secure your AI agents? Shoofly Advanced provides pre-execution policy enforcement for Claude Code and OpenClaw β 20 threat rules, YAML policy-as-code, 100% local. $5/mo.