LLM Firewall: Architecture, Comparison, and the Case for Pre-Execution

← Back to Blog

The term "LLM firewall" means different things to different vendors, and that ambiguity is a problem. If you're evaluating LLM firewalls for your stack, you need to understand what each product actually intercepts, where it sits in the request lifecycle, and what it can and cannot catch. A prompt filter and a tool call interceptor are both marketed as "LLM firewalls," but they protect against fundamentally different threats.

This is a comprehensive, honest comparison. We'll credit competitors fairly, flag where our information might be incomplete, and be clear about what Shoofly does and doesn't do. If a different tool is better for your specific threat model, we'll say so.

What LLM Firewalls Do

An LLM firewall inspects traffic at some point in the lifecycle of an LLM interaction and applies rules to allow, modify, or block that traffic. The concept borrows from network firewalls — packet inspection, rule matching, allow/deny decisions — but applied to the AI request/response pipeline instead of network packets.

The lifecycle has three interception points:

Input (prompt). Before the user's message or system prompt reaches the model.
Execution (tool calls). After the model decides to take an action but before that action executes.
Output (response). After the model generates a response but before it reaches the user or triggers downstream actions.

Every LLM firewall operates at one or more of these points. The difference between products is which points they cover and how they evaluate what passes through.

Taxonomy of Approaches

Prompt Filtering

Intercepts and analyzes the input before it reaches the model. Looks for prompt injection attempts, jailbreaks, PII in prompts, off-topic inputs, and policy violations in user messages. The core question it answers: "Is this input safe to send to the model?"

Tool Call Interception

Intercepts the model's planned actions before they execute. Evaluates shell commands, file operations, API calls, MCP tool invocations, and other tool uses against a policy. The core question it answers: "Is this action safe to execute?"

Output Scanning

Intercepts the model's response before it's returned or acted upon. Checks for hallucinations, PII in outputs, harmful content, insecure code, and policy violations in generated text. The core question it answers: "Is this output safe to use?"

These aren't competing approaches — they're complementary layers. A prompt injection that bypasses the input filter might produce a dangerous tool call that gets caught at the execution layer. An output that looks clean might have been produced by executing a dangerous command that should have been blocked. Full coverage requires all three points, and no single product covers all of them equally well.

Prompt Filtering: Lakera, Guardrails AI

Lakera Guard is the most established prompt-level LLM firewall. [NEEDS SOURCE: verify current Lakera product name and capabilities — product has evolved rapidly] It operates as an API endpoint that sits between your application and the LLM. You send the prompt to Lakera first; it evaluates the prompt against its detection models and returns a risk score. If the score exceeds your threshold, you block the request.

Lakera's strengths:

Purpose-built detection models for prompt injection, jailbreaks, and PII
Low latency — typically single-digit milliseconds for prompt classification
Simple integration — API call before your LLM call
Broad model support — works with any LLM since it only inspects the input

Lakera's coverage:

Strong on prompt injection detection (their primary focus)
PII detection in inputs
Content policy enforcement on prompts
Does not intercept tool calls or execution-layer actions
Does not scan model outputs

Guardrails AI takes a different architectural approach — it's a framework rather than a hosted API. You define "guards" (validators) in Python that check inputs and outputs against your criteria. Guards can be LLM-based, regex-based, or custom functions.

Guardrails AI's strengths:

Highly customizable — write arbitrary validation logic
Covers both input and output (guards run on both sides)
Self-hosted — your data doesn't leave your infrastructure
Open and auditable codebase

Guardrails AI's coverage:

Input validation (prompt checks, PII detection, topic filtering)
Output validation (hallucination detection, format enforcement, content filtering)
Does not natively intercept tool calls at the execution layer
Requires custom implementation for execution-layer coverage

Both Lakera and Guardrails AI are good at what they do. If your primary concern is prompt injection — attackers manipulating your chatbot via crafted inputs — Lakera is purpose-built for that. If you need flexible input/output validation with custom business logic, Guardrails AI gives you full control. Neither was designed to intercept tool calls at the execution layer, because their architecture sits in the prompt/response pipeline, not in the agent runtime.

Output Scanning: Guardrails AI, Custom Solutions

Output scanning catches problems after the model generates them but before they reach users or downstream systems.

Guardrails AI (again) is the most flexible option here. Output guards can check for:

Factual consistency against source material
PII leakage in responses
Format adherence (JSON schema, specific structure requirements)
Content policy violations
Relevance and topicality

Custom solutions are common at scale. Most teams with production LLM deployments have built some form of output post-processing — toxicity classifiers, regex-based PII scrubbers, schema validators. These range from simple regex checks to full secondary LLM calls that evaluate the primary model's output.

Output scanning is necessary for any production LLM deployment. It catches hallucinations, PII leakage, and content policy violations that input filtering can't predict. But it shares the same architectural limitation as prompt filtering: it operates on text, not on actions. If the model generates a tool call that deletes files or exfiltrates data, the output scanner sees the text of the response — which may or may not describe what the tool call actually does. The dangerous action executes before the output is generated.

Tool Call Interception: Shoofly, LlamaFirewall

This is where the architecture diverges most sharply from prompt/output firewalls.

Meta's LlamaFirewall is the most significant tool call–aware firewall from a major AI lab. [NEEDS SOURCE: verify current LlamaFirewall capabilities, architecture, and tool call interception scope — Meta's documentation has been updated since initial release] LlamaFirewall includes three core scanners:

PromptGuard 2: A prompt injection classifier that operates on inputs. Comparable to Lakera's prompt-level detection.
AlignmentCheck: An LLM-based evaluator that analyzes the model's reasoning to detect goal hijacking — whether the model has been manipulated into pursuing an attacker's objective instead of the user's.
CodeShield: A static analysis engine that scans generated code for insecure patterns before execution.

LlamaFirewall's architecture is notable because AlignmentCheck operates on the model's reasoning, not just its inputs or outputs. This gives it partial tool call awareness — if the model has been hijacked via prompt injection and is now generating tool calls to serve the attacker, AlignmentCheck can potentially detect the misalignment in the reasoning chain.

However, LlamaFirewall's tool call interception is reasoning-based, not execution-based. It evaluates whether the model's intent appears compromised, not whether a specific shell command or file operation is dangerous. A tool call that passes AlignmentCheck (the model genuinely believes it's helpful) but violates a security policy (it reads credentials it shouldn't need) would not be caught by reasoning analysis alone.

LlamaFirewall's strengths:

Multi-scanner architecture covering prompt, reasoning, and code layers
AlignmentCheck provides a unique reasoning-level defense
Open and auditable codebase
Designed for integration into agentic pipelines

LlamaFirewall's coverage:

Prompt injection detection (PromptGuard 2)
Reasoning-level goal hijacking detection (AlignmentCheck)
Code security scanning (CodeShield)
Partial tool call awareness through reasoning analysis
Does not enforce deterministic policies on specific execution-layer actions

Shoofly operates exclusively at the execution layer. Every tool call — shell commands, file operations, network requests, MCP tool invocations — is intercepted before execution and evaluated against policy rules defined in YAML.

Shoofly's strengths:

Deterministic rule evaluation — no false-negative rate on pattern matches
Execution-layer specificity — evaluates the actual command/operation, not the reasoning behind it
Low latency — rule evaluation is sub-millisecond
Policy-as-code in YAML — open and auditable
Built specifically for Claude Code and OpenClaw agent runtimes

Shoofly's coverage:

Tool call interception and policy enforcement
Credential access control (file path patterns)
Network egress control (domain allowlists)
Destructive command blocking (rm -rf, chmod, etc.)
MCP tool call scoping
Does not analyze prompts or model reasoning
Does not scan generated code for vulnerabilities
Does not detect prompt injection at the input level

Shoofly and LlamaFirewall are complementary, not competitive. LlamaFirewall catches reasoning-level compromise — when the model has been tricked into malicious intent. Shoofly catches execution-level violations — when any action, regardless of the reasoning behind it, violates a defined policy. The model can have perfectly aligned intent and still generate a tool call that reads credentials it doesn't need. The model can have compromised intent and still be blocked from executing anything dangerous.

Comparison Table

Capability	Lakera Guard	LlamaFirewall	Guardrails AI	Shoofly
Prompt injection detection	Yes (primary)	Yes (PromptGuard 2)	Yes (custom guards)	No
Reasoning analysis	No	Yes (AlignmentCheck)	No	No
Code scanning	No	Yes (CodeShield)	Partial (output guards)	No
Output validation	No	No	Yes (primary)	No
Tool call interception	No	Partial (reasoning-based)	No	Yes (primary)
Deterministic policy rules	Threshold-based	No	Custom (programmatic)	Yes (YAML)
Architecture	Hosted API	Self-hosted	Self-hosted/framework	Hook in agent runtime
Latency	~5-15ms	Varies by scanner	Varies by guard	<1ms (rule eval)
Pricing	Tiered [NEEDS SOURCE: verify current pricing]	Free	Free (framework) / Enterprise	Free (Basic) / $5/mo (Advanced)
Open/Auditable	No	Yes	Yes	Yes (rules are YAML)

Important caveats on this table:

Lakera's capabilities may have expanded beyond prompt injection — verify against their current documentation before making purchasing decisions. [NEEDS SOURCE: verify Lakera's current feature set]
LlamaFirewall's tool call awareness is through reasoning analysis, not direct execution-layer interception. The distinction matters for policy-based enforcement.
Guardrails AI can theoretically do anything via custom guards, but native out-of-the-box coverage is focused on input/output validation.
Shoofly's coverage is intentionally narrow — execution layer only. This is a design choice, not a limitation we're planning to address. We think the execution layer should be done by a specialist.

When Each Approach Is Appropriate

Use Lakera when: Your primary threat is prompt injection in a chatbot or customer-facing LLM application. You need low-latency detection in a hosted API with minimal integration work. Your application doesn't use tool calls or agentic execution.

Use LlamaFirewall when: You're building an agentic system and want defense-in-depth across prompt, reasoning, and code layers. You have the infrastructure to self-host. You want reasoning-level compromise detection (AlignmentCheck) as part of your security stack.

Use Guardrails AI when: You need custom input/output validation with business-specific logic. You want full control over validation rules and execution. You're building a production LLM application with specific format, content, and safety requirements.

Use Shoofly when: Your agents execute tool calls — shell commands, file operations, network requests, MCP tool invocations — and you need deterministic policy enforcement on what those actions are allowed to do. You're running Claude Code or OpenClaw. You want execution-layer security that complements (not replaces) prompt-level and output-level defenses.

Use multiple when: You're serious about security. Prompt filtering protects the input. Output scanning protects the response. Execution-layer enforcement protects the actions. Each layer catches threats the others miss.

Implementation Guide

Here's how to assemble a practical LLM firewall stack for an agentic development workflow:

Layer 1: Prompt Defense

If your agents accept external inputs (user messages, task descriptions from Dispatch, webhook payloads), add prompt-level filtering.

# Option A: Lakera Guard (hosted)
# Add before your LLM API call
response = lakera.check(prompt=user_input)
if response.risk_score > threshold:
    block()

# Option B: LlamaFirewall PromptGuard 2 (self-hosted)
# Runs locally, no external API call
result = prompt_guard.analyze(user_input)
if result.is_injection:
    block()

Layer 2: Execution Defense

Install Shoofly to intercept tool calls before execution.

# Install Shoofly Advanced
curl -fsSL https://shoofly.dev/advanced | bash

# Default policies cover:
# - Destructive commands (rm -rf outside project root)
# - Credential access (~/.aws, ~/.ssh, .env)
# - Network egress (unknown domains)
# - Sensitive file modification (.bashrc, CI configs)

Customize policies in YAML:

# .shoofly/policy.yaml
rules:
  - name: block-credential-reads
    match:
      tool: file_read
      path: ["~/.aws/**", "~/.ssh/**", "**/.env"]
    action: block

  - name: allow-project-writes
    match:
      tool: file_write
      path: [".//**"]
    action: allow

  - name: block-external-egress
    match:
      tool: network_request
      domain: ["!*.your-company.com", "!github.com", "!npmjs.org"]
    action: block

Layer 3: Output Defense

For generated code, add SAST scanning. For generated text, add content validation.

# Code: Semgrep, CodeQL, or Snyk Code
semgrep --config auto ./generated/

# Text: Guardrails AI guards or custom validators

The Complete Stack

User Input → [Prompt Filter] → LLM → [Tool Call Interceptor] → Execution
                                  ↓
                           [Output Scanner] → Response

Each layer is independently valuable. But the combination is where the real security posture lives. A prompt injection that bypasses Lakera still has to generate a tool call that passes Shoofly's policy rules. A tool call that somehow slips through Shoofly (the rules didn't cover that specific pattern) still produces output that gets scanned. Defense in depth isn't just a buzzword here — it's the architecture.

Prompt filtering protects the input. Output scanning protects the code. Shoofly Advanced protects the execution. Complete the stack.

→ Get Shoofly Advanced

FAQ

Q: Do I need all three layers (prompt, execution, output) to be secure? Each layer is independently valuable, but the combination is significantly stronger than any single layer. If you can only deploy one, start with execution-layer interception (Shoofly) — it catches the highest-impact threats (destructive operations, data exfiltration, credential access) at the last point before damage occurs. Add prompt-level and output-level defenses as your security posture matures.

Q: How does an LLM firewall differ from a traditional web application firewall (WAF)? A WAF inspects HTTP requests and responses using signature-based rules and operates at the network layer. An LLM firewall operates at the AI application layer — inspecting prompts, model reasoning, tool calls, or outputs depending on its interception point. The threat model is fundamentally different: WAFs protect against attacks on your application's code, while LLM firewalls protect against attacks that flow through the model to reach your tools and infrastructure. They're complementary — a WAF protects the HTTP boundary, an LLM firewall protects the AI execution boundary.

Q: Can I use LlamaFirewall and Shoofly together? Yes, and this is the recommended architecture for high-security agentic deployments. LlamaFirewall's AlignmentCheck detects reasoning-level compromise — cases where the model has been manipulated into pursuing adversarial goals. Shoofly catches execution-level violations — specific tool calls that violate policy regardless of reasoning. Together, they provide defense in depth across both the reasoning and execution layers.

Q: What's the latency impact of stacking multiple LLM firewall layers? Minimal in practice. Prompt-level filtering (Lakera) adds ~5–15ms. Shoofly's rule evaluation adds <1ms. Output scanning varies by guard complexity but typically adds 10–50ms. Combined, total overhead is well under 100ms — negligible compared to LLM inference time (typically 1–30 seconds) and tool execution time. The security benefit far outweighs the latency cost.

Further reading: Why We Block Instead of Detect · Prompt Injection Blocking: Pre-Execution Security · Runtime Threat Detection for AI Agents · Agentic AI Security

Ready to secure your AI agents? Shoofly Advanced provides pre-execution policy enforcement for Claude Code and OpenClaw — 20 threat rules, YAML policy-as-code, 100% local. $5/mo.