LLM Firewall: Architecture, Comparison, and the Case for Pre-Execution

← Back to Blog

The term "LLM firewall" means different things to different vendors, and that ambiguity is a problem. If you're evaluating LLM firewalls for your stack, you need to understand what each product actually intercepts, where it sits in the request lifecycle, and what it can and cannot catch. A prompt filter and a tool call interceptor are both marketed as "LLM firewalls," but they protect against fundamentally different threats.

This is a comprehensive, honest comparison. We'll credit competitors fairly, flag where our information might be incomplete, and be clear about what Shoofly does and doesn't do. If a different tool is better for your specific threat model, we'll say so.

What LLM Firewalls Do

An LLM firewall inspects traffic at some point in the lifecycle of an LLM interaction and applies rules to allow, modify, or block that traffic. The concept borrows from network firewalls β€” packet inspection, rule matching, allow/deny decisions β€” but applied to the AI request/response pipeline instead of network packets.

The lifecycle has three interception points:

  1. Input (prompt). Before the user's message or system prompt reaches the model.
  2. Execution (tool calls). After the model decides to take an action but before that action executes.
  3. Output (response). After the model generates a response but before it reaches the user or triggers downstream actions.

Every LLM firewall operates at one or more of these points. The difference between products is which points they cover and how they evaluate what passes through.

Taxonomy of Approaches

Prompt Filtering

Intercepts and analyzes the input before it reaches the model. Looks for prompt injection attempts, jailbreaks, PII in prompts, off-topic inputs, and policy violations in user messages. The core question it answers: "Is this input safe to send to the model?"

Tool Call Interception

Intercepts the model's planned actions before they execute. Evaluates shell commands, file operations, API calls, MCP tool invocations, and other tool uses against a policy. The core question it answers: "Is this action safe to execute?"

Output Scanning

Intercepts the model's response before it's returned or acted upon. Checks for hallucinations, PII in outputs, harmful content, insecure code, and policy violations in generated text. The core question it answers: "Is this output safe to use?"

These aren't competing approaches β€” they're complementary layers. A prompt injection that bypasses the input filter might produce a dangerous tool call that gets caught at the execution layer. An output that looks clean might have been produced by executing a dangerous command that should have been blocked. Full coverage requires all three points, and no single product covers all of them equally well.

Prompt Filtering: Lakera, Guardrails AI

Lakera Guard is the most established prompt-level LLM firewall. [NEEDS SOURCE: verify current Lakera product name and capabilities β€” product has evolved rapidly] It operates as an API endpoint that sits between your application and the LLM. You send the prompt to Lakera first; it evaluates the prompt against its detection models and returns a risk score. If the score exceeds your threshold, you block the request.

Lakera's strengths:

Lakera's coverage:

Guardrails AI takes a different architectural approach β€” it's a framework rather than a hosted API. You define "guards" (validators) in Python that check inputs and outputs against your criteria. Guards can be LLM-based, regex-based, or custom functions.

Guardrails AI's strengths:

Guardrails AI's coverage:

Both Lakera and Guardrails AI are good at what they do. If your primary concern is prompt injection β€” attackers manipulating your chatbot via crafted inputs β€” Lakera is purpose-built for that. If you need flexible input/output validation with custom business logic, Guardrails AI gives you full control. Neither was designed to intercept tool calls at the execution layer, because their architecture sits in the prompt/response pipeline, not in the agent runtime.

Output Scanning: Guardrails AI, Custom Solutions

Output scanning catches problems after the model generates them but before they reach users or downstream systems.

Guardrails AI (again) is the most flexible option here. Output guards can check for:

Custom solutions are common at scale. Most teams with production LLM deployments have built some form of output post-processing β€” toxicity classifiers, regex-based PII scrubbers, schema validators. These range from simple regex checks to full secondary LLM calls that evaluate the primary model's output.

Output scanning is necessary for any production LLM deployment. It catches hallucinations, PII leakage, and content policy violations that input filtering can't predict. But it shares the same architectural limitation as prompt filtering: it operates on text, not on actions. If the model generates a tool call that deletes files or exfiltrates data, the output scanner sees the text of the response β€” which may or may not describe what the tool call actually does. The dangerous action executes before the output is generated.

Tool Call Interception: Shoofly, LlamaFirewall

This is where the architecture diverges most sharply from prompt/output firewalls.

Meta's LlamaFirewall is the most significant tool call–aware firewall from a major AI lab. [NEEDS SOURCE: verify current LlamaFirewall capabilities, architecture, and tool call interception scope β€” Meta's documentation has been updated since initial release] LlamaFirewall includes three core scanners:

LlamaFirewall's architecture is notable because AlignmentCheck operates on the model's reasoning, not just its inputs or outputs. This gives it partial tool call awareness β€” if the model has been hijacked via prompt injection and is now generating tool calls to serve the attacker, AlignmentCheck can potentially detect the misalignment in the reasoning chain.

However, LlamaFirewall's tool call interception is reasoning-based, not execution-based. It evaluates whether the model's intent appears compromised, not whether a specific shell command or file operation is dangerous. A tool call that passes AlignmentCheck (the model genuinely believes it's helpful) but violates a security policy (it reads credentials it shouldn't need) would not be caught by reasoning analysis alone.

LlamaFirewall's strengths:

LlamaFirewall's coverage:

Shoofly operates exclusively at the execution layer. Every tool call β€” shell commands, file operations, network requests, MCP tool invocations β€” is intercepted before execution and evaluated against policy rules defined in YAML.

Shoofly's strengths:

Shoofly's coverage:

Shoofly and LlamaFirewall are complementary, not competitive. LlamaFirewall catches reasoning-level compromise β€” when the model has been tricked into malicious intent. Shoofly catches execution-level violations β€” when any action, regardless of the reasoning behind it, violates a defined policy. The model can have perfectly aligned intent and still generate a tool call that reads credentials it doesn't need. The model can have compromised intent and still be blocked from executing anything dangerous.

Comparison Table

CapabilityLakera GuardLlamaFirewallGuardrails AIShoofly
Prompt injection detectionYes (primary)Yes (PromptGuard 2)Yes (custom guards)No
Reasoning analysisNoYes (AlignmentCheck)NoNo
Code scanningNoYes (CodeShield)Partial (output guards)No
Output validationNoNoYes (primary)No
Tool call interceptionNoPartial (reasoning-based)NoYes (primary)
Deterministic policy rulesThreshold-basedNoCustom (programmatic)Yes (YAML)
ArchitectureHosted APISelf-hostedSelf-hosted/frameworkHook in agent runtime
Latency~5-15msVaries by scannerVaries by guard<1ms (rule eval)
PricingTiered [NEEDS SOURCE: verify current pricing]FreeFree (framework) / EnterpriseFree (Basic) / $5/mo (Advanced)
Open/AuditableNoYesYesYes (rules are YAML)

Important caveats on this table:

When Each Approach Is Appropriate

Use Lakera when: Your primary threat is prompt injection in a chatbot or customer-facing LLM application. You need low-latency detection in a hosted API with minimal integration work. Your application doesn't use tool calls or agentic execution.

Use LlamaFirewall when: You're building an agentic system and want defense-in-depth across prompt, reasoning, and code layers. You have the infrastructure to self-host. You want reasoning-level compromise detection (AlignmentCheck) as part of your security stack.

Use Guardrails AI when: You need custom input/output validation with business-specific logic. You want full control over validation rules and execution. You're building a production LLM application with specific format, content, and safety requirements.

Use Shoofly when: Your agents execute tool calls β€” shell commands, file operations, network requests, MCP tool invocations β€” and you need deterministic policy enforcement on what those actions are allowed to do. You're running Claude Code or OpenClaw. You want execution-layer security that complements (not replaces) prompt-level and output-level defenses.

Use multiple when: You're serious about security. Prompt filtering protects the input. Output scanning protects the response. Execution-layer enforcement protects the actions. Each layer catches threats the others miss.

Implementation Guide

Here's how to assemble a practical LLM firewall stack for an agentic development workflow:

Layer 1: Prompt Defense

If your agents accept external inputs (user messages, task descriptions from Dispatch, webhook payloads), add prompt-level filtering.

# Option A: Lakera Guard (hosted)
# Add before your LLM API call
response = lakera.check(prompt=user_input)
if response.risk_score > threshold:
    block()

# Option B: LlamaFirewall PromptGuard 2 (self-hosted)
# Runs locally, no external API call
result = prompt_guard.analyze(user_input)
if result.is_injection:
    block()

Layer 2: Execution Defense

Install Shoofly to intercept tool calls before execution.

# Install Shoofly Advanced
curl -fsSL https://shoofly.dev/advanced | bash

# Default policies cover:
# - Destructive commands (rm -rf outside project root)
# - Credential access (~/.aws, ~/.ssh, .env)
# - Network egress (unknown domains)
# - Sensitive file modification (.bashrc, CI configs)

Customize policies in YAML:

# .shoofly/policy.yaml
rules:
  - name: block-credential-reads
    match:
      tool: file_read
      path: ["~/.aws/**", "~/.ssh/**", "**/.env"]
    action: block

  - name: allow-project-writes
    match:
      tool: file_write
      path: [".//**"]
    action: allow

  - name: block-external-egress
    match:
      tool: network_request
      domain: ["!*.your-company.com", "!github.com", "!npmjs.org"]
    action: block

Layer 3: Output Defense

For generated code, add SAST scanning. For generated text, add content validation.

# Code: Semgrep, CodeQL, or Snyk Code
semgrep --config auto ./generated/

# Text: Guardrails AI guards or custom validators

The Complete Stack

User Input β†’ [Prompt Filter] β†’ LLM β†’ [Tool Call Interceptor] β†’ Execution
                                  ↓
                           [Output Scanner] β†’ Response

Each layer is independently valuable. But the combination is where the real security posture lives. A prompt injection that bypasses Lakera still has to generate a tool call that passes Shoofly's policy rules. A tool call that somehow slips through Shoofly (the rules didn't cover that specific pattern) still produces output that gets scanned. Defense in depth isn't just a buzzword here β€” it's the architecture.

Prompt filtering protects the input. Output scanning protects the code. Shoofly Advanced protects the execution. Complete the stack.

β†’ Get Shoofly Advanced


FAQ

Q: Do I need all three layers (prompt, execution, output) to be secure? Each layer is independently valuable, but the combination is significantly stronger than any single layer. If you can only deploy one, start with execution-layer interception (Shoofly) β€” it catches the highest-impact threats (destructive operations, data exfiltration, credential access) at the last point before damage occurs. Add prompt-level and output-level defenses as your security posture matures.

Q: How does an LLM firewall differ from a traditional web application firewall (WAF)? A WAF inspects HTTP requests and responses using signature-based rules and operates at the network layer. An LLM firewall operates at the AI application layer β€” inspecting prompts, model reasoning, tool calls, or outputs depending on its interception point. The threat model is fundamentally different: WAFs protect against attacks on your application's code, while LLM firewalls protect against attacks that flow through the model to reach your tools and infrastructure. They're complementary β€” a WAF protects the HTTP boundary, an LLM firewall protects the AI execution boundary.

Q: Can I use LlamaFirewall and Shoofly together? Yes, and this is the recommended architecture for high-security agentic deployments. LlamaFirewall's AlignmentCheck detects reasoning-level compromise β€” cases where the model has been manipulated into pursuing adversarial goals. Shoofly catches execution-level violations β€” specific tool calls that violate policy regardless of reasoning. Together, they provide defense in depth across both the reasoning and execution layers.

Q: What's the latency impact of stacking multiple LLM firewall layers? Minimal in practice. Prompt-level filtering (Lakera) adds ~5–15ms. Shoofly's rule evaluation adds <1ms. Output scanning varies by guard complexity but typically adds 10–50ms. Combined, total overhead is well under 100ms β€” negligible compared to LLM inference time (typically 1–30 seconds) and tool execution time. The security benefit far outweighs the latency cost.


Further reading: Why We Block Instead of Detect Β· Prompt Injection Blocking: Pre-Execution Security Β· Runtime Threat Detection for AI Agents Β· Agentic AI Security


Ready to secure your AI agents? Shoofly Advanced provides pre-execution policy enforcement for Claude Code and OpenClaw β€” 20 threat rules, YAML policy-as-code, 100% local. $5/mo.