Guardrail Types
Building robust guardrails around AI agents require a combination of precise, deterministic, "online" (always updated) rules and AI-evaluated, intent-based policies. Simple, one-off command blacklists are insufficient for the nuanced, context-aware risk management required for modern agentic workflows, so we created our own library of operations and matchers to cover the most common risks, while exposing a flexible purely AI-inference based "policy" interface for broader, more complex cases.
Deterministic Rules
These examples show what Stoplight's recommendation engine might suggest based on real agent telemetry. Each rule bundles multiple matchers that target the same business risk.
Rules determine whether to require approval or deny a tool call based on if any one of a set of deterministic conditions is satisfied. Because a single business intent can surface through many different tools, good rules expand into multiple execution paths so agents cannot trivially route around one blocked command. To help our system recommend robust rules, Stoplight's operation library packages common intents like file access, uploads, database changes, and destructive actions into reusable coverage matchers that serve as the preferred building block.
You can view the full list of matchers in the reference page.
A single blocked command is rarely enough
Agents can adapt unpredictably. If a narrow control only blocks one command prefix, the model can often pivot to a different CLI, a web tool, or an MCP integration that accomplishes the same result.
Naive rule
Blocking only `curl` is not a reliable way to stop network egress. An agent can reach for other tools immediately.
command_starts_with: "curl"
action: "deny"- ✕wget and other download clients
- ✕scp, rsync, or SSH-based uploads
- ✕Language runtimes like python -c or node -e
- ✕Provider-native web tools and MCP integrations
Broad rule
A reusable operation backed by multiple matchers covers the higher-level intent instead of one literal command prefix.
operation: "net.upload"
target: "api.example.com"
action: "ask"Intent-based Policies
Policies are natural-language guardrails evaluated by AI at runtime. Each policy carries a prompt that an LLM uses to decide whether an action should be allowed, escalated, or denied.
Policies are purely AI-evaluated guardrails made available to express higher-level intent, and decide intelligently whether to approve, ask, or deny a particular tool call at runtime, based on a short history of prior tool calls to grasp the agent's overall intent. This guardrail type is most useful when deterministic matching is too brittle, too narrow, or too expensive to maintain for every edge case.
To ensure policies do not bog down agent performance, particularly as they grow in number over time, policies can be scoped to trigger only on specific operations or events, leveraging a trigger library similar to matchers for rules. Policy execution also leverages fast inference to appear nearly invisible to end users.
Improved guardrail recommendations over time
Stoplight's AI learns from which guardrails you reject, and uses that feedback to make more targeted recommendations over time. This feedback loop helps the AI understand your agent's scope and organization's risk tolerance, ensuring noisy, repeated guardrail suggestions are minimized.
Stoplight selects the best guardrail type for you
The recommendation pipeline will automatically decide that a control belongs in a deterministic rule or in an AI policy. Over time, rules may be merged or resurfaced as policy recommendations as the system explores better ways to cover your agent's behavior.