Most teams build sandboxes by intuition, adding controls reactively as something goes wrong. That works until it does not, and the gaps it leaves are the ones nobody thought to check. A framework replaces intuition with a deliberate map, so you reason about every dimension of risk on purpose rather than discovering the missed one in production.
This article introduces CAGE, a reusable model for designing AI sandboxes across four dimensions: Containment, Access, Governance, and Ephemerality. The name is a useful mnemonic, since a sandbox is, in a sense, a cage for autonomous behavior, one built to let the agent move freely inside known limits.
CAGE is not a product or a checklist. It is a way of thinking that ensures you have made a conscious decision about each axis of risk. You can apply it to a five-minute experiment or an enterprise platform; only the stringency changes. For the practical step-by-step, pair this with our how-to guide.
C: Containment
Containment is the strength and completeness of your walls. It answers: if everything inside misbehaves, what can it reach?
The framework forces you to consider containment along three sub-axes simultaneously, because a gap in any one undoes the others.
- Execution containment. Where does the agent's code run? A disposable container, a microVM, or, dangerously, the host itself.
- Network containment. What can the agent reach outward? Default-deny with an allowlist is the strong posture; open networking is no containment at all.
- Filesystem containment. What can the agent read and write? Scoped to a working directory, ideally with the original mounted read-only.
When to dial containment up
Increase containment stringency when the code running inside is less trusted. AI-generated code from open-ended prompts demands the strongest boundary; reviewed internal code demands less. Match the wall to the trust, never reflexively maximize.
A: Access
Access governs what capabilities and data the agent is granted. Where containment limits what the agent can reach by force, access limits what it is allowed to use by design.
Two questions structure this dimension:
- Capability access. Which tools does the agent have? The principle is least privilege: grant only what the task requires, and scope each grant as narrowly as possible.
- Data access. What data does the agent see? Synthetic by default, masked when realism is needed, raw production data essentially never.
A subtle point CAGE surfaces: dangerous capabilities should often be mocked rather than granted. An agent that needs to "make a purchase" can be given a simulated payment tool that logs intent without executing it. You get to observe behavior without permitting consequence, a pattern our examples article illustrates in detail.
G: Governance
Governance is what makes the sandbox observable and accountable. A contained, access-limited sandbox that you cannot see into is still a black box, and black boxes cannot be trusted or improved.
Governance has two components.
Observability
Log every prompt, tool call, command, and output, starting from the first run. This is simultaneously your debugger, your audit trail, and your evidence that the sandbox behaved correctly. It cannot be added retroactively after an incident, so it must be designed in.
Limits
Spend caps and rate limits act as circuit breakers against the runaway loops that autonomous agents fall into. The reasoning is asymmetric: the downside of an uncapped agent is unbounded, while the cost of capping it is essentially nothing. Govern by default.
When to dial governance up
Tighten governance when runs are unattended, when the agent has any real-world capability, and when you operate under compliance obligations. A supervised five-minute experiment needs light governance; an overnight autonomous run needs heavy governance.
E: Ephemerality
Ephemerality is the discipline of disposability. The sandbox should be cheap to create, used briefly, destroyed, and recreated clean.
This dimension is easy to underrate because its benefits are quiet. Ephemeral environments deliver three things:
- Reproducibility, because every run starts from an identical clean state.
- Hygiene, because leftover files, cached credentials, and modified configs never accumulate to contaminate later runs.
- Safety, because a compromised sandbox is solved by deletion rather than cleanup.
The practical key is speed. If creating a clean sandbox is slow, people reuse dirty ones and ephemerality dies in practice no matter what the policy says. Investing in fast provisioning is what makes this dimension real rather than aspirational.
Applying CAGE end to end
The power of the framework is in running all four dimensions for a given use case before you build, then setting each dial to match the risk.
Consider an autonomous overnight agent handling untrusted inputs. CAGE tells you: maximize Containment (microVM, default-deny, scoped filesystem), tighten Access (least privilege, mocked dangerous tools, masked data), heavy Governance (full logging, hard caps), and strict Ephemerality (fresh per run). Each setting follows from the dimension, so nothing gets forgotten.
Now consider a supervised five-minute prompt experiment. The same framework relaxes every dial: a container suffices, broader tool access is fine under supervision, light logging is acceptable, and reuse is tolerable for the session. Same four questions, different answers, no gaps.
That is the whole point. CAGE does not tell you the answers; it guarantees you asked all four questions. For the failures that happen when one dimension is skipped, see our common mistakes guide, and for the broader picture, the complete guide.
Why the four dimensions are independent
A natural objection to any framework is that its categories overlap, that you are really describing one thing four ways. CAGE survives this test because each dimension can fail while the others hold, which is precisely what makes covering all four necessary.
Consider the failure cases. You can have perfect Containment and still expose real data through a lax Access decision, the walls hold but the wrong thing is inside them. You can have tight Access and weak Governance, the agent is well-scoped but you have no record of what it did. You can have strong Governance and poor Ephemerality, you log everything but stale state contaminates every run. Each dimension guards against a failure the others cannot catch.
This independence is the framework's justification. If the dimensions truly overlapped, you could cover the risk by maxing one of them. Because they are independent, the only way to be safe is to consciously address each. CAGE earns its place by mapping to four genuinely distinct ways a sandbox fails.
Using CAGE as a shared language
Beyond design, the framework's quiet benefit is communication. When a sandbox conversation has a shared vocabulary, reviews get faster and gaps get named instead of missed.
- In design reviews, a reviewer can ask "what's your governance story?" and everyone knows exactly what is being questioned. Vague worries become specific ones.
- In incident retrospectives, you can categorize a failure by dimension, "this was an access failure, the agent had a tool it shouldn't have", which points directly at the fix.
- Across teams, CAGE lets a sandbox built by one group be evaluated by another against a common standard, rather than re-litigated from scratch.
A framework that only structures the builder's thinking is useful. One that also structures the conversation around the build is far more valuable, because most sandbox failures are failures of communication and assumption as much as of engineering.
Frequently Asked Questions
How is CAGE different from just following a checklist?
A checklist tells you what to verify; CAGE tells you how to reason. The framework's value is forcing a conscious decision on all four risk dimensions before you build, so you do not discover the skipped one in production. A checklist is the verification; CAGE is the design thinking that precedes it.
Do I have to maximize all four dimensions every time?
No, and that is the point. CAGE asks you to set each dial to match the risk, not to crank everything to maximum. A supervised experiment relaxes every dimension; an unattended agent on untrusted inputs tightens them all. The framework ensures you decide deliberately rather than by default.
Which dimension do teams most often neglect?
Governance, specifically observability, because its absence causes no visible problem until the first incident you cannot diagnose. Teams build strong containment and access, run successfully, and never notice they have no logs until something goes wrong and there is nothing to read. Design governance in from the start.
Can CAGE apply to a no-code sandbox for non-engineers?
Yes. The dimensions are conceptual, not technical. A no-code playground still has containment (the platform's isolation), access (what data and actions are exposed), governance (whether usage is logged and capped), and ephemerality (whether sessions reset). The framework applies regardless of who is at the keyboard.
How often should I re-run the CAGE analysis?
Re-run it whenever the use case changes, when an agent gains a new capability, when inputs become less trusted, or when a run moves from supervised to unattended. Each of those shifts the appropriate dial settings. The framework is cheap to re-apply and expensive to skip when conditions change.
Key Takeaways
- CAGE is a reusable model spanning four dimensions: Containment, Access, Governance, and Ephemerality.
- Containment is the strength of your walls across execution, network, and filesystem; match it to how much you trust the code inside.
- Access governs capabilities and data via least privilege, masked data, and mocking dangerous tools rather than granting them.
- Governance makes the sandbox observable and bounded through full logging and circuit-breaker limits, designed in from the start.
- The framework's purpose is to guarantee you consciously set each dial to match the risk, so no dimension gets silently skipped.