Most "best practices" lists are a wall of platitudes you already agree with and will never act on. This one is different on purpose. Every practice below comes with the reasoning that makes it worth the friction, because a practice you understand is a practice you will actually keep when the deadline pressure arrives.
The unifying principle behind everything here is one sentence: make the safe path the fast path. Every time the secure way to do something is slower or more annoying than the insecure way, people route around your safety controls. Good sandbox design is less about adding walls and more about making the walled route the one of least resistance.
These practices assume you already know what an AI sandbox is. If that is not yet solid, start with the complete guide, then come back.
Default to deny, then allowlist with intent
The strongest single habit you can adopt is denying by default and permitting on purpose.
Network access, tool access, data access, all of it should start closed. The agent gets nothing until you decide it needs something, and then you grant exactly that.
The reasoning: an allowlist fails safe and a denylist fails open. With a denylist, anything you forgot to block is permitted, and you will always forget something. With an allowlist, anything you forgot to permit is blocked, which is the failure direction you want. The cost is a little more setup; the payoff is that your mistakes are conservative rather than catastrophic.
Grant the minimum capability the task requires
Least privilege is not bureaucracy. It is blast-radius control.
Every capability you give an agent, file access, network reach, the ability to send or spend, is a fresh way for a mistake to escape the sandbox and do something. An agent that can only read files and run code cannot accidentally email a customer, because it was never given the means.
How to apply it
- Start the agent with no tools, then add only what the specific task needs.
- Scope each tool as narrowly as possible: read-only where reads suffice, a single directory rather than the whole filesystem.
- Re-examine permissions when the task changes, and remove ones it no longer needs.
The temptation is always to grant broad access to avoid fiddling. Resist it. The fiddling is the work.
Make environments ephemeral and cheap
A sandbox you reuse is a sandbox that accumulates state, and accumulated state contaminates experiments and occasionally leaves credentials lying around.
Treat every environment as disposable. Provision it from a script, use it, destroy it, recreate it clean. The reasoning is twofold: fresh environments make results reproducible, and they prevent the slow buildup of leftover files, cached secrets, and modified configs that quietly undermine isolation.
The practical key is speed. If creating a clean sandbox is slow, people reuse dirty ones no matter what the policy says. Invest in fast provisioning specifically so that the clean path beats the lazy path on convenience. This is the make-the-safe-path-the-fast-path principle in its purest form.
Instrument everything from the first run
Observability is not something you bolt on after an incident. It is the evidence layer that makes the whole sandbox trustworthy.
Log every prompt, every tool call, every command executed, and every output produced, starting from the very first run. Two payoffs justify the effort. When an agent behaves strangely, the log is how you understand why. And when someone asks whether the sandbox behaved correctly, the log is your proof.
A sandbox without observability is a black box that asks for trust it cannot demonstrate. Do not ship one.
Cap spend and rate before going unattended
Autonomous agents loop. A bug that causes infinite retries can quietly burn a large token bill overnight or hammer an API into the next pricing tier.
Set a hard token spend cap and an action rate limit, and treat them as a circuit breaker rather than a tuning parameter. They will not improve your results. They exist purely to convert a potential disaster into a logged, bounded failure. The reasoning is simple: the downside is unbounded and the upside of skipping the cap is nothing, so the trade is one-sided.
Test your walls like an adversary
The practice that separates real sandboxes from theatrical ones is adversarial verification.
Do not assume isolation holds. Actively try to break out. Instruct an agent inside to reach a forbidden external endpoint, access a production hostname, hit your cloud metadata endpoint, or persist a file past teardown. Each of those should fail. If one succeeds, you have found a hole while it is still cheap to fix.
Run these checks on every configuration change and on a recurring schedule regardless. Isolation erodes quietly as people add allowlist entries for convenience, and only adversarial testing catches that drift. For the specific procedure, see the step-by-step guide; for the failures these tests catch, see the common mistakes breakdown.
Match fidelity to the question you are asking
A subtle practice: do not reflexively make your sandbox as realistic as possible.
The closer a sandbox resembles production, the more useful its tests and the more dangerous a leak. Synthetic data is safe but sometimes too clean to surface real-world edge cases. The right move is to match fidelity to the specific question. Testing whether an agent's logic works? Clean synthetic data is fine. Testing whether it handles messy real-world inputs? Use masked production data with its realistic ugliness intact.
Higher fidelity is not better; it is a trade-off you make deliberately per experiment. The framework article gives you a structured way to decide.
Frequently Asked Questions
If I can only adopt one practice, which should it be?
Default-deny. Starting everything closed and permitting on purpose is the single habit that makes your mistakes conservative instead of catastrophic. It costs the most setup friction but it changes the failure direction of your entire system, which is worth more than any other single control.
Does least privilege slow teams down too much to be worth it?
It adds upfront configuration, but it pays back by making unsupervised runs safe, and unsupervised runs are where most surprises happen. The trick is tooling: make scoped permissions easy to grant so least privilege is not painful. Friction is a tooling problem, not a reason to abandon the practice.
How do I keep ephemerality from being annoying?
Invest in fast, scripted provisioning so a fresh sandbox is quicker to create than a dirty one is to clean. Ephemerality only becomes annoying when setup is slow. Solve the speed problem and disposability becomes the path of least resistance, which is exactly what you want.
Is full observability overkill for small experiments?
The small, casual experiments are precisely the ones nobody watches closely, which makes their logs the most valuable when something goes wrong. Observability is cheap to leave on and expensive to wish you had. Default it on everywhere rather than deciding per experiment.
Why test the walls if I built them carefully?
Because careful construction does not prevent later erosion. People add allowlist entries and broaden permissions for convenience, and isolation degrades without any error to warn you. Adversarial testing is the only practice that catches that quiet drift before it becomes a leak.
Key Takeaways
- The governing principle is to make the safe path the fast path, or people will route around your controls.
- Default to deny and allowlist with intent, because allowlists fail safe while denylists fail open.
- Grant the minimum capability each task needs; every extra permission is a fresh escape route for mistakes.
- Make environments ephemeral and cheap, instrument everything from the first run, and cap spend before going unattended.
- Test your walls adversarially on a schedule, and match sandbox fidelity to the specific question rather than maximizing realism.