This is a working checklist, not a reading exercise. Run it before any AI deployment ships, then re-run the relevant sections on every prompt change, model upgrade, and config edit. Each item carries a one-line justification so you can tell why it earns its place and when you might skip it.
A checklist is only useful if it is honest about stakes. Low-stakes internal tools do not need every item; customer-facing systems that touch money or irreversible actions need all of them. Use judgment, but err toward completeness as the consequences of failure rise. For the reasoning behind these items, our best practices guide is the companion to this list.
Before You Build: Specification
- [ ] Write the "must never" list. Ten or fewer concrete forbidden behaviors specific to this use case. Without a written spec, you cannot test against it.
- [ ] Define the real goal, not just a proxy. State the outcome you actually want, separate from any metric. Models game proxies; you need the goal to check against.
- [ ] Classify the stakes of each action. Mark which actions are reversible and which are not. Posture should scale with the cost of being wrong.
Inputs: Containing Untrusted Content
- [ ] Map every untrusted-input channel. Files, fetched pages, emails, third-party fields, cross-session history. You cannot defend what you have not located.
- [ ] Separate instructions from data structurally. Wrap untrusted content in labeled delimiters; tell the system prompt to treat it as data only. Concatenation is how prompt injection succeeds.
- [ ] Length-limit and validate inputs. Reject oversized or malformed input before it reaches the model. Edge-case inputs are where many failures start.
The injection risk these items address is the most underestimated one in AI products; our examples article shows it breaking a real-looking demo.
Outputs: Never Trust Raw Generation
- [ ] Validate output shape in code. Schema-check JSON, verify category membership, reject what does not conform. The model proposes; your code disposes.
- [ ] Sanitize anything that gets rendered. Treat displayed model output as untrusted before it hits a screen. Output can carry injection downstream.
- [ ] Ground factual claims where possible. Require citations against a known set and validate them. Confident fabrication is camouflaged by good formatting.
Actions: The Privilege Wall
- [ ] Route every consequential action through deterministic authorization. The model returns intent; code checks policy, ownership, and limits. This contains the damage when the prompt fails.
- [ ] Grant least privilege. The model touches only what it strictly needs. Excess capability turns a hallucination into an incident.
- [ ] Require human approval for irreversible or high-cost actions. Gate precisely, not everything. Reserve scrutiny for where the cost of error exceeds the cost of delay.
This is the highest-leverage section; the privilege wall is the control our case study credits with closing an incident class permanently.
Measurement: The Part Teams Skip
- [ ] Build an evaluation set with normal, edge, and attack cases. Start with twenty to fifty; grow from real incidents. You cannot manage behavior you do not measure.
- [ ] Track the false-refusal rate, not just the harm rate. Keep legitimate-but-sensitive requests in the set. Over-refusal destroys the product and is a real failure.
- [ ] Gate every change on the eval set. A score regression blocks the change. This is what catches silent breakage from prompt and model updates.
Operations: Keeping Safety Alive
- [ ] Log prompts, outputs, and tool calls for consequential paths. Respect data and retention policy. Logs are both your investigation tool and your next test cases.
- [ ] Red-team on a schedule. Monthly or quarterly, feed findings back into the eval set. Threats evolve; a system probed once is defended against last year.
- [ ] Review the "must never" list periodically. Update it as the product and risks change. A stale spec tests for stale risks.
For the standing process that keeps this checklist from becoming a one-time gate, see our framework.
A Worked Pass Through the Checklist
To show the checklist in motion, run it quickly against a simple deployment: a model that drafts replies to customer emails and can apply a small credit.
- Specify: Must-never list includes "no credit over $25" and "no internal notes." Real goal: resolve the customer's issue, not just close the thread. Actions classified, applying a credit is consequential, drafting is reversible.
- Inputs: The email body is the untrusted channel. It goes inside labeled delimiters, marked as data. Length-limited to reject a pasted novel.
- Outputs: Model returns JSON with a draft and an optional credit amount; code rejects anything that does not parse. The draft is sanitized before display.
- Actions: Credit is applied by code after checking the $25 cap and account eligibility, not by the model. Credits above $15 route to a human.
- Measurement: Eval set includes a "$500 credit" injection, a normal apology email, and an empty email, each with expected behavior. False-refusal cases included. Set gates every change.
- Operations: Drafts and credit decisions logged with a retention limit. Monthly red-team session. Must-never list reviewed each quarter.
That single pass took minutes and surfaced concrete decisions at every stage. That is the test of a working checklist: it forces decisions rather than nodding along.
Adapting the Checklist to Your Stakes
The checklist is a maximum, not a mandate. Trim it deliberately, never by neglect.
- Internal, low-stakes tool: Specification, Inputs, and Measurement. Skip most of Actions if nothing consequential can happen.
- Customer-facing, no real actions: Add Outputs in full and the logging item from Operations. Confident fabrication is now a reputational risk.
- Customer-facing with consequential actions: Every item, no exceptions. This is the configuration the checklist was designed for.
The wrong way to shorten the list is to drop items you find tedious; the right way is to drop items whose failure mode genuinely cannot occur in your system. If you are unsure whether a failure mode applies, assume it does, because the cost of an unnecessary control is minutes while the cost of a missing one is an incident.
One more adaptation note: as your system matures, the checklist should grow, not shrink. Every real incident and every red-team finding becomes a new line item or a new eval case. A checklist that looks identical a year after launch is a checklist nobody has been learning from. Treat it as a living document that accumulates the specific lessons your deployment has taught you, layered on top of these universal items.
How to Use This Checklist
Run the full list before launch. After launch, the Inputs, Outputs, and Measurement sections re-run on every change; Operations runs on its schedule; Specification gets reviewed when the product shifts. Treat an unchecked box on a high-stakes system as a blocker, not a note for later. Pair this with our framework for the reasoning that tells you which items a novel deployment needs.
Frequently Asked Questions
Can I skip sections for an internal tool?
Yes. For a low-stakes internal tool, the Specification, Inputs, and Measurement sections give you most of the protection. The Actions section matters only if the tool can take consequential actions. Scale the checklist to the stakes.
Which section should I never skip?
Measurement. Without an evaluation set, every other item becomes unverifiable, and you have no way to know whether a change made things better or worse. It is the cheapest section to start and the most expensive to lack.
How often should I actually re-run this?
Inputs, Outputs, and Measurement on every change that touches the model, prompt, or config. Operations on a fixed schedule. The full list before any launch. The point is that safety is maintained, not certified once.
What if I do not have time for all of it?
Do the privilege wall, instruction-data separation, and a small eval set first. Those three address the most dangerous and most common failures. The rest reduces risk further but is lower priority under time pressure.
Key Takeaways
- Write a concrete "must never" list before building; it is the spec everything tests against.
- Separate untrusted input from instructions and validate all output in code.
- Route consequential actions through a deterministic privilege wall with least privilege.
- An evaluation set that gates every change is the section teams skip and need most.
- Log, red-team on a schedule, and review the spec so safety stays a maintained property.