A checklist is only useful if you can actually run it before launch and trust that clearing it means something. This one is built to be a gate: twelve concrete checks, each with a short justification, that a prompt should pass before it meets real users. None are aspirational. Each maps to a failure that regularly reaches production when skipped.
Use it literally. Copy the twelve items, run them against your prompt, and treat any unchecked box as a launch blocker until you have a deliberate reason to waive it. The justifications matter because a checklist you do not understand becomes ritual; a checklist you understand becomes judgment.
The checks are ordered roughly by sequence, from defining the target through to scheduling reruns. You can run the whole thing against a single prompt in an afternoon, and far faster on later passes once your attack inventory exists.
A note on how to hold this checklist. It is tempting to treat any list as a box-ticking ritual, racing to the bottom so you can declare yourself done. Resist that. The justifications under each item exist to keep the check meaningful, because a box you tick without understanding protects nothing. The goal is not a complete-looking list; it is a prompt you would be comfortable putting in front of a stranger who is actively trying to misuse it. Read each justification, decide whether it applies to your stakes, and only then mark the box.
Before You Test: Define and Scope
Check 1: Boundaries Are Written Down
Confirm you have a written statement of what the prompt must do and must never do, in specific terms. You cannot test a boundary you have not named, and "be helpful and safe" is not testable. This definition is the standard for every later check.
Check 2: Stakes Are Classified
Confirm you have noted what a failure would cost. A prompt that can move money or expose data warrants far more scrutiny than one suggesting blog titles. Stakes determine how hard you push, as we argue in Habits That Keep a Production Prompt From Caving In.
Core Attack Coverage
Check 3: Instruction Override Attempts Tested
Confirm you have tried to make the model abandon its rules ("ignore your instructions," role reassignment, "the policy is wrong"). Override is the most common attack, so a prompt that has not faced it has not been tested.
Check 4: Scope Probing Tested
Confirm you have pushed reasonable-sounding requests just outside the allowed job. Scope drift rarely looks like an attack; it looks like a slightly-too-broad question the prompt should have declined. The test for this check is whether your inputs include requests a well-meaning user might genuinely make, not just obviously hostile ones. If every scope attack looks like an attack, you have only tested the easy half of the problem and missed the dangerous half that arrives disguised as ordinary curiosity.
Check 5: Indirect Injection Tested
Confirm you have pasted content containing hidden instructions and verified the model treats it as data, not commands. Any prompt that ingests user-supplied documents or URLs is exposed to injection.
Check 6: Malformed Inputs Tested
Confirm you have sent empty, oversized, mixed-language, and nonsensical inputs. Boring malformed inputs cause real outages and happen constantly by accident, as shown in When Real Users Attack: Concrete Prompt-Breaking Scenarios. This check is easy to skip because malformed inputs feel too trivial to bother with, which is exactly why they slip through to production. A blank submission, a pasted spreadsheet, a message in an unexpected language: users generate these constantly without any intent to break anything. A prompt that handles clever attacks but stumbles on an empty box still fails its first ordinary day.
Check 7: Domain-Specific Attacks Tested
Confirm most of your attacks target your specific domain's expensive failures, not just generic ones. Generic attacks find generic problems; your costly failures live in your domain. A quick way to audit this check is to count how many of your attacks would make sense against a completely different product. If most of them would, your inventory is too generic. The attacks that only make sense against your specific prompt, the refund requests, the diagnosis bait, the cross-account probes, are the ones earning their place.
Evaluation and Fixes
Check 8: Outputs Judged Against Boundaries, Not Tone
Confirm each output was labeled pass or fail against the written boundaries, and that confident-sounding answers were verified rather than trusted. Fluency is not correctness.
Check 9: Failures Logged Reproducibly
Confirm every failure was recorded with the verbatim input, output, model, and settings. A failure you cannot reproduce is one you cannot fix or verify. The full procedure lives in Run Hostile Inputs at Your Prompts, One Step at a Time.
Check 10: Fixes Applied One at a Time
Confirm fixes were isolated and the full set rerun between them. Bundled fixes hide which edit helped and which broke a legitimate use case.
Before You Ship: Verify and Schedule
Check 11: Full Inventory Re-Run Clean
Confirm a final rerun of the entire attack inventory shows zero high-severity failures and only acceptable low-severity issues. The rerun, not the first pass, is what proves the prompt is ready.
Check 12: Reruns Scheduled on Real Triggers
Confirm the inventory is saved as a regression suite and scheduled to rerun on any prompt change, model upgrade, or new capability. These triggers are when safe behavior most often regresses, a risk weighed in Manual Red-Teaming or Automated Fuzzing: Choosing Your Approach.
Using the Checklist as a Living Gate
Wire It Into the Workflow
A checklist that lives in someone's memory gets skipped under deadline pressure. Put it where the work happens: as a section in the pull request that ships a prompt change, or as a required step in your release process. The cheapest way to guarantee the checklist gets run is to make running it the path of least resistance rather than an act of discipline.
Let the Inventory Carry the Weight Over Time
After the first pass, most of these checks collapse into "rerun the saved inventory and confirm it is clean." The early checks, defining boundaries and classifying stakes, only need a fresh look when the prompt's purpose genuinely changes. This is why the recurring cost stays low: the expensive thinking happens once, and the checklist mostly verifies that nothing has quietly regressed since.
Frequently Asked Questions
Can I skip checks for a low-stakes prompt?
You can scale down, but do so deliberately rather than by accident. For a genuinely trivial prompt, the malformed-input and override checks alone catch most problems. The point of check 2 is to make the scaling decision conscious and defensible.
What counts as a launch blocker versus a waivable item?
Any high-severity failure, an unfinished boundary definition, or a missing rerun should block launch. Low-severity tone issues can often be waived with a note. The written boundaries and stakes classification tell you which category a given gap falls into.
How long does running the full checklist take?
The first pass on a single prompt typically takes an afternoon. Later passes are much faster because the attack inventory already exists and only needs rerunning. The recurring cost is low, which is what makes scheduled reruns practical.
Do I need tools to run this checklist?
No. Every check can be performed by typing inputs into the same interface users see and reading outputs. Tools help automate the repetition once your inventory grows, but the checklist itself is tool-agnostic and can start manually today.
What if I cannot pass check 11 no matter what I change in the prompt?
That usually means the fix belongs outside the prompt, such as input filtering or access scoping. A persistently failing attack family is a signal to harden the surrounding system rather than to keep rewording, and it is a legitimate reason to escalate beyond the prompt layer.
Key Takeaways
- A checklist is only useful as a real launch gate, with unchecked items treated as blockers.
- Define written boundaries and classify stakes before running any attacks.
- Cover override, scope, injection, malformed, and domain-specific attacks every time.
- Judge outputs against boundaries, log failures reproducibly, and fix one change at a time.
- A clean rerun of the full inventory, plus scheduled reruns on real triggers, is what proves readiness.