A checklist is only useful if every item earns its place and you can actually run it in a few minutes before shipping. The list below is built to do exactly that for instruction hierarchy: each item targets a specific, recurring way that priority conflicts break a prompt, and each carries a one-line justification so you can decide whether it applies to your case.
Treat this as a working tool rather than reading material. Pull up your assembled prompt, walk the items in order, and fix what you find. Most non-trivial prompts will fail at least two or three of these on the first pass, which is exactly the point.
The items are grouped by stage: structure, content, conflicts, and verification, plus a set for multi-turn and agent contexts. Work them in that order, because earlier fixes often resolve later items for free. A structural problem fixed early, like a missing precedence order, frequently makes several downstream content and conflict items pass without further effort, so resist the temptation to jump straight to testing.
One more framing note before you start. This checklist assumes you can see the prompt the model actually receives, with all variables, retrieved content, and examples rendered in. If you have only ever looked at the template, render the assembled prompt first. Nearly every conflict hides in the gap between the template you edit and the text the model reads, so skipping this step makes the rest of the checklist far less effective.
Structure Checks
These confirm the prompt is organized so that priority is even expressible.
Is There an Explicit Precedence Order?
Confirm the prompt states, near the top, which sources of instruction outrank which. Without it, the model guesses at every conflict. This single item prevents the largest class of failures described in Seven Ways Conflicting Instructions Quietly Break Your Prompts.
Are Hard Constraints Separated from Preferences?
Check that non-negotiable rules sit in a clearly labeled section, distinct from preferences. The label signals to the model that violating a constraint is failure, not a tradeoff.
Are Critical Rules at the Top and Restated at the End?
Verify your most important constraints anchor the start of the prompt and that the few that absolutely cannot break are repeated at the end. Position drives attention, and the middle is the weakest spot.
Content Checks
These confirm the actual instructions are written to express priority correctly.
Does Soft Language Match True Optionality?
Scan for "try to," "if possible," and "ideally." Confirm those phrases attach only to genuine preferences, never to rules you actually need enforced. Misplaced hedging tells the model a mandatory rule is optional.
Do Few-Shot Examples Agree with the Rules?
Read each example against your rules, including edge cases. Any example that contradicts a rule will likely win, because demonstrated behavior outranks stated behavior. This check often surfaces the cause of "stopped following instructions" bugs.
Is User and Retrieved Content Structurally Isolated?
Confirm untrusted input sits inside clear delimiters with a note that its contents are data, not instructions. Without isolation, injected text competes with your system rules on equal footing.
Conflict Checks
These hunt for the actual collisions.
Have You Listed Every Instruction and Checked the Pairs?
Enumerate the instructions and check pairs that could both apply to the same input but cannot both be satisfied. Each such pair is a latent conflict and a future test case. The method mirrors the walkthroughs in Walking Through Prompts Where Instructions Collide.
Are Likely Conflicts Pre-Resolved in the Prompt?
For each conflict you found, confirm the prompt states which side wins. "If brevity and completeness conflict, prioritize completeness" beats hoping the case never arises.
Have You Removed Rules That Only Patch Conflicts?
Check whether any rule exists solely to paper over a conflict that pre-resolution now handles. Pruning these shortens the prompt and removes new collision surface.
Verification Checks
These confirm the hierarchy actually holds under pressure.
Do You Have Conflict-Probing Test Cases?
Confirm you have inputs designed to pit one instruction against another, not just happy-path inputs. These are the only tests that exercise the hierarchy. Instrumenting them is covered in How to Measure Instruction Hierarchy and Priority Conflicts: Metrics That Matter.
Do You Run Each Conflict Test Multiple Times?
Verify you run conflict tests repeatedly, since priority failures are often intermittent. A rule that wins four times out of five is not actually winning. Tooling that automates this is surveyed in The Best Tools for Instruction Hierarchy and Priority Conflicts.
Have You Confirmed Hard Constraints Never Lose?
Run the subset of tests that exercise Tier 1 constraints and confirm the violation rate is effectively zero. Preferences can lose occasionally without much harm; a safety or policy constraint that loses even once is a release blocker. Treat this as a separate gate from the general win rate.
Multi-Turn and Agent Checks
These matter the moment your prompt is not a single request-and-response.
Do Critical Constraints Survive Across Turns?
In a conversation, confirm that a hard constraint stated in the system prompt still holds after many user turns, including turns that push against it. Constraints carried only in an early user message tend to fade; anchor them in the system prompt instead.
Is the Most-Recent-User-Instruction Rule Working?
Check that when a user gives a later instruction contradicting an earlier one, the later one wins, unless it violates a hard constraint. Test this with a conversation that reverses an earlier preference, such as switching from detailed to brief, and confirm the switch takes effect.
Do Constraints Carry Across Agent Steps?
If your prompt is part of a chain, verify that Tier 1 rules established early are restated at later steps rather than assumed to persist. A constraint set in a planning step but absent from a synthesis step will not reliably hold, a failure mode that grows as workflows become more agentic.
How to Use This Checklist in Practice
A checklist that lives in a document and never gets run is worthless. Wire it into your workflow so it actually fires.
Make It Part of Shipping
Treat the structure and conflict checks as a pre-ship gate the way you would treat code review. The cost is small once the structure is in place, and the failures it catches are exactly the intermittent ones that are most expensive to debug in production.
Keep a Living Conflict Inventory
Maintain the list of identified instruction pairs alongside the prompt itself, so each new rule prompts the question of what it might collide with. This turns the most time-consuming step, conflict enumeration, into an incremental update rather than a from-scratch audit each time. It is the same living-inventory discipline that kept the rewrite in our case study from drifting back into contradiction.
Frequently Asked Questions
How long should running this checklist take?
For a typical production prompt, fifteen to thirty minutes on the first pass and under ten on subsequent passes once the structure is in place. The conflict enumeration is the longest step.
Which items matter most if I am short on time?
The precedence-order item and the conflict-enumeration item. Together they address the largest share of real failures. Everything else refines what those two establish.
Should I rerun the whole checklist after every edit?
Not the whole thing. After a small edit, rerun the content and conflict checks for the section you touched plus the verification tests. After a large rewrite, run all four stages.
Why run conflict tests multiple times instead of once?
Because priority failures are frequently intermittent. A single pass can hide a rule that loses one time in five, which is exactly the kind of flaky failure that erodes trust in production.
Key Takeaways
- Run this checklist as a tool before shipping, not as background reading.
- Confirm an explicit precedence order exists before anything else.
- Separate hard constraints from preferences and anchor critical rules at both ends.
- Audit soft language, examples, and content isolation for hidden priority signals.
- Enumerate instructions, check every conflicting pair, and pre-resolve each in the prompt.
- Verify with conflict-probing tests run repeatedly, since priority failures are often intermittent.