A checklist earns its place only if you run it. That means it has to be short enough to use under time pressure and pointed enough that each item prevents a real failure rather than gesturing at good intentions. The list below is built for comparison prompts specifically, because they have a distinct failure surface that generic prompt checklists miss entirely.
Each item carries a one-line justification, so you know what it is defending against and can drop items that do not apply to a given comparison. Treat it as a pre-flight pass: before you send the prompt, walk the "before" section; before you act on the answer, walk the "after" section.
For the reasoning that motivates each item, this pairs naturally with Habits That Make AI Comparisons Hold Up Under Pressure. Here we keep it operational.
A note on how to read the list: the items are grouped by phase, not by importance, so do not assume the first item in each group is the most critical. Some items prevent catastrophic failures; others prevent slow erosion of quality. The justifications are there so you can make that judgment for your own situation rather than treating every line as equally mandatory.
Before You Write the Prompt
These items shape the request so the model has what it needs to reason instead of guess.
Setup checks
- State the decision the comparison serves. A comparison with no decision behind it has no standard for "better."
- List your criteria explicitly. Unstated criteria get invented, usually the popular ones rather than yours.
- Rank the criteria. Priority order tells the model how to resolve trade-offs the way you would.
- Confirm the options are actually comparable. Comparing a service to a library is a category error in disguise.
- Note your real constraints. Budget, timeline, team size, and horizon change which option wins.
Writing the Prompt Itself
These items control how the model produces the comparison.
Structure checks
- Supply symmetric information for each option. Uneven input tilts the verdict toward whichever option you described more.
- Forbid a recommendation in the analysis pass. An early verdict anchors and biases the reasoning that follows.
- Require evidence or an assumption per cell. A claim you cannot trace is a claim you cannot trust.
- Instruct the model to leave unknowns blank. A visible gap is safer than an invisible fabrication.
- Ask it to flag uncertain figures. Marked uncertainty tells you exactly what to verify.
- Request the conditions under which each option wins. Most real comparisons are conditional, not absolute.
The "leave blanks, flag uncertainty" pairing is the single best defense against the fabricated-specifics failure described in Seven Ways Comparison Prompts Quietly Go Wrong.
Why these items belong together
The structure checks reinforce each other. Symmetric information without an evidence requirement still lets the model bluff equally on both sides. An evidence requirement without permission to leave blanks pressures the model to fabricate a source for every cell. Forbidding a recommendation without asking for conditions can produce a flat, lifeless table. The items work as a system; dropping one often undermines another, which is why the list is worth running as a set rather than cherry-picking.
A Worked Pass Through the List
It helps to see the checklist applied once, quickly, to a concrete choice.
Example: choosing a logging library
Suppose you are choosing between two logging libraries. The "before" pass produces: the decision is a long-lived backend service; the criteria, ranked, are performance overhead, structured-output support, and maintenance activity; the options are genuinely comparable; the constraint is high request volume. The "writing" pass tells the model to fill those three criteria for each library, attach a source or assumption per cell, leave blanks where it lacks data, flag uncertain numbers, and report conditions under which each wins—with no recommendation. The "after" pass verifies the overhead figures against benchmarks, confirms all three criteria were addressed, notices the verdict is genuinely conditional on volume, and runs a separate recommendation scoped to high throughput.
What the pass caught
Run honestly, this pass would have caught a fabricated benchmark number and a substituted criterion the model tried to introduce. Neither would have been visible in a single unstructured prompt. The checklist's value is precisely that it makes those silent defects surface where you can see them.
After the Comparison Comes Back
These items catch problems before they reach a decision.
Review checks
- Verify every load-bearing number against a primary source. The model structures; you confirm the facts that move the decision.
- Check that the criteria actually got addressed. Models sometimes substitute their own axes mid-comparison.
- Look for suppressed nuance. If a verdict feels too clean, ask where the trade-offs went.
- Separate inference from fact in the output. Label which conclusions rest on evidence and which are the model's guesses.
- Run the recommendation pass separately. Feed the verified table back and scope the verdict to your conditions.
Adapting the List to Stakes
Not every comparison needs the full pass.
Match rigor to consequence
For a quick, reversible choice, the "state the decision," "rank criteria," and "ask for conditions" items carry most of the value. For a decision a committee will act on, run the whole list, including verification and the separate recommendation pass. The instinct to calibrate effort to stakes is the same one that governs Judging Comparison Quality With the Right Signals.
Keep it visible
A checklist filed away is a checklist unused. Paste the relevant section into your working doc next to the prompt so the items are in front of you when it counts.
Turn it into a team default
A personal checklist drifts; a team checklist compounds. When a comparison is something colleagues will read and act on, encode the relevant items into a shared prompt template so everyone runs the same pass. This is how the procurement team in How a Procurement Team Rebuilt Its Vendor Comparisons turned a personal habit into a process the whole buying committee trusted. The checklist stops being something one careful person does and becomes how the organization compares anything that matters.
When to Add Items of Your Own
The list is a floor, not a ceiling.
Domain-specific checks
Some fields carry risks this general list does not name. A regulatory comparison should add a "check each option against current compliance requirements" item; a security comparison should add an explicit threat-model axis. Extend the list where your domain has a characteristic way of going wrong, and write a one-line justification for each addition so it stays a tool rather than ritual.
Prune what never fires
If an item has never caught anything across many comparisons in your context, consider whether it applies to your work at all. A checklist that includes dead items trains people to skim past it. Keep it lean enough that every line still earns attention, and revisit it as your comparisons and tooling change.
Frequently Asked Questions
Which items matter most if I only do three?
State the decision, rank the criteria, and require evidence per cell. Those three address the largest sources of silent error: undefined "better," invented criteria, and unverifiable claims.
Why forbid a recommendation in the first pass?
Because an early verdict anchors the reasoning, turning analysis into advocacy. Separating the recommendation into its own pass keeps the evidence honest before you ask the model to conclude.
Is leaving cells blank really better than an estimate?
Yes. A blank tells you exactly where to verify; a plausible estimate hides the gap and invites you to trust a guess. Blanks convert uncertainty into a visible action item.
How do I check that the model used my criteria?
Read the output against your ranked list and confirm each axis appears and is weighted as you intended. Models sometimes drift into their own criteria mid-table, which silently changes the comparison.
Can I skip verification for internal, low-stakes comparisons?
Often, yes. Verification scales with consequence. For reversible internal choices, the structural items carry the load; reserve full verification for decisions that are expensive to undo.
Does this checklist work for comparisons without source documents?
Largely. The structural items—decision, criteria, conditions, blanks—apply regardless of input. Verification simply shifts to confirming the model's claims through your own research rather than against supplied sources.
Key Takeaways
- A comparison checklist works only if it is short, justified, and actually run before you send.
- Before writing: state the decision, list and rank criteria, confirm comparability, note constraints.
- While writing: demand symmetric inputs, forbid an early verdict, require evidence, allow blanks, ask for conditions.
- After: verify load-bearing numbers, confirm your criteria were used, and run the recommendation as a separate pass.
- Calibrate the depth of the checklist to how consequential and reversible the decision is.
- Keep the list visible beside your prompt; a filed checklist is an unused one.