AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Before You Write the PromptSetup checksWriting the Prompt ItselfStructure checksWhy these items belong togetherA Worked Pass Through the ListExample: choosing a logging libraryWhat the pass caughtAfter the Comparison Comes BackReview checksAdapting the List to StakesMatch rigor to consequenceKeep it visibleTurn it into a team defaultWhen to Add Items of Your OwnDomain-specific checksPrune what never firesFrequently Asked QuestionsWhich items matter most if I only do three?Why forbid a recommendation in the first pass?Is leaving cells blank really better than an estimate?How do I check that the model used my criteria?Can I skip verification for internal, low-stakes comparisons?Does this checklist work for comparisons without source documents?Key Takeaways
Home/Blog/A Working Pre-Flight List for AI Comparison Prompts
General

A Working Pre-Flight List for AI Comparison Prompts

A

Agency Script Editorial

Editorial Team

·December 19, 2021·7 min read
prompting for comparative analysis tasksprompting for comparative analysis tasks checklistprompting for comparative analysis tasks guideprompt engineering

A checklist earns its place only if you run it. That means it has to be short enough to use under time pressure and pointed enough that each item prevents a real failure rather than gesturing at good intentions. The list below is built for comparison prompts specifically, because they have a distinct failure surface that generic prompt checklists miss entirely.

Each item carries a one-line justification, so you know what it is defending against and can drop items that do not apply to a given comparison. Treat it as a pre-flight pass: before you send the prompt, walk the "before" section; before you act on the answer, walk the "after" section.

For the reasoning that motivates each item, this pairs naturally with Habits That Make AI Comparisons Hold Up Under Pressure. Here we keep it operational.

A note on how to read the list: the items are grouped by phase, not by importance, so do not assume the first item in each group is the most critical. Some items prevent catastrophic failures; others prevent slow erosion of quality. The justifications are there so you can make that judgment for your own situation rather than treating every line as equally mandatory.

Before You Write the Prompt

These items shape the request so the model has what it needs to reason instead of guess.

Setup checks

  • State the decision the comparison serves. A comparison with no decision behind it has no standard for "better."
  • List your criteria explicitly. Unstated criteria get invented, usually the popular ones rather than yours.
  • Rank the criteria. Priority order tells the model how to resolve trade-offs the way you would.
  • Confirm the options are actually comparable. Comparing a service to a library is a category error in disguise.
  • Note your real constraints. Budget, timeline, team size, and horizon change which option wins.

Writing the Prompt Itself

These items control how the model produces the comparison.

Structure checks

  • Supply symmetric information for each option. Uneven input tilts the verdict toward whichever option you described more.
  • Forbid a recommendation in the analysis pass. An early verdict anchors and biases the reasoning that follows.
  • Require evidence or an assumption per cell. A claim you cannot trace is a claim you cannot trust.
  • Instruct the model to leave unknowns blank. A visible gap is safer than an invisible fabrication.
  • Ask it to flag uncertain figures. Marked uncertainty tells you exactly what to verify.
  • Request the conditions under which each option wins. Most real comparisons are conditional, not absolute.

The "leave blanks, flag uncertainty" pairing is the single best defense against the fabricated-specifics failure described in Seven Ways Comparison Prompts Quietly Go Wrong.

Why these items belong together

The structure checks reinforce each other. Symmetric information without an evidence requirement still lets the model bluff equally on both sides. An evidence requirement without permission to leave blanks pressures the model to fabricate a source for every cell. Forbidding a recommendation without asking for conditions can produce a flat, lifeless table. The items work as a system; dropping one often undermines another, which is why the list is worth running as a set rather than cherry-picking.

A Worked Pass Through the List

It helps to see the checklist applied once, quickly, to a concrete choice.

Example: choosing a logging library

Suppose you are choosing between two logging libraries. The "before" pass produces: the decision is a long-lived backend service; the criteria, ranked, are performance overhead, structured-output support, and maintenance activity; the options are genuinely comparable; the constraint is high request volume. The "writing" pass tells the model to fill those three criteria for each library, attach a source or assumption per cell, leave blanks where it lacks data, flag uncertain numbers, and report conditions under which each wins—with no recommendation. The "after" pass verifies the overhead figures against benchmarks, confirms all three criteria were addressed, notices the verdict is genuinely conditional on volume, and runs a separate recommendation scoped to high throughput.

What the pass caught

Run honestly, this pass would have caught a fabricated benchmark number and a substituted criterion the model tried to introduce. Neither would have been visible in a single unstructured prompt. The checklist's value is precisely that it makes those silent defects surface where you can see them.

After the Comparison Comes Back

These items catch problems before they reach a decision.

Review checks

  • Verify every load-bearing number against a primary source. The model structures; you confirm the facts that move the decision.
  • Check that the criteria actually got addressed. Models sometimes substitute their own axes mid-comparison.
  • Look for suppressed nuance. If a verdict feels too clean, ask where the trade-offs went.
  • Separate inference from fact in the output. Label which conclusions rest on evidence and which are the model's guesses.
  • Run the recommendation pass separately. Feed the verified table back and scope the verdict to your conditions.

Adapting the List to Stakes

Not every comparison needs the full pass.

Match rigor to consequence

For a quick, reversible choice, the "state the decision," "rank criteria," and "ask for conditions" items carry most of the value. For a decision a committee will act on, run the whole list, including verification and the separate recommendation pass. The instinct to calibrate effort to stakes is the same one that governs Judging Comparison Quality With the Right Signals.

Keep it visible

A checklist filed away is a checklist unused. Paste the relevant section into your working doc next to the prompt so the items are in front of you when it counts.

Turn it into a team default

A personal checklist drifts; a team checklist compounds. When a comparison is something colleagues will read and act on, encode the relevant items into a shared prompt template so everyone runs the same pass. This is how the procurement team in How a Procurement Team Rebuilt Its Vendor Comparisons turned a personal habit into a process the whole buying committee trusted. The checklist stops being something one careful person does and becomes how the organization compares anything that matters.

When to Add Items of Your Own

The list is a floor, not a ceiling.

Domain-specific checks

Some fields carry risks this general list does not name. A regulatory comparison should add a "check each option against current compliance requirements" item; a security comparison should add an explicit threat-model axis. Extend the list where your domain has a characteristic way of going wrong, and write a one-line justification for each addition so it stays a tool rather than ritual.

Prune what never fires

If an item has never caught anything across many comparisons in your context, consider whether it applies to your work at all. A checklist that includes dead items trains people to skim past it. Keep it lean enough that every line still earns attention, and revisit it as your comparisons and tooling change.

Frequently Asked Questions

Which items matter most if I only do three?

State the decision, rank the criteria, and require evidence per cell. Those three address the largest sources of silent error: undefined "better," invented criteria, and unverifiable claims.

Why forbid a recommendation in the first pass?

Because an early verdict anchors the reasoning, turning analysis into advocacy. Separating the recommendation into its own pass keeps the evidence honest before you ask the model to conclude.

Is leaving cells blank really better than an estimate?

Yes. A blank tells you exactly where to verify; a plausible estimate hides the gap and invites you to trust a guess. Blanks convert uncertainty into a visible action item.

How do I check that the model used my criteria?

Read the output against your ranked list and confirm each axis appears and is weighted as you intended. Models sometimes drift into their own criteria mid-table, which silently changes the comparison.

Can I skip verification for internal, low-stakes comparisons?

Often, yes. Verification scales with consequence. For reversible internal choices, the structural items carry the load; reserve full verification for decisions that are expensive to undo.

Does this checklist work for comparisons without source documents?

Largely. The structural items—decision, criteria, conditions, blanks—apply regardless of input. Verification simply shifts to confirming the model's claims through your own research rather than against supplied sources.

Key Takeaways

  • A comparison checklist works only if it is short, justified, and actually run before you send.
  • Before writing: state the decision, list and rank criteria, confirm comparability, note constraints.
  • While writing: demand symmetric inputs, forbid an early verdict, require evidence, allow blanks, ask for conditions.
  • After: verify load-bearing numbers, confirm your criteria were used, and run the recommendation as a separate pass.
  • Calibrate the depth of the checklist to how consequential and reversible the decision is.
  • Keep the list visible beside your prompt; a filed checklist is an unused one.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification