A checklist earns its place when it prevents the mistakes you reliably make under pressure. Hypothesis generation with AI has a handful of those: thin context, premature focus, missing the boring cause, no test path. This checklist is built to catch each one before it costs you a session.
Use it as a working tool rather than reading material. Keep it open while you run a session, and tick items as you go. Each entry includes a brief justification so you understand why it matters and can adapt it to your own context rather than following it blindly.
Before You Prompt
Preparation determines most of your output quality. Do not skip to prompting.
Setup Checklist
- Write a one-paragraph problem statement. A precise statement is the foundation; everything downstream depends on it.
- Include real numbers and dates. Specifics let the model tailor hypotheses to your situation instead of producing generic ideas.
- List recent changes on your side. Launches, deploys, pricing moves, and campaigns are prime suspects and should be in front of the model.
- Note what you have already ruled out. This stops the model from spending candidates on dead ends.
- Define what a solved problem looks like. Knowing the goal keeps the session aimed at actionable hypotheses.
- State what you would do with each answer. If a confirmed hypothesis would not change any decision, it may not be worth generating; this keeps the session tied to action.
This preparation mirrors the framing step in A Sequential Process for Drafting Testable Ideas With AI. The few minutes it takes to assemble these items consistently outperforms a faster start, because every weakness in the setup compounds through the rest of the session.
While Generating
The generation pass is about breadth and diversity, not judgment.
Generation Checklist
- Ask for at least fifteen hypotheses. The non-obvious, useful ideas sit deep in the list, past the predictable first few.
- Request explanations across named categories. Categories like measurement, behavior, technical, and external prevent the model from repeating one theme.
- Explicitly invite uncomfortable hypotheses. Inviting bad-news explanations counters the bias that hides true causes implicating your own decisions.
- Include a null hypothesis. Considering that the effect is noise or an artifact guards against chasing a pattern that is not real.
- Do not evaluate yet. Judging during generation kills promising lines and biases toward the obvious.
The reasoning behind these items is laid out in Opinionated Habits That Make Hypothesis Prompts Pay Off.
While Refining
Once you have a wide list, sharpen it into something testable.
Refinement Checklist
- Rewrite each kept hypothesis with its mechanism. A causal chain, not a bare claim, is what makes a hypothesis testable.
- Attach a test method to every hypothesis. If you cannot name a way to check it, it is not yet actionable.
- Flag any hypothesis you cannot test. Either reframe it into something measurable or set it aside honestly.
- Strip duplicates. Near-identical hypotheses inflate the list without adding options.
While Prioritizing
You cannot test everything. Prioritization is where your judgment leads.
Prioritization Checklist
- Score each hypothesis on impact if true. High-impact hypotheses deserve attention even if they are less likely.
- Score each on cost to test. Cheap, fast checks let you learn quickly and eliminate options.
- Test cheap, decisive hypotheses first. Removing candidates for almost no cost narrows the field efficiently.
- Pick three to start. A short list keeps the investigation focused; you can always return to the rest.
This scoring approach is detailed in Weighing the Competing Ways to Prompt for Hypotheses.
After the Session
The work does not end when you have a shortlist. Capture it.
Closeout Checklist
- Log every hypothesis and its status. A record prevents you from regenerating ideas you already resolved.
- Note the evidence that moved each one. Preserving reasoning turns scattered sessions into institutional memory.
- Schedule the first test. A hypothesis with no test date tends to drift; commit to a check.
- Plan to regenerate after results. New evidence reshapes the hypothesis space, so the next session starts from what you learned.
Adapting the Checklist to Your Context
A checklist is only useful if it fits the work in front of you. The version above is the full, high-stakes form, and you should expect to trim it for everyday use rather than treat every item as mandatory.
Scaling by Stakes
The right amount of rigor scales with how costly a wrong conclusion would be. Use these rough tiers as a guide:
- Quick, low-stakes questions: Run the setup and generation items only. A precise problem statement and a breadth prompt with a null hypothesis are usually enough. Skip formal scoring and logging.
- Recurring operational problems: Add the refinement and prioritization items so you produce testable, ranked hypotheses. Keep a lightweight log because the same problems tend to recur.
- High-stakes investigations: Run every item deliberately and document each stage. When a wrong conclusion is expensive, the few minutes each item costs is trivial insurance.
The skill is not memorizing the list; it is knowing which items to keep when time is short. Over a few sessions you will develop an instinct for which checks catch your particular mistakes most often, and you can promote those to non-negotiable.
Turning the Checklist Into Team Habit
When more than one person runs hypothesis sessions, an informal checklist drifts. Different people skip different items, and the quality of sessions becomes uneven. Codifying the list, even as a shared document or a prompt template, makes the standard explicit. A team that agrees on the same setup and generation items produces comparable hypotheses and avoids the situation where one person's session is rigorous and another's is a single vague prompt. This shared standard is what makes the case-study style turnaround in How a Stalled Trial Funnel Got Diagnosed by AI Prompts repeatable rather than lucky.
Common Ways the Checklist Gets Misused
A checklist can fail even when followed, usually because it is treated as a box-ticking ritual rather than a thinking aid. Watch for a few patterns that drain its value.
Pitfalls to Avoid
- Ticking without engaging. Marking an item done because you technically did it, while producing a vague problem statement, defeats the purpose. The items are prompts to think, not formalities.
- Treating every item as mandatory. Forcing the full high-stakes list onto a trivial question wastes time and breeds resentment for the checklist itself. Scale it to the stakes.
- Never updating it. Your most common mistakes are personal. If you keep skipping the null hypothesis, promote it to a bold, non-negotiable item. A static checklist that ignores your actual failure pattern is less useful than one you tune.
- Using it only at the start. The closeout items, logging and scheduling the first test, are where many sessions quietly fail. Run the checklist through to the end, not just the setup.
The goal is a living tool that catches your real mistakes, not a compliance document. When it stops catching anything, revise it. The mistakes it is meant to prevent are catalogued in Seven Ways Hypothesis Prompts Quietly Go Wrong.
Frequently Asked Questions
Do I need to use every item every time?
No. For a quick, low-stakes question you might use only the setup and generation items. The full checklist is for problems that matter enough to investigate carefully. Adapt the depth to the stakes.
Why is logging hypotheses on the checklist?
Because without a log, teams regenerate and re-debate the same ideas across sessions, wasting effort and losing the reasoning behind past decisions. A simple log turns isolated sessions into a growing, searchable knowledge base.
What is the single most skipped item?
Including a null hypothesis. People are eager to explain a surprising result and forget to ask whether the result is even real. Skipping it leads to investigations built on noise or measurement artifacts.
How is this different from a generic brainstorming checklist?
It is built around the specific failure modes of AI-assisted hypothesis work: model overconfidence, repetition, bias toward obvious causes, and untestable ideas. Generic brainstorming checklists do not address those, because they assume a human source of ideas.
Can I turn this into a prompt template?
Yes, and many people do. You can encode the setup, generation, and refinement items into a reusable prompt structure. Just keep prioritization and logging as human steps, since those depend on your judgment and your records.
Key Takeaways
- Preparation, especially a specific problem statement with real numbers, drives most of the output quality.
- During generation, aim for breadth, force categories, invite uncomfortable ideas, and always include a null hypothesis.
- Refine by attaching a mechanism and a test method to every kept hypothesis.
- Prioritize by impact and cost to test, then start with three cheap, decisive checks.
- Close every session by logging hypotheses, recording evidence, and scheduling the first test.