Knowing that AI can help you generate hypotheses is one thing. Having a repeatable process you can run on a Tuesday afternoon is another. Most people who try this once get a mediocre result, conclude the technique does not work, and move on. The difference between that experience and a genuinely useful one is almost entirely about process.
This article gives you a sequential workflow. Each step builds on the last, and you can follow it right now on a problem you actually have. We will move from framing the problem, through prompting, to filtering the output into a short list of hypotheses worth testing.
Treat this as a recipe the first few times. Once the sequence feels natural, you will adapt it to your own style. But start by following it in order, because the order is what produces quality.
Step One: Frame the Problem Precisely
The single biggest determinant of output quality is how you state the problem. Vague problems produce vague hypotheses.
Write a Problem Statement
Before you open the model, write one or two sentences that capture:
- The observation: What happened that needs explaining. Include numbers and timeframes if you have them.
- What you already know: Factors you have ruled out or confirmed, so the model does not waste suggestions on them.
- What "solved" looks like: What you would do with a correct hypothesis once you had one.
Spending three minutes here saves you from a generic answer. If you skip this step, you will spend longer wrestating the problem to the model anyway.
Step Two: Prompt for Breadth First
Your first prompt should aim for quantity and range, not polish. You want the model to cast a wide net.
The Breadth Prompt
Give the model your problem statement, then ask explicitly for many distinct hypotheses. Tell it to include both obvious and non-obvious explanations, and to avoid repeating itself. Request a numbered list with one clear sentence per item.
Ask for more than you think you need, around fifteen. The early items will be predictable; the value often hides in items eight through fifteen, where the model has exhausted the obvious and starts reaching. This breadth-first habit pairs well with the structure described in The DIVET Model for Generating Hypotheses With AI.
Step Three: Force Diversity
A common failure is that the model produces fifteen variations of the same idea. Your third step is to break that pattern deliberately.
Prompt for Different Categories
Follow up with a prompt that asks the model to generate hypotheses across distinct categories. For a business problem, you might ask for explanations grouped by:
- User behavior and psychology
- Technical or product issues
- External market factors
- Internal process or operational causes
- Measurement or data artifacts
By naming categories, you force the model out of a single track. The "measurement artifact" category is especially valuable, since the simplest explanation for a surprising number is often that the number itself is wrong.
Step Four: Sharpen Each Candidate
Now you have a long, diverse list. Most items will still be too vague to test. Step four sharpens them.
Demand Specificity
Pick the eight to ten most promising hypotheses and ask the model to rewrite each as a precise, testable statement that names a clear cause and effect. A good follow-up prompt is to ask it to convert each into the form "X happens because Y, which I could test by Z."
This transformation is where loose ideas become real experiments. A hypothesis that comes with its own test method is far more useful than one that just asserts a cause. The discipline of demanding specificity also appears in Seven Ways Hypothesis Prompts Quietly Go Wrong.
Step Five: Rank and Select
You cannot test everything. The final step is prioritization, and here your judgment leads while the model assists.
Score Against Two Axes
Evaluate each hypothesis on two dimensions:
- Impact if true: How much would confirming this change your decisions or outcomes?
- Cost to test: How quickly and cheaply can you check it?
Favor hypotheses that are high impact and cheap to test. You can ask the model to help you estimate these, but make the final call yourself, because you know your context. Pick the top three to investigate first. The full prioritization logic is laid out in Weighing the Competing Ways to Prompt for Hypotheses.
Putting the Sequence Together
The whole flow takes fifteen to thirty minutes. Frame the problem, prompt for breadth, force diversity, sharpen candidates, then rank and select. Each step has a distinct job, and skipping one degrades the result in a predictable way.
The most common temptation is to jump straight to step five and evaluate the first idea the model offers. Resist it. The discipline of generating widely before judging is what separates a useful session from a frustrating one.
Adapting the Sequence to Your Problem
The five steps are a default, not a rigid script. Once you have run them a few times, you will adjust the emphasis to match the problem in front of you.
When to Lean Into Each Step
- When the problem is poorly understood, spend more time on step one. A murky problem statement poisons everything downstream, so getting it precise is worth extra minutes.
- When you keep getting the same obvious ideas, lean hard into steps two and three. The repetition means you are not pushing the model far enough into non-obvious territory.
- When you have many plausible ideas but cannot act, the bottleneck is step four. Your hypotheses are not yet testable, and sharpening them is the priority.
- When everything feels equally important, step five is where you are stuck. Scoring by impact and cost forces the prioritization you have been avoiding.
Knowing which step to emphasize comes from noticing where your sessions stall. The sequence is the same; the weighting shifts. This adaptive use of a fixed structure is the same principle behind The DIVET Model for Generating Hypotheses With AI.
Closing the Loop After You Test
The sequence does not truly end at step five. Once you test your top hypotheses, the results reshape the problem, and you start again with a sharper view.
This loop is what makes the process powerful over time. Suppose you test your three priorities and all three come back negative. That is not a failure; it is information. The negative results often point toward a category you underexplored, so you return to step one with an updated problem statement, run a fresh inventory, and the new list is better because it is informed by what you ruled out. Most real problems take one or two loops to resolve. Treating each loop as a refinement rather than a restart keeps the momentum, and the running record of what you tested, emphasized in Pre-Flight Items to Run Before a Hypothesis Session, is what makes each loop build on the last.
Frequently Asked Questions
How long should this whole process take?
For a focused problem, fifteen to thirty minutes. The framing step takes a few minutes, the prompting steps are quick back-and-forth exchanges, and the ranking step takes the most thought. As you get practiced, you will move faster.
Can I combine these steps into one big prompt?
You can, but you usually get better results keeping them separate. Breaking the work into stages lets you steer at each point and prevents the model from collapsing breadth and evaluation into a rushed list. Separate prompts give you more control.
What if the model keeps repeating similar hypotheses?
That is what step three solves. Explicitly ask for hypotheses across named categories. If repetition continues, point it out directly: tell the model the last list was too similar and ask for genuinely different mechanisms.
Do I need a paid AI model for this?
No. The workflow is about how you prompt, not which model you use. More capable models tend to produce sharper hypotheses, but the sequence works on free tiers too. Focus on your process first.
How do I know when to stop generating and start testing?
Stop when you have three to five specific, testable hypotheses that you could not easily rank further without real data. At that point, more generation has diminishing returns. The next learning comes from running the cheapest test, not from more brainstorming.
Key Takeaways
- Frame the problem precisely before prompting; vague input produces vague output.
- Prompt for breadth first, asking for more hypotheses than you think you need.
- Force diversity by requesting explanations across distinct named categories.
- Sharpen promising candidates into specific statements that include a test method.
- Rank by impact and cost to test, then investigate the top few yourself.