When people start working seriously with iterative refinement, the same questions surface again and again. How many times should I iterate? Can the model grade its own work? How do I keep a loop from making things worse? These are not beginner confusions; they are the practical questions that decide whether someone's loops actually pay off.
This piece is organized around those high-frequency questions and answers each one directly, grouped by theme. It is meant as a reference you can scan to find the specific thing you are wondering about, rather than a narrative you read front to back. Where a question deserves deeper treatment, it points to a companion piece that goes further.
The answers reflect how loops behave in practice, not how they are imagined to. The recurring theme across all of them is that good refinement depends less on the model and more on the standard you bring to it.
Questions About How Many Passes to Run
The first cluster of questions is about quantity: how much iteration is right.
How many passes does a typical task need?
Most tasks converge within two to four passes. A simple factual rewrite often needs one or two; a nuanced output with competing constraints might need four or five. If you routinely exceed five, the problem is usually a vague initial specification, not a difficult task. Tightening the spec cuts the pass count.
How do I know when to stop?
Define done before you start, as a written set of acceptance criteria. Stop when they all pass, even if you can imagine further tweaks. A practical signal: when a pass changes wording but resolves nothing on your criteria, the loop has converged and further iteration is noise. The metrics piece details convergence signals.
Questions About the Model's Role
A second cluster concerns what the model can and cannot be trusted to do inside the loop.
Can the model critique its own output?
It can assist, but do not trust it as the final judge. A model in the same context that produced a draft defends that draft. Give critique a fresh context and an explicit rubric, and treat its findings as input to your judgment, not a verdict. The advanced piece explains separating critic from author.
Should I use the same model for drafting and critiquing?
You can, as long as you separate the contexts. Some practitioners use a stronger model to critique and a faster one to draft, which helps when critique quality is the bottleneck. The key is that the critic should see the draft and the standard, not the conversation that produced the draft.
Questions About Keeping the Loop on Track
A third cluster is about loops that misbehave: drifting, oscillating, or degrading.
Why does my output get worse with more passes?
Almost always because your target is moving. You react to each draft with a new preference, so the loop never has a stable goal to converge on. Fix the specification, measure every draft against it, and resist redefining good mid-loop. The common mistakes piece covers this failure in detail.
Why does my loop flip between two versions?
Oscillation means you are issuing contradictory critiques on alternating passes. Freeze one of the two competing properties as a hard constraint, iterate the other until it is right, then revisit the frozen one. Removing the degree of freedom ends the flip.
What do I do when the model ignores a correction?
Repeating the instruction louder rarely works. Usually a competing instruction or a model default is overriding yours. Find the conflict and resolve it explicitly, or convert the stubborn requirement into a constraint the model checks against rather than a preference buried in prose.
Questions About When to Use Loops at All
A final cluster is about scope: when refinement is worth it.
Is iteration worth it for simple tasks?
Often not. A one-line answer or a simple reformat rarely benefits from a loop. Reserve iteration for outputs where quality is multidimensional and the cost of being mediocre is real. For routine work, one well-specified prompt is faster and good enough.
Does refinement only apply to writing?
No. The generate-critique-revise loop works for code, analysis, plans, designs, and structured data, anywhere output has multiple quality dimensions. Treating it as a writing-only trick hides most of its value. The framework generalizes the loop across work types.
Questions About Critique Quality
A cluster of questions concerns how to make the critique step actually find real problems.
Why does my critique keep saying everything is fine?
Usually because the critique has no rubric to check against, so it falls back on a vague overall impression and the model is generous by default. Give it explicit, named criteria and require a pass-or-fail with a reason for each. Specific criteria turn a generous shrug into an honest assessment that surfaces the gaps worth fixing.
How do I get more specific critiques?
Ask for them in terms of your criteria, not in general. A critique that must report on "does paragraph two include a concrete example" produces an actionable answer; one asked "is this good" produces a feeling. The specificity of the critique is downstream of the specificity of the rubric you give it.
Should the critique propose fixes or just find faults?
Both are useful, but keep them as distinct steps. First identify and rank the faults against the rubric, then decide which to address, then instruct the fix. Letting the model jump straight to sweeping fixes tends to introduce new problems; separating finding from fixing keeps each revision surgical and focused.
Frequently Asked Questions
What is the single most useful habit for good refinement?
Defining done before you draft. A written set of acceptance criteria turns refinement from open-ended polishing into a bounded process with a clear finish line. It tells you what to critique against, what to fix, and exactly when to stop, which prevents both premature shipping and endless tweaking.
How long should a refinement loop take?
It should be bounded by a pass budget you set against your deadline, not open-ended. Most tasks converge in two to four passes, so the loop should be minutes for short work and not more than a focused session for complex deliverables. If it sprawls, your specification is probably too vague.
Can I automate the whole loop?
Partly. You can automate generate-critique-revise with a fixed rubric and a stopping rule for high-volume, well-bounded tasks. But the automated loop inherits any weakness in the rubric, so the human work moves upstream into defining criteria precisely. Automation handles the mechanics; it does not replace the judgment.
What is the most common reason loops fail?
A moving target. Practitioners react to each draft with a new preference, so the loop never has a stable goal and never converges. The fix is to fix the specification, measure every draft against it, and refuse to redefine good in the middle of the loop.
Should beginners start with refinement or basic prompting?
Learn a basic single prompt first, then add the loop quickly. Refinement is where most of the quality comes from, so do not delay it long. A practical path is to write one prompt, see the draft, then practice critiquing and revising it against a written standard.
How do I refine without losing my own voice?
Refine against criteria for substance and structure, not for phrasing, and apply corrections in your own words. The loop should fix what is wrong, not homogenize how you sound. If output starts feeling generic, your rubric is over-specifying style; pull it back to substance.
Why does fixing one problem create another?
Because you are revising too broadly. Wholesale rewrites tend to undo passing work while fixing the failing part. The cure is surgical revision: instruct the model to correct only the specific item, leaving everything that already passes untouched, then re-run the critique to confirm nothing regressed. Narrow changes keep the loop converging instead of trading one fault for another.
Is it worth keeping a record of my refinement passes?
For high-stakes or recurring work, yes. A lightweight record of the rubric used and the major change per pass lets you explain how an output was reached and spot recurring faults worth codifying. For low-stakes one-off work, it is overhead you can skip. Match the record-keeping to the consequence of the output.
Key Takeaways
- Most tasks converge in two to four passes; needing more usually means the initial specification was too vague.
- Define done before you draft, and stop when a pass resolves nothing on your written criteria.
- The model is a useful critic but not a trustworthy judge; give critique a fresh context and keep the final call human.
- Worsening output and oscillation both trace to a moving target or contradictory critiques; stabilize the goal to fix them.
- Refinement is worth it for multidimensional, high-stakes outputs across writing, code, analysis, and design, not for trivial tasks.