Opinionated Rules for Refinement Loops That Converge

Generic advice about refinement loops is easy to find and useless in practice. "Iterate until it is good" tells you nothing about how to make the loop actually converge. The practices in this article are the opposite: specific, opinionated, and each paired with the reasoning that justifies it. They come from watching loops succeed and fail, and noticing what the successful ones consistently did.

You will not agree with every position here, and that is fine; opinionated practice invites argument. But each one earns its place by solving a real problem that vague advice ignores. Where a practice connects to a deeper treatment, the link is there, but the practices stand on their own.

If you want the failure-mode framing of these same ideas, Seven Ways Refinement Loops Quietly Go Off the Rails covers what goes wrong when you ignore them. This piece is about what to do.

Define the Standard Before You Touch the Model

The first and least negotiable practice: write down what good looks like before generating anything.

The reasoning

A loop converges toward a target. With no written target, every draft redefines the goal, and the loop circles forever. The standard is your finish line, and you cannot finish a race with no finish line.

How to do it well

Write three to five qualities you can check by pointing at the output.
Tie them to the specific task and reader, not to generic quality.
Keep them visible throughout the loop.

The same discipline underpins audience work, where the standard is reader-fitness; see Tailoring Prompts to Readers: Direct Answers to Real Questions.

Treat the First Draft as a Probe

Do not invest in a perfect first prompt. Generate something to react to.

The reasoning

You cannot predict everything good about an output in advance. The draft reveals what you could not anticipate. Reacting to a concrete draft is faster and more accurate than predicting, so getting a draft quickly beats perfecting the prompt slowly.

How to do it well

Write a plain, simple first request.
Generate and read without judgment of the prompt itself.
Save your specificity for the critique, where it has a target.

This probe-first mindset is the on-ramp taught in Refining Model Output by Looping: A Plain Introduction.

Critique to a Location and a Direction

Every critique should name where the problem is and which way to fix it.

The reasoning

The model can only act on what you give it. "Make it better" gives nothing; "tighten the second paragraph to two sentences" gives a location and a direction. Specific critique produces specific revision, which is the only kind that converges.

How to do it well

Point at a sentence or section.
Name which standard item it misses.
State the direction the fix should go.

Vague critique is one of the most common loop-killers, which is why precision here pays off so reliably.

Change Exactly One Thing Per Iteration

This is the practice people resist most and benefit from most.

The reasoning

Changing one thing keeps cause and effect legible. When the output improves, you know what helped; when it regresses, you can cleanly revert. Batching changes hides interactions and destroys your ability to learn from the loop.

How to do it well

Pick the single highest-value failure each iteration.
Instruct the change and explicitly preserve the rest.
Verify before moving to the next failure.

It feels slower and converges faster, a trade the step-by-step procedure builds in deliberately.

Use the Model as a Critic, Not a Judge

Let the model critique, but never hand it the verdict.

The reasoning

Model self-critique surfaces real issues you might miss, which is valuable. But models rationalize their own drafts and approve work a reader would reject. If the model both writes and judges, the loop converges on the model's taste, not the reader's need.

How to do it well

Ask for critique to generate candidate issues.
Filter those candidates against your human-held standard.
Keep the final pass/fail decision yours.

Steer, Never Re-Roll

When a draft disappoints, react to it rather than discarding it.

The reasoning

Re-rolling is random; it does not converge because it ignores what was wrong. Steering uses the specific gap to push the next draft in a known direction. A hundred re-rolls can leave you no closer; three steered revisions usually arrive.

How to do it well

Critique the disappointing draft instead of regenerating blindly.
Feed the specific gap back as a targeted instruction.
Reserve full regeneration for drafts too far off to react to.

Stop on the Standard, Not on Fatigue

End the loop deliberately, the moment the standard passes.

The reasoning

Past the standard, additional loops trade real time for gains nobody notices. The polish trap is real and expensive. A defined standard is also a defined stopping point, which is one more reason it has to come first.

How to do it well

Stop when every standard item passes.
Stop when revisions stop producing noticeable improvement.
Resist the urge to keep tweaking once you have arrived.

Keep a Running Pass/Fail Picture

Across a multi-step loop, track state explicitly.

The reasoning

Working memory leaks over many iterations. Without an explicit record, you re-introduce solved problems and lose track of what still fails. A visible pass/fail picture keeps the loop moving forward instead of circling.

How to do it well

After each revision, update which standard items pass.
Note any regression the change introduced.
Use the picture to choose the next target and to recognize when you are done.

Separate the Critic From the Author in Your Own Head

A subtler practice: when you critique your own loop, deliberately switch roles rather than staying in author mode.

The reasoning

The mindset that produces a draft is invested in it. That same mindset, asked to critique, tends to defend rather than scrutinize. Authors rationalize; critics interrogate. If you critique while still wearing the author hat, you will excuse problems a fresh reader would flag.

How to do it well

Before critiquing, consciously step out of the author role and read as the target reader would.
Critique against the standard, not against the effort you put into the draft.
Treat your own sunk cost as irrelevant to whether the output passes.

This is the human-side mirror of using the model as a critic but not a judge. Both are about keeping the evaluation honest by separating the thing that makes from the thing that judges.

Match the Loop's Intensity to the Stakes

Not every output deserves the same number of loops, and a good practitioner calibrates.

The reasoning

A throwaway internal note and a flagship published piece have different fitness bars. Running a flagship-grade loop on a throwaway wastes time; running a throwaway loop on a flagship ships something embarrassing. The practice is to set the standard's height to the stakes, then loop to that height.

How to do it well

For low-stakes work, write a short standard and accept the first draft that clears it.
For high-stakes work, write a fuller standard and expect more iterations.
Let the standard, not your mood or available time, decide how much polish is warranted.

Calibrating this way protects you from both under-investing in what matters and over-polishing what does not. It is the same stop-on-the-standard discipline applied at the level of how high to set the bar in the first place.

Frequently Asked Questions

If I only adopt one of these, which should it be?

Define the standard before you generate. It is the foundation the rest depend on: it makes critique specific, tells you when to stop, and gives you something to track. Adopting this one practice prevents the worst failures even if you ignore the others initially.

Is changing one thing at a time really worth the slowdown?

Yes, because the slowdown is an illusion. One change at a time converges faster overall because every step is legible and reversible. Batching changes feels fast but routinely sends loops circling when a regression appears and you cannot tell what caused it.

How do I balance the model's critique against my own judgment?

Use the model's critique to widen your list of candidate issues, then judge those candidates against your own standard. The model is good at spotting things you missed and bad at being impartial about its own work. Keep it as a source, not a judge.

What if my standard turns out to be wrong mid-loop?

Update it deliberately and note that you did. A standard can be refined as you learn, and that is healthy. The failure mode is letting it drift silently so every draft redefines the goal. Changing it on purpose is fine; letting it dissolve is not.

Do these practices work for non-writing tasks?

Yes. Any task where you can define checkable quality, react to a draft, and revise benefits from the same loop: standard first, probe, specific critique, one change, stop on the standard. Code, plans, and analyses all fit. The medium changes; the discipline does not.

How do I get a team to follow these consistently?

Standardize the standard format and the one-change rule, and make the running pass/fail picture a shared artifact. When the loop has the same shape for everyone, the practices become the default rather than something each person remembers individually. Consistency comes from shared structure, not willpower.

Key Takeaways

Define a concrete, checkable standard before generating; it is the finish line and the foundation.
Treat the first draft as a probe to react to, not a prompt to perfect.
Critique to a location and a direction, and change exactly one thing per iteration.
Use the model as a critic for candidate issues but keep the verdict and standard human.
Steer instead of re-rolling, stop on the standard, and track a running pass/fail picture.

If you want the failure-mode framing of these same ideas, Seven Ways Refinement Loops Quietly Go Off the Rails covers what goes wrong when you ignore them. This piece is about what to do.

Define the Standard Before You Touch the Model

The first and least negotiable practice: write down what good looks like before generating anything.

The reasoning

How to do it well

Write three to five qualities you can check by pointing at the output.
Tie them to the specific task and reader, not to generic quality.
Keep them visible throughout the loop.

The same discipline underpins audience work, where the standard is reader-fitness; see Tailoring Prompts to Readers: Direct Answers to Real Questions.

Treat the First Draft as a Probe

Do not invest in a perfect first prompt. Generate something to react to.

The reasoning

How to do it well

Write a plain, simple first request.
Generate and read without judgment of the prompt itself.
Save your specificity for the critique, where it has a target.

This probe-first mindset is the on-ramp taught in Refining Model Output by Looping: A Plain Introduction.

Critique to a Location and a Direction

Every critique should name where the problem is and which way to fix it.

The reasoning

How to do it well

Point at a sentence or section.
Name which standard item it misses.
State the direction the fix should go.

Vague critique is one of the most common loop-killers, which is why precision here pays off so reliably.

Change Exactly One Thing Per Iteration

This is the practice people resist most and benefit from most.

The reasoning

How to do it well

Pick the single highest-value failure each iteration.
Instruct the change and explicitly preserve the rest.
Verify before moving to the next failure.

It feels slower and converges faster, a trade the step-by-step procedure builds in deliberately.

Use the Model as a Critic, Not a Judge

Let the model critique, but never hand it the verdict.

The reasoning

How to do it well

Ask for critique to generate candidate issues.
Filter those candidates against your human-held standard.
Keep the final pass/fail decision yours.

Steer, Never Re-Roll

When a draft disappoints, react to it rather than discarding it.

The reasoning

How to do it well

Critique the disappointing draft instead of regenerating blindly.
Feed the specific gap back as a targeted instruction.
Reserve full regeneration for drafts too far off to react to.

Stop on the Standard, Not on Fatigue

End the loop deliberately, the moment the standard passes.

The reasoning

How to do it well

Stop when every standard item passes.
Stop when revisions stop producing noticeable improvement.
Resist the urge to keep tweaking once you have arrived.

Keep a Running Pass/Fail Picture

Across a multi-step loop, track state explicitly.

The reasoning

How to do it well

After each revision, update which standard items pass.
Note any regression the change introduced.
Use the picture to choose the next target and to recognize when you are done.

Separate the Critic From the Author in Your Own Head

A subtler practice: when you critique your own loop, deliberately switch roles rather than staying in author mode.

The reasoning

How to do it well

Before critiquing, consciously step out of the author role and read as the target reader would.
Critique against the standard, not against the effort you put into the draft.
Treat your own sunk cost as irrelevant to whether the output passes.

This is the human-side mirror of using the model as a critic but not a judge. Both are about keeping the evaluation honest by separating the thing that makes from the thing that judges.

Match the Loop's Intensity to the Stakes

Not every output deserves the same number of loops, and a good practitioner calibrates.

The reasoning

How to do it well

For low-stakes work, write a short standard and accept the first draft that clears it.
For high-stakes work, write a fuller standard and expect more iterations.
Let the standard, not your mood or available time, decide how much polish is warranted.

Frequently Asked Questions

If I only adopt one of these, which should it be?

Is changing one thing at a time really worth the slowdown?

How do I balance the model's critique against my own judgment?

What if my standard turns out to be wrong mid-loop?

Do these practices work for non-writing tasks?

How do I get a team to follow these consistently?

Key Takeaways

Define a concrete, checkable standard before generating; it is the finish line and the foundation.
Treat the first draft as a probe to react to, not a prompt to perfect.
Critique to a location and a direction, and change exactly one thing per iteration.
Use the model as a critic for candidate issues but keep the verdict and standard human.
Steer instead of re-rolling, stop on the standard, and track a running pass/fail picture.

Opinionated Rules for Refinement Loops That Converge

Define the Standard Before You Touch the Model

The reasoning

How to do it well

Treat the First Draft as a Probe

The reasoning

How to do it well

Critique to a Location and a Direction

The reasoning

How to do it well

Change Exactly One Thing Per Iteration

The reasoning

How to do it well

Use the Model as a Critic, Not a Judge

The reasoning

How to do it well

Steer, Never Re-Roll

The reasoning

How to do it well

Stop on the Standard, Not on Fatigue

The reasoning

How to do it well

Keep a Running Pass/Fail Picture

The reasoning

How to do it well

Separate the Critic From the Author in Your Own Head

The reasoning

How to do it well

Match the Loop's Intensity to the Stakes

The reasoning

How to do it well

Frequently Asked Questions

If I only adopt one of these, which should it be?

Is changing one thing at a time really worth the slowdown?

How do I balance the model's critique against my own judgment?

What if my standard turns out to be wrong mid-loop?

Do these practices work for non-writing tasks?

How do I get a team to follow these consistently?

Key Takeaways

Agency Script Editorial

Related Articles

Prompt Quality Decides Whether AI Earns Its Keep

Counting the Real Cost of Every Token You Send

Rolling Out AI Hallucinations Across a Team

Ready to certify your AI capability?

Opinionated Rules for Refinement Loops That Converge

Define the Standard Before You Touch the Model

The reasoning

How to do it well

Treat the First Draft as a Probe

The reasoning

How to do it well

Critique to a Location and a Direction

The reasoning

How to do it well

Change Exactly One Thing Per Iteration

The reasoning

How to do it well

Use the Model as a Critic, Not a Judge

The reasoning

How to do it well

Steer, Never Re-Roll

The reasoning

How to do it well

Stop on the Standard, Not on Fatigue

The reasoning

How to do it well

Keep a Running Pass/Fail Picture

The reasoning

How to do it well

Separate the Critic From the Author in Your Own Head

The reasoning

How to do it well

Match the Loop's Intensity to the Stakes

The reasoning

How to do it well

Frequently Asked Questions

If I only adopt one of these, which should it be?

Is changing one thing at a time really worth the slowdown?

How do I balance the model's critique against my own judgment?