AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Mistake One: Using Strawman NegativesWhy it happensThe cost and the fixMistake Two: Pairs That Differ in Too Many WaysWhy it happensThe cost and the fixMistake Three: OvercorrectionWhy it happensThe cost and the fixMistake Four: Validating Only on the ExamplesWhy it happensThe cost and the fixMistake Five: Resolving One Ambiguity While Creating AnotherWhy it happensThe cost and the fixMistake Six: Piling On Too Many ContrastsWhy it happensThe cost and the fixMistake Seven: Skipping the Reason for the DistinctionWhy it happensThe cost and the fixA Pattern Behind All SevenThe common threadThe general guardHow to Catch These Before They ShipA pre-ship checklistWhy a routine beats vigilanceFrequently Asked QuestionsWhat is the single most damaging mistake?How do I know if I have overcorrected?Why does validating on my own examples not count?Should I really limit myself to one contrastive pair?How can a fix create a new ambiguity?Is adding a reason for the distinction really necessary?Key Takeaways
Home/Blog/Seven Ways a Good Contrast Teaches the Wrong Lesson
General

Seven Ways a Good Contrast Teaches the Wrong Lesson

A

Agency Script Editorial

Editorial Team

·May 3, 2020·8 min read
contrastive prompting for disambiguationcontrastive prompting for disambiguation common mistakescontrastive prompting for disambiguation guideprompt engineering

Contrastive prompting is powerful precisely because it can teach a model a sharp distinction from just one or two examples. That same leverage is what makes it dangerous: a poorly constructed contrast teaches the wrong distinction just as efficiently as a good one teaches the right one. The model does exactly what your examples imply, and if your examples imply something you did not mean, you get a confidently wrong behavior that is hard to trace back to its cause.

This article catalogs the seven failure modes we see most often, in roughly the order people stumble into them. For each, it names why the mistake happens, what it costs you in practice, and the specific corrective practice that prevents it. These are not abstract warnings; each one corresponds to a real way contrastive prompts quietly degrade.

The unifying lesson is that a contrast is a precise instrument, and precision instruments punish carelessness. The fixes are not complicated, but they require discipline about what your examples actually communicate versus what you intended them to communicate.

It helps to read these failure modes as a progression. The first few are mistakes of construction, where the pair itself is built wrong. The middle ones are mistakes of validation, where the pair might be fine but you never confirmed it. The last ones are mistakes of communication, where the pair works but fails to convey the principle you meant. Recognizing which category a problem falls into tells you which fix to reach for.

Mistake One: Using Strawman Negatives

The most common error is a negative no reasonable interpretation would produce.

Why it happens

It is easy to write an obviously-terrible negative because it feels clearly wrong. But the model was never going to produce that output, so contrasting against it teaches nothing about the real boundary.

The cost and the fix

  • Cost: the contrast occupies space and attention while changing nothing.
  • Fix: source negatives from the model's actual wrong outputs, so each negative targets a real tendency.

Mistake Two: Pairs That Differ in Too Many Ways

A muddled pair teaches a muddled lesson.

Why it happens

When you write the positive and negative independently, they naturally diverge in length, tone, structure, and content all at once. The model cannot tell which difference you care about.

The cost and the fix

  • Cost: the model learns an arbitrary blend of differences, often the wrong one.
  • Fix: build the pair on the same input and strip every difference except the single dimension you are teaching, the discipline behind Running a Complex Task Through One Sub-Prompt at a Time.

Mistake Three: Overcorrection

Avoiding the negative becomes its own distortion.

Why it happens

A strong contrast pushes the model hard away from the negative. If the negative was "too verbose," the model may swing to "uselessly terse," overshooting the target.

The cost and the fix

  • Cost: you trade one wrong output for the opposite wrong output.
  • Fix: choose a negative that is wrong but not extreme, and test for overshoot on held-out inputs.

Mistake Four: Validating Only on the Examples

Testing on the examples you provided proves nothing.

Why it happens

It feels like success when the model reproduces your desired example. But it had the answer in front of it; reproducing it shows memorization, not understanding.

The cost and the fix

  • Cost: a prompt that works in the demo and fails on real inputs.
  • Fix: always validate on fresh inputs near the boundary, the same mindset as Chain of Thought Is Powerful and Constantly Misused.

Mistake Five: Resolving One Ambiguity While Creating Another

Fixes can have side effects.

Why it happens

Drawing a sharp boundary on one dimension can inadvertently constrain another. Teaching brevity might accidentally teach a particular tone you did not intend.

The cost and the fix

  • Cost: you fix the reported problem and introduce a new, unreported one.
  • Fix: after validating the target distinction, check that outputs are still correct on dimensions you did not mean to touch.

Mistake Six: Piling On Too Many Contrasts

More pairs are not more clarity.

Why it happens

When one pair does not fully work, the instinct is to add more. But multiple contrasts can imply conflicting distinctions, and the model has to reconcile them, often badly.

The cost and the fix

  • Cost: contradictory pairs produce erratic, hard-to-debug behavior.
  • Fix: prefer one well-isolated pair; add a second only for a genuinely distinct ambiguity, and confirm the two do not conflict.

Mistake Seven: Skipping the Reason for the Distinction

A labeled pair without rationale teaches less than it could.

Why it happens

It feels sufficient to mark one example desired and one undesired. But without a brief reason, the model may infer a superficial difference rather than the principle you intended.

The cost and the fix

  • Cost: the model latches onto an incidental feature of the examples.
  • Fix: add a one-line reason naming the principle, so the model generalizes the right rule rather than a surface pattern.

A Pattern Behind All Seven

These mistakes share a single root, worth naming on its own.

The common thread

Every failure above comes from a gap between what your examples imply and what you intended. The model is a faithful reader of implication; it does not infer your intent, only the pattern your examples present. Strawman negatives imply nothing useful, muddled pairs imply the wrong dimension, and missing rationale implies a surface feature.

The general guard

  • Before shipping, ask what a careful but literal reader would conclude from your examples alone.
  • If that conclusion differs from your intent, the gap is your defect to close.
  • Treat the pair as communication to a literal-minded reader, because that is exactly what the model is.

How to Catch These Before They Ship

A short review routine prevents most of them.

A pre-ship checklist

  • Is the negative a real failure the model produced, not an invention?
  • Do the positive and negative share an input and differ in only the target dimension?
  • Did you validate on fresh boundary inputs rather than the examples themselves?
  • Did you check dimensions you did not intend to change for collateral effects?
  • Is there a one-line reason naming the principle?

Why a routine beats vigilance

  • Vigilance fades under deadline pressure; a checklist does not.
  • Each item maps to a specific failure mode, so passing it rules that mode out.
  • The routine is cheap relative to debugging an erratic prompt in production.

Frequently Asked Questions

What is the single most damaging mistake?

Strawman negatives, because they make the whole contrast inert. If the model would never produce the negative, contrasting against it teaches nothing, and the prompt only looks like it should work.

How do I know if I have overcorrected?

The output swings to the opposite extreme of the negative. If you contrasted against verbosity and now get uselessly terse answers, that is overcorrection. Soften the negative and retest.

Why does validating on my own examples not count?

Because the model can simply reproduce an example it was shown without learning the underlying distinction. Only fresh inputs near the boundary reveal whether it generalized.

Should I really limit myself to one contrastive pair?

Start with one well-isolated pair and add more only for distinct ambiguities. Extra pairs raise the risk of teaching conflicting lessons, which produces erratic behavior that is hard to debug.

How can a fix create a new ambiguity?

A sharp boundary on one dimension can incidentally constrain another, like teaching brevity that accidentally fixes tone. Check dimensions you did not intend to change after validating the one you did.

Is adding a reason for the distinction really necessary?

It is not strictly required, but it meaningfully helps the model generalize the right principle instead of an incidental surface difference. A single line of rationale is cheap insurance.

Key Takeaways

  • Strawman negatives teach nothing; use the model's real wrong outputs instead.
  • Pairs must differ in only the target dimension, or the lesson is muddled.
  • A strong contrast can overcorrect; choose negatives that are wrong but not extreme.
  • Validate on fresh inputs, never on the examples you provided.
  • Watch for fixes that resolve one ambiguity while quietly creating another.
  • Prefer one isolated pair with a stated reason over many conflicting contrasts.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification