If you have run error-detection prompts for a while, you already know the easy wins: typos, broken cross-references, totals that do not add up, a claim with no support. A single well-structured prompt catches most of these reliably. The interesting problems begin where that simple approach plateaus.
Naive detection has predictable blind spots. It catches errors that are visible inside a single document but misses errors that only exist relative to outside context. It over-flags unusual-but-correct content and under-flags plausible-but-wrong content. It treats every issue with the same posture, when high-stakes work needs adversarial scrutiny and low-stakes work needs a light touch. Moving past the basics means engineering around these limits deliberately.
This article assumes you know the fundamentals and want depth. We will cover adversarial prompting, structured comparison against a source of truth, ensemble and self-critique techniques, calibration of confidence, and the specific edge cases that catch experienced practitioners off guard.
Adversarial Framing for Higher Recall
A reviewer asked to "find errors" looks for problems passively. A reviewer told to actively break something looks harder.
Make the model argue against the work
Instead of "identify any errors," try "assume this document contains at least one serious flaw that a careful critic would catch. Find it. If you cannot find a serious flaw, identify the weakest claim and explain why a skeptic would challenge it." This adversarial stance raises recall on subtle issues because it removes the model's tendency to conclude that everything looks fine.
Role-stack for domain rigor
- Assign a specific critical persona: a compliance auditor, a hostile reviewer, a domain expert who disagrees.
- Ask the model to predict how that persona would attack the work, then evaluate whether the attack lands.
- Combine personas in separate passes rather than one prompt, so each lens stays sharp.
The trade-off is more false positives, which is why adversarial passes pair best with strong triage. The cost of that triage is part of the economics covered in What Error-Detection Prompting Actually Saves You.
Structured Comparison Against Ground Truth
The biggest class of errors a single-document pass misses is the contradiction with external reality. The fix is to give the model the truth to compare against.
Provide the source, then demand alignment
When a deliverable should match a spec, a brief, a data source, or a prior version, paste both and ask the model to check the work against the reference line by line. Frame it as: "Here is the source of truth and here is the derived document. List every place where the derived document contradicts, omits, or misrepresents the source."
Diff-style review for revisions
- Provide the previous version and the new version.
- Ask the model to summarize what changed and flag any change that introduced an inconsistency or removed required content.
- This catches regressions that a fresh read would never notice because nothing looks wrong in isolation.
Ensembles and Self-Critique
A single pass reflects a single perspective. Combining passes raises reliability, at the cost of more compute and orchestration.
Run independent passes and reconcile
Run the same content through detection several times with different framings or different models, then have a final pass reconcile the findings. Issues flagged by multiple independent passes deserve more trust; issues flagged by only one warrant closer human inspection.
Force the model to critique its own findings
After an initial detection pass, feed the findings back with the instruction: "For each issue you raised, argue the opposite β explain why this might actually be correct. Then give a final verdict." This self-critique step prunes weak flags and exposes overconfident ones. It is one of the highest-leverage techniques for reducing the false positives that erode trust, a theme also examined in Sorting Truth From Hype in AI Error Checking.
Calibrating Confidence and Severity
Experienced practitioners stop treating all flags equally and start engineering for triage.
Separate confidence from severity
- Confidence is how sure the model is that something is wrong.
- Severity is how much it matters if it is wrong.
- Ask for both on every flag, because a low-confidence flag on a high-severity item still demands a human look, while a high-confidence flag on a trivial item may not.
Anchor the ratings
Vague confidence labels drift. Define them in the prompt: "High confidence means you would stake your reputation on it. Medium means you would raise it in review. Low means it is worth a glance." Concrete anchors make the ratings comparable across runs and reviewers, which matters enormously when several people share the practice, as discussed in Spreading AI Error Review Beyond One Power User.
Edge Cases That Catch Experts
The failure modes that bite experienced users are rarely the obvious ones.
The subtle traps
- Plausible fabrication. The model invents a confident correction that sounds authoritative and is wrong. Always require it to cite the exact text it is reacting to.
- Anchoring on its own output. In multi-turn review, the model defends earlier statements. Reset context or run fresh passes for genuinely independent checks.
- Silent truncation. Long documents get partially ignored. Chunk deliberately and confirm coverage rather than assuming the whole thing was read.
- Domain drift. A general model misjudges specialized conventions as errors. Provide the relevant standard so it judges against the right rules.
Engineering around them
Build these mitigations into your standard prompts rather than rediscovering them each time. Codifying hard-won techniques into reusable assets is exactly what Turning Ad Hoc Error Checking Into a Documented Routine is designed to capture.
Measuring a Detection Prompt Like an Instrument
Advanced practice means you stop trusting your impression of a prompt and start measuring it. A detection prompt is an instrument, and instruments are characterized, not assumed.
Build an evaluation set
- Assemble a collection of work samples where you already know every error, including some clean samples with no errors at all.
- Run your prompt against the set and score it on two axes: how many real errors it caught (recall) and how many of its flags were genuine (precision).
- Include the clean samples specifically to measure how often the prompt invents problems where none exist.
Tune against the numbers, not vibes
Once you can measure precision and recall, prompt changes become experiments rather than guesses. Tighten the framing and watch precision rise while recall may fall; loosen it and watch the reverse. The right operating point depends on whether a miss or a false alarm is more costly for the work in question. This measured approach is what lets you set the aggressiveness deliberately rather than by feel, and it feeds directly into the economics described in What Error-Detection Prompting Actually Saves You.
Re-measure when anything changes
A prompt characterized six months ago may have drifted as the work, the model, or the error patterns changed. Re-run your evaluation set periodically so you catch silent degradation before it costs you a missed defect in production.
Frequently Asked Questions
When is adversarial framing worth the extra false positives?
Use it on high-stakes work where a missed error is expensive and you have the review capacity to triage the extra flags. For routine, low-risk content, a gentler detection prompt keeps the signal-to-noise ratio better. Match the aggressiveness of the framing to the cost of being wrong.
How many independent passes are worth running?
For most work, two or three differently framed passes plus a reconciliation step capture the benefit without runaway cost. Beyond that, returns diminish quickly and orchestration complexity grows. Reserve large ensembles for the rare deliverables where a missed error carries serious consequences.
Does self-critique actually reduce errors or just add steps?
When implemented as a genuine devil's-advocate pass, it reliably prunes weak and overconfident flags, which improves precision. It works because it forces the model to consider the opposite case rather than reinforcing its first answer. It adds a step, but that step is where much of the false-positive reduction happens.
How do I stop the model from inventing confident corrections?
Require it to quote the exact source text behind every flag and forbid it from proposing a fix without pointing to the specific problem. Fabrications usually appear when the model is allowed to speak in generalities. Tying every claim to quoted evidence makes invention much harder.
What is the right way to handle very long documents?
Chunk the content into sections small enough to be read fully, run detection on each, and then run a final pass focused only on cross-section consistency. Do not trust a single prompt to carefully review a very long document end to end; confirm coverage rather than assuming it.
Should I use a different model for the critique pass?
Using a second, independent model for reconciliation or self-critique can surface issues the first model is blind to, because different models have different blind spots. It is not mandatory, but for high-stakes review it adds a meaningful layer of independence at modest extra cost.
Key Takeaways
- Adversarial framing raises recall on subtle issues but demands strong triage to handle the extra false positives.
- The errors a single-document pass misses most are contradictions with external truth; supply the source and demand line-by-line alignment.
- Independent passes plus a reconciliation step, and a self-critique devil's-advocate pass, both improve reliability.
- Separate confidence from severity and anchor both with concrete definitions so ratings stay comparable.
- Guard against plausible fabrication, self-anchoring, silent truncation, and domain drift by building mitigations into your standard prompts.