AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Default to Less ContextWhy Restraint WinsWhen to Add MoreMake Instructions Concrete and TestableReplace Abstractions With RulesOne Rule Per LineResolve Contradictions DeliberatelyEngineer Position DeliberatelyAnchor the EdgesSeparate Evidence From InstructionTreat Retrieval as a First-Class ConcernInspect Before You Tune PromptsPrefer Precision Over RecallManage Context Across TimeSummarize Aging HistorySet Freshness Expectations Per SourceBuild Evaluation Into the WorkflowKeep a Living Regression SetTrace Failures to Context FirstKnowing When to Break These DefaultsMatch the Practice to the TaskJustify Every OverrideFrequently Asked QuestionsIs defaulting to less context ever wrong?Why separate instructions from evidence if the model reads it all?How concrete should an instruction be?Should I always favor precision over recall in retrieval?How often should I run my regression set?Key Takeaways
Home/Blog/Production Habits That Keep AI Context From Rotting
General

Production Habits That Keep AI Context From Rotting

A

Agency Script Editorial

Editorial Team

·October 14, 2023·7 min read
context engineeringcontext engineering best practicescontext engineering guideprompt engineering

Best-practice lists for AI usually collapse into platitudes: be clear, be specific, test your work. True, but useless, because they tell you nothing about the actual decisions you face when assembling context. This article takes a different stance. Each practice below is opinionated, comes with the reasoning that justifies it, and tells you when it applies and when it does not.

These habits come from watching context-driven systems succeed and fail under real load. They are not universal laws; they are defaults that work more often than not, and knowing why each one works lets you break it intelligently when your situation demands. Treat them as a starting position you can argue with, not a checklist to obey blindly.

The through-line is restraint. Most context problems come from including too much, trusting too easily, or measuring too little. Good practice pushes in the opposite direction on all three.

Default to Less Context

The strongest single habit is including less than feels comfortable.

Why Restraint Wins

Every token of marginally relevant text competes with the tokens that matter. Models do not perfectly ignore noise; they weight it. A lean context with only decision-changing information consistently outperforms a comprehensive one, and it costs less on every call.

When to Add More

Add context only when a specific failure shows the model lacked a fact. Let evidence pull material in, rather than including material defensively. This inverts the common instinct and produces tighter, faster systems. The foundations are in Master Context Engineering Without Guesswork.

Make Instructions Concrete and Testable

An instruction the model cannot verifiably follow is decoration.

Replace Abstractions With Rules

Be professional means nothing actionable. Respond in two sentences, cite only provided sources, and never speculate beyond the given text means something the model can obey and you can check.

One Rule Per Line

Bundle rules together and the model may honor some and drop others. Stating each constraint plainly and separately raises the odds all of them stick. A dense paragraph of mixed instructions invites the model to satisfy the most prominent ones and quietly skip the rest. Plain, separated rules also make your own review easier, since you can check each constraint against the output individually.

Resolve Contradictions Deliberately

As instruction sets grow, rules start to conflict. One says be exhaustive, another says be concise, and the model picks unpredictably. Review the full set as a whole and resolve contradictions before they surface as inconsistent behavior you cannot trace. Two clear rules that cannot both be satisfied are worse than one.

Engineer Position Deliberately

Treat the context as ordered, because the model does.

Anchor the Edges

Place the most important rules near the start of the system block and restate the immediate task right before generation. These high-attention positions protect critical instructions from being diluted.

Separate Evidence From Instruction

Keep retrieved facts in a clearly labeled block, distinct from the rules. Mixing the two makes both harder to follow. The mechanics behind ordering are detailed in Build Reliable Context One Step at a Time.

Treat Retrieval as a First-Class Concern

Retrieval quality is the ceiling on everything downstream.

Inspect Before You Tune Prompts

When an answer is wrong, read the exact passages retrieval returned before changing wording. If the right facts are absent, the prompt is irrelevant. This single habit redirects effort to where it pays off.

Prefer Precision Over Recall

Returning a few highly relevant passages beats returning many loosely related ones. Excess retrieved text becomes noise that the model must fight through. The instinct to widen retrieval—to include more passages just in case—usually backfires, because the cost of a buried answer is higher than the cost of a missed one in most grounded tasks. Tighten retrieval until the passages that remain are the ones that genuinely change the answer, then stop.

Manage Context Across Time

Static context is easy; living context needs maintenance.

Summarize Aging History

In multi-turn systems, replace old verbatim turns with running summaries. This keeps intent alive while reclaiming token budget and prevents the window from overflowing.

Set Freshness Expectations Per Source

Retrieved facts age at different rates. Decide explicitly how stale each source may be before it must refresh, rather than caching everything indefinitely and silently serving outdated data. A pricing table and a published policy may both be cached, but one might tolerate a day of staleness while the other must be current to the minute. Assigning a freshness window per source turns a hidden risk into an explicit decision you can defend.

Build Evaluation Into the Workflow

Without measurement, improvement is guesswork.

Keep a Living Regression Set

Collect real failing cases and require every change to pass them. This turns each fix into a permanent guarantee and stops new changes from quietly breaking old wins. To see the common failures this guards against, review 7 Common Mistakes with Context Engineering.

Trace Failures to Context First

Before rewriting prompts or swapping models, inspect the exact context that produced the bad output. Most failures resolve into a missing fact, a misordered rule, or noise. This habit is worth more than any single technique, because it directs your effort to the actual cause instead of the most visible one. The most common waste in AI development is energetic prompt rewriting aimed at a problem that lived in retrieval all along.

Knowing When to Break These Defaults

Every practice here is a default, and defaults exist to be overridden with reason.

Match the Practice to the Task

Defaulting to less context is right for focused answering but wrong for tasks that genuinely demand broad synthesis across many sources. Precision over recall is right for grounded answers but wrong for discovery, where missing a document costs more than including an extra one. The practices encode a typical risk profile; when yours differs, adjust deliberately.

Justify Every Override

Breaking a default is fine when you can name why your situation differs and what you expect to gain. Breaking it because you did not understand the reasoning is how systems drift back into the failure modes the defaults were meant to prevent. The reasoning attached to each practice is what makes intelligent deviation possible.

Frequently Asked Questions

Is defaulting to less context ever wrong?

Yes. Tasks that genuinely require broad synthesis—reconciling many sources, reasoning over a large body of evidence—need more material. The practice is a default, not a law. The discipline is letting demonstrated need pull context in, rather than adding it defensively just in case.

Why separate instructions from evidence if the model reads it all?

Because the model weights structure. When rules and facts are interleaved, the model can confuse which text is a command and which is information to act on. Clear separation, with labeled blocks, reduces that confusion and makes both the rules and the evidence easier to follow.

How concrete should an instruction be?

Concrete enough that you could write a test for it. Respond in under three sentences is checkable; be concise is not. If you cannot imagine an automated or manual check that confirms the rule was followed, the instruction is too vague to enforce reliably.

Should I always favor precision over recall in retrieval?

For most grounded answering, yes—a few accurate passages beat many loosely related ones. The exception is discovery tasks where missing a relevant document is costlier than including an extra one. Match the bias to whether your risk is wrong answers or missed information.

How often should I run my regression set?

Every time you change context assembly, instructions, retrieval, or the model. The set exists to catch regressions, and regressions happen precisely when you change something. Running it only occasionally defeats its purpose and lets silent breakage accumulate between checks.

Key Takeaways

  • Default to less context and let demonstrated failures pull material in
  • Write concrete, testable instructions, one rule per line
  • Order deliberately: anchor critical rules at the edges, separate evidence from instructions
  • Treat retrieval as the ceiling on quality and inspect it before tuning prompts
  • Maintain living context with summaries and per-source freshness rules
  • Keep a regression set, run it on every change, and trace failures to context first

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read
General

Case Study: Large Language Models in Practice

Most teams that fail with large language models don't fail because the technology doesn't work. They fail because they treat deployment as a one-time event rather than a discipline — pick a model, wri

A
Agency Script Editorial
June 1, 2026·11 min read
General

Thirty-Second Wins Breed False Confidence With LLMs

Working with large language models is deceptively easy to start and surprisingly hard to do well. You can get a useful output in thirty seconds, which creates a false confidence that compounds over ti

A
Agency Script Editorial
June 1, 2026·10 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification