Budget, Decide, Degrade: One Model for Any Context Limit

Most teams handle context limits reactively. They build a prompt, hit a wall, patch it, and move on. That works until the system grows complex enough that ad hoc patches contradict each other. What you need is a reusable model you can apply to any context-bound feature and get a consistent, defensible answer.

This article lays out one such framework. Call it the Budget-Decide-Degrade model, three stages that turn context management from a series of one-off fixes into a repeatable discipline. The first stage establishes what you have to spend. The second chooses how to spend it. The third defines what happens when you would otherwise overspend. Each stage has clear inputs, outputs, and a rule for when to apply it.

The framework is deliberately small. It is meant to be remembered and used, not filed away. For the foundational concepts it builds on, start with the complete guide.

Stage One: Budget

The first stage answers a single question: how many tokens do I actually have for content?

You cannot make any further decision honestly without this number. The headline window is a trap; it includes everything. The budget is what remains after the fixed and reserved costs are removed.

Components of the budget

Window: the model's hard limit, in tokens, for this exact model and version.
Fixed cost: the system prompt plus tool schemas, measured with the real tokenizer.
Reserved output: the maximum response length you will request.
Safety margin: 10 to 15 percent of the window, to absorb variance.

The output is your working budget: window minus fixed cost minus reserved output minus margin. The step-by-step approach computes this with a worked example. Everything downstream is measured against the working budget, never the window.

When to apply: Once per model per feature, and again whenever you change models. This is the foundation; redo it on any change to the inputs.

Stage Two: Decide

The second stage chooses a strategy based on the shape of your content relative to the budget. There are three strategies, and the decision rule is simple.

The decision rule

If content fits the working budget with margin, Fit it. Send it whole. This is the most reliable option and the right default for coherent documents that must be considered as a unit.
If content is a growing stream you want to remember, Summarize it. Compress older material into a running synopsis. This is the strategy for conversation history.
If content is a large, mostly static corpus, Retrieve it. Index it and pull only relevant slices per query. This is the only strategy that scales to corpora far larger than any window.

The mistake the framework prevents is forcing the wrong strategy onto a problem: stuffing a corpus you should retrieve, or chunking a coherent document you should fit. The real-world examples show each strategy matched correctly to content, and the common mistakes guide shows what mismatches cost.

Combining strategies

Real systems blend them. A document assistant might Fit a single uploaded file, Summarize the conversation about it, and Retrieve from a reference corpus, all in one request. The framework applies per content source, not per system. Decide each source independently.

Stage Three: Degrade

The third stage answers what happens when, despite good budgeting and strategy, the assembled prompt would still exceed the window. This is not a rare edge case; it is the normal condition under load, and it must be designed, not improvised.

The degradation principle

When over budget, shed the lowest-priority content first, never truncate arbitrarily. Graceful degradation requires that the system know the priority of everything in the prompt.

A workable priority ordering:

System instructions and must-follow constraints, never dropped.
The user's current question, never dropped.
Recent conversation turns, dropped or summarized only after lower tiers.
Retrieved passages, ranked by relevance; lowest-ranked dropped first.
Older history and optional context, shed first.

The mechanism

Degradation is enforced by a single pre-send guard: count the assembled prompt, and if it exceeds the window minus reserved output, remove content from the bottom of the priority order until it fits. This converts overflow from a hard failure into a quality trade-off you control. The best practices article argues why this guard is non-negotiable.

When to apply: On every single request. Budget and Decide are design-time; Degrade is runtime, and it runs constantly.

Putting the Three Stages Together

The framework reads as a pipeline. Budget once, to establish the working number. Decide per content source, to choose a strategy. Degrade on every request, to enforce the budget gracefully. Each stage has a different cadence, and conflating them is the root of most ad hoc messes. Teams that budget but never degrade get runtime rejections. Teams that degrade but never budget shed content they did not need to. Teams that decide wrong fight their own architecture.

Applied consistently, the model turns context management into something you can reason about and review, which is exactly what the checklist operationalizes.

A Worked Application of the Framework

Consider a document assistant on a model with a 128,000-token window. Run the framework end to end.

Budget: The system prompt and tool schemas measure 2,000 tokens. Maximum output is reserved at 2,000. A 12 percent safety margin removes about 15,400. The working budget lands near 108,600 tokens. That is the number every later step uses.

Decide: The assistant has three content sources. An uploaded report of 40,000 tokens is coherent and fits, so Fit it. The conversation about the report grows over time, so Summarize it once it crosses 60 percent of the budget. A reference library of millions of tokens cannot fit, so Retrieve from it per query, pulling perhaps 8,000 tokens of relevant passages. Each source gets the strategy its shape demands.

Degrade: On every request, the assembly function counts the assembled prompt. If the report, summarized history, and retrieved passages together approach the ceiling, it sheds the lowest-ranked retrieved passages first, then trims older summarized history, never touching the system instructions or the user's question. The result is a system that always sends something coherent and never gets rejected.

Notice how cleanly the three stages separate concerns. The budget calculation never worries about strategy; the strategy decision never worries about runtime overflow; the runtime guard never worries about which model to use. That separation is the whole point.

When the Framework Says Change Models

Sometimes the honest output of Stage One is that no reasonable budget fits the task on the current model. That is a signal, not a failure. If a coherent document genuinely cannot fit and cannot be retrieved without losing essential cross-references, a larger window is the right move. The framework makes this a deliberate, evidenced decision rather than a reflexive upgrade, because you reach it only after Budget and Decide have shown that Fit is required and impossible on the current model.

Frequently Asked Questions

What are the three stages of the framework?

Budget, Decide, and Degrade. Budget establishes your true working token allowance after reservations. Decide chooses among fitting, summarizing, or retrieving based on the content's shape. Degrade defines how the system sheds low-priority content gracefully when a prompt would otherwise exceed the window.

How is the working budget different from the window size?

The window is the model's total hard limit for everything in a request. The working budget is what remains for your content after subtracting the system prompt, tool schemas, reserved output, and a safety margin. Decisions should be made against the working budget, never the headline window.

Can I use more than one strategy at once?

Yes, and complex systems usually do. The Decide stage applies per content source, so a single request might fit an uploaded document, summarize the conversation, and retrieve from a corpus simultaneously. Each source gets the strategy that matches its shape.

How often does each stage run?

Budget runs once per model per feature and again on any model change. Decide runs at design time per content source. Degrade runs at runtime on every single request, because overflow under load is the normal condition, not an exception.

When does the framework justify a bigger model?

Only when Stage One shows no reasonable budget fits, and Stage Two shows the content must be fitted whole rather than retrieved, such as a coherent document with essential cross-references. Reaching that conclusion through the framework makes the upgrade evidenced rather than reflexive.

Key Takeaways

The Budget-Decide-Degrade framework turns reactive context patches into a repeatable discipline.
Budget computes your working token allowance as the window minus fixed costs, reserved output, and margin.
Decide chooses Fit, Summarize, or Retrieve based on the content's shape, applied per content source.
Degrade sheds the lowest-priority content first via a pre-send guard, converting overflow into a controlled trade-off.
The stages run on different cadences: budget once, decide at design time, degrade on every request.
The framework justifies a larger model only after showing the content must be fitted whole and cannot be on the current one.

The framework is deliberately small. It is meant to be remembered and used, not filed away. For the foundational concepts it builds on, start with the complete guide.

Stage One: Budget

The first stage answers a single question: how many tokens do I actually have for content?

You cannot make any further decision honestly without this number. The headline window is a trap; it includes everything. The budget is what remains after the fixed and reserved costs are removed.

Components of the budget

Window: the model's hard limit, in tokens, for this exact model and version.
Fixed cost: the system prompt plus tool schemas, measured with the real tokenizer.
Reserved output: the maximum response length you will request.
Safety margin: 10 to 15 percent of the window, to absorb variance.

When to apply: Once per model per feature, and again whenever you change models. This is the foundation; redo it on any change to the inputs.

Stage Two: Decide

The second stage chooses a strategy based on the shape of your content relative to the budget. There are three strategies, and the decision rule is simple.

The decision rule

If content fits the working budget with margin, Fit it. Send it whole. This is the most reliable option and the right default for coherent documents that must be considered as a unit.
If content is a growing stream you want to remember, Summarize it. Compress older material into a running synopsis. This is the strategy for conversation history.
If content is a large, mostly static corpus, Retrieve it. Index it and pull only relevant slices per query. This is the only strategy that scales to corpora far larger than any window.

Combining strategies

Stage Three: Degrade

The degradation principle

When over budget, shed the lowest-priority content first, never truncate arbitrarily. Graceful degradation requires that the system know the priority of everything in the prompt.

A workable priority ordering:

System instructions and must-follow constraints, never dropped.
The user's current question, never dropped.
Recent conversation turns, dropped or summarized only after lower tiers.
Retrieved passages, ranked by relevance; lowest-ranked dropped first.
Older history and optional context, shed first.

The mechanism

When to apply: On every single request. Budget and Decide are design-time; Degrade is runtime, and it runs constantly.

Putting the Three Stages Together

Applied consistently, the model turns context management into something you can reason about and review, which is exactly what the checklist operationalizes.

A Worked Application of the Framework

Consider a document assistant on a model with a 128,000-token window. Run the framework end to end.

When the Framework Says Change Models

Frequently Asked Questions

What are the three stages of the framework?

How is the working budget different from the window size?

Can I use more than one strategy at once?

How often does each stage run?

When does the framework justify a bigger model?

Key Takeaways

The Budget-Decide-Degrade framework turns reactive context patches into a repeatable discipline.
Budget computes your working token allowance as the window minus fixed costs, reserved output, and margin.
Decide chooses Fit, Summarize, or Retrieve based on the content's shape, applied per content source.
Degrade sheds the lowest-priority content first via a pre-send guard, converting overflow into a controlled trade-off.
The stages run on different cadences: budget once, decide at design time, degrade on every request.
The framework justifies a larger model only after showing the content must be fitted whole and cannot be on the current one.

Budget, Decide, Degrade: One Model for Any Context Limit

Stage One: Budget

Components of the budget

Stage Two: Decide

The decision rule

Combining strategies

Stage Three: Degrade

The degradation principle

The mechanism

Putting the Three Stages Together

A Worked Application of the Framework

When the Framework Says Change Models

Frequently Asked Questions

What are the three stages of the framework?

How is the working budget different from the window size?

Can I use more than one strategy at once?

How often does each stage run?

When does the framework justify a bigger model?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?

Budget, Decide, Degrade: One Model for Any Context Limit

Stage One: Budget

Components of the budget

Stage Two: Decide

The decision rule

Combining strategies

Stage Three: Degrade

The degradation principle

The mechanism

Putting the Three Stages Together

A Worked Application of the Framework

When the Framework Says Change Models

Frequently Asked Questions

What are the three stages of the framework?

How is the working budget different from the window size?

Can I use more than one strategy at once?

How often does each stage run?

When does the framework justify a bigger model?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?