Per-Token Prices Collapse While Total Spend Climbs

Predicting AI's future is a good way to look foolish in eighteen months. So this article doesn't predict capabilities. It reads the pricing signals that are already visible and reasons forward from them. The thesis is simple: the per-token price of a given capability is collapsing, but total AI spend per company is rising, and that apparent contradiction is the most important thing to understand about where cost structures are headed.

If you build a cost model assuming today's prices and architectures are stable, you'll be wrong in a predictable direction. The teams that win the next few years won't be the ones who guessed the exact prices. They'll be the ones who built cost discipline flexible enough to absorb change without a rewrite. Here's the case, grounded in what's observable now.

Signal one: per-token prices keep falling

The clearest trend is that the cost of a fixed capability tier drops steadily. A task that required an expensive frontier model last year is often handled this year by a cheaper model at a fraction of the price. Competition, better training efficiency, and hardware improvements all push in the same direction.

What this means for your decisions

Don't over-optimize for today's prices. A heroic effort to shave tokens may be made irrelevant by a price cut you couldn't have predicted. Instead, keep your model choice swappable so a cheaper option can be adopted in a config change. The teams that treat model selection as a fixed architectural decision will keep overpaying. The best practices guide covers how to keep model choice loosely coupled.

Signal two: but total spend is rising

Cheaper tokens have not produced smaller bills. They've produced more usage. As the per-unit cost drops, applications that were uneconomical become viable, so teams run more requests, longer contexts, and more ambitious workloads.

This is the central tension. Cheaper does not mean cheaper for you. It means you'll do more, and the question becomes whether the value scales with the spend. The discipline that matters is unit economics: cost per unit of value delivered, not cost per token. A falling token price with rising total spend is fine if value per dollar is climbing, and a problem if it isn't.

There's a behavioral trap here worth naming. When a resource gets cheaper, teams stop scrutinizing how they use it, exactly when expanding usage means the absolute numbers still grow. The companies that get surprised are usually the ones that read a price-drop headline, relaxed, and let context windows balloon and retrieval get sloppy. Falling prices are not a reason to stop measuring. They're a reason to measure value per dollar more carefully, because the temptation to waste rises in lockstep with affordability.

Signal three: agents change the cost shape entirely

The shift toward agentic systems, models that take many steps, call tools, and loop until a task is done, is the biggest coming change to cost structure. A single user request that once meant one API call may now mean dozens of internal calls as the agent reasons, retrieves, and acts.

Why this matters

Cost per user action becomes variable and harder to predict, because the agent decides how many steps it takes.
A poorly bounded agent can loop expensively, making guardrails essential rather than optional.
The unit you price and budget around shifts from "a request" to "a task," which may involve an unknown number of requests.

Teams that still think in single-call terms will mis-budget agentic features badly. The common mistakes article covers the runaway-loop risk that agents amplify.

Signal four: pricing models are diversifying

Per-token billing won't disappear, but it's being joined by alternatives. We're seeing more prompt caching tiers, batch discounts, committed-use pricing for predictable volume, and the early signs of outcome- or seat-based pricing for higher-level products.

The implication for your own pricing is that "tokens in, markup, tokens out" gets harder to sustain as a customer-facing model. Customers increasingly want predictable, value-aligned pricing, not a metered bill they can't forecast. Expect to move toward hybrid and outcome-based structures, which means you'll need even tighter knowledge of your underlying cost per outcome to protect margin. The examples article shows how teams are already experimenting with these structures.

Signal five: open models reshape the build-versus-buy line

Capable open-weight models keep improving, which lowers the threshold at which self-hosting beats paying API fees. As open models close the quality gap, more workloads become candidates for self-hosting at steady volume.

But self-hosting trades a clean per-token cost for fixed infrastructure plus operational complexity, and that calculus only favors building at genuine scale with stable demand. The future isn't "everyone self-hosts." It's a more granular split: hosted APIs for spiky and frontier-quality work, self-hosted open models for high-volume commodity tasks. Knowing which workloads sit on which side of that line becomes a core cost-management skill.

Expect the dividing line itself to keep moving. As open models improve and serving stacks get more efficient, workloads that were clearly "buy" this year drift toward "build" next year. That means the build-versus-buy decision is not a one-time architecture call but a recurring review, and the teams that revisit it deliberately will quietly accumulate margin while everyone else defaults to whatever they chose first.

What to build now to be ready

The throughline of every signal is the same: change is constant and your job is to be ready for it rather than to predict it. Concretely:

Make model choice swappable. Route through an abstraction so adopting a cheaper or better model is a config change, not a rewrite.
Measure cost per outcome, not per token. As agents and new pricing arrive, the token becomes a poor unit; value-per-dollar is what survives the transition.
Build guardrails before you need them. Agentic and variable-cost workloads make runaway spend a question of when, not if.
Revisit pricing assumptions quarterly. Both your input costs and viable pricing models are moving; a static assumption guarantees drift.

The teams that thrive will hold their cost discipline as a flexible system, not a frozen spreadsheet. The framework article lays out a structure built to absorb exactly this kind of change.

Frequently Asked Questions

Will AI just become too cheap to bother managing?

No. Per-token prices fall, but usage rises to fill the gap and then some, so total spend keeps climbing. Cheaper unit costs unlock more ambitious applications, which means cost management becomes more important, not less, even as each token gets cheaper.

Should I wait for prices to drop before building?

No. Build now with swappable model choices so you capture price drops automatically as they happen. Waiting forfeits the learning and market position you'd gain, and the falling prices benefit you whenever you adopt them, not only if you delayed.

How will agents change my budgeting?

You'll budget around tasks rather than individual requests, since one task may trigger many internal calls. Expect higher variance in cost per action and a stronger need for step limits and spend guardrails. The unit of analysis shifts upward from the API call to the completed job.

Is outcome-based pricing realistic for AI products?

It's emerging but demands tight cost control to offer safely. You can only price by outcome if you know your cost per outcome well enough to protect margin under heavy use. Expect hybrid models, a base plus metered overage, to dominate before pure outcome pricing becomes common.

How often will I need to revisit my cost model?

Plan on a quarterly review at minimum, plus an ad hoc check whenever a major provider announces price or model changes. The landscape moves fast enough that a model choice optimal six months ago can quietly become twice the necessary cost.

Key Takeaways

Per-token prices keep falling while total AI spend rises, so cheaper tokens mean more usage, not smaller bills; manage cost per unit of value, not per token.
Agentic systems turn one user action into many internal calls, shifting the budgeting unit from request to task and making guardrails essential.
Pricing models are diversifying toward caching, batch, committed-use, and outcome-based structures; expect to move beyond simple per-token markup for customers.
Improving open models lower the self-hosting threshold, splitting workloads between hosted APIs and self-hosted commodity tasks.
Build for change: keep model choice swappable, measure cost per outcome, set guardrails early, and revisit assumptions quarterly.

Signal one: per-token prices keep falling

What this means for your decisions

Signal two: but total spend is rising

Signal three: agents change the cost shape entirely

Why this matters

Cost per user action becomes variable and harder to predict, because the agent decides how many steps it takes.
A poorly bounded agent can loop expensively, making guardrails essential rather than optional.
The unit you price and budget around shifts from "a request" to "a task," which may involve an unknown number of requests.

Teams that still think in single-call terms will mis-budget agentic features badly. The common mistakes article covers the runaway-loop risk that agents amplify.

Signal four: pricing models are diversifying

Signal five: open models reshape the build-versus-buy line

What to build now to be ready

The throughline of every signal is the same: change is constant and your job is to be ready for it rather than to predict it. Concretely:

Make model choice swappable. Route through an abstraction so adopting a cheaper or better model is a config change, not a rewrite.
Measure cost per outcome, not per token. As agents and new pricing arrive, the token becomes a poor unit; value-per-dollar is what survives the transition.
Build guardrails before you need them. Agentic and variable-cost workloads make runaway spend a question of when, not if.
Revisit pricing assumptions quarterly. Both your input costs and viable pricing models are moving; a static assumption guarantees drift.

The teams that thrive will hold their cost discipline as a flexible system, not a frozen spreadsheet. The framework article lays out a structure built to absorb exactly this kind of change.

Frequently Asked Questions

Will AI just become too cheap to bother managing?

Should I wait for prices to drop before building?

How will agents change my budgeting?

Is outcome-based pricing realistic for AI products?

How often will I need to revisit my cost model?

Key Takeaways

Per-token prices keep falling while total AI spend rises, so cheaper tokens mean more usage, not smaller bills; manage cost per unit of value, not per token.
Agentic systems turn one user action into many internal calls, shifting the budgeting unit from request to task and making guardrails essential.
Pricing models are diversifying toward caching, batch, committed-use, and outcome-based structures; expect to move beyond simple per-token markup for customers.
Improving open models lower the self-hosting threshold, splitting workloads between hosted APIs and self-hosted commodity tasks.
Build for change: keep model choice swappable, measure cost per outcome, set guardrails early, and revisit assumptions quarterly.

Per-Token Prices Collapse While Total Spend Climbs

Signal one: per-token prices keep falling

What this means for your decisions

Signal two: but total spend is rising

Signal three: agents change the cost shape entirely

Why this matters

Signal four: pricing models are diversifying

Signal five: open models reshape the build-versus-buy line

What to build now to be ready

Frequently Asked Questions

Will AI just become too cheap to bother managing?

Should I wait for prices to drop before building?

How will agents change my budgeting?

Is outcome-based pricing realistic for AI products?

How often will I need to revisit my cost model?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?

Per-Token Prices Collapse While Total Spend Climbs

Signal one: per-token prices keep falling

What this means for your decisions

Signal two: but total spend is rising

Signal three: agents change the cost shape entirely

Why this matters

Signal four: pricing models are diversifying

Signal five: open models reshape the build-versus-buy line

What to build now to be ready

Frequently Asked Questions

Will AI just become too cheap to bother managing?

Should I wait for prices to drop before building?

How will agents change my budgeting?

Is outcome-based pricing realistic for AI products?

How often will I need to revisit my cost model?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?