Named Plays for When the Surprise Invoice Lands

A playbook is not a list of tips. It's a set of named plays, each with a trigger that tells you when to run it, an owner who's accountable, and a clear place in the sequence. The difference matters because AI cost problems rarely announce themselves politely. They show up as a surprise invoice three weeks after someone shipped a feature nobody instrumented.

This playbook organizes AI model cost and pricing structures into plays you can assign and sequence. Run them roughly in order for a new product, or jump to the play whose trigger just fired. Each one names what kicks it off, who runs it, and what "done" looks like. The goal is that cost becomes a managed variable, not a quarterly fire drill.

Play 1: Establish the unit economics baseline

Trigger: Before you ship any AI feature to real users. Owner: Whoever owns the feature's P&L, usually a product lead working with engineering.

You cannot manage what you haven't measured. The first play is to calculate cost per primary action: what does one "generate," one "summarize," one "chat turn" actually cost in tokens at current prices?

How to run it

Capture a representative sample of real requests and count input and output tokens for each.
Multiply by current per-token rates for your chosen model.
Express the result as cost per action and cost per active user per month.

If you can't state these two numbers, every downstream pricing and budgeting decision is a guess. The framework article gives a structured way to lay these numbers out.

Play 2: Instrument before you optimize

Trigger: The moment a feature handles production traffic. Owner: Engineering.

Every request should log token counts (input and output separately), the model used, the feature, and a user or tenant identifier. Without this, you're blind to where spend concentrates and which customer is generating it.

This play comes before any optimization play deliberately. Optimizing without measurement means guessing at the bottleneck, and teams routinely guess wrong, hand-tuning a prompt that accounts for two percent of spend while a retrieval step burns the other ninety.

Play 3: Tier your models by task difficulty

Trigger: Baseline shows a single premium model handling everything. Owner: Engineering, with product sign-off on quality bars.

Most production traffic is mundane and doesn't need a frontier model. Route it accordingly.

The routing rules

Cheap small model: classification, extraction, short summaries, intent detection, formatting.
Mid-tier model: standard generation, moderate reasoning, most chat.
Frontier model: complex multi-step reasoning, long-form creative work, anything where quality is the visible product.

Define the routing in one place so it's auditable and changeable. This single play often delivers the largest savings of anything here. See the step-by-step guide for implementation patterns.

Play 4: Cut the prompt down to what's load-bearing

Trigger: Input tokens dominate your cost breakdown. Owner: Engineering.

Every token in the prompt is billed on every request. The plays here are concrete:

Trim system prompts to essential instructions; remove redundant examples.
For retrieval, return the most relevant chunks, not the whole document. Tighten the number of chunks and re-measure quality.
Truncate or summarize conversation history instead of resending the entire transcript every turn.

Conversation history is the silent killer. A 40-turn chat that resends everything each time pays for early messages dozens of times over.

Play 5: Turn on caching and batch where they fit

Trigger: You see repeated prefixes (caching) or non-interactive workloads (batch). Owner: Engineering.

Prompt caching discounts a fixed, reused prefix by 75 to 90 percent; use it when the same system prompt or knowledge base goes out on most calls. Batch tiers cut price roughly in half for work that can wait hours instead of seconds.

The discipline: for every workload, ask "is the prefix stable?" and "does a human wait on this?" If the prefix is stable, cache it. If no human waits, batch it. These two questions catch most of the easy savings.

Play 6: Set guardrails and alerts

Trigger: Once spend is meaningful enough that a runaway day would hurt. Owner: Engineering plus finance.

Optimizing the average case doesn't protect you from the catastrophic case: a retry loop, a malicious user, a bug that resends history infinitely.

The guardrails to put in place

Per-request max output token limits so a single call can't run away.
Per-user and per-tenant rate limits to contain abuse.
Daily spend alerts that page someone when the trend breaks from normal.
A kill switch to disable an expensive feature without a deploy.

The common mistakes article details the failure modes these guardrails exist to catch.

Play 7: Align customer pricing with your cost

Trigger: You're packaging the feature for sale. Owner: Product and finance.

Your pricing model must hold margin even under heavy use. Decide between flat, usage-based, and hybrid, then stress-test it against your power users.

Model the cost of your top one percent of users; flat pricing must survive them or include a cap.
For usage-based, give customers visibility into their spend to prevent bill shock.
Hybrid (base plus overage) is the common landing spot for a reason: predictable for most, protected against outliers.

If your cost per action from Play 1 ever exceeds your effective price per action, you're losing money on every use. Revisit when prices or usage shift.

Play 8: Review on a cadence

Trigger: Quarterly, or whenever a provider announces price changes. Owner: Product lead.

AI prices fall and new models ship constantly. A model choice that was optimal six months ago may now be twice the necessary cost. Each review: re-run the baseline, check whether a newer cheaper model meets your quality bar, and confirm caching and batch are still applied where they should be. The best practices guide covers how to make this review lightweight enough that it actually happens.

Frequently Asked Questions

What order should I run these plays in?

For a new feature, run them in sequence: baseline, instrument, tier, trim, cache and batch, guardrails, pricing, review. For an existing system in trouble, jump to the play whose trigger fired, usually instrumentation if you're flying blind, then guardrails if a spike just hit.

Who should own the cost playbook overall?

A single product or engineering lead should own the playbook as a whole, even though individual plays have different owners. Diffuse ownership is why cost problems fester. One person should be able to answer "what does this feature cost and who's reducing it?"

How much can a full playbook run save?

It varies by starting point, but teams running from a single-premium-model baseline commonly cut spend by half or more once tiering, caching, batch, and prompt trimming are all applied. The largest single lever is usually model tiering.

Do small teams need all eight plays?

Small teams need the baseline, instrumentation, and guardrails immediately; those three prevent disasters. Tiering and caching matter once volume grows. Pricing alignment matters the moment you charge customers. Scale the depth of each play to your stage.

When is optimization premature?

Before you have real traffic to optimize against. Ship with a reasonable default, instrument it, and let actual usage tell you where the spend concentrates. Optimizing imagined workloads wastes engineering time on bottlenecks that may never appear.

Key Takeaways

Treat cost as a set of named plays with triggers and owners, not a one-time cleanup; this makes spend a managed variable.
Baseline unit economics and instrument token usage before any optimization, so you cut the bottleneck that actually exists.
Model tiering is usually the single largest saving; route mundane traffic to cheap models and reserve frontier models for hard work.
Guardrails (output caps, rate limits, alerts, kill switch) protect against the catastrophic runaway case that averages won't catch.
Align customer pricing to a known cost per action and review the whole playbook quarterly as prices fall and new models ship.

Play 1: Establish the unit economics baseline

Trigger: Before you ship any AI feature to real users. Owner: Whoever owns the feature's P&L, usually a product lead working with engineering.

How to run it

Capture a representative sample of real requests and count input and output tokens for each.
Multiply by current per-token rates for your chosen model.
Express the result as cost per action and cost per active user per month.

If you can't state these two numbers, every downstream pricing and budgeting decision is a guess. The framework article gives a structured way to lay these numbers out.

Play 2: Instrument before you optimize

Trigger: The moment a feature handles production traffic. Owner: Engineering.

Play 3: Tier your models by task difficulty

Trigger: Baseline shows a single premium model handling everything. Owner: Engineering, with product sign-off on quality bars.

Most production traffic is mundane and doesn't need a frontier model. Route it accordingly.

The routing rules

Cheap small model: classification, extraction, short summaries, intent detection, formatting.
Mid-tier model: standard generation, moderate reasoning, most chat.
Frontier model: complex multi-step reasoning, long-form creative work, anything where quality is the visible product.

Define the routing in one place so it's auditable and changeable. This single play often delivers the largest savings of anything here. See the step-by-step guide for implementation patterns.

Play 4: Cut the prompt down to what's load-bearing

Trigger: Input tokens dominate your cost breakdown. Owner: Engineering.

Every token in the prompt is billed on every request. The plays here are concrete:

Trim system prompts to essential instructions; remove redundant examples.
For retrieval, return the most relevant chunks, not the whole document. Tighten the number of chunks and re-measure quality.
Truncate or summarize conversation history instead of resending the entire transcript every turn.

Conversation history is the silent killer. A 40-turn chat that resends everything each time pays for early messages dozens of times over.

Play 5: Turn on caching and batch where they fit

Trigger: You see repeated prefixes (caching) or non-interactive workloads (batch). Owner: Engineering.

Play 6: Set guardrails and alerts

Trigger: Once spend is meaningful enough that a runaway day would hurt. Owner: Engineering plus finance.

Optimizing the average case doesn't protect you from the catastrophic case: a retry loop, a malicious user, a bug that resends history infinitely.

The guardrails to put in place

Per-request max output token limits so a single call can't run away.
Per-user and per-tenant rate limits to contain abuse.
Daily spend alerts that page someone when the trend breaks from normal.
A kill switch to disable an expensive feature without a deploy.

The common mistakes article details the failure modes these guardrails exist to catch.

Play 7: Align customer pricing with your cost

Trigger: You're packaging the feature for sale. Owner: Product and finance.

Your pricing model must hold margin even under heavy use. Decide between flat, usage-based, and hybrid, then stress-test it against your power users.

Model the cost of your top one percent of users; flat pricing must survive them or include a cap.
For usage-based, give customers visibility into their spend to prevent bill shock.
Hybrid (base plus overage) is the common landing spot for a reason: predictable for most, protected against outliers.

If your cost per action from Play 1 ever exceeds your effective price per action, you're losing money on every use. Revisit when prices or usage shift.

Play 8: Review on a cadence

Trigger: Quarterly, or whenever a provider announces price changes. Owner: Product lead.

Frequently Asked Questions

What order should I run these plays in?

Who should own the cost playbook overall?

How much can a full playbook run save?

Do small teams need all eight plays?

When is optimization premature?

Key Takeaways

Treat cost as a set of named plays with triggers and owners, not a one-time cleanup; this makes spend a managed variable.
Baseline unit economics and instrument token usage before any optimization, so you cut the bottleneck that actually exists.
Model tiering is usually the single largest saving; route mundane traffic to cheap models and reserve frontier models for hard work.
Guardrails (output caps, rate limits, alerts, kill switch) protect against the catastrophic runaway case that averages won't catch.
Align customer pricing to a known cost per action and review the whole playbook quarterly as prices fall and new models ship.

Named Plays for When the Surprise Invoice Lands

Play 1: Establish the unit economics baseline

How to run it

Play 2: Instrument before you optimize

Play 3: Tier your models by task difficulty

The routing rules

Play 4: Cut the prompt down to what's load-bearing

Play 5: Turn on caching and batch where they fit

Play 6: Set guardrails and alerts

The guardrails to put in place

Play 7: Align customer pricing with your cost

Play 8: Review on a cadence

Frequently Asked Questions

What order should I run these plays in?

Who should own the cost playbook overall?

How much can a full playbook run save?

Do small teams need all eight plays?

When is optimization premature?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?

Named Plays for When the Surprise Invoice Lands

Play 1: Establish the unit economics baseline

How to run it

Play 2: Instrument before you optimize

Play 3: Tier your models by task difficulty

The routing rules

Play 4: Cut the prompt down to what's load-bearing

Play 5: Turn on caching and batch where they fit

Play 6: Set guardrails and alerts

The guardrails to put in place

Play 7: Align customer pricing with your cost

Play 8: Review on a cadence

Frequently Asked Questions

What order should I run these plays in?

Who should own the cost playbook overall?

How much can a full playbook run save?

Do small teams need all eight plays?

When is optimization premature?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?