AGENCYSCRIPT
CoursesEnterpriseBlog
πŸ‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
Β© 2026 Agency Script, Inc.Β·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Play 1: Establish the unit economics baselineHow to run itPlay 2: Instrument before you optimizePlay 3: Tier your models by task difficultyThe routing rulesPlay 4: Cut the prompt down to what's load-bearingPlay 5: Turn on caching and batch where they fitPlay 6: Set guardrails and alertsThe guardrails to put in placePlay 7: Align customer pricing with your costPlay 8: Review on a cadenceFrequently Asked QuestionsWhat order should I run these plays in?Who should own the cost playbook overall?How much can a full playbook run save?Do small teams need all eight plays?When is optimization premature?Key Takeaways
Home/Blog/Named Plays for When the Surprise Invoice Lands
General

Named Plays for When the Surprise Invoice Lands

A

Agency Script Editorial

Editorial Team

Β·October 5, 2024Β·6 min read
ai model cost and pricing structuresai model cost and pricing structures playbookai model cost and pricing structures guideai fundamentals

A playbook is not a list of tips. It's a set of named plays, each with a trigger that tells you when to run it, an owner who's accountable, and a clear place in the sequence. The difference matters because AI cost problems rarely announce themselves politely. They show up as a surprise invoice three weeks after someone shipped a feature nobody instrumented.

This playbook organizes AI model cost and pricing structures into plays you can assign and sequence. Run them roughly in order for a new product, or jump to the play whose trigger just fired. Each one names what kicks it off, who runs it, and what "done" looks like. The goal is that cost becomes a managed variable, not a quarterly fire drill.

Play 1: Establish the unit economics baseline

Trigger: Before you ship any AI feature to real users. Owner: Whoever owns the feature's P&L, usually a product lead working with engineering.

You cannot manage what you haven't measured. The first play is to calculate cost per primary action: what does one "generate," one "summarize," one "chat turn" actually cost in tokens at current prices?

How to run it

  • Capture a representative sample of real requests and count input and output tokens for each.
  • Multiply by current per-token rates for your chosen model.
  • Express the result as cost per action and cost per active user per month.

If you can't state these two numbers, every downstream pricing and budgeting decision is a guess. The framework article gives a structured way to lay these numbers out.

Play 2: Instrument before you optimize

Trigger: The moment a feature handles production traffic. Owner: Engineering.

Every request should log token counts (input and output separately), the model used, the feature, and a user or tenant identifier. Without this, you're blind to where spend concentrates and which customer is generating it.

This play comes before any optimization play deliberately. Optimizing without measurement means guessing at the bottleneck, and teams routinely guess wrong, hand-tuning a prompt that accounts for two percent of spend while a retrieval step burns the other ninety.

Play 3: Tier your models by task difficulty

Trigger: Baseline shows a single premium model handling everything. Owner: Engineering, with product sign-off on quality bars.

Most production traffic is mundane and doesn't need a frontier model. Route it accordingly.

The routing rules

  • Cheap small model: classification, extraction, short summaries, intent detection, formatting.
  • Mid-tier model: standard generation, moderate reasoning, most chat.
  • Frontier model: complex multi-step reasoning, long-form creative work, anything where quality is the visible product.

Define the routing in one place so it's auditable and changeable. This single play often delivers the largest savings of anything here. See the step-by-step guide for implementation patterns.

Play 4: Cut the prompt down to what's load-bearing

Trigger: Input tokens dominate your cost breakdown. Owner: Engineering.

Every token in the prompt is billed on every request. The plays here are concrete:

  • Trim system prompts to essential instructions; remove redundant examples.
  • For retrieval, return the most relevant chunks, not the whole document. Tighten the number of chunks and re-measure quality.
  • Truncate or summarize conversation history instead of resending the entire transcript every turn.

Conversation history is the silent killer. A 40-turn chat that resends everything each time pays for early messages dozens of times over.

Play 5: Turn on caching and batch where they fit

Trigger: You see repeated prefixes (caching) or non-interactive workloads (batch). Owner: Engineering.

Prompt caching discounts a fixed, reused prefix by 75 to 90 percent; use it when the same system prompt or knowledge base goes out on most calls. Batch tiers cut price roughly in half for work that can wait hours instead of seconds.

The discipline: for every workload, ask "is the prefix stable?" and "does a human wait on this?" If the prefix is stable, cache it. If no human waits, batch it. These two questions catch most of the easy savings.

Play 6: Set guardrails and alerts

Trigger: Once spend is meaningful enough that a runaway day would hurt. Owner: Engineering plus finance.

Optimizing the average case doesn't protect you from the catastrophic case: a retry loop, a malicious user, a bug that resends history infinitely.

The guardrails to put in place

  • Per-request max output token limits so a single call can't run away.
  • Per-user and per-tenant rate limits to contain abuse.
  • Daily spend alerts that page someone when the trend breaks from normal.
  • A kill switch to disable an expensive feature without a deploy.

The common mistakes article details the failure modes these guardrails exist to catch.

Play 7: Align customer pricing with your cost

Trigger: You're packaging the feature for sale. Owner: Product and finance.

Your pricing model must hold margin even under heavy use. Decide between flat, usage-based, and hybrid, then stress-test it against your power users.

  • Model the cost of your top one percent of users; flat pricing must survive them or include a cap.
  • For usage-based, give customers visibility into their spend to prevent bill shock.
  • Hybrid (base plus overage) is the common landing spot for a reason: predictable for most, protected against outliers.

If your cost per action from Play 1 ever exceeds your effective price per action, you're losing money on every use. Revisit when prices or usage shift.

Play 8: Review on a cadence

Trigger: Quarterly, or whenever a provider announces price changes. Owner: Product lead.

AI prices fall and new models ship constantly. A model choice that was optimal six months ago may now be twice the necessary cost. Each review: re-run the baseline, check whether a newer cheaper model meets your quality bar, and confirm caching and batch are still applied where they should be. The best practices guide covers how to make this review lightweight enough that it actually happens.

Frequently Asked Questions

What order should I run these plays in?

For a new feature, run them in sequence: baseline, instrument, tier, trim, cache and batch, guardrails, pricing, review. For an existing system in trouble, jump to the play whose trigger fired, usually instrumentation if you're flying blind, then guardrails if a spike just hit.

Who should own the cost playbook overall?

A single product or engineering lead should own the playbook as a whole, even though individual plays have different owners. Diffuse ownership is why cost problems fester. One person should be able to answer "what does this feature cost and who's reducing it?"

How much can a full playbook run save?

It varies by starting point, but teams running from a single-premium-model baseline commonly cut spend by half or more once tiering, caching, batch, and prompt trimming are all applied. The largest single lever is usually model tiering.

Do small teams need all eight plays?

Small teams need the baseline, instrumentation, and guardrails immediately; those three prevent disasters. Tiering and caching matter once volume grows. Pricing alignment matters the moment you charge customers. Scale the depth of each play to your stage.

When is optimization premature?

Before you have real traffic to optimize against. Ship with a reasonable default, instrument it, and let actual usage tell you where the spend concentrates. Optimizing imagined workloads wastes engineering time on bottlenecks that may never appear.

Key Takeaways

  • Treat cost as a set of named plays with triggers and owners, not a one-time cleanup; this makes spend a managed variable.
  • Baseline unit economics and instrument token usage before any optimization, so you cut the bottleneck that actually exists.
  • Model tiering is usually the single largest saving; route mundane traffic to cheap models and reserve frontier models for hard work.
  • Guardrails (output caps, rate limits, alerts, kill switch) protect against the catastrophic runaway case that averages won't catch.
  • Align customer pricing to a known cost per action and review the whole playbook quarterly as prices fall and new models ship.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way β€” a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026Β·11 min read
General

Case Study: Large Language Models in Practice

Most teams that fail with large language models don't fail because the technology doesn't work. They fail because they treat deployment as a one-time event rather than a discipline β€” pick a model, wri

A
Agency Script Editorial
June 1, 2026Β·11 min read
General

Thirty-Second Wins Breed False Confidence With LLMs

Working with large language models is deceptively easy to start and surprisingly hard to do well. You can get a useful output in thirty seconds, which creates a false confidence that compounds over ti

A
Agency Script Editorial
June 1, 2026Β·10 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification