Triggered Plays for When a Workload or Bill Shifts

A guide explains concepts. A playbook tells you what to do when a specific thing happens, who does it, and what comes next. Compute planning fails most often not because people don't understand GPUs but because there's no agreed sequence of moves when a workload changes, a bill spikes, or a model upgrade lands. Everyone improvises, and improvisation at infrastructure scale is expensive.

This playbook organizes AI compute and GPU requirements into a set of named plays. Each one has a trigger that tells you when to run it, an owner who's accountable, and a clear handoff to the next play. Treat it as a runbook you can hand to a team, not an essay you read once.

The plays are sequenced roughly in the order a project encounters them, from first sizing through ongoing optimization. If your organization is earlier in the journey, Ai Compute and Gpu Requirements: A Beginner's Guide covers the foundations these plays assume.

Play 1: The Sizing Play

Trigger: A new model or workload is proposed. Owner: The engineer who will run the workload.

Before anyone discusses hardware, establish the demand. The sizing play produces a single artifact: a requirements sheet stating model size in parameters, target precision, expected concurrency, and whether the workload is inference, fine-tuning, or full training.

The sizing math

Inference memory: roughly 2 GB per billion parameters at 16-bit, plus 25 to 40 percent overhead.
Fine-tuning: 4x to 6x the inference footprint for full tuning; far less for parameter-efficient methods.
Concurrency: multiply key-value cache needs by expected simultaneous requests.

The output is a memory floor and a throughput target. Without these two numbers, every later play is guesswork.

Play 2: The Rent-or-Buy Play

Trigger: The sizing sheet is approved and exceeds existing capacity. Owner: Whoever holds the budget.

This is a financial play disguised as a technical one. Run it as a comparison over the asset's useful life, not a per-hour glance.

Estimate monthly GPU-hours at realistic, not peak, utilization.
Price the cloud path at those hours, including egress and storage.
Price the owned path including power, cooling, rack space, and the engineering time to operate it.
Find the break-even month.

Rent if utilization is below roughly 40 to 50 percent of capacity or if needs are still changing. Buy if utilization is high and stable for many months. The full decision tree lives in A Framework for Ai Compute and Gpu Requirements.

Play 3: The Provisioning Play

Trigger: The rent-or-buy decision is made. Owner: Infrastructure or platform engineer.

Provisioning is where good plans die from small omissions. The play is a checklist, not a vibe.

The provisioning checklist

Match GPU memory to the requirements sheet, with headroom for growth.
Confirm interconnect bandwidth if the workload spans multiple cards.
Verify storage throughput can feed the GPU; slow disks waste fast cards.
Set up monitoring for utilization, memory, temperature, and cost from day one.
Establish quotas or budget alerts before anything runs.

The detailed version of this play is in The Ai Compute and Gpu Requirements Checklist for 2026.

Play 4: The Utilization Play

Trigger: Hardware is live and running workloads. Owner: The workload engineer, reviewed weekly.

A GPU at 30 percent utilization is a 70 percent refund nobody is collecting. This play runs continuously and is the highest-ROI activity in the entire playbook.

Profile the workload to find the bottleneck: data loading, preprocessing, batch size, or synchronization.
Tune batch size upward until memory is the binding constraint.
Move preprocessing off the critical path with prefetching and more loader workers.
For serving, enable request batching and tune the key-value cache.

The goal is sustained utilization above 70 to 80 percent. Anything chronically lower means you bought capacity you aren't using.

Play 5: The Scaling Play

Trigger: A single GPU or node can no longer meet the throughput target. Owner: Platform engineer plus the workload owner.

Scaling has two flavors, and confusing them wastes money. Scale up by moving to a bigger card before you scale out to more cards, because cross-GPU communication adds overhead.

When to scale out

The model exceeds the largest single card you can get.
Throughput demand exceeds one card even at full utilization.
You need redundancy for availability.

When you do scale out, prioritize fast interconnect within a node before spreading across nodes. The interconnect penalty is the difference between near-linear scaling and disappointing returns.

Play 6: The Cost-Review Play

Trigger: Monthly, on a fixed calendar date. Owner: Budget holder, with engineering input.

Compute costs creep. Idle instances, oversized cards, and forgotten experiments accumulate silently. The cost-review play is a recurring audit.

Reconcile actual spend against the forecast from the rent-or-buy play.
Flag any instance below 40 percent utilization for downsizing or shutdown.
Check for orphaned resources from completed experiments.
Reassess reserved-versus-on-demand mix based on observed patterns.

This play feeds back into Play 2 when patterns shift enough to change the rent-or-buy calculus.

Play 7: The Upgrade Play

Trigger: A new model generation, a new card, or a workload change makes current hardware suboptimal. Owner: Platform lead.

Don't chase every release. Run this play only when the math justifies it: a new card delivers meaningfully better throughput per dollar, or a new model your product depends on no longer fits. Re-run the sizing play with the new model, then the rent-or-buy play with current prices. Most upgrades fail the test and shouldn't happen.

Sequencing the plays

The plays aren't independent; they hand off to each other in a deliberate order. Sizing feeds rent-or-buy, which feeds provisioning, which kicks off the continuous utilization play. Scaling and upgrade plays both loop back to sizing because any structural change re-opens the original questions. The cost-review play sits across all of them as a recurring audit that can re-trigger rent-or-buy when patterns shift.

The reason to draw these handoffs explicitly is that skipped handoffs are where money leaks. Provisioning without finishing sizing produces oversized hardware. Scaling without re-running the rent-or-buy play produces owned clusters that should have been rented. The arrows between plays matter as much as the plays themselves.

Frequently Asked Questions

How is a playbook different from a checklist?

A checklist is a static list of items to verify. A playbook is a set of plays, each tied to a trigger, an owner, and a handoff, so the right action fires automatically when a situation arises. The checklist lives inside the provisioning play; the playbook is the larger structure that decides which checklist to run and when.

Who should own the compute playbook?

Ownership is split deliberately: engineers own sizing, utilization, and scaling because they're closest to the workload, while the budget holder owns rent-or-buy and the cost review because those are financial decisions. The failure mode is letting one side own both, which produces either reckless spending or starved infrastructure.

How often should the plays run?

Sizing, rent-or-buy, provisioning, and scaling are event-driven, firing on their triggers. The utilization play runs continuously with weekly review. The cost-review play runs on a fixed monthly date. Hard-coding the cost review to a calendar date is what keeps creep from going unnoticed for a quarter.

What's the most commonly skipped play?

The utilization play. Teams provision hardware, get something working, and never circle back to check whether they're using what they paid for. It's also the play with the highest return, because reclaiming idle capacity is cheaper than buying more.

Can a small team run this playbook?

Yes, with the roles collapsed onto fewer people. A two-person team might have one engineer owning sizing through scaling and the founder owning the financial plays. The structure matters more than the headcount; even one person benefits from naming the plays and their triggers rather than improvising.

Key Takeaways

Treat compute planning as named plays with triggers, owners, and handoffs, not as ad hoc decisions made under pressure.
The sizing play produces two numbers, a memory floor and a throughput target, that every later play depends on.
Rent-or-buy is a financial play decided over an asset's full life, not a per-hour price comparison.
The utilization play is continuous and the highest-ROI activity; idle GPUs are uncollected refunds.
The monthly cost-review play is the safeguard against silent cost creep and feeds decisions back into rent-or-buy.

Play 1: The Sizing Play

Trigger: A new model or workload is proposed. Owner: The engineer who will run the workload.

The sizing math

Inference memory: roughly 2 GB per billion parameters at 16-bit, plus 25 to 40 percent overhead.
Fine-tuning: 4x to 6x the inference footprint for full tuning; far less for parameter-efficient methods.
Concurrency: multiply key-value cache needs by expected simultaneous requests.

The output is a memory floor and a throughput target. Without these two numbers, every later play is guesswork.

Play 2: The Rent-or-Buy Play

Trigger: The sizing sheet is approved and exceeds existing capacity. Owner: Whoever holds the budget.

This is a financial play disguised as a technical one. Run it as a comparison over the asset's useful life, not a per-hour glance.

Estimate monthly GPU-hours at realistic, not peak, utilization.
Price the cloud path at those hours, including egress and storage.
Price the owned path including power, cooling, rack space, and the engineering time to operate it.
Find the break-even month.

Play 3: The Provisioning Play

Trigger: The rent-or-buy decision is made. Owner: Infrastructure or platform engineer.

Provisioning is where good plans die from small omissions. The play is a checklist, not a vibe.

The provisioning checklist

Match GPU memory to the requirements sheet, with headroom for growth.
Confirm interconnect bandwidth if the workload spans multiple cards.
Verify storage throughput can feed the GPU; slow disks waste fast cards.
Set up monitoring for utilization, memory, temperature, and cost from day one.
Establish quotas or budget alerts before anything runs.

The detailed version of this play is in The Ai Compute and Gpu Requirements Checklist for 2026.

Play 4: The Utilization Play

Trigger: Hardware is live and running workloads. Owner: The workload engineer, reviewed weekly.

A GPU at 30 percent utilization is a 70 percent refund nobody is collecting. This play runs continuously and is the highest-ROI activity in the entire playbook.

Profile the workload to find the bottleneck: data loading, preprocessing, batch size, or synchronization.
Tune batch size upward until memory is the binding constraint.
Move preprocessing off the critical path with prefetching and more loader workers.
For serving, enable request batching and tune the key-value cache.

The goal is sustained utilization above 70 to 80 percent. Anything chronically lower means you bought capacity you aren't using.

Play 5: The Scaling Play

Trigger: A single GPU or node can no longer meet the throughput target. Owner: Platform engineer plus the workload owner.

Scaling has two flavors, and confusing them wastes money. Scale up by moving to a bigger card before you scale out to more cards, because cross-GPU communication adds overhead.

When to scale out

The model exceeds the largest single card you can get.
Throughput demand exceeds one card even at full utilization.
You need redundancy for availability.

When you do scale out, prioritize fast interconnect within a node before spreading across nodes. The interconnect penalty is the difference between near-linear scaling and disappointing returns.

Play 6: The Cost-Review Play

Trigger: Monthly, on a fixed calendar date. Owner: Budget holder, with engineering input.

Compute costs creep. Idle instances, oversized cards, and forgotten experiments accumulate silently. The cost-review play is a recurring audit.

Reconcile actual spend against the forecast from the rent-or-buy play.
Flag any instance below 40 percent utilization for downsizing or shutdown.
Check for orphaned resources from completed experiments.
Reassess reserved-versus-on-demand mix based on observed patterns.

This play feeds back into Play 2 when patterns shift enough to change the rent-or-buy calculus.

Play 7: The Upgrade Play

Trigger: A new model generation, a new card, or a workload change makes current hardware suboptimal. Owner: Platform lead.

Sequencing the plays

Frequently Asked Questions

How is a playbook different from a checklist?

Who should own the compute playbook?

How often should the plays run?

What's the most commonly skipped play?

Can a small team run this playbook?

Key Takeaways

Treat compute planning as named plays with triggers, owners, and handoffs, not as ad hoc decisions made under pressure.
The sizing play produces two numbers, a memory floor and a throughput target, that every later play depends on.
Rent-or-buy is a financial play decided over an asset's full life, not a per-hour price comparison.
The utilization play is continuous and the highest-ROI activity; idle GPUs are uncollected refunds.
The monthly cost-review play is the safeguard against silent cost creep and feeds decisions back into rent-or-buy.

Triggered Plays for When a Workload or Bill Shifts

Play 1: The Sizing Play

The sizing math

Play 2: The Rent-or-Buy Play

Play 3: The Provisioning Play

The provisioning checklist

Play 4: The Utilization Play

Play 5: The Scaling Play

When to scale out

Play 6: The Cost-Review Play

Play 7: The Upgrade Play

Sequencing the plays

Frequently Asked Questions

How is a playbook different from a checklist?

Who should own the compute playbook?

How often should the plays run?

What's the most commonly skipped play?

Can a small team run this playbook?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?

Triggered Plays for When a Workload or Bill Shifts

Play 1: The Sizing Play

The sizing math

Play 2: The Rent-or-Buy Play

Play 3: The Provisioning Play

The provisioning checklist

Play 4: The Utilization Play

Play 5: The Scaling Play

When to scale out

Play 6: The Cost-Review Play

Play 7: The Upgrade Play

Sequencing the plays

Frequently Asked Questions

How is a playbook different from a checklist?

Who should own the compute playbook?

How often should the plays run?

What's the most commonly skipped play?

Can a small team run this playbook?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?