AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Stage 1: Standardize the intakeThe intake templateStage 2: Encode the sizing logicStage 3: Make the hardware decision a lookupThe decision tableStage 4: Document the provisioning stepsStage 5: Build the optimization loopThe recurring loopStage 6: Make it hand-off-ableCommon failure modes the workflow preventsFrequently Asked QuestionsHow much should I document versus automate?What's the minimum viable version of this workflow?How do I keep the workflow from going stale?Who owns the workflow?Does a small team really need this?Key Takeaways
Home/Blog/Sizing GPUs the Fifth Time Should Be Routine, Not Research
General

Sizing GPUs the Fifth Time Should Be Routine, Not Research

A

Agency Script Editorial

Editorial Team

·May 31, 2025·7 min read
ai compute and gpu requirementsai compute and gpu requirements workflowai compute and gpu requirements guideai fundamentals

The first time a team sizes GPUs, it's research. The fifth time, it should be a workflow. If every new model still triggers a panicked round of VRAM math and vendor comparisons, you haven't built a process, you've just survived a problem five times. The difference between those two states is documentation and repeatability.

This article is about converting compute planning into a workflow that survives handoffs. The test is simple: could a new engineer, handed your documentation, size and provision a workload correctly without tapping the person who did it last time? If not, you have tribal knowledge, and tribal knowledge evaporates when people leave.

We'll build the workflow in stages: capture the inputs, standardize the decisions, document the handoff, and close the loop with review. For the underlying concepts each stage relies on, A Step-by-Step Approach to Ai Compute and Gpu Requirements is the companion reference.

Stage 1: Standardize the intake

A repeatable workflow starts with a repeatable input. Every compute request should arrive in the same shape, captured in the same template, so no one is guessing what's missing.

The intake template

Require these fields for every request:

  • Model name and parameter count.
  • Target precision (16-bit, 8-bit, 4-bit).
  • Workload type: inference, fine-tuning, or training.
  • Expected concurrency and latency target.
  • Duration: one-off experiment or ongoing service.
  • Budget ceiling.

When intake is standardized, the person sizing the workload never has to chase down basics. The template does the chasing. This single change eliminates most of the back-and-forth that makes ad hoc planning slow.

Stage 2: Encode the sizing logic

The math for memory and throughput shouldn't live in someone's head. Encode it as a documented procedure or, better, a small calculator that takes the intake fields and outputs a memory floor and a throughput target.

  • Memory floor: roughly 2 GB per billion parameters at 16-bit, adjusted for precision, plus 25 to 40 percent overhead.
  • Throughput target: derived from concurrency and latency requirements.
  • Training multiplier: 4x to 6x for full tuning, far less for parameter-efficient methods.

The point isn't precision to the gigabyte. The point is that two different engineers, given the same intake, produce the same sizing. That consistency is what makes the workflow trustworthy. The common errors this stage prevents are catalogued in 7 Common Mistakes with Ai Compute and Gpu Requirements (and How to Avoid Them).

Stage 3: Make the hardware decision a lookup

Once you have a memory floor and throughput target, choosing hardware should be a lookup, not a debate. Maintain a current table mapping requirement ranges to recommended options, with the rent-or-buy guidance attached.

The decision table

  • Small models, low concurrency: consumer-class card or CPU, rent first.
  • Mid-size models, moderate load: high-memory consumer or entry data-center card.
  • Large models or high concurrency: data-center card, evaluate owning if utilization is sustained.
  • Training at scale: multi-GPU node with fast interconnect.

Update this table when prices or cards shift, ideally during the monthly cost review. The table turns a recurring research task into a five-minute reference check.

Stage 4: Document the provisioning steps

Provisioning is where undocumented workflows leak. The person who set up the last environment knows the quirks; nobody else does. Write the steps down as an executable runbook.

  • The exact provisioning commands or console steps.
  • Monitoring setup for utilization, memory, temperature, and cost.
  • Budget alerts and quotas.
  • Validation: a quick test confirming the GPU is reachable, the model loads, and utilization climbs under load.

Better still, capture as much of this as code. Infrastructure defined in configuration files is self-documenting and reproducible in a way that a wiki page never is.

Stage 5: Build the optimization loop

A workflow that stops at provisioning is incomplete. The most expensive recurring failure is idle hardware, and catching it requires a standing loop, not a one-time check.

The recurring loop

  • Weekly: review utilization dashboards; flag anything chronically below 70 percent.
  • On each flag: profile to find the bottleneck (data loading, batch size, preprocessing, synchronization) and tune.
  • Monthly: reconcile cost against forecast; downsize or shut down underused resources.

Document who runs this loop and on what cadence. An optimization step with no owner is an optimization step that doesn't happen.

Stage 6: Make it hand-off-able

The final stage is the test of whether you actually have a workflow. Bundle the intake template, the sizing calculator, the decision table, the provisioning runbook, and the optimization loop into one place a new person can find.

  • Store everything in a single, discoverable location, not scattered across chats and notebooks.
  • Include a worked example: one real request taken end to end.
  • Note the failure modes and how the workflow caught them.

A worked example does more teaching than any abstract description. When you can hand someone the bundle plus one example and they can run the next request unaided, the workflow is real. For inspiration on what good documentation of real cases looks like, see Case Study: Ai Compute and Gpu Requirements in Practice.

Common failure modes the workflow prevents

It helps to be explicit about what goes wrong without a workflow, because those failures are what the structure is buying you protection against.

  • The repeated research tax. Every new model triggers the same VRAM math from scratch because nobody wrote it down. The encoded sizing logic in Stage 2 eliminates this entirely.
  • The handoff cliff. The one engineer who knows how to provision leaves, and provisioning becomes a multi-day archaeology project. The runbook in Stage 4 turns that into a checklist anyone can follow.
  • The silent idle GPU. Hardware runs at a fraction of capacity for months because no one is watching utilization. The optimization loop in Stage 5 catches it within a week.
  • The inconsistent estimate. Two engineers size the same workload differently and one is badly wrong. Standardized intake plus encoded math forces convergence.

Naming these failures in your documentation does double duty: it justifies the workflow to skeptics and tells the next person what each stage is actually for.

Frequently Asked Questions

How much should I document versus automate?

Automate the parts that are mechanical and repetitive: the sizing math, the provisioning steps, the monitoring setup. Document the parts that require judgment: the rent-or-buy reasoning, the trade-offs behind the decision table. The goal is that mechanical steps run themselves and judgment steps are at least guided by written reasoning rather than improvised each time.

What's the minimum viable version of this workflow?

An intake template and a sizing procedure. Those two artifacts alone eliminate most of the chaos, because they standardize what goes in and how it's evaluated. You can add the decision table, provisioning runbook, and optimization loop as the workflow matures, but the intake-plus-sizing pair is the irreducible core.

How do I keep the workflow from going stale?

Tie its maintenance to a recurring event you already run, like the monthly cost review. During that review, update the decision table with current prices and cards, and check whether any provisioning steps have changed. A workflow maintained on a schedule stays current; one maintained only when it breaks is always slightly wrong.

Who owns the workflow?

Someone has to, or it decays. Usually a platform or infrastructure lead owns the documentation and the decision table, while individual engineers own running the workflow for their own requests. The owner's job is keeping the shared artifacts accurate, not personally sizing every workload.

Does a small team really need this?

A small team needs it more, because a small team can't afford the time lost re-researching the same questions. The workflow doesn't have to be elaborate; even a single shared document with the intake template, sizing math, and a decision table saves hours and prevents the worst sizing mistakes.

Key Takeaways

  • A repeatable workflow starts with a standardized intake template so no request arrives missing the basics.
  • Encode the sizing math so any engineer produces the same memory floor and throughput target from the same inputs.
  • Turn hardware selection into a lookup table maintained on a schedule, not a recurring debate.
  • Document provisioning as an executable runbook, ideally as infrastructure-as-code that is self-documenting.
  • Close the loop with a weekly utilization review and monthly cost reconciliation, each with a named owner.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification