AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Treat the Prompt and the Setting as One SystemWhyThe PracticeBias Toward the Lower Setting When in DoubtWhyThe PracticeSeparate Generation From CurationWhyThe PracticeTune One Control and Leave the Other NeutralWhyThe PracticeMake Settings Explicit and SharedWhyThe PracticeRe-Tune on Model Changes, Not on a CalendarWhyThe PracticeMatch the Number of Samples to the StakesWhyThe PracticePrefer Reversible Settings During ExplorationWhyThe PracticeDistinguish Stylistic Variety From Substantive VarietyWhyThe PracticeFrequently Asked QuestionsWhat single practice matters most?Why bias toward lower temperature specifically?When should I generate multiple candidates instead of one?Is calendar-based re-tuning ever worth it?How detailed should my documentation be?Key Takeaways
Home/Blog/Opinionated Rules for Tuning Model Randomness
General

Opinionated Rules for Tuning Model Randomness

A

Agency Script Editorial

Editorial Team

·June 16, 2023·7 min read
temperature and creativity controltemperature and creativity control best practicestemperature and creativity control guideprompt engineering

Best-practice lists for model settings tend to dissolve into platitudes: "use the right temperature for your task." True, useless. The practices below are opinionated on purpose. Each one takes a position, and each one explains the reasoning so you can decide when to follow it and when your situation justifies an exception.

These come from watching teams tune sampling controls across very different workloads — extraction pipelines, customer-facing assistants, content generation, code tools. The patterns that hold up across all of them are the ones worth codifying. The patterns that only worked once are not here.

Read these as defaults with rationale, not commandments. The reasoning matters more than the rule, because the reasoning is what survives when your context differs from ours.

Treat the Prompt and the Setting as One System

The single most important practice is to stop thinking of temperature as separate from the prompt. They jointly determine the output.

Why

Sampling operates on the probability distribution your prompt creates. A precise prompt with explicit constraints narrows that distribution, which means temperature has less room to cause trouble. A vague prompt leaves a wide distribution where even a moderate temperature can wander.

The Practice

  • Tighten the prompt before reaching for the dial.
  • Re-tune the setting whenever you substantially rewrite the prompt.
  • Treat a prompt-plus-setting pair as the unit you version and document. The step-by-step process is built around this pairing.

Bias Toward the Lower Setting When in Doubt

When two settings produce comparable quality, choose the lower temperature.

Why

Lower temperature means fewer surprises in production. The cost of slightly less variety is almost always smaller than the cost of an occasional output that goes off the rails in front of a user or downstream system. Reliability compounds; novelty rarely does.

The Practice

Default to the conservative side and only push higher when the task genuinely rewards range and a human is curating the results. Make every upward move a conscious decision, not a habit. This is the opposite of the common mistake of cranking temperature reflexively.

Separate Generation From Curation

For creative work, do not try to get one perfect output. Generate several at a higher setting and select.

Why

Creativity is a numbers game. A higher temperature gives you a wider spread of candidates, and the value comes from picking the best, not from any single draw being great. Forcing one shot to be perfect pushes you toward settings that are too cautious for ideation.

The Practice

  • Generate three to five candidates for creative tasks.
  • Let a human or a downstream filter do the selecting.
  • Reserve single-shot, low-temperature generation for tasks with a correct answer. The examples guide shows this split across real scenarios.

Tune One Control and Leave the Other Neutral

Adjust temperature or top-p, never both in the same experiment.

Why

The controls compound, so moving both makes any result uninterpretable. You learn nothing reusable when you cannot attribute a change to a cause. Interpretability is what lets a one-time experiment become a durable default.

The Practice

Default to tuning temperature with top-p near 1.0. Reach for top-p only when you specifically need to clamp the vocabulary while keeping some variety — a narrower need than most people assume.

Make Settings Explicit and Shared

Never let settings live only in someone's head or buried in a script.

Why

Invisible settings drift. Two people run the same task with different values and get different quality, and nobody can explain why because the difference is unrecorded. Explicit, shared settings turn an individual's tuning into a team asset.

The Practice

  • Record task, temperature, top-p, and prompt version together.
  • Keep them in a shared working checklist.
  • Review the list when onboarding anyone new to the workload.

Re-Tune on Model Changes, Not on a Calendar

Trigger re-tuning by events, not by the passage of time.

Why

A stable model with a stable prompt does not drift, so calendar-based re-tuning wastes effort. But a model upgrade can change how sensitive the model is to temperature, quietly invalidating your old default. Event triggers catch the real risks without busywork.

The Practice

Treat any model version change or substantial prompt rewrite as a standing trigger to run a quick sweep. Between those events, leave working settings alone. The foundational guide frames why model behavior, not time, is the variable that matters.

Match the Number of Samples to the Stakes

How many outputs you generate is itself a sampling decision, and it interacts with temperature more than most people realize.

Why

At a low temperature, generating multiple samples buys you little because the outputs barely differ. At a high temperature, a single sample is a gamble — you might draw the brilliant option or the weak one. The right number of samples is a function of how much variety the setting produces and how costly a miss is.

The Practice

  • For deterministic tasks, one sample is enough; more is waste.
  • For creative tasks at high temperature, generate three to five and curate.
  • For high-stakes single answers, prefer a lower temperature with one sample over a high temperature with selection, because selection still leaves room for a confidently wrong pick to slip through.

This is the operational side of separating generation from curation, applied to the question of how many draws to take.

Prefer Reversible Settings During Exploration

When you are still learning a task, choose settings you can change cheaply over settings baked into hard-to-touch infrastructure.

Why

Early tuning is iterative by nature. If your setting is buried in a gateway policy or a deployed config that takes a release to change, you slow your own learning loop. Keeping settings adjustable during exploration lets you run the sweeps that actually teach you the task.

The Practice

Tune in a place where you can change the number in seconds, lock in the result, and only then promote it to a more permanent home. Keeping the feedback loop fast is what makes the rest of these practices feasible to apply in the first place.

Distinguish Stylistic Variety From Substantive Variety

Not all variety is the same, and conflating the two leads to the wrong setting.

Why

Some tasks want variety in phrasing while keeping the substance fixed — three ways to word the same correct answer. Others want variety in substance — genuinely different ideas. Temperature produces both kinds at once, which means a setting high enough for substantive variety often introduces unwanted substantive drift in a task that only wanted stylistic variety.

The Practice

When you only need rephrasing of a fixed answer, prefer a low temperature with an explicit instruction to vary the wording, rather than a high temperature that risks changing the meaning. Reserve high temperature for tasks where you genuinely want the substance to range. This distinction is one of the quieter reasons two reasonable people pick very different settings for tasks that sound similar, a point the examples guide illustrates across concrete cases.

Frequently Asked Questions

What single practice matters most?

Treating the prompt and the setting as one system. Most sampling problems are actually prompt problems wearing a temperature costume. Fix the instruction first and many tuning headaches disappear.

Why bias toward lower temperature specifically?

Because the downside of less variety is usually mild, while the downside of an unpredictable output can be severe — a broken integration, a wrong answer to a user, an off-brand message. Conservative defaults protect you where it counts.

When should I generate multiple candidates instead of one?

Whenever the task is creative and a human or filter will curate. Brainstorming, naming, and copy variations all benefit from a spread of candidates. Tasks with a correct answer should stay single-shot and low-temperature.

Is calendar-based re-tuning ever worth it?

Rarely. Stable models and prompts do not drift on their own. Tie re-tuning to model upgrades and prompt rewrites instead, which is where the actual risk of a stale setting lives.

How detailed should my documentation be?

Enough to reproduce the decision: task, temperature, top-p, prompt version, and date. That is sufficient for someone else to understand and rerun your tuning without guessing.

Key Takeaways

  • The prompt and the setting are one system; tighten the instruction before adjusting the dial.
  • When quality is comparable, bias toward the lower temperature for production reliability.
  • Separate generation from curation: produce several candidates for creative work and select.
  • Tune one control at a time, document settings in a shared place, and make every upward move deliberate.
  • Re-tune on model upgrades and prompt rewrites, not on a calendar.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification