AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Treat the Schema as Code, Not ConfigurationWhy It WinsScope Every Schema as Tightly as PossibleWhy It WinsValidate Semantics, Always, SeparatelyWhy It WinsPut the Reasoning in Field DescriptionsWhy It WinsRetry With Context, Not BlindlyWhy It WinsMeasure Failures Over TimeWhy It WinsPrefer the Strongest Enforcement, Then Verify AnywayWhy It WinsThe Underlying PrincipleFrequently Asked QuestionsIs it really worth splitting a schema into multiple calls?Why not just trust strict schema enforcement and skip validation?How detailed should field descriptions be?What should I actually log for measurement?Does tight scoping conflict with wanting rich output?Key Takeaways
Home/Blog/Opinionated Habits for Structured Output That Holds Up
General

Opinionated Habits for Structured Output That Holds Up

A

Agency Script Editorial

Editorial Team

·February 6, 2024·8 min read
structured output and JSON modestructured output and JSON mode best practicesstructured output and JSON mode guideprompt engineering

There is no shortage of structured-output advice that amounts to "use a schema and validate the output." True, but useless—it tells you what without why, and the why is where the judgment lives. This piece takes positions. Each practice below comes with the reasoning that justifies it, so you can adapt it rather than cargo-cult it.

These are habits drawn from running structured output at volume, where the difference between a good practice and a generic one shows up in your error logs. Where a practice trades something off, we say so. The goal is not a tidy checklist of platitudes but a set of defensible defaults you can argue with.

If you want the ordered build sequence rather than the principles behind it, the step-by-step approach is the companion piece. This one is about judgment.

Treat the Schema as Code, Not Configuration

The strongest single practice is to define your schema as a typed object in your codebase—Pydantic, Zod, or equivalent—and derive everything else from it.

Why It Wins

When the schema is code, your validator and your model instruction come from the same place and cannot drift apart. You get autocomplete, type checking, and a single file to change when requirements shift. Treating the schema as a string pasted into a prompt forfeits all of that and guarantees eventual divergence between what you ask for and what you accept.

The trade-off is a little upfront setup. It pays back the first time a requirement changes and you update one definition instead of hunting for three.

Scope Every Schema as Tightly as Possible

Ask the model for the minimum number of fields your application actually consumes. Not the fields you might want someday—the ones you use today.

Why It Wins

Every field is a surface for error and a draw on the model's attention. Smaller schemas produce higher per-field accuracy and cost fewer tokens. A speculative field you do not yet consume adds risk and cost for zero benefit. When you genuinely need many fields, splitting into focused calls often beats one sprawling request, because each call lets the model concentrate.

The trade-off is more calls. Usually worth it; measure if you are unsure.

Validate Semantics, Always, Separately

Structural validation and semantic validation are different jobs and you need both. Run the schema validator to confirm shape, then run your own checks for meaning.

Why It Wins

Schema enforcement and structural validation cannot encode your business rules. A discount can be a valid number and still exceed your company's maximum. A date can be valid JSON and still be in the past when it must be in the future. Only domain-specific validation catches these, and they are exactly the errors that look clean and slip through. The common mistakes piece shows how often skipped semantic checks become production incidents.

Put the Reasoning in Field Descriptions

Use the description field of your schema to explain, in plain language, what belongs in each field and how to resolve edge cases.

Why It Wins

Descriptions are prompt instructions that travel with the schema, so they stay attached to the exact field they govern. A good description on an enum field—"use 'urgent' only when the customer mentions a deadline within 24 hours"—does more for accuracy than a paragraph of general prompt instructions, because it is anchored to the decision it affects.

The trade-off is a slightly larger schema and a few more tokens. Negligible against the accuracy gain.

Retry With Context, Not Blindly

When validation fails, do not simply re-run the identical request. Feed the specific error back to the model.

Why It Wins

A blind retry asks the model to make the same mistake again with the same information, so it often does. A retry that says "your previous answer set status to 'pending', but allowed values are open or closed" gives the model what it needs to correct course. Most fixable failures resolve on the first informed retry, which keeps your fallback path rare and cheap.

The trade-off is slightly more complex retry code. It is the difference between a pipeline that recovers and one that just fails twice.

Measure Failures Over Time

Log every validation failure and every retry, then review the logs weekly.

Why It Wins

Structured-output quality is tunable, but only if you can see where it breaks. The logs reveal which fields the model struggles with, which descriptions need rewriting, and whether a cheaper model would suffice for the easy cases. Teams that skip measurement keep paying for the same recurring failures because they never see the pattern. The framework for structured output builds this feedback loop in as a stage rather than an afterthought.

Prefer the Strongest Enforcement, Then Verify Anyway

Use the strictest enforcement your provider offers—but never treat it as a license to skip validation.

Why It Wins

Strong enforcement dramatically reduces malformed output, which makes your pipeline faster and cheaper because retries become rare. But enforcement is not infallible across every edge case and provider, and it says nothing about semantics. The belt-and-suspenders posture—strong enforcement plus full validation—costs little and removes a whole category of late-night surprises. The tooling survey covers which providers offer the strongest guarantees.

The Underlying Principle

Every practice here flows from one stance: the model's output is untrusted input until your code has verified it. That stance dictates the single schema source, the tight scoping, the mandatory semantic validation, the informed retries, and the measurement. Adopt the stance and the practices follow naturally; adopt the practices without the stance and you will skip the inconvenient ones.

Frequently Asked Questions

Is it really worth splitting a schema into multiple calls?

Often, yes, when accuracy on a large schema is suffering. Each call lets the model concentrate on fewer fields, which raises per-field accuracy. The trade-off is more requests and a bit more orchestration. Measure your per-field accuracy before and after splitting; if it improves meaningfully, the extra calls are justified.

Why not just trust strict schema enforcement and skip validation?

Because enforcement guarantees structure, not meaning, and across providers and edge cases it is not perfectly infallible. Validation is cheap and catches the semantic errors enforcement structurally cannot see. The combined cost is low and the downside it prevents—bad data flowing silently downstream—is high, so the belt-and-suspenders approach wins on expected value.

How detailed should field descriptions be?

Detailed enough to resolve the edge cases that actually arise, no more. For a simple unambiguous field, a short phrase suffices. For an enum or a field with tricky boundaries, spell out exactly when to choose each option. The test is whether someone unfamiliar with the task could fill the field correctly using only the description.

What should I actually log for measurement?

Log each validation failure with the field that failed, the value the model produced, and the retry outcome. Over a week this reveals which fields are weak spots and whether retries are succeeding. You do not need to log every successful response in full—the failures and retries carry the signal you tune against.

Does tight scoping conflict with wanting rich output?

Not really. Tight scoping means asking only for what you consume, not making output thin for its own sake. If you genuinely consume many fields, ask for them—just split the request when one large schema hurts accuracy. The discipline is against speculative fields you do not yet use, not against richness you actually need.

Key Takeaways

  • Define the schema as typed code and derive the validator and instruction from it so they never drift.
  • Scope schemas to the fields you actually consume; smaller schemas raise accuracy and cut cost.
  • Run structural and semantic validation separately, every time—enforcement cannot encode your business rules.
  • Retry with the specific error fed back to the model rather than blindly re-running.
  • Log and review failures weekly; structured-output quality is tunable only when you can see where it breaks.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification