AGENCYSCRIPT
CoursesEnterpriseBlog
πŸ‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
Β© 2026 Agency Script, Inc.Β·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Example One: A Billing Support AssistantExample Two: A Code Review AssistantThe output contract matteredExample Three: A Support Ticket ClassifierExample Four: A Creative Brainstorming PartnerExample Five: A Prompt That FailedWhat the Examples Have in CommonFrequently Asked QuestionsCan I reuse these example prompts directly?Why did the classifier need an "other" category?How do I decide how much freedom to give the model?What was the real problem with the failed everything-assistant?Should output structure be in the system prompt or requested per message?Key Takeaways
Home/Blog/Five Annotated System Prompts and Why They Behave
General

Five Annotated System Prompts and Why They Behave

A

Agency Script Editorial

Editorial Team

Β·July 16, 2024Β·7 min read
system promptssystem prompts examplessystem prompts guideprompt engineering

Principles are easier to nod along to than to apply. You can agree that a system prompt should be specific and still write a vague one, because the gap between knowing the rule and seeing it in action is wide. This article closes that gap with concrete examples.

We will walk through five distinct system prompt scenarios, each from a different domain, and dissect what made the prompt work or fail. These are not copy-paste templates. They are case material to learn from, annotated so you can see the reasoning behind each choice.

By the end you should recognize the moves: how a good prompt scopes itself, handles edges, and shapes output, and how a weak one quietly invites trouble.

A note on how to read these. Resist the urge to skim for a prompt you can lift wholesale. The value is in the reasoning, not the wording, because the wording was tuned to a context that is not yours. As you go through each example, pause on the choices: why this constraint and not another, why this output shape, why this edge got explicit handling. Those choices are the transferable part. Borrow the thinking and you can write a prompt for any domain; borrow the text and you get a prompt that fits someone else's problem.

Example One: A Billing Support Assistant

Scenario: a SaaS company wants an assistant that answers billing questions but never makes promises it cannot keep.

The effective prompt opened with a tight role, "You are a billing support assistant for Acme SaaS," then immediately stated the hard constraints: never promise refunds, never quote prices not in the provided pricing data, and escalate any dispute to a human.

What made it work was the refusal logic. By telling the model exactly what it could not commit to, the prompt prevented the most expensive failure mode, an AI promising a refund the company never authorized. The closing line restated the single most critical rule, a technique from System Prompts: Best Practices That Actually Work.

Example Two: A Code Review Assistant

Scenario: a development team wants quick, focused feedback on pull requests without nitpicking.

The prompt defined scope sharply: comment only on correctness, security, and clear readability problems; ignore formatting and style preferences. This scoping is what kept the assistant useful. An earlier version without it buried real bugs under dozens of trivial style comments, and developers stopped reading the output.

The output contract mattered

The prompt required each comment to name the file, the issue, and a concrete suggested fix, in that order. Structured output made the feedback actionable instead of a wall of prose. When the contract was loose, developers had to hunt for what the assistant actually meant.

Example Three: A Support Ticket Classifier

Scenario: incoming tickets need to be sorted into categories for routing.

This is a classification task, and the winning prompt treated it like one. It listed the exact allowed categories, instructed the model to choose exactly one, and specified what to do when a ticket fit none, return "other" rather than invent a category.

The failure mode it avoided was category drift. Without the closed list and the explicit "other" fallback, the model would coin new categories on the fly, breaking the downstream routing system. Constraining the output to a fixed set is a small change with outsized reliability gains, the kind of edge handling explored in Case Study: System Prompts in Practice.

Example Four: A Creative Brainstorming Partner

Scenario: a marketing team wants an idea generator that is genuinely divergent, not safe and generic.

Here the usual instinct toward tight constraints would backfire. The effective prompt deliberately widened the lane: generate ten distinctly different angles, avoid repeating the same structure twice, and do not self-censor toward the obvious.

The lesson is that the right amount of constraint depends on the job. A classifier wants a closed box; a brainstorming partner wants room. Applying support-bot rigidity to creative work produces dull output, while applying creative looseness to a classifier produces chaos. Matching the prompt to the task is the core idea behind A Framework for System Prompts.

Example Five: A Prompt That Failed

Scenario: a general "company assistant" meant to do everything for everyone.

This one is instructive because it failed. The prompt tried to cover support, sales, HR questions, and technical help in one sprawling instruction set. It had no clear scope, contradictory tone guidance ("be formal" and "be casual and fun" both appeared), and no edge handling.

The result was an assistant that was mediocre at everything and unpredictable in tone. The fix was not a better prompt; it was three focused assistants, each with a tight mandate. The takeaway: scope is a feature, and trying to do everything is the most common way a system prompt fails, a pattern catalogued in 7 Common Mistakes with System Prompts (and How to Avoid Them).

What the Examples Have in Common

Across the successes, the same threads recur. Each effective prompt had a single clear mandate, stated its hard constraints early, defined a structured output, and handled the case where the model did not have a clean answer. The amount of freedom varied by task, but the discipline did not.

The failure shared the opposite traits: no scope, internal contradictions, and no plan for the edges. Studying both is more instructive than studying either alone, because the contrast makes the principles concrete.

There is also a lesson about the amount of constraint that none of the examples states outright but all of them embody. Constraint is a dial, not a switch. The classifier and the support assistant wanted it turned high; the brainstorming partner wanted it turned low. The mistake is not picking the wrong absolute level but failing to ask the question at all and applying whatever level you defaulted to. Before writing constraints, decide deliberately how much freedom the task rewards, then set the dial to match. That single deliberate decision prevents both the dull over-constrained output and the chaotic under-constrained kind.

Frequently Asked Questions

Can I reuse these example prompts directly?

You can borrow their structure and reasoning, but not paste them in unchanged. Each was tuned to a specific company, dataset, and tone. Adapt the role, constraints, and output contract to your own situation, then test the result against realistic inputs.

Why did the classifier need an "other" category?

Because real inputs do not always fit your predefined buckets. Without an explicit escape hatch, the model invents new categories to force a fit, which breaks any system that expects a fixed set. The "other" bucket gives unclassifiable inputs a safe, predictable home.

How do I decide how much freedom to give the model?

Match freedom to the task's nature. Deterministic tasks like classification or extraction want tight constraints; generative tasks like brainstorming want room. Ask whether variety in the output is a feature or a bug, and constrain accordingly.

What was the real problem with the failed everything-assistant?

Lack of scope. By trying to serve every function at once, it had no coherent identity, contradictory guidance, and no clear boundaries. Splitting it into focused single-purpose assistants solved the problem because each could be scoped, tested, and tuned independently.

Should output structure be in the system prompt or requested per message?

Put stable output structure that should govern every response in the system prompt. Request one-off formatting in the user message. The dividing question is whether the structure is a permanent property of the assistant or a per-request preference.

Key Takeaways

  • Effective system prompts share a pattern: one clear mandate, early hard constraints, structured output, and explicit handling of the unknown.
  • Refusal logic in a support assistant prevents the costliest failures, like promising refunds that were never authorized.
  • Closed category lists with an "other" fallback keep classifiers from inventing buckets and breaking downstream systems.
  • The right amount of constraint depends on the task: tight for classification, loose for creative work.
  • The most common failure is trying to do everything in one prompt; scope is a feature, and focused assistants beat sprawling ones.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way β€” a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026Β·11 min read
General

Case Study: Large Language Models in Practice

Most teams that fail with large language models don't fail because the technology doesn't work. They fail because they treat deployment as a one-time event rather than a discipline β€” pick a model, wri

A
Agency Script Editorial
June 1, 2026Β·11 min read
General

Thirty-Second Wins Breed False Confidence With LLMs

Working with large language models is deceptively easy to start and surprisingly hard to do well. You can get a useful output in thirty seconds, which creates a false confidence that compounds over ti

A
Agency Script Editorial
June 1, 2026Β·10 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification