Five Annotated System Prompts and Why They Behave

Principles are easier to nod along to than to apply. You can agree that a system prompt should be specific and still write a vague one, because the gap between knowing the rule and seeing it in action is wide. This article closes that gap with concrete examples.

We will walk through five distinct system prompt scenarios, each from a different domain, and dissect what made the prompt work or fail. These are not copy-paste templates. They are case material to learn from, annotated so you can see the reasoning behind each choice.

By the end you should recognize the moves: how a good prompt scopes itself, handles edges, and shapes output, and how a weak one quietly invites trouble.

A note on how to read these. Resist the urge to skim for a prompt you can lift wholesale. The value is in the reasoning, not the wording, because the wording was tuned to a context that is not yours. As you go through each example, pause on the choices: why this constraint and not another, why this output shape, why this edge got explicit handling. Those choices are the transferable part. Borrow the thinking and you can write a prompt for any domain; borrow the text and you get a prompt that fits someone else's problem.

Example One: A Billing Support Assistant

Scenario: a SaaS company wants an assistant that answers billing questions but never makes promises it cannot keep.

The effective prompt opened with a tight role, "You are a billing support assistant for Acme SaaS," then immediately stated the hard constraints: never promise refunds, never quote prices not in the provided pricing data, and escalate any dispute to a human.

What made it work was the refusal logic. By telling the model exactly what it could not commit to, the prompt prevented the most expensive failure mode, an AI promising a refund the company never authorized. The closing line restated the single most critical rule, a technique from System Prompts: Best Practices That Actually Work.

Example Two: A Code Review Assistant

Scenario: a development team wants quick, focused feedback on pull requests without nitpicking.

The prompt defined scope sharply: comment only on correctness, security, and clear readability problems; ignore formatting and style preferences. This scoping is what kept the assistant useful. An earlier version without it buried real bugs under dozens of trivial style comments, and developers stopped reading the output.

The output contract mattered

The prompt required each comment to name the file, the issue, and a concrete suggested fix, in that order. Structured output made the feedback actionable instead of a wall of prose. When the contract was loose, developers had to hunt for what the assistant actually meant.

Example Three: A Support Ticket Classifier

Scenario: incoming tickets need to be sorted into categories for routing.

This is a classification task, and the winning prompt treated it like one. It listed the exact allowed categories, instructed the model to choose exactly one, and specified what to do when a ticket fit none, return "other" rather than invent a category.

The failure mode it avoided was category drift. Without the closed list and the explicit "other" fallback, the model would coin new categories on the fly, breaking the downstream routing system. Constraining the output to a fixed set is a small change with outsized reliability gains, the kind of edge handling explored in Case Study: System Prompts in Practice.

Example Four: A Creative Brainstorming Partner

Scenario: a marketing team wants an idea generator that is genuinely divergent, not safe and generic.

Here the usual instinct toward tight constraints would backfire. The effective prompt deliberately widened the lane: generate ten distinctly different angles, avoid repeating the same structure twice, and do not self-censor toward the obvious.

The lesson is that the right amount of constraint depends on the job. A classifier wants a closed box; a brainstorming partner wants room. Applying support-bot rigidity to creative work produces dull output, while applying creative looseness to a classifier produces chaos. Matching the prompt to the task is the core idea behind A Framework for System Prompts.

Example Five: A Prompt That Failed

Scenario: a general "company assistant" meant to do everything for everyone.

This one is instructive because it failed. The prompt tried to cover support, sales, HR questions, and technical help in one sprawling instruction set. It had no clear scope, contradictory tone guidance ("be formal" and "be casual and fun" both appeared), and no edge handling.

The result was an assistant that was mediocre at everything and unpredictable in tone. The fix was not a better prompt; it was three focused assistants, each with a tight mandate. The takeaway: scope is a feature, and trying to do everything is the most common way a system prompt fails, a pattern catalogued in 7 Common Mistakes with System Prompts (and How to Avoid Them).

What the Examples Have in Common

Across the successes, the same threads recur. Each effective prompt had a single clear mandate, stated its hard constraints early, defined a structured output, and handled the case where the model did not have a clean answer. The amount of freedom varied by task, but the discipline did not.

The failure shared the opposite traits: no scope, internal contradictions, and no plan for the edges. Studying both is more instructive than studying either alone, because the contrast makes the principles concrete.

There is also a lesson about the amount of constraint that none of the examples states outright but all of them embody. Constraint is a dial, not a switch. The classifier and the support assistant wanted it turned high; the brainstorming partner wanted it turned low. The mistake is not picking the wrong absolute level but failing to ask the question at all and applying whatever level you defaulted to. Before writing constraints, decide deliberately how much freedom the task rewards, then set the dial to match. That single deliberate decision prevents both the dull over-constrained output and the chaotic under-constrained kind.

Frequently Asked Questions

Can I reuse these example prompts directly?

You can borrow their structure and reasoning, but not paste them in unchanged. Each was tuned to a specific company, dataset, and tone. Adapt the role, constraints, and output contract to your own situation, then test the result against realistic inputs.

Why did the classifier need an "other" category?

Because real inputs do not always fit your predefined buckets. Without an explicit escape hatch, the model invents new categories to force a fit, which breaks any system that expects a fixed set. The "other" bucket gives unclassifiable inputs a safe, predictable home.

How do I decide how much freedom to give the model?

Match freedom to the task's nature. Deterministic tasks like classification or extraction want tight constraints; generative tasks like brainstorming want room. Ask whether variety in the output is a feature or a bug, and constrain accordingly.

What was the real problem with the failed everything-assistant?

Lack of scope. By trying to serve every function at once, it had no coherent identity, contradictory guidance, and no clear boundaries. Splitting it into focused single-purpose assistants solved the problem because each could be scoped, tested, and tuned independently.

Should output structure be in the system prompt or requested per message?

Put stable output structure that should govern every response in the system prompt. Request one-off formatting in the user message. The dividing question is whether the structure is a permanent property of the assistant or a per-request preference.

Key Takeaways

Effective system prompts share a pattern: one clear mandate, early hard constraints, structured output, and explicit handling of the unknown.
Refusal logic in a support assistant prevents the costliest failures, like promising refunds that were never authorized.
Closed category lists with an "other" fallback keep classifiers from inventing buckets and breaking downstream systems.
The right amount of constraint depends on the task: tight for classification, loose for creative work.
The most common failure is trying to do everything in one prompt; scope is a feature, and focused assistants beat sprawling ones.

By the end you should recognize the moves: how a good prompt scopes itself, handles edges, and shapes output, and how a weak one quietly invites trouble.

Example One: A Billing Support Assistant

Scenario: a SaaS company wants an assistant that answers billing questions but never makes promises it cannot keep.

Example Two: A Code Review Assistant

Scenario: a development team wants quick, focused feedback on pull requests without nitpicking.

The output contract mattered

Example Three: A Support Ticket Classifier

Scenario: incoming tickets need to be sorted into categories for routing.

Example Four: A Creative Brainstorming Partner

Scenario: a marketing team wants an idea generator that is genuinely divergent, not safe and generic.

Example Five: A Prompt That Failed

Scenario: a general "company assistant" meant to do everything for everyone.

What the Examples Have in Common

Frequently Asked Questions

Can I reuse these example prompts directly?

Why did the classifier need an "other" category?

How do I decide how much freedom to give the model?

What was the real problem with the failed everything-assistant?

Should output structure be in the system prompt or requested per message?

Key Takeaways

Effective system prompts share a pattern: one clear mandate, early hard constraints, structured output, and explicit handling of the unknown.
Refusal logic in a support assistant prevents the costliest failures, like promising refunds that were never authorized.
Closed category lists with an "other" fallback keep classifiers from inventing buckets and breaking downstream systems.
The right amount of constraint depends on the task: tight for classification, loose for creative work.
The most common failure is trying to do everything in one prompt; scope is a feature, and focused assistants beat sprawling ones.

Five Annotated System Prompts and Why They Behave

Example One: A Billing Support Assistant

Example Two: A Code Review Assistant

The output contract mattered

Example Three: A Support Ticket Classifier

Example Four: A Creative Brainstorming Partner

Example Five: A Prompt That Failed

What the Examples Have in Common

Frequently Asked Questions

Can I reuse these example prompts directly?

Why did the classifier need an "other" category?

How do I decide how much freedom to give the model?

What was the real problem with the failed everything-assistant?

Should output structure be in the system prompt or requested per message?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?

Five Annotated System Prompts and Why They Behave

Example One: A Billing Support Assistant

Example Two: A Code Review Assistant

The output contract mattered

Example Three: A Support Ticket Classifier

Example Four: A Creative Brainstorming Partner

Example Five: A Prompt That Failed

What the Examples Have in Common

Frequently Asked Questions

Can I reuse these example prompts directly?

Why did the classifier need an "other" category?

How do I decide how much freedom to give the model?

What was the real problem with the failed everything-assistant?

Should output structure be in the system prompt or requested per message?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

Case Study: Large Language Models in Practice

Thirty-Second Wins Breed False Confidence With LLMs

Ready to certify your AI capability?