AGENCYSCRIPT
CoursesEnterpriseBlog
πŸ‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
Β© 2026 Agency Script, Inc.Β·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Step 1: Write the One-Sentence MandateStep 2: List the Hard RulesStep 3: Define the Output ShapeShow one exampleStep 4: Add Edge-Case HandlingStep 5: Assemble in the Right OrderStep 6: Test Against Real InputsWatch for regressionsStep 7: Lock It Down and Version ItFrequently Asked QuestionsHow long does this whole process take the first time?What if I do not have real inputs to test with yet?Should I write the prompt in plain English or use special formatting?What do I do when two of my rules conflict?How do I know when the prompt is good enough to ship?Key Takeaways
Home/Blog/Draft, Test, Tighten: Writing a System Prompt in Order
General

Draft, Test, Tighten: Writing a System Prompt in Order

A

Agency Script Editorial

Editorial Team

Β·July 28, 2024Β·7 min read
system promptssystem prompts how tosystem prompts guideprompt engineering

Most advice about system prompts tells you what a good one contains. Far less of it tells you the order to build one in. Staring at a blank page knowing that a prompt should have a role, constraints, and an output format does not tell you what to type first.

This article is a sequence. Follow the steps in order and you will end the day with a system prompt that has been drafted, tested against real inputs, and tightened where it leaked. The process is deliberately small at each step so you never have to hold the whole thing in your head at once.

We will build a single example as we go, a system prompt for an assistant that helps a marketing team turn rough notes into polished email drafts, so each step has something concrete to act on.

Before the first step, gather two things: a blank document where the prompt will live, and a rough sense of who will use the assistant and what they need from it. You do not need a formal spec. You need enough understanding of the job that you can describe it in a sentence, which is exactly where step one begins. If you cannot yet describe the job in a sentence, spend a few minutes talking to whoever will use the tool before you write anything. That conversation prevents you from building the wrong assistant carefully.

Step 1: Write the One-Sentence Mandate

Before any rules or formatting, write a single sentence describing what the assistant is for. Resist the urge to add detail. The discipline of one sentence forces clarity.

For our example: "You are an assistant that turns a marketer's rough bullet-point notes into a clear, on-brand email draft." That is the mandate. Everything you add later must serve it. If a future clause does not, you will know to cut it.

Step 2: List the Hard Rules

Now write the non-negotiables. These are the behaviors that, if violated, make the output unacceptable. Write them as commands, not suggestions.

For the email assistant:

  • Never invent statistics, prices, or quotes that are not in the notes
  • Always keep the email under 200 words unless the notes ask for more
  • Never use exclamation points more than once per email

Notice each rule is checkable. You can read an output and decide objectively whether it followed the rule. If a rule cannot be checked, rewrite it until it can. This testability principle is the backbone of System Prompts: Best Practices That Actually Work.

Step 3: Define the Output Shape

Decide what a finished response looks like and describe it precisely. The model should not have to guess at format.

For our example, the output shape is: a subject line on the first line, a blank line, then the email body, with no preamble like "Here is your draft." Spelling this out prevents the chatty wrapper text that makes responses harder to use downstream.

Show one example

If the shape is at all specific, include a single example of an ideal output inside the prompt. One concrete example teaches structure faster than any amount of description, and it gives the model a target to imitate.

Step 4: Add Edge-Case Handling

Your prompt so far handles the normal case. Now think about what breaks it. What if the notes are empty? What if they contradict each other? What if they request something against a hard rule?

Add explicit handling:

  • If the notes are too sparse to write a useful email, ask one clarifying question instead of guessing
  • If the notes conflict with a hard rule, follow the rule and note the conflict briefly

Edge handling is what separates a demo from a product. Skipping it is the most common cause of production failures, a pattern explored in Case Study: System Prompts in Practice.

Step 5: Assemble in the Right Order

Now stitch the pieces together. Order matters because models weight the start and end of the prompt more heavily. A reliable arrangement:

  • Mandate (the one sentence) at the top
  • Hard rules immediately after, while attention is high
  • Output shape and example in the middle
  • Edge-case handling next
  • A one-line restatement of the single most critical rule at the very end

That closing restatement is cheap insurance. For the email assistant, it might be: "Remember: never invent facts that are not in the notes."

Step 6: Test Against Real Inputs

Do not ship on faith. Gather five to ten realistic sets of notes, including a deliberately messy one and an empty one, and run them through. Read each output against your hard rules.

When something fails, change exactly one thing and re-run the whole set. Changing one variable at a time is the only way to know what actually fixed the problem. Keep the test inputs in a file so you can re-run them every time you touch the prompt.

Watch for regressions

A fix that solves one case can quietly break another. That is why you re-run the full set, not just the case you were fixing. This habit of regression testing is what makes a prompt trustworthy over time, and it scales into the discipline described in A Framework for System Prompts.

Step 7: Lock It Down and Version It

Once the prompt passes your test set, freeze it as version one. Save it somewhere with a note on what it does and when you last changed it. The next time behavior drifts, the version history is the fastest path to the cause.

From here, every change follows the same loop: edit one thing, re-run the test set, confirm no regressions, increment the version. That loop never really ends, but after the first pass it gets fast.

A note on discipline as the prompt ages. The temptation, months in, is to make quick edits directly in production because a change "obviously" works. That is exactly when regressions slip in. The loop exists to protect you from your own confidence. Even a thirty-second test run before shipping a "trivial" change has saved countless deployments from quiet breakage. The cost of the loop is tiny; the cost of skipping it is a degradation you will not notice until a user does. Make the loop a reflex and the prompt stays trustworthy as long as you maintain it.

Frequently Asked Questions

How long does this whole process take the first time?

For a focused single-purpose assistant, a careful first pass through all seven steps takes a few hours, most of it in testing. Subsequent revisions take minutes because the test set already exists and you are changing one thing at a time.

What if I do not have real inputs to test with yet?

Write plausible ones yourself, and make a few of them deliberately awkward. Synthetic test cases are far better than no test cases. Once the tool sees real traffic, replace your invented inputs with real ones that exposed problems.

Should I write the prompt in plain English or use special formatting?

Plain English with light structure works well. Use short headers or labels to separate sections so the model can tell rules from format from examples. Avoid heavy markup that adds noise without adding clarity.

What do I do when two of my rules conflict?

You resolve the conflict in the prompt, not at runtime. Decide which rule wins, state that priority explicitly, and remove the ambiguity. Leaving a conflict in place means the model picks for you, often inconsistently.

How do I know when the prompt is good enough to ship?

When it passes your full test set, including the messy and empty cases, without violating any hard rule. Good enough does not mean perfect; it means reliable on the inputs you can foresee, with safe fallback behavior for the ones you cannot.

Key Takeaways

  • Build a system prompt in order: mandate first, then hard rules, output shape, edge handling, assembly, testing, and versioning.
  • Compress the purpose into a single sentence before adding anything else, so every later clause has to justify itself.
  • Write rules as checkable commands and include one concrete example of an ideal output.
  • Test against five to ten realistic inputs, change one thing at a time, and re-run the full set to catch regressions.
  • Freeze the working prompt as a version, then repeat the edit-test-confirm loop for every future change.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way β€” a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026Β·11 min read
General

Case Study: Large Language Models in Practice

Most teams that fail with large language models don't fail because the technology doesn't work. They fail because they treat deployment as a one-time event rather than a discipline β€” pick a model, wri

A
Agency Script Editorial
June 1, 2026Β·11 min read
General

Thirty-Second Wins Breed False Confidence With LLMs

Working with large language models is deceptively easy to start and surprisingly hard to do well. You can get a useful output in thirty seconds, which creates a false confidence that compounds over ti

A
Agency Script Editorial
June 1, 2026Β·10 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification