AGENCYSCRIPT
CoursesEnterpriseBlog
πŸ‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
Β© 2026 Agency Script, Inc.Β·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

The SituationHow the trouble startedThe DecisionWhy they kept it smallThe ExecutionWeek one: inventory and consolidationWeek two: format, evaluation, and rollbackThe OutcomeWhat changed measurablyWhat changed culturallyThe First Real TestA bad change reaches productionA second, quieter benefitWhat They Would Do DifferentlyEarlier evaluationEarlier ownershipFrequently Asked QuestionsDid the agency need special software to fix this?How long did the turnaround take?What single change had the biggest impact?Why start with practices instead of tooling?Were clients aware of the change?Key Takeaways
Home/Blog/How One Agency Tamed Its Runaway Prompts
General

How One Agency Tamed Its Runaway Prompts

A

Agency Script Editorial

Editorial Team

Β·August 27, 2023Β·6 min read
prompt versioningprompt versioning case studyprompt versioning guideprompt engineering

This is the story of a mid-sized digital agency that built AI features for a roster of clients and nearly let unmanaged prompts sink one of its biggest accounts. It is composited from common patterns rather than a single named company, but every decision and consequence below mirrors what teams in this position actually face. The arc is familiar: things work, then they quietly stop working, then someone has to fix the underlying discipline.

What makes the story worth telling is not the crisis but the response. The agency did not buy an expensive platform or hire a specialist. It adopted a handful of versioning practices in sequence and measured the difference. By the end, prompt-related incidents had dropped sharply and the team could finally explain its own AI behavior.

Follow the situation, the decisions, the execution, and the outcome. Then borrow what fits.

The Situation

The agency ran AI-powered features for roughly fifteen clients: content drafting, support triage, lead classification. Each feature was driven by prompts that lived wherever the original engineer had put them, inline in code, in config files, in one case in a comment.

How the trouble started

A flagship client reported that their support assistant had started misrouting tickets. The agency investigated and hit a wall.

  • The prompt had been edited by three people over two months
  • None of the edits were documented
  • The original, working prompt no longer existed anywhere

The team spent two days reconstructing a prompt that approximately worked. The client noticed the downtime. Trust took a hit. Leadership decided the ad hoc approach had to end.

The Decision

Rather than over-engineer a solution, the team's lead set a deliberately modest goal: every prompt should have a recorded history, a reason for each change, and a way to roll back. Nothing more, at first.

Why they kept it small

The lead had seen ambitious tooling projects stall. A heavy system nobody adopted would be worse than a light one everybody used. The plan was to start with practices, prove the value, and only then consider tooling.

The sequence they chose mirrored the steps in A Step-by-Step Approach to Prompt Versioning: inventory first, consolidate storage, define a version format, then add evaluation and rollback.

The Execution

Over two weeks, the team worked through the plan one stage at a time, fitting it around client work rather than pausing everything.

Week one: inventory and consolidation

  • They searched the codebase and found 47 distinct prompts, several duplicated
  • Every prompt moved into the repository as a versioned file
  • Each got an initial version 1.0.0 and a named owner

Week two: format, evaluation, and rollback

  • They adopted a major-minor-patch scheme and made versions immutable
  • For the five highest-traffic prompts, they built small evaluation sets
  • They wired the application to select prompts by version number

That last change was the quiet hero. By referencing prompts by version rather than inline text, rollback went from an hour-long deploy to a one-line configuration switch, exactly the property that pays off in Prompt Versioning: Real-World Examples and Use Cases.

The Outcome

Three months after adoption, the team compared the period before and after.

What changed measurably

  • Prompt-related incidents dropped from roughly one per week to one in the entire quarter
  • The two incidents that did occur were resolved in minutes via rollback, not days
  • A model migration that would once have been terrifying went smoothly because the model was treated as part of the version

What changed culturally

  • Engineers stopped fearing prompt edits because they could always revert
  • Client conversations about AI behavior became factual, backed by version history
  • The team could finally answer "why did it say that" with a specific version

The biggest surprise was how cheap the win had been. No platform purchase, no new hire, just a fortnight of disciplined cleanup and a commitment to a few non-negotiable habits.

The First Real Test

The system was barely two weeks old when it faced a genuine emergency, and how it performed shaped the team's confidence in the new discipline.

A bad change reaches production

An engineer adjusted the content-drafting prompt to encourage a more formal tone for a particular client. The edit was reasonable, but it interacted badly with an existing instruction and started producing stilted, repetitive copy across every client using that prompt, not just the intended one.

  • The problem surfaced within an hour through a client's feedback
  • The engineer checked the version history and saw exactly what had changed and why
  • A single configuration switch reverted production to the previous version

What would once have been an afternoon of frantic reconstruction was instead a five-minute fix. The engineer then made the tone change correctly in a fresh version, evaluated it against the test inputs, and shipped it the next day without drama. The team later pointed to this incident as the moment the investment proved itself, because the contrast with the original misrouting crisis was so stark.

A second, quieter benefit

Beyond the fast recovery, the incident produced something subtler: a clear record that the team could share internally. New engineers could read the version history and understand not just what the prompts said but how they had evolved and why. Onboarding to a feature stopped requiring a conversation with whoever happened to remember the history, because the history now documented itself.

What They Would Do Differently

In a retrospective, the team named two regrets, both instructive.

Earlier evaluation

They built evaluation sets only for the top five prompts and wished they had covered more sooner. The prompts without evaluation gates were the ones that produced the quarter's two incidents. The case for measurement-gated promotion is laid out in Prompt Versioning: Best Practices That Actually Work.

Earlier ownership

Assigning owners during inventory, rather than after the first incident, would have prevented the original misrouting crisis entirely. The lesson echoes the warning in 7 Common Mistakes with Prompt Versioning (and How to Avoid Them) about ownerless prompts.

Frequently Asked Questions

Did the agency need special software to fix this?

No. They used their existing code repository for storage and versioning, built small evaluation sets by hand, and added a configuration value to select prompt versions at runtime. The transformation came from disciplined practices, not from purchasing a platform.

How long did the turnaround take?

The core system was in place within two weeks, worked around ongoing client commitments. The measurable payoff, a sharp drop in incidents, was visible within the following quarter. The early steps delivered most of the benefit before the system was fully built out.

What single change had the biggest impact?

Referencing prompts by version number instead of pasting text inline. That decoupling turned rollback from a slow deploy into an instant configuration switch, which is what converted potential outages into minor blips. It was also the prerequisite for clean model migrations.

Why start with practices instead of tooling?

The team had seen heavy tooling projects stall from low adoption. By proving the value of versioning with lightweight practices first, they earned the buy-in that would justify tooling later. A light system everyone uses beats a sophisticated one nobody adopts.

Were clients aware of the change?

Clients did not see the internal system, but they felt its effects: fewer incidents and faster recovery when problems did occur. The team also found that version history made client conversations about AI behavior concrete and credible rather than defensive.

Key Takeaways

  • Undocumented, ownerless prompts edited by multiple people quietly degraded a flagship feature until the original working version was unrecoverable.
  • The agency chose a deliberately modest goal, history plus change reasons plus rollback, and proved value before considering any tooling.
  • Inventory, consolidation, an immutable version format, and version-based references were built in two weeks around ongoing client work.
  • Referencing prompts by version turned rollback from an hour-long deploy into a one-line switch, the change with the largest impact.
  • Incidents fell from weekly to one per quarter, and the team's two regrets were not adding evaluation and ownership sooner.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way β€” a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026Β·11 min read
General

Case Study: Large Language Models in Practice

Most teams that fail with large language models don't fail because the technology doesn't work. They fail because they treat deployment as a one-time event rather than a discipline β€” pick a model, wri

A
Agency Script Editorial
June 1, 2026Β·11 min read
General

Thirty-Second Wins Breed False Confidence With LLMs

Working with large language models is deceptively easy to start and surprisingly hard to do well. You can get a useful output in thirty seconds, which creates a false confidence that compounds over ti

A
Agency Script Editorial
June 1, 2026Β·10 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification