Knowing the techniques to defend against prompt injection is not the same as running them reliably under pressure. When a new feature ships at midnight, when a model upgrade lands, when someone reports odd behavior in production, you need to know what to do, in what order, and who is responsible. That is what a playbook provides: not theory, but a set of named plays with clear triggers and owners.
This article lays out an end-to-end operating playbook. Each play is a concrete, repeatable response. Each has a trigger that tells you when to run it and an owner who is accountable for it. The plays are sequenced so a team can adopt them in order, building from foundation to maturity rather than trying to do everything at once.
If you want the conceptual grounding behind these plays, pair this with The Complete Guide to Prompt Injection Defense. Here, the focus is operational.
Foundation Plays
These run once and then anchor everything else. Owner: the defense lead.
Play 1: Inventory the attack surface
Trigger: starting the program, or quarterly thereafter.
List every feature that sends untrusted content to a model. For each, record the untrusted sources it reads, the tools it can call, and its owner. This inventory is the map the rest of the playbook depends on. You cannot sequence plays against systems you have not catalogued.
Play 2: Classify by blast radius
Trigger: after every inventory update.
Rank each feature by what damage a successful injection could cause. A read-only summarizer is low. An agent that sends client emails or modifies records is high. Blast radius, not popularity, sets the priority order for the plays that follow.
Keep the ranking coarse. Three tiers, low, medium, high, are enough to drive decisions, and a more granular scale invites unproductive debate about whether something is a six or a seven. The point of the ranking is not precision; it is making sure your scarce defensive attention lands on the features where a breach would actually hurt, rather than being spread evenly across systems that do not deserve equal treatment.
Build Plays
These harden individual features. Owner: the feature's engineering owner, reviewed by the defense lead.
Play 3: Separate instructions from data
Trigger: any new or modified AI feature.
Structure every prompt so trusted instructions and untrusted content are unmistakably distinct. This single structural choice does more than any amount of pleading inside the prompt, because it gives the model and your validation logic a clear boundary to reason about.
Play 4: Minimize capability
Trigger: any feature that grants the model tools or permissions.
Give each feature the least capability it needs and no more. Remove tools that are not essential. The fewer damaging actions a model can take, the fewer an injected instruction can trigger. This play stops entire attack classes at once.
Play 5: Gate high-impact actions
Trigger: any feature where the model can take irreversible or sensitive action.
Insert a human confirmation or a hard check before the model can send, delete, pay, or publish. Automation is fine until the consequences are hard to reverse, at which point a deliberate gate is worth the friction.
Play 6: Validate outputs
Trigger: any feature whose output triggers downstream effects.
Check the model's output against expected shape and constraints before acting on it. If a model is supposed to return a category from a fixed list, reject anything outside that list. Validation catches both injected misbehavior and ordinary model errors.
Detection Plays
These find attacks the build plays missed. Owner: the defense lead.
Play 7: Screen untrusted input
Trigger: high blast-radius features identified in Play 2.
Add an input-screening layer for the riskiest features. Treat it as a supporting layer, not a foundation, because the screener reads untrusted input too. The limits of this approach are covered in Prompt Injection Defense: Myths vs Reality.
Play 8: Instrument and log carefully
Trigger: any feature in production.
Record when defenses fire, when legitimate input is blocked, and how much each layer costs. Scrub and secure these logs, since they contain untrusted content and sometimes sensitive data. Observability is what turns a guess into a managed risk.
Response Plays
These run when something goes wrong. Owner: the defense lead, with engineering support.
Play 9: Run the adversarial drill
Trigger: quarterly, and after any major change.
Actively try to break your own systems with a maintained library of injection techniques against a production-like environment. Treat findings as learning. A team that expects to be tested writes more defensively in the first place. The repeatable structure for this lives in Building a Repeatable Workflow for Prompt Injection Defense.
Play 10: Contain and learn from incidents
Trigger: a confirmed or suspected injection in production.
Have a predefined response: isolate the affected feature, revoke or narrow its permissions, review logs to understand scope, and document what happened and why the existing plays missed it. Every incident should produce a change to the playbook, not just a fix to the code.
The instinct after an incident is to patch the specific hole and move on. Resist it. Ask why the feature had the permission that made the attack damaging, why the gate did not catch it, and whether other features in the registry share the same weakness. An incident that only produces a one-line fix has been half-wasted. An incident that produces a sharper play and a sweep of similar features has paid for the pain it caused.
Sequencing the Rollout
Do not attempt all ten plays in week one. The order matters.
- Weeks 1 to 2: Plays 1 and 2, so you know what you are protecting and why.
- Weeks 3 to 6: Plays 3 through 6 on your highest blast-radius features first.
- Weeks 7 onward: Plays 7 and 8 for detection and observability.
- Ongoing: Plays 9 and 10 as a permanent operating rhythm.
Spreading the work this way builds durable habits instead of a burst of effort that decays. For the change-management side of getting a team to actually follow this, see Rolling Out Prompt Injection Defense Across a Team.
Frequently Asked Questions
Where should a small team start with this playbook?
Plays 1 and 2: inventory your AI surfaces and rank them by blast radius. Without that map, every other play is guesswork. Then apply the build plays to your single highest-risk feature before broadening.
Who should own the playbook overall?
A designated defense lead, even part-time. They keep the inventory current, coordinate reviews, and run the adversarial drills. Individual feature owners handle the build plays for their own features under that coordination.
How is blast radius different from likelihood?
Blast radius measures consequence: how bad it is if an attack succeeds. Likelihood measures how easy the attack is. The playbook prioritizes blast radius because a low-probability attack on a high-consequence feature can still be catastrophic.
What triggers a response play versus a build play?
Build plays trigger on change: a new feature, a new tool, a model upgrade. Response plays trigger on schedule or event: a quarterly drill, or a suspected incident in production. Both run continuously, just on different clocks.
How often should we rerun the foundation plays?
At least quarterly, and immediately whenever a significant new feature or data source appears. The inventory and blast-radius ranking are only useful if they reflect the system as it is now, not as it was last quarter.
Key Takeaways
- A playbook turns known techniques into reliable action under pressure.
- Foundation plays map and rank your AI surfaces before any hardening begins.
- Build plays harden features: separate instructions from data, minimize capability, gate impact, validate outputs.
- Detection plays add screening and observability for the highest-risk features.
- Response plays cover adversarial drills and incident containment, feeding lessons back into the playbook.
- Sequence the rollout over weeks so the practice becomes habit rather than a one-time push.