The difference between a team that defends against prompt injection well and one that does it occasionally is rarely knowledge. It is process. The first team has a documented sequence of steps that runs every time an AI feature is built or changed. The second team relies on whoever happens to remember, which means it sometimes happens and sometimes does not.
A repeatable workflow takes the techniques out of individual heads and puts them into a process that a new engineer, a contractor, or a future version of yourself can follow without supervision. It is the difference between defense as a personal skill and defense as a property of how your team works. This article walks through building that workflow stage by stage.
For the strategic context behind these steps, The Complete Guide to Prompt Injection Defense gives the foundations. Here, the goal is a process you can hand off.
Why a Workflow Beats Good Intentions
Talented engineers still skip steps under deadline pressure, especially steps that are not part of the normal flow. A workflow solves three problems at once: consistency, so the same checks run every time; transferability, so the practice does not depend on one person; and auditability, so you can show what was done and when.
The handoff test
A good workflow passes a simple test: could someone who has never seen this system pick up the document and apply it correctly? If the answer requires "ask the person who built it," the workflow is incomplete. Everything below is designed to pass that test.
There is a fourth benefit that teams discover only after they build the workflow: speed. Engineers often resist process because they assume it slows them down. In practice, a clear workflow removes the cognitive load of deciding what to do each time, which makes the work faster, not slower. The slow part of defense is figuring out what is needed from a blank page. A workflow turns that open question into a checklist, and checklists are fast.
Stage One: Classify the Feature
Every workflow run starts by understanding what you are dealing with.
Identify untrusted inputs
List every source of content the feature reads that it did not author: user messages, retrieved documents, tool outputs, file contents, external pages. If a source is untrusted, it can carry an injection. Naming them explicitly prevents the common mistake of treating retrieved content as trusted.
Determine the blast radius
Decide what a successful injection could cause. A feature that only displays text to a human is low stakes. A feature that can take actions, send, modify, pay, delete, is high stakes and gets more rigorous treatment in the stages that follow. This mirrors the prioritization logic in The Prompt Injection Defense Playbook.
Stage Two: Apply the Controls
With classification done, apply controls proportional to the stakes.
The standard control set
- Structure the prompt so trusted instructions and untrusted data are clearly separated
- Restrict the feature's tools and permissions to the minimum it needs
- Gate any irreversible or sensitive action behind a human confirmation or hard check
- Validate the model's output against expected shape before acting on it
For low blast-radius features, structure and validation may be enough. For high blast-radius features, every control applies, plus the screening added in the next stage.
The workflow should make this proportionality explicit rather than leaving it to judgment. Spell out which controls are mandatory at each blast-radius tier so an engineer following the document does not have to decide how much is enough. Left to individual judgment, the answer drifts toward whatever fits the schedule. Encoded in the workflow, the answer stays consistent across people and across deadlines, which is the entire point of having a workflow.
Record what you applied and why
Note which controls were applied and, importantly, which were deliberately skipped and why. The reasoning is what lets the next person extend or correct the work. Undocumented omissions look identical to oversights.
This record does double duty during incident reviews. If something goes wrong, the first question is always whether the control that would have stopped it was considered and rejected, or simply forgotten. A documented decision answers that instantly and turns a blame conversation into a learning one. The cost is a sentence or two per feature. The payoff is a clear trail every time someone asks why the system behaves the way it does.
Stage Three: Test Adversarially
A control you have not tried to break is a guess.
Run the attack library
Maintain a shared set of injection techniques and run them against the feature in a production-like environment. Include direct attempts in user input and indirect attempts hidden in retrieved content. Add any new technique you encounter to the shared library so every future run benefits. Ready-made scenarios for this live in Prompt Injection Defense: Real-World Examples and Use Cases.
Record results
Document what was tested, what passed, and what failed. A clean test you cannot prove is the same as no test when an incident review asks what coverage existed.
Stage Four: Review and Sign Off
Bolt the final check onto the review you already run.
Make it a review-gate item
A pull request that touches a prompt or adds an untrusted source should require a reviewer to confirm the workflow ran: feature classified, controls applied, adversarial test passed, decisions recorded. This costs a reviewer a few minutes and prevents the silent gaps that accumulate otherwise.
Stage Five: Maintain the Workflow
The workflow itself is a living artifact.
Trigger re-runs on change
Re-run the workflow whenever the feature changes: a new tool, a new data source, a model upgrade. Defenses decay as the system around them shifts, so the workflow must fire on change, not just at first build.
Keep the documentation current
When you learn a new attack or retire an obsolete control, update the workflow document so the next run reflects current reality. A stale workflow gives false confidence, which is its own risk, as covered in The Hidden Risks of Prompt Injection Defense.
Frequently Asked Questions
How detailed should the workflow document be?
Detailed enough that a competent newcomer could follow it without asking the original author questions. That usually means concrete steps, named control options, and recorded decisions rather than high-level principles. If it needs a verbal explanation to use, it is not finished.
Does every feature need the full workflow?
The classification stage applies to every feature, but the depth of controls and testing scales with blast radius. A read-only summarizer gets a lighter pass than an agent that can take irreversible actions. The workflow tells you how much rigor each feature earns.
How do we keep the workflow from being skipped under deadline pressure?
Attach it to an existing gate, usually code review, so it is not optional extra work but part of shipping. When the check lives inside the normal flow, it survives busy weeks far better than a separate process people must remember.
What goes in the adversarial attack library?
A growing collection of injection techniques: direct overrides in user input, indirect instructions hidden in retrieved content, encoded or multilingual attempts, and any novel attack your team encounters. Sharing it across the team means every feature benefits from every lesson.
Who maintains the workflow over time?
A designated defense lead keeps the document and the attack library current, while individual engineers run the workflow on their own features. Maintenance is light but must have a named owner, or the document silently goes stale.
Key Takeaways
- Process, not knowledge, separates consistent defense from occasional defense.
- A good workflow passes the handoff test: a newcomer can apply it without asking the author.
- Start every run by classifying untrusted inputs and blast radius.
- Apply controls proportional to stakes, and record what you skipped and why.
- Test adversarially with a shared attack library and bolt sign-off onto existing reviews.
- Re-run the workflow on every change and keep the document current as you learn.