Most teams approach multilingual generation as a series of one-off prompts. Someone needs Spanish copy, so they write a Spanish prompt. Someone else needs Japanese support replies, so they improvise a different prompt. Six months later there is no shared standard, no glossary, and no way to tell whether the French output is any good. The work happens, but it does not compound.
A playbook fixes this by turning ad hoc prompting into a set of named plays β each with a clear trigger, a defined owner, and a known sequence of steps. When a new language or content type comes up, you reach for the matching play instead of starting from a blank prompt. This article lays out that operating manual: the plays themselves, who runs them, and how they fit together.
Treat what follows as a starting structure to adapt, not a rigid mandate. The value is in having named, repeatable responses to predictable situations.
Play 1: Standing Up a New Target Language
This play runs whenever you commit to supporting a language you have not shipped before.
Trigger and Owner
The trigger is a product or commercial decision to launch in a new market. The owner is whoever holds localization quality β often a content lead or a localization specialist, not the engineer wiring up the call.
Sequence
Start by defining the locale precisely: language, region, and register. Build a starter glossary of brand terms, product names, and do-not-translate items. Source at least three native-language few-shot examples that reflect your house voice. Then run a calibration batch of representative prompts and have a native speaker grade the output before any of it reaches users. Only after that gate do you turn the language on. The underlying mechanics of each step are detailed in Building a Repeatable Workflow for Prompting for Multilingual Output.
Why This Play Runs Once and Pays Forever
The temptation is to skip the calibration batch and ship the moment the model produces something readable. Resist it. The setup play is where you discover whether the target language needs a formal or informal default, which brand terms the model mangles, and what house-voice cues the examples must carry. Every shortcut here surfaces later as a recurring correction in review. An hour of upfront calibration removes weeks of repeated cleanup, which is why this is the one play worth over-investing in.
Play 2: Generating Customer-Facing Copy
This is the highest-volume play for most marketing and support teams.
Trigger and Owner
The trigger is any request for localized content β emails, landing pages, in-app messages. The owner is the content producer, working from the shared template rather than a fresh prompt each time.
Sequence
Inject the target locale and register into the standard template. Attach the glossary for that language. Provide the source intent in English so the model generates natively rather than translating word-for-word. Generate, then run automated gates before human review. The point of the play is that the producer never re-derives the prompt structure; they fill in variables. For the specific phrasing patterns that make this reliable, see Prompting for Multilingual Output: Best Practices That Actually Work.
Play 3: Structured Output With Localized Fields
Many systems need not prose but structured data β JSON objects where some fields are localized and others are not.
Trigger and Owner
The trigger is any feature that stores or transmits multilingual data: product catalogs, notification templates, form labels. The owner is typically an engineer, because the play intersects with schema and validation.
Sequence
Define the schema explicitly and mark which fields are translatable. Instruct the model to keep keys, enums, and identifiers in their canonical form while localizing only the marked values. Validate the output against the schema programmatically, and reject any response where a non-translatable field changed. This separation of "translate this, never touch that" is the core discipline. Worked examples appear in Prompting for Multilingual Output: Real-World Examples and Use Cases.
Play 4: Handling Mixed-Language Conversations
Real users do not stay in one language. They code-switch, paste English error messages into a Spanish chat, and expect the assistant to keep up.
Trigger and Owner
The trigger is any interactive surface where users type freely. The owner is the conversation designer who sets the response-language policy.
Sequence
Decide the policy up front: mirror the user, default to a fixed language, or follow the language of the majority of the message. Encode the policy in the system prompt as an explicit rule. Add a language-detection step so the system knows what it is dealing with. Then handle the awkward cases β a one-word English reply inside a long Spanish thread should not flip the whole conversation to English. Document the chosen behavior so it is consistent across the product. A useful refinement is to weight the detected language by recency and length, so a short interjection does not override an established conversation language. Test the policy against real transcripts, not invented examples, because actual users code-switch in ways you will not predict from a whiteboard.
Play 5: Quality Review and Escalation
Every play above feeds into this one. Generation without review is hope, not a process.
Trigger and Owner
The trigger is any batch of generated multilingual content awaiting release. The owner is the review coordinator, who routes content to the right native reviewer.
Sequence
Run automated gates first: language detection, schema validation, glossary compliance, and round-trip semantic checks. Route anything that fails to regeneration or human fix. Sample the passing content for native review on a rotating basis so quality does not silently drift. Escalate systematic problems β a language that repeatedly fails the same way β back to the language-setup play for re-calibration.
Sequencing the Plays Together
The plays are not independent; they form a pipeline.
From Setup to Steady State
Play 1 runs once per language and produces the glossary and examples that Plays 2 and 3 depend on. Plays 2 and 3 are the daily workhorses. Play 4 governs interactive surfaces. Play 5 sits underneath all of them as the quality backstop.
Ownership Map at a Glance
- New language setup: localization lead
- Customer copy: content producers
- Structured output: engineers
- Conversations: conversation designers
- Review and escalation: review coordinator
Keeping ownership explicit prevents the common failure where everyone assumes someone else owns quality. For the forward-looking implications of running this kind of operation, see The Future of Prompting for Multilingual Output.
Frequently Asked Questions
How many plays should a small team actually run?
Start with two: new-language setup and customer-facing copy. Those cover the majority of value. Add structured-output and conversation plays only when you have features that genuinely need them. A playbook is useful only if people actually follow it, so keep it lean at first.
Who should own the glossary?
A single named owner, usually the localization lead, with contributions from anyone who spots a term that needs protecting. The failure mode is a glossary that no one maintains. Make updating it part of the new-language and review plays so it stays current.
How often should we re-calibrate a language?
Re-run the calibration batch whenever you change the base model, materially change the prompt template, or see review scores trend down. Many teams also schedule a quarterly check. The signal to act is a rising rate of native-reviewer corrections.
Can these plays work with a translation vendor in the loop?
Yes. The plays describe responsibilities, not who performs them. A vendor can own native review and glossary curation while your team owns generation. The key is that the handoffs and quality gates stay defined regardless of who staffs each role.
What is the minimum tooling to run this?
A prompt template store, a glossary file, a language-detection check, and schema validation for structured output. None of these require heavy infrastructure. The discipline matters more than the tooling sophistication.
Key Takeaways
- Convert ad hoc multilingual prompting into named plays with explicit triggers and owners.
- Run the new-language setup play once per locale to produce the glossary and examples everything else depends on.
- Separate translatable from non-translatable content rigorously, especially in structured output.
- Set an explicit response-language policy for any interactive, mixed-language surface.
- Make quality review a standing play with automated gates plus sampled native review.
- Keep the playbook lean; start with two plays and add more only when features demand them.