Ad hoc multilingual prompting works until it does not. The first language goes fine, the second reveals a tone problem, the third breaks your structured output, and suddenly you are maintaining a pile of one-off prompts with no shared logic. A framework solves this by giving you named stages to design against, so every new language slots into the same structure instead of becoming a fresh improvisation.
This article introduces the DETECT model, a six-stage way to structure any multilingual prompt: Define, Establish, Tone, Enforce, Check, and Tune. The stages run in order during design and map to the decisions you have to make anyway, just organized so none gets skipped. Each stage includes guidance on when it carries the most weight, because not every task needs every stage at full intensity.
Treat DETECT as scaffolding for thinking, not bureaucracy. It is most valuable when you are supporting several languages and want consistency you can reason about.
Stage 1: Define the Target
The first stage fixes exactly what you are producing: which languages, which regional variants, which markets.
Why it leads
Every later stage depends on this one. Tone, localization, and evaluation all reference the market you define here, so a vague definition undermines everything downstream. Name the variant explicitly, Brazilian versus European Portuguese, and the market, since one language can span many markets.
When it matters most
Always, but especially when supporting widely spoken languages with many regional forms. Our Getting Models to Speak Every Language Your Users Do explains why the market is the real unit.
Stage 2: Establish the Language Instruction
Here you state the output language directly, independent of input, and decide where to place it.
The core move
Name the output language and variant in plain terms, and pin the instruction near the end of the prompt where it carries the most influence on generation. This is the stage that fights English drift at its source.
When it matters most
Every task, and critically for long prompts and any input that may arrive in a different language than the desired output.
Stage 3: Tone and Register
This stage sets formality, address form, and cultural adaptation.
What it covers
Specify how to address the reader, the overall tone, and instructions to adapt idioms by meaning rather than translate them literally. In languages that grammaticalize politeness, this stage prevents a social error.
When it matters most
Highest stakes for customer-facing content and for languages with strong formality distinctions. Lower stakes for internal or purely informational output. Our Hard-Won Habits for Multilingual AI That Holds Up goes deeper on register.
Stage 4: Enforce Structure and Persistence
This stage handles structured output and multi-turn persistence.
Structured output
If output follows a schema, state which parts translate and which stay fixed, keys in English, values localized, so downstream parsing survives.
Persistence
For conversational features, move language and tone into the system message so they hold across turns rather than fading after the first reply.
When it matters most
Whenever output is parsed by other systems or whenever the interaction spans multiple turns.
Stage 5: Check Quality
This stage builds the evaluation path: how you will know the output is good.
The three layers
Automated language detection to confirm the language, back-translation to verify meaning, and native speaker review against a rubric for accuracy, fluency, tone, and cultural fit. Stand these up before scaling languages, not after.
When it matters most
Always, and most acutely for languages no one on the team reads and for low-resource languages where fluent output may hide real errors. Our Seven Ways Multilingual Prompts Quietly Go Wrong explains why skipping this is the costliest mistake.
Stage 6: Tune and Operate
The final stage covers parameterization, cost, and fallback decisions.
Parameterize
Turn the prompt into one template with language, market, and formality as variables and an identical structure across languages, so fixes propagate and behavior stays consistent.
Operate
Budget for higher token cost on non-Latin scripts, and decide in advance which low-resource languages route to professional translation if generation quality falls short.
When it matters most
As soon as you support more than two or three languages, where consistency and cost begin to dominate. Below that threshold you can often operate informally, but the moment a second person maintains the prompts or a fourth language joins, the lack of a shared template and a fallback plan starts producing inconsistency that is hard to trace.
Applying DETECT to a Real Task
To see the model in motion, walk it through a single task: a chat assistant that answers product questions in French and Japanese.
Define and Establish
You define French for France and Japanese for Japan, both reasonably high-resource. You establish the language instruction by naming the output language explicitly, tied to the user's account setting rather than the language of their question, and you pin that instruction at the end of the prompt.
Tone and Enforce
For Tone, you specify a polite, helpful register, which matters especially in Japanese where politeness is grammatical. For Enforce, because this is a multi-turn chat, you move the language and tone settings into the system message so they persist across the conversation, and if the assistant returns any structured data you separate fixed keys from translated values.
Check and Tune
For Check, you confirm each reply with automated language detection, back-translate a sample, and route a weekly sample to native reviewers. For Tune, you parameterize the prompt so French and Japanese share one template, and you note that Japanese consumes more tokens per reply, budgeting accordingly. The whole task slots cleanly into the six stages, and adding a third language later means running it through the same path.
Why the model holds up across tasks
The benefit of a named model is that it externalizes the decisions you would otherwise make implicitly and inconsistently. Two prompt authors using DETECT will arrive at comparable, reviewable prompts, which matters enormously once more than one person maintains your multilingual output.
Frequently Asked Questions
Do I have to run all six stages every time?
You should consider all six, but their intensity varies by task. A quick internal summary in a strong language might lean heavily on Define and Establish and treat Tone and Enforce lightly. A customer-facing chat in a low-resource language needs every stage at full strength. The framework ensures you make a conscious choice rather than forgetting a stage.
How does DETECT relate to a simple checklist?
A checklist tells you what to verify; DETECT tells you how to think about designing the prompt in the first place. The two complement each other: use DETECT to build the prompt and a checklist to confirm nothing slipped before launch. They cover design and verification respectively.
Where do most prompts fail within the model?
Most failures cluster in Stage 2 (language drift from weak instructions) and Stage 5 (no evaluation path, so errors stay invisible). If you strengthen those two stages, you eliminate the majority of real-world multilingual problems.
Can the framework handle a brand-new language addition?
Yes, that is its main benefit. Adding a language means running it through the same six stages and slotting it into your existing parameterized template from Stage 6. Because the structure is shared, the new language inherits all the controls you already built rather than starting from scratch.
Key Takeaways
- DETECT structures multilingual prompting into six ordered stages: Define, Establish, Tone, Enforce, Check, Tune.
- Define fixes language, variant, and market; Establish states and pins the output-language instruction to fight drift.
- Tone sets formality and idiom handling; Enforce protects structured output and persists settings across turns.
- Check builds the three-layer evaluation path before scaling, and Tune parameterizes, budgets cost, and plans fallbacks.
- Apply every stage consciously, scaling intensity to the task, and add new languages by running them through the same model.