Teams that choose an AI stack badly rarely do so for lack of options. They fail because they make the decision in the wrong order, letting an exciting model dictate an architecture that the workload never justified. A framework fixes the order. It turns a tangle of simultaneous choices into a sequence where each layer constrains the next.
The model in this article is deliberately simple: four layers, each answering one question, evaluated from the bottom up. It is reusable across very different projects because it separates what is stable from what is volatile. The bottom layers change rarely; the top layer changes constantly. Putting your most durable decisions at the foundation is what keeps the whole stack from collapsing every time a new model ships.
This is not the only way to organize the problem, but it is a way that holds up under pressure. Read it as scaffolding you can adapt, not a recipe you must follow line by line.
Layer One: The Workload Foundation
Everything rests on an honest description of the job. This is the layer most teams rush, and rushing it poisons every decision above.
What this layer decides
- The shape of the task: classification, generation, extraction, reasoning, or some blend. Each shape favors different tools.
- The volume and latency envelope: the realistic ceiling on requests and the patience your users actually have.
- The quality bar and how you will know it is met: without a definition of good, you cannot evaluate anything above this layer.
When this layer is solid, the upper layers become much easier because you are choosing against a fixed target. When it is vague, every higher decision turns into an argument.
A useful test of whether this layer is done: could a stranger read your three answers and reproduce roughly the same shortlist you would build? If not, the description is still carrying assumptions that live only in your head, and those assumptions will surface as disagreements later, usually at the worst moment. The discipline of writing the foundation down, plainly, is what makes the rest of the framework mechanical rather than contentious.
Layer Two: The Data and Trust Boundary
The second layer asks where your data is allowed to go and who you are willing to trust with it. It sits below the model deliberately, because data constraints frequently eliminate models before capability is even discussed.
What this layer decides
- Sensitivity classification: what kind of data the system touches and what obligations come with it.
- The hosting posture: hosted API, private cloud, or self-hosted, driven by residency and confidentiality rules.
- The trust contract: what you require providers to commit to in writing before they see anything.
Settling this layer early prevents the painful pattern of falling in love with a model you are not legally allowed to use. For the closely related decision of which numbers prove the stack is performing, How to Measure Choosing an AI Tech Stack: Metrics That Matter picks up where this layer leaves off.
The reason this layer sits below capability, rather than beside it, is that its constraints are categorical. A latency budget might rule out a model by a margin you could close with optimization, but a residency requirement rules out an entire deployment posture outright. Categorical constraints belong lower in the stack than negotiable ones, because they shrink the search space before you spend effort exploring it.
Layer Three: The Capability Engine
Only now do models enter. By this point the foundation has narrowed the field enough that you are choosing among a handful of viable engines rather than the entire market.
What this layer decides
- The capability tier you need: the cheapest model that clears your quality bar, not the most powerful one available.
- The abstraction at the model boundary: the seam that lets you swap engines without rewriting the application.
- The fallback strategy: what runs when the primary engine is slow, down, or deprecated.
The discipline here is restraint. The temptation is to buy the most capable model and feel safe; the framework pushes you toward the cheapest one that works and a clean way to upgrade later. The trade-offs involved are worth studying on their own in Choosing an AI Tech Stack: Trade-offs, Options, and How to Decide.
Layer Four: The Orchestration Surface
The top layer is the machinery around the model: retrieval, prompt management, tool calling, observability, and the application glue. It changes most often and matters most to daily engineering life.
What this layer decides
- Whether retrieval is warranted: many teams add a vector database before proving they need one.
- How prompts and chains are versioned and tested: prompts are code and deserve the same rigor.
- What you can observe: the ability to inspect a full run is the difference between debugging and guessing.
Because this layer is volatile, the framework treats its components as replaceable by design. You expect to swap orchestration tools as the ecosystem matures, and you build so that swapping one does not disturb the layers below.
This is also the layer where teams accumulate the most accidental complexity. Because the tools are exciting and the ecosystem is loud, it is easy to adopt a framework, a vector database, and a prompt platform before any of them has earned its place. The framework's instruction here is restraint: add a component to this layer only when a concrete problem at one of the lower layers demands it, and keep each addition isolated enough that removing it later is a local change rather than a rewrite.
Applying the Framework in Order
The power of the model is in the sequence. Each layer hands a constraint to the next.
How a real decision flows
- Bottom up, always. Define the workload, then the trust boundary, then the engine, then the surface. Reversing the order is how teams end up with stacks that violate their own data rules.
- Stop when a layer disqualifies an option. If the data boundary rules out hosted APIs, you do not need to evaluate hosted models at all.
- Revisit top-down for changes. When a new model ships, you only re-open the capability layer; the foundation stays put.
This ordering is what separates a framework from a checklist. A checklist tells you what to verify; the framework tells you what to decide first.
Knowing When to Bend the Model
A framework you apply mechanically becomes its own trap. The four layers are a default, not a law.
When to adapt it
- Throwaway experiments can collapse the lower layers; if nothing sensitive is involved and nothing ships, jump straight to capability.
- Highly regulated workloads may invert the emphasis, with the trust boundary dominating every other consideration.
- Mature platforms sometimes freeze the lower three layers entirely and only ever touch orchestration.
The framework earns its keep by making your shortcuts deliberate. When you skip a layer, you should be able to say why. For practitioners ready to push past the defaults entirely, Advanced Choosing an AI Tech Stack: Going Beyond the Basics explores the edge cases this model only gestures at.
Frequently Asked Questions
Why evaluate from the bottom up instead of starting with the model?
Because the lower layers eliminate options the model layer cannot see. If you start with a model and discover later that it violates your data rules, you have wasted the evaluation. Working upward means every layer hands a smaller, valid set of choices to the next.
Is four layers always the right number?
Four is a useful default, not a magic count. The point is separating durable decisions from volatile ones. If your context demands splitting the orchestration layer in two, do it. The structure matters more than the exact tally.
How does this framework handle new model releases?
It localizes the disruption. A new release only reopens the capability layer, leaving the workload, trust boundary, and orchestration surface untouched if you built the model boundary cleanly. That containment is the main payoff of ordering decisions this way.
Can a small team actually use all four layers?
Yes, and faster than they expect. For a small project, each layer might take an hour of honest conversation. The framework is not heavyweight process; it is a sequence of questions that prevents expensive backtracking later.
What if two layers seem to conflict?
A conflict usually means a constraint you have not made explicit. If the trust boundary rules out the only model that meets your quality bar, the real decision is whether to relax the quality bar or change the hosting posture. The framework surfaces that trade rather than hiding it.
Where should I go after adopting this framework?
Pair it with a concrete verification pass. Vetting an AI Stack Before You Sign the Contract turns these four layers into specific items you can run against a real shortlist before committing budget.
Key Takeaways
- Choose your stack from the bottom up: workload, then trust boundary, then capability, then orchestration.
- Put durable decisions at the foundation and volatile ones at the top so new models disrupt only one layer.
- Let each layer hand a constraint to the next; stop evaluating as soon as a layer disqualifies an option.
- Resolve data and trust before models, because they routinely eliminate engines before capability matters.
- Treat the framework as an adaptable default and make every shortcut a deliberate, explainable choice.