Most best-practice lists for AI stacks are safe to the point of being useless. "Choose the right model for your needs" is true and tells you nothing. Real practices are opinionated, which means a reasonable person could disagree, and they come with an argument you can evaluate rather than a platitude you nod along to. The value is in the reasoning, because the reasoning is what lets you decide whether the practice applies to your situation.
This article lays out a set of positions on choosing an AI tech stack that have earned their keep, each paired with the case behind it. Some you will adopt, some you may reject for good reasons specific to your context. That is the point. A practice you have actually reasoned about is worth more than a dozen you absorbed without thinking.
Default to Boring
The strongest position to hold is a bias toward boring, well-understood components over novel, exciting ones. Boring tools have known failure modes, broad support, and predictable behavior.
The argument
Novelty carries hidden costs: undocumented edge cases, thin community knowledge, and the chance the tool is abandoned. For most stacks, the marginal capability of the newest thing does not outweigh the reliability of the established thing. Save your novelty budget for the one place it genuinely differentiates you, and be boring everywhere else.
Novelty is a budget, not a virtue
The frame that makes this practice click is treating novelty as a scarce budget you spend deliberately rather than a quality you accumulate. Every novel component you add increases the surface area of things that can surprise you in ways no one has documented. A stack built entirely from the newest tools is not advanced; it is fragile, because you are simultaneously betting on multiple unproven pieces. The disciplined move is to identify the single place where a cutting-edge capability actually creates an advantage your competitors lack, spend your novelty there, and choose the most boring, battle-tested option for everything else. That way, when something breaks, you are debugging one new thing against a backdrop of well-understood ones.
Let the Problem Pick the Model
A practice worth holding firmly: never choose a model in the abstract. Define the problem until it selects the model for you, the discipline at the center of Step by Step Through an AI Tech Stack Decision.
Why this beats intuition
Model intuition is mostly hype absorption. The model that gets attention is rarely the model your specific problem needs. A precise problem statement, with its error tolerance and latency and budget, narrows the field on facts rather than vibes. Hold this practice and you will routinely pick cheaper, faster models than your instinct suggested.
Make Cost a First-Class Citizen
Treat cost per request as a design constraint from the first line, not an accounting concern you address later. The economics of an AI stack are part of its architecture.
The reasoning
A stack that is technically excellent and economically unviable at scale is a failed stack. Knowing your unit cost shapes which model you pick, how much context you send, and whether a given feature is even worth building. Teams that defer cost thinking build things they later cannot afford to run.
Prefer Explicit Over Magic
Hold a strong preference for systems where you can see what is happening over systems that hide behavior behind convenient abstractions. Explicit code, explicit prompts, explicit data flow.
The case for legibility
When an AI system misbehaves, and it will, the speed of your fix depends entirely on how clearly you can see what it did. Magic is comfortable until the moment it breaks, at which point it becomes a black box you cannot reason about. This is also why Seven Stack Choices That Quietly Sink AI Projects warns against heavy frameworks adopted too early.
Legibility pays off most when things go wrong
The case for explicit systems is easy to undervalue when everything works, because magic and legibility look identical on a good day. The difference shows up only under failure, which is precisely the moment you cannot afford a black box. With an explicit system, a bad output is a thread you can pull: you see the exact prompt that was sent, the exact context that was retrieved, the exact response that came back, and the problem usually reveals itself. With a magical one, you are reduced to guessing which hidden layer misbehaved. Since AI systems fail in subtle, plausible ways far more often than conventional software, the moments where legibility pays off are not rare edge cases; they are a routine part of operating the thing.
Build Evaluation Before You Build Features
A position many teams resist: write the way you will measure quality before you write the thing whose quality you are measuring. Evaluation is not a finishing step; it is a foundation.
Why order matters here
If you build first and evaluate later, you have no baseline and no way to know whether a change helped or hurt. Worse, AI quality decays silently, so without evaluation you find out from a user. Building the eval first means every subsequent change is measurable, and silent decay becomes visible decay.
An evaluation set is the smallest version of this
You do not need an elaborate testing harness to honor this practice. The smallest viable version is a handful of representative inputs paired with the outputs you would consider correct, kept somewhere you can rerun them. With even a dozen such examples, you can answer the question that otherwise haunts every change: did this make things better or worse? Without them, you are tuning prompts and swapping models on faith, hoping the latest change helped while having no way to confirm it did not quietly break something else. The teams that ship reliable AI almost all built this small evaluation set early, because it converts guesswork into evidence cheaply.
Optimize the System, Not the Parts
Resist the urge to make each component best in class. Optimize for a coherent whole where each layer makes the next easier.
The systems argument
A frontier model feeding a weak retrieval layer underperforms a modest model with excellent context. Local optimization at each layer produces a stack of mismatched excellence. Holding this practice means asking, for every choice, how it affects the layers around it rather than how good it is alone. The Everything That Goes Into an AI Tech Stack Decision overview treats this coherence as the central skill.
Know Which Decisions to Sweat
A final practice: spend your deliberation where it pays. Reversible decisions deserve speed; irreversible ones deserve care.
The reasoning
Deliberation is a finite resource. Agonizing over a hosted model you can swap in an afternoon, while rushing the infrastructure decision you cannot unwind for months, is exactly backward. Hold this practice and you will move fast where mistakes are cheap and slow where they are expensive.
Frequently Asked Questions
Is defaulting to boring just risk aversion?
It is calibrated risk aversion. Novelty has real costs, so you spend it deliberately on the one component that differentiates you and stay conservative elsewhere. That is not fear; it is allocating risk where it earns a return.
Why build evaluation before features?
Because without a baseline you cannot tell whether a change helped, and AI quality decays silently. Building evaluation first turns every later change into a measurable experiment rather than a hopeful guess.
Does preferring explicit over magic mean never using frameworks?
No. It means defaulting to legibility and adopting abstractions only when they clearly earn their opacity. A framework that genuinely simplifies a complex flow can be worth the reduced visibility; one adopted out of habit usually is not.
How can I make cost a first-class citizen early?
Compute your cost per request as soon as you have a working prototype and project it against expected volume. Let that number influence model choice, context size, and which features you build.
What does optimizing the system rather than the parts look like in practice?
For every component choice, you ask how it affects the layers around it, not just how capable it is alone. You accept a slightly less capable part when it makes the whole stack more coherent.
How do I know which decisions are irreversible?
Ask how much work reversing the choice would take. Swapping a hosted model is an afternoon; replacing self-hosted infrastructure is months. Spend your care on the months, not the afternoons.
Key Takeaways
- Default to boring, well-understood components and spend your novelty budget where it truly differentiates.
- Define the problem precisely and let it pick the model rather than choosing on intuition.
- Treat cost per request as a design constraint from the start, not a later accounting concern.
- Prefer explicit, legible systems over convenient magic, because legibility determines how fast you can fix things.
- Build evaluation before features so every change is measurable and silent decay becomes visible.
- Optimize the stack as a coherent system and spend deliberation on the decisions you cannot easily reverse.