Zero shot and few shot learning are two ways to steer a large language model without ever touching its weights. You change behavior by changing the prompt: in zero shot you describe the task and ask for the result, in few shot you also paste in a handful of worked examples. That single difference β examples or no examples β quietly decides cost, latency, accuracy, and how brittle your system feels in production.
Most people learn these terms as trivia and never internalize the trade-offs. That is a mistake. The choice between them is one you make dozens of times a week once you are building anything real, and the wrong default wastes tokens or ships garbage. This guide gives you a working mental model, concrete decision rules, and the failure modes that bite teams who pick wrong.
The short version: start zero shot, measure, and only add examples when the model is getting the shape of the output wrong. Everything below explains why that order matters and when to break the rule.
What zero shot learning actually means
Zero shot means the model performs a task it was never explicitly shown an example of, using only an instruction. "Classify this support ticket as billing, technical, or account" with no sample tickets is zero shot. The model leans entirely on patterns absorbed during pretraining plus whatever you describe in the prompt.
Why it works at all
Modern models have seen billions of documents, so most common tasks β summarization, sentiment, extraction, translation β already live somewhere in their learned distribution. A clear instruction is often enough to surface that latent ability. When people say a model "just knows" how to write a polite email, they are describing zero shot capability.
Where it breaks
Zero shot fails when the task is ambiguous, idiosyncratic to your business, or has a specific output format the model can't infer. Ask for "the priority" and you'll get prose; ask for one of three labels and define them, and you'll get a label. Vague instructions are the number one cause of bad zero shot results β not the technique itself.
What few shot learning actually means
Few shot means you include a small number of input-output examples in the prompt before the real input. The model infers the pattern from the demonstrations β this is also called in-context learning. Two or three examples is "few shot"; one is technically "one shot."
You are not training anything. The examples vanish the moment the API call returns. Each request re-sends them, which is why few shot costs more per call. What you buy with those tokens is a sharper specification: examples communicate format, edge-case handling, and tone faster than a paragraph of instructions ever could.
A concrete contrast
Suppose you want product names extracted as JSON arrays. Zero shot, you'd write a careful spec and hope. Few shot, you paste three messy reviews each followed by the exact array you want. The model now mirrors your format almost perfectly, including how you handle reviews with no product mentioned. That mirroring is the entire value proposition.
How to choose between them
Use this decision order. It is the practical core of the whole topic.
- Try zero shot first. It's cheaper, faster, and easier to maintain. If a clear instruction gets you 90%+ of the way, stop.
- Add few shot when format is the problem. If the model understands the task but keeps formatting output wrong, two or three examples fix it faster than more instructions.
- Add few shot for edge cases. When there are tricky inputs (sarcasm, mixed languages, null cases), demonstrate them directly.
- Reach for fine-tuning, not more examples, when you need consistency at scale. Past roughly five to eight examples you hit diminishing returns and rising cost.
For a deeper treatment of when each shines, see our real-world examples and use cases. If you want a repeatable selection process, the framework article turns these rules into a flowchart.
The trade-offs nobody tells you upfront
Cost and latency
Few shot inflates every single prompt with example tokens. At one example that's negligible; at six long examples you might double or triple input cost and add measurable latency. For high-volume endpoints, this compounds into a real bill. Zero shot keeps prompts lean.
Accuracy is not monotonic
More examples do not mean better results. Poorly chosen, contradictory, or unrepresentative examples can lower accuracy β the model overfits to your sample's quirks. One mislabeled demonstration can poison an entire batch. Curating examples is real work, not copy-paste.
Maintenance burden
Zero shot prompts are a single instruction to update. Few shot prompts are an instruction plus a curated example set you must keep in sync as your task evolves. Teams underestimate how quickly stale examples drift from current requirements.
Common pitfalls and how to avoid them
The mistakes are predictable. Picking few shot before you've written a clear instruction is the most common β you're paying for examples to compensate for a vague spec. Ordering bias is another: models can weight later examples more heavily, so a lopsided example set skews outputs. And label imbalance in your demonstrations quietly teaches the model a default it shouldn't have.
We cover these in depth in 7 common mistakes with zero shot vs few shot learning. The single best habit: change one variable at a time and measure, so you know whether examples actually helped.
Which tasks favor which approach
Patterns emerge once you've run enough tasks. Zero shot tends to win on broad, well-represented work: summarization, translation, drafting, open-ended reasoning, and general question answering. The model has seen oceans of these during pretraining, so an instruction unlocks them cleanly.
Few shot tends to win on narrow, format-heavy, or company-specific work: structured extraction into a fixed schema, classification with bespoke labels, matching an internal writing style, or any task where "right" means "shaped exactly like this." The examples carry information that's painful to write as rules.
A simple heuristic
Ask whether the difficulty is in understanding the task or in matching a precise output. Understanding problems are instruction problems β solve them with clearer words, and zero shot will often hold. Matching problems are demonstration problems β solve them with examples. Keeping those two failure modes separate in your head prevents most wasted effort. The tools roundup covers software that helps you test both quickly.
Putting it into practice
Build a tiny evaluation set first β even 20 hand-labeled inputs. Run zero shot, score it, then run few shot with two and four examples, scoring each. You'll usually see a clear winner within an hour, and you'll have evidence instead of vibes. This measure-then-decide loop is the habit that separates people who guess from people who ship.
If you're just getting oriented, start with the beginner's guide. When you're ready to wire it into a workflow, the step-by-step approach walks through it command by command.
Frequently Asked Questions
Is few shot learning always more accurate than zero shot?
No. Few shot helps most when output format or edge cases are the issue, but good examples are required β bad ones can reduce accuracy below zero shot. For tasks the model already handles well, examples add cost without benefit.
How many examples make it "few shot"?
Generally two to about eight input-output pairs. One example is "one shot." Beyond eight you usually see diminishing returns and should consider fine-tuning instead of stuffing more examples into the prompt.
Does few shot learning train or change the model?
No. The examples live only in the prompt for that one request and are discarded after the response. Nothing about the model's weights changes. This is why it's also called in-context learning.
When should I skip both and fine-tune instead?
When you need consistent behavior across very high volume, have hundreds of labeled examples, or your task is too specialized for in-context learning to capture reliably. Fine-tuning moves the knowledge into the weights so prompts stay short.
Can I mix zero shot instructions with few shot examples?
Yes, and you usually should. A clear instruction plus a couple of examples is often the strongest combination β the instruction sets intent, the examples nail format.
Key Takeaways
- Zero shot uses an instruction only; few shot adds a handful of in-context examples. Neither changes the model's weights.
- Default to zero shot for cost, speed, and maintainability; add examples only when format or edge cases are the failure point.
- More examples are not always better β bad or imbalanced examples can lower accuracy.
- Few shot inflates every prompt with token cost and latency, which compounds at high volume.
- Build a small eval set and measure both before committing; decide with evidence, not intuition.