A lot of confident advice about zero-shot and few-shot prompting is wrong, or was true two model generations ago and quietly stopped being true. People repeat rules of thumb like "always add examples" or "few-shot is more accurate" as if they were laws of nature. They're not. They're heuristics with conditions, and the conditions matter enormously.
This article takes the most common claims and tests each against reality. Some are flatly false. Some are true only sometimes, and knowing when is the whole skill. The goal is to replace folklore with an accurate mental model, so you stop making decisions based on advice that no longer holds. We'll go myth by myth.
If you want the straight foundational explanation rather than the myth-busting angle, The Complete Guide to Zero Shot vs Few Shot Learning is the place to start.
Myth: More Examples Always Improve Accuracy
This is the most expensive myth because it costs you tokens on every call.
Reality: Accuracy from added examples plateaus quickly, usually between two and five, and can decline beyond that. More examples can overfit the model to your specific samples, dilute the strongest examples, and introduce label imbalance that biases predictions. Past the plateau, you're paying more per call for the same or worse accuracy.
The right move is to test example count, not assume monotonic improvement. Start at two or three, measure, and only add more if the data clearly justifies it. The instinct to keep adding examples is exactly backwards.
Myth: Few-Shot Always Beats Zero-Shot on Hard Tasks
This was closer to true on weaker models. It's increasingly shaky.
Reality: On strong, instruction-following models, a precise zero-shot prompt frequently matches or beats few-shot, without the token cost or the order sensitivity. The marginal value of examples drops as models get better at following detailed instructions.
The genuine exception is tasks you can demonstrate but can't fully describe, like a specific brand voice or a subtle stylistic judgment. There, examples remain irreplaceable. But "hard task" is not the right trigger for few-shot; "hard to articulate" is. Many hard tasks are easy to describe, and those do fine zero-shot. The Advanced guide digs into this crossover.
Myth: Zero-Shot Means No Instructions
People hear "zero-shot" and picture throwing a bare question at the model.
Reality: Zero-shot means zero examples, not zero guidance. A good zero-shot prompt can be long and detailed: it can specify the output format, enumerate edge-case rules, list negative cases, and define the task precisely. It just doesn't include worked examples. The "zero" refers only to demonstrations.
This matters because the most common reason zero-shot "fails" is a vague instruction, not the absence of examples. Before reaching for few-shot, tighten the instruction. Often that closes the gap entirely.
Myth: Few-Shot Is the Same as Fine-Tuning
These get conflated constantly, and the difference is fundamental.
Reality: Few-shot learning happens entirely at inference time. You're showing examples in the prompt; the model's weights never change. Fine-tuning actually updates the model's parameters using a training dataset, which is a separate, heavier process with its own cost and infrastructure.
The practical implication: few-shot is instant and reversible, and you can change examples between calls. Fine-tuning is a commitment. Confusing the two leads people to think few-shot "trains" the model, which it does not. It conditions a single response.
Myth: Examples Just Need to Be Correct
The belief that any correct examples will do underrates selection.
Reality: Correct isn't enough. Example selection dominates: which examples, in what order, balanced across which labels, drawn from what distribution. Pristine, easy, correct examples can make a model worse on the hard tail because they teach the easy distribution. Order affects output. Label imbalance biases predictions. Correct-but-unrepresentative examples are a known failure mode.
Good few-shot examples mirror the real distribution of your inputs, including the ugly cases, and are balanced and ordered deliberately. This is why Best Practices That Actually Work spends so much time on selection rather than count.
Myth: This Is a One-Time Decision
People decide zero-shot or few-shot once and treat it as settled.
Reality: The right choice moves. It shifts with your volume (token cost compounds), with model upgrades (stronger models need fewer examples), and with data drift (examples become unrepresentative over time). A decision that was correct six months ago can be wasting money or quietly degrading today. The accurate stance is to re-evaluate periodically, not to lock in a permanent answer.
Myth: Zero-Shot Is for Beginners, Few-Shot Is Advanced
There's a status hierarchy implied here that's simply false.
Reality: Zero-shot is often the more sophisticated choice. Reaching for a clean, well-instructed zero-shot prompt on a strong model, avoiding the example tax, the order sensitivity, the maintenance burden, and the data-leakage risk, is frequently the expert move. Piling on examples is often the less considered one. Neither is intrinsically more advanced; the judgment about which fits the task is what's advanced.
Myth: Few-Shot Examples Just Need to Be Recent
People assume that grabbing a handful of fresh examples is enough, that recency is the quality bar.
Reality: Recency is one dimension, and not the most important. An example set drawn entirely from the last week of data can be just as unrepresentative as a stale one if that week happened to be unusual. What matters is whether the examples mirror the full distribution you actually face: the common cases, the edge cases, the label balance, and the formats. A recent but skewed set teaches the model the skew.
The practical consequence is that "refresh the examples" is not a mechanical pull of the newest records. It's a re-sampling against the real distribution, which means someone has to understand that distribution. Treat example refreshes as a curation task, not a copy-paste. This is also why ownerless example sets degrade even when someone occasionally swaps in new samples without thinking about balance.
Myth: If Output Looks Right, the Approach Is Working
This myth is dangerous precisely because it feels like common sense.
Reality: Few-shot prompting optimizes the surface of the output: the format, the tone, the structure. That makes outputs look polished and confident even when the substance is wrong. A reviewer skimming for obvious problems sees clean formatting and a plausible answer and approves it. "Looks right" is exactly the signal that few-shot is best at faking, which makes it the worst quality bar to rely on.
The accurate stance is to measure correctness against ground truth on a sample, not to eyeball whether outputs look professional. On high-stakes tasks, fluent output should be treated as a hypothesis to verify, not an answer to trust. The Hidden Risks of Zero Shot vs Few Shot Learning covers this confidence trap in full.
Frequently Asked Questions
Is it true that more examples always help few-shot accuracy?
No. Accuracy typically plateaus between two and five examples and can decline beyond that due to overfitting, dilution, and label imbalance. Past the plateau you pay more tokens per call for equal or worse results. Test the example count rather than assuming more is better.
Does few-shot always beat zero-shot on difficult tasks?
Not on strong models. A precise zero-shot prompt often matches or beats few-shot without the token cost, especially as models get better at following instructions. The real trigger for few-shot is "hard to describe," like a specific voice, not "hard" in general; many hard tasks are easy to articulate and do fine zero-shot.
Is few-shot learning a form of training the model?
No. Few-shot conditions a single response at inference time; the model's weights never change. That's different from fine-tuning, which actually updates parameters using a training dataset. Few-shot is instant and reversible and you can change examples per call, while fine-tuning is a heavier commitment.
Does zero-shot mean giving the model no instructions?
No. Zero-shot means zero examples, not zero guidance. A strong zero-shot prompt can be long and detailed, specifying format, edge cases, and negative cases; it just contains no worked examples. Most zero-shot "failures" come from vague instructions, so tightening the instruction often beats adding examples.
Is choosing between them a one-time decision?
No. The right choice shifts with volume, model capability, and data drift. A decision that was correct months ago can be wasting money or silently degrading now. Re-evaluate periodically rather than locking in a permanent answer.
Key Takeaways
- More examples don't reliably help; accuracy plateaus around two to five and can decline beyond that.
- On strong models, precise zero-shot often matches few-shot; the real trigger for examples is "hard to describe," not "hard."
- Zero-shot means zero examples, not zero instructions, and few-shot is inference-time conditioning, not training.
- Example selection, order, and balance matter as much as correctness; unrepresentative examples are a real failure mode.
- The choice isn't permanent; re-evaluate as volume, models, and data change.