Once you have the basics down β clean tables, clear questions, a verification habit β the interesting work begins. Real client data does not arrive in tidy form. It arrives as three spreadsheets that need to be joined, charts with truncated axes that exaggerate trends, and exports where the same column mixes currencies and formats. These are the cases where naive prompting produces an answer that looks fine and is quietly wrong.
The skill at this level is less about clever wording and more about anticipating where interpretation breaks down and structuring the work to prevent it. An expert does not just prompt better; they decompose the problem, constrain the model's assumptions, and verify at the points most likely to fail. This is craft built from knowing the edge cases.
This guide covers the hard cases that separate competent practitioners from experts: combining multiple sources, handling deceptive or ambiguous visuals, taming messy real-world exports, and pushing the model to reason rather than pattern-match.
A theme runs through all of it: the model will rarely volunteer that it is in trouble. It does not pause to say the join was ambiguous or the axis was truncated or half the rows were silently dropped. It produces a fluent answer regardless. So expertise at this level is largely about anticipating the failure before it happens and structuring the prompt to force the model to surface its assumptions, report what it could not do, and flag the places a human needs to look. The craft is defensive by design.
Combining Multiple Sources
Joins the Model Gets Wrong
When an analysis spans several tables, the model has to align them on a common key. It frequently guesses the join incorrectly β matching on the wrong column or silently dropping unmatched rows. Spell out the join key and the join type explicitly, and ask the model to report how many rows matched so you can catch silent drops.
Reconciling Conflicting Figures
Two sources often disagree on the same number because of different definitions or time windows. The model will happily average them or pick one without flagging the conflict. Instruct it to surface discrepancies rather than resolve them silently, and decide the resolution yourself.
Deceptive and Ambiguous Visuals
Truncated and Dual Axes
A chart with a y-axis that does not start at zero exaggerates differences, and a dual-axis chart invites false correlation. A naive vision prompt reports the visual impression, which is exactly the distortion the chart designer baked in. Ask the model to read the actual axis bounds and qualify its trend statements accordingly.
Missing Context
A chart without clear labels or units is genuinely ambiguous. An expert prompt makes the model state its assumptions explicitly rather than fill the gaps confidently. When the model has to assume, you want to see the assumption so you can correct it, a theme the risk guide develops.
Aggregated Data That Hides the Story
A chart that shows a smooth aggregate often conceals wild variation underneath β a stable average masking two diverging segments, or a monthly total hiding a single anomalous day. A naive prompt reports the calm surface. The expert move is to ask whether the aggregation could be hiding something and, where the underlying data exists, to drill into it rather than trusting the rolled-up view. Knowing that aggregation can deceive is itself a piece of hard-won judgment.
Taming Messy Exports
Mixed Formats in One Column
Real exports mix date formats, embed currency symbols, and stuff footnotes into data cells. A direct text prompt stumbles on these. The robust approach is a code-execution path that normalizes the column programmatically before any analysis, which the trade-offs guide recommends for messy data.
Wide Tables and Context Limits
When a table exceeds the context window, the model sees only part of it and reasons over a fragment without telling you. Chunk deliberately or use code to load and filter the file, surfacing only the relevant slice. Never let the model silently truncate the input.
Pushing the Model to Reason
Decompose Before You Ask
For a multi-step analysis, break the request into stages β extract, compute, conclude β and verify each before proceeding. A single mega-prompt invites compounding errors that are hard to trace; staged prompts isolate failures, as the metrics guide notes when separating failure stages.
Constrain the Conclusion
Models over-generalize from limited data, asserting causation or projecting trends a small sample cannot support. Explicitly bound what the model is allowed to conclude: report what the data shows, flag what it cannot support, and avoid causal language unless the design justifies it.
Have the Model Argue Against Itself
A powerful technique for hard cases is to ask the model, after it produces an analysis, to make the strongest case that its own conclusion is wrong. This surfaces the confounders, the alternative explanations, and the data limitations it glossed over the first time. The self-critique often catches problems no single forward pass would, and it costs only one extra prompt. Treat the model as both the analyst and the skeptical reviewer rather than trusting its first confident answer.
Demand Traceability
At the expert level, every figure in the output should trace to a specific cell, row, or axis reading. Require this traceability so a reviewer can audit the conclusion fast. It also discourages the model from inventing convenient numbers.
Edge Cases Worth Rehearsing
Time Zones and Date Boundaries
When data spans regions, the same event can land on different calendar days depending on time zone, quietly shifting daily and weekly aggregates. A model will not flag this unless prompted. For any time-sensitive analysis, confirm how dates are bucketed before trusting period-over-period comparisons.
Units That Silently Change
Exports sometimes switch units partway through β thousands in one section, millions in another, or a currency that changes after a column. The model treats the numbers as comparable and produces nonsense. Ask it to verify unit consistency across the data before computing anything that spans the boundary.
Survivorship and Missing Rows
A dataset that only contains records meeting some condition β active customers, completed orders β invites conclusions that ignore everything filtered out. An expert checks what the data does not contain, not just what it does, and constrains the conclusion to the population actually represented. Models rarely raise this on their own, so the prompt has to.
A Mental Checklist for Hard Files
When a genuinely difficult file lands, running through a short mental checklist catches most of the traps before they bite:
- Does this analysis span multiple sources, and if so, have I specified the join key and type?
- Could any chart here be distorting the story through a truncated or dual axis?
- Are units and date buckets consistent across the whole dataset?
- Does the table exceed the context window, risking silent truncation?
- What population does this data actually represent, and what has been filtered out?
- Have I bounded the conclusion and asked the model to argue against its own answer?
The checklist is not bureaucracy; it is the externalized form of the judgment experts apply automatically. Newer practitioners benefit from running it explicitly until the questions become reflexive, at which point the hard cases stop producing confident wrong answers.
Frequently Asked Questions
Why does the model join tables incorrectly?
It guesses the common key and join type when you do not specify them, sometimes matching the wrong column or dropping unmatched rows silently. State the key and join type explicitly and ask for the matched-row count.
How do I stop the model from being fooled by a truncated axis?
Ask it to read the actual axis bounds rather than describe the visual impression, then qualify the trend statement. A chart designed to exaggerate will mislead a naive vision prompt that only reports what it sees.
What is the best way to handle a column with mixed formats?
Use a code-execution path that normalizes the column programmatically before analysis. Direct text reasoning stumbles on mixed dates, embedded symbols, and inline footnotes.
Why decompose a complex analysis into stages?
Staged prompts isolate failures so you can verify extraction, computation, and conclusion separately. A single large prompt compounds errors and makes them hard to trace.
How do I keep the model from over-generalizing?
Explicitly bound what it may conclude: report what the data shows, flag what it cannot support, and avoid causal claims unless the data design justifies them. Models default to overreach without these constraints.
Key Takeaways
- For multi-table work, specify the join key and type and require a matched-row count to catch silent drops.
- Make the model read actual axis bounds so truncated or dual-axis charts do not mislead.
- Normalize messy columns with code before analysis, and never let the model silently truncate wide tables.
- Decompose complex analyses into extract, compute, and conclude stages and verify each.
- Constrain conclusions and demand traceability so every figure can be audited quickly.