AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Why a Smart-Sounding Model Gets Math WrongPredicting Text, Not Computing AnswersThe Trouble With Unfamiliar NumbersWhy It Sounds So SureYour First Technique: Ask It to Show Its WorkWhat to SayWhy It WorksYour Second Technique: Break the Problem ApartOne Thing at a TimeWhy Smaller Is SaferYour Third Technique: Let It Use a CalculatorTools and CodeWhen You Cannot Use ToolsBuilding the Habit of CheckingQuick Sanity ChecksMatch Checking to StakesA Few Common Situations and What to DoWorking Out a Tip or a SplitCalculating a Discount or MarkupAnything With Money on the LineFrequently Asked QuestionsAre language models just bad at math, then?Do I need to know how to code to use these techniques?Which technique should I learn first?Can I just tell the model to be more accurate?How do I know if an answer is actually right?Key Takeaways
Home/Blog/Starting Out With Numbers and Language Models
General

Starting Out With Numbers and Language Models

A

Agency Script Editorial

Editorial Team

·April 19, 2020·8 min read
prompting for numerical reasoning tasksprompting for numerical reasoning tasks for beginnersprompting for numerical reasoning tasks guideprompt engineering

If you have asked a chatbot to do a calculation and gotten a wrong answer delivered with total confidence, you are not doing anything wrong. Language models are genuinely unreliable at math, and that surprises people because the same models can write essays, explain science, and hold a conversation. The mismatch is jarring: how can something so articulate get a tip calculation wrong?

This piece assumes you know nothing about how these models work and starts from the beginning. We will define the terms you need, explain in plain language why models trip over numbers, and walk through the first few techniques that make a real difference. By the end you will understand enough to get dependable numerical answers from a model, and to know when you should not trust one without checking.

Numerical reasoning just means any task that involves working with numbers — adding, finding percentages, converting units, solving word problems, or doing calculations buried inside a larger question. It sounds simple, and for a calculator it is. For a language model, it is one of the trickier things you can ask, and knowing why is the first step to handling it well.

Why a Smart-Sounding Model Gets Math Wrong

The key idea is that a language model does not calculate. It predicts.

Predicting Text, Not Computing Answers

A language model works by guessing the next word, or piece of a word, over and over. It learned to do this by reading enormous amounts of text. When it writes "2 plus 2 equals 4," it is not adding; it is recalling that those words tend to go together because it saw them constantly. That works fine for common, simple sums.

The Trouble With Unfamiliar Numbers

Ask it to add 4,817 and 2,932 and the model has rarely seen that exact sum written out. So it does what it always does — guesses based on similar patterns — and the guess is often close but wrong. The bigger or less common the numbers, the worse it gets. This is the single most important thing to understand: the model is approximating, not calculating.

Why It Sounds So Sure

A model has no built-in sense of doubt. It states a wrong number in the same confident tone as a right one. There is no wobble in its voice to warn you, which is exactly why beginners get caught out.

Your First Technique: Ask It to Show Its Work

The easiest improvement requires no special tools, just a change in how you ask.

What to Say

Instead of "What is 23 percent of 1,840?" try "What is 23 percent of 1,840? Show each step of your calculation before giving the final answer." That small addition often turns a wrong answer into a right one.

Why It Works

When the model writes out the steps — "10 percent is 184, so 20 percent is 368, and 3 percent is 55.2, giving 423.2" — each step is a smaller, easier guess. Breaking one hard problem into several easy ones is the whole trick. As a bonus, you can read the steps and see where things went wrong if they did. The fuller version of this approach lives in A Step-by-Step Approach to Prompting for Numerical Reasoning Tasks.

Your Second Technique: Break the Problem Apart

For anything with more than one step, do not ask for everything at once.

One Thing at a Time

Suppose you want to know the monthly payment on a loan after a discount and a fee. Ask for the discounted amount first. Check it. Then ask about the fee. Then the monthly payment. Each answer is simpler and easier to verify than one big tangled request.

Why Smaller Is Safer

When a model does four operations in one go, a mistake in the first one ruins everything after it, and you cannot tell where it broke. Splitting the work means a mistake stays contained in one step, where you can spot and fix it.

Your Third Technique: Let It Use a Calculator

The most reliable fix is to stop asking the model to do the arithmetic itself.

Tools and Code

Many modern AI tools can run actual code or use a built-in calculator. When that option exists, the model writes out the math and a real calculator computes it — exactly, every time. The model is good at figuring out what to calculate; it is bad at doing the calculation. Letting a tool handle the arithmetic plays to each one's strength.

When You Cannot Use Tools

If your tool cannot run code, do the final arithmetic yourself or in a spreadsheet. Let the model set up the problem and explain the approach, then trust a real calculator for the numbers.

Building the Habit of Checking

The last beginner skill is not a prompt at all. It is a mindset.

Quick Sanity Checks

Glance at every number for plausibility. Should a discount make the price go up? Can a percentage be over 100? Is the answer roughly the size you expected? These five-second checks catch the embarrassing errors. The common traps are laid out in 7 Mistakes That Wreck Numerical Reasoning Prompts.

Match Checking to Stakes

A rough estimate for your own curiosity needs little checking. A number you are putting in front of a client or a boss needs real verification. Decide how much a wrong answer would cost, and check accordingly. Seeing these ideas applied helps, and Where Numerical Reasoning Prompts Earn Their Keep walks through concrete cases.

A Few Common Situations and What to Do

It helps to connect the techniques to the everyday moments where you will actually use them.

Working Out a Tip or a Split

For a restaurant tip or splitting a bill, the numbers are small and familiar, so the model usually gets them right. Even so, asking it to show the steps takes no effort and lets you glance at the work. This is the lowest-stakes case, where a quick look is all the checking you need.

Calculating a Discount or Markup

Discounts and markups involve percentages, which is where small errors creep in. Ask for the steps explicitly:

  • State the starting price and the percentage clearly. Say whether the percentage comes off or gets added on.
  • Have the model show the percentage amount, then the new price. Two visible steps instead of one hidden jump.
  • Glance at the direction. A discount should make the price smaller; a markup should make it bigger. If it goes the wrong way, something broke.

Anything With Money on the Line

The moment a number is going to someone else — a quote, an invoice, a figure in a report — treat it as high-stakes. Use a calculator or a tool for the actual arithmetic and check the result a second way. The cost of a wrong number in front of a client is far higher than the minute it takes to verify. The same instinct, applied throughout a real workflow, appears in Where Numerical Reasoning Prompts Earn Their Keep.

Frequently Asked Questions

Are language models just bad at math, then?

They are unreliable at arithmetic specifically, because they approximate rather than calculate. They can be quite good at the reasoning around math — setting up a word problem, choosing the right formula, explaining a concept. The weakness is in the exact computation, which is why the fixes focus on improving reasoning and offloading the actual arithmetic.

Do I need to know how to code to use these techniques?

No. Asking the model to show its work and breaking problems into steps require no coding at all. Using a tool to run a calculation usually means clicking a button or the tool doing it automatically, not writing code yourself. You can get most of the benefit with plain prompting.

Which technique should I learn first?

Start with asking the model to show its work step by step. It is the easiest to do, works in any tool, and produces the biggest single improvement for the least effort. Once that is a habit, add breaking problems apart and checking the results.

Can I just tell the model to be more accurate?

Not really. Telling a model to "be accurate" or "double-check your math" helps a little because it nudges the model toward showing work, but it does not change the underlying limitation. The reliable gains come from structure — showing steps, splitting the task, using tools — not from instructions to try harder.

How do I know if an answer is actually right?

For simple cases, a sanity check on plausibility and a rough estimate of the expected size will catch most errors. For anything important, recompute the number a different way or use a real calculator and compare. If two independent methods agree, you can be confident; if they disagree, you have found a mistake worth catching.

Key Takeaways

  • Language models predict text rather than calculate, so they approximate arithmetic and get unfamiliar numbers wrong while sounding confident.
  • Asking the model to show its work step by step is the easiest and most effective first technique.
  • Breaking a multi-step problem into separate, checkable parts keeps any single error from ruining the whole answer.
  • Letting the model use a calculator or run code is the most reliable fix for exact arithmetic.
  • Build a habit of quick sanity checks, and verify carefully whenever a wrong number would actually cost you something.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification