AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Why do language models hallucinate in the first place?The confidence problemWhen hallucination is most likelyCan prompting really reduce hallucinations, or is that the model's job?What is the single most effective prompt change?A before-and-afterHow do I get the model to admit when it doesn't know?Phrases that workReinforce with structureDoes asking for sources or citations actually help?What about chain-of-thought and step-by-step reasoning?Where it helps and where it doesn'tHow do I know if my changes are working?A simple scorecardFrequently Asked QuestionsDoes a bigger or newer model eliminate hallucinations?Is temperature setting related to hallucination?Can I just tell the model not to hallucinate?How much context should I provide?Will these techniques slow down my responses?Key Takeaways
Home/Blog/Can a Prompt Really Keep AI From Making Things Up?
General

Can a Prompt Really Keep AI From Making Things Up?

A

Agency Script Editorial

Editorial Team

·December 14, 2023·7 min read
reducing hallucinations through promptingreducing hallucinations through prompting questions answeredreducing hallucinations through prompting guideprompt engineering

Hallucination is the polite industry word for a model stating something false with complete confidence. It is the single biggest reason teams hesitate to put AI in front of clients, and it is also the area where prompting gives you the most leverage. Most people assume hallucination is a fixed property of the model, something you simply tolerate. In practice, a large share of fabricated answers trace back to vague instructions, missing source material, or prompts that quietly reward guessing over admitting uncertainty.

This article works through the questions that come up most often when teams start taking hallucination seriously. The answers are practical rather than academic. The goal is to give you a clear mental model of where fabrication comes from and which prompting moves actually move the needle, so you can stop treating accuracy as luck.

Why do language models hallucinate in the first place?

A language model predicts the most plausible next token given everything before it. It is not consulting a database of verified facts; it is generating text that statistically resembles correct answers. When the training data is thin, contradictory, or absent for your specific question, the model still produces a fluent, confident response because fluency and confidence are what it was optimized for.

The confidence problem

The dangerous part is that fabricated answers look identical to accurate ones. There is no built-in tremor in the voice. This is why you cannot rely on tone or detail to judge truth, and why prompting strategies that force the model to expose its reasoning or cite sources are so valuable.

When hallucination is most likely

  • Questions about recent events outside the training window
  • Highly specific facts like dates, figures, citations, and proper names
  • Niche domains with little public documentation
  • Requests that assume a false premise the model tries to satisfy anyway

Can prompting really reduce hallucinations, or is that the model's job?

Both are true, but prompting is the lever you control today. You cannot retrain the model, but you can change what you ask of it and what you give it to work with. The two highest-impact moves are grounding and permission to abstain.

Grounding means supplying the source material in the prompt itself and instructing the model to answer only from that material. Permission to abstain means explicitly telling the model that saying "I don't know" is an acceptable, even preferred, response when the answer is not supported. For a fuller walkthrough of these techniques in sequence, A Step-by-Step Approach to Reducing Hallucinations Through Prompting lays out the order that works.

What is the single most effective prompt change?

If you make one change, make it this: tell the model to answer only from provided context and to say it cannot find the answer when the context does not contain it.

A weak prompt asks an open question and hopes the model knows. A strong prompt provides the relevant documents and constrains the model to them. This converts an open-ended recall task, where fabrication thrives, into a closed reading-comprehension task, where it largely does not.

A before-and-after

  • Weak: "What is the refund window for this client's product?"
  • Strong: "Using only the policy text below, state the refund window. If the policy text does not specify it, respond: not stated in provided policy."

The second version cannot invent a number, because you have removed the incentive and given it a sanctioned escape hatch.

How do I get the model to admit when it doesn't know?

Models hedge poorly by default because most training rewards helpful, complete answers. You have to actively grant permission to be incomplete and make abstention the safe choice.

Phrases that work

  • "If you are not certain, say so rather than guessing."
  • "Distinguish facts supported by the context from your own inferences."
  • "It is better to return no answer than an unverified one."

Reinforce with structure

Ask for a confidence label or a source citation alongside each claim. When the model has to attach a source, unsupported claims become visibly empty, and that friction discourages fabrication. The patterns in Reducing Hallucinations Through Prompting: Best Practices That Actually Work cover several reusable phrasings worth saving as snippets.

Does asking for sources or citations actually help?

Yes, with one caveat. Requiring citations forces the model to tie claims to specific passages, which suppresses invention when you have supplied real documents. The caveat is that without provided sources, models will sometimes fabricate plausible-looking citations too.

So the rule is: ask for citations, but only when you have given the model real material to cite, and always verify that the cited passage actually says what the model claims. A citation you never check is decoration, not evidence.

What about chain-of-thought and step-by-step reasoning?

Asking a model to reason step by step before answering reduces certain errors, particularly in math, logic, and multi-hop questions. By generating intermediate steps, the model is less likely to leap to a wrong conclusion.

Where it helps and where it doesn't

  • Helps: arithmetic, multi-step deductions, comparisons across several facts
  • Doesn't help much: recall of a single obscure fact the model never learned

Step-by-step reasoning organizes what the model knows; it cannot conjure knowledge that was never there. Pair it with grounding for the best result. The deeper mechanics are covered in The Complete Guide to Reducing Hallucinations Through Prompting.

How do I know if my changes are working?

You measure. Build a small evaluation set of questions where you know the correct answers, including some that the model should refuse because the answer is not available. Run your old prompt and your new prompt against the same set and count three things: correct answers, wrong answers, and appropriate refusals.

A simple scorecard

  • Accuracy: correct answers out of total answerable questions
  • Fabrication rate: confident wrong answers out of total
  • Abstention quality: did it refuse the genuinely unanswerable ones?

A good prompt change lifts accuracy while lowering fabrication, even if total answer volume drops slightly because the model now declines more often. That trade is almost always worth it. For concrete illustrations, see Reducing Hallucinations Through Prompting: Real-World Examples and Use Cases.

Frequently Asked Questions

Does a bigger or newer model eliminate hallucinations?

No. Larger models hallucinate less on average but still confidently fabricate, especially on niche or recent facts. Better prompting and grounding remain necessary regardless of model size, so do not treat an upgrade as a substitute for disciplined prompts.

Is temperature setting related to hallucination?

Lower temperature makes outputs more deterministic and slightly reduces creative fabrication, but it does not fix the root cause. A low-temperature model will still invent answers it lacks grounding for. Treat temperature as a minor knob, not a solution.

Can I just tell the model not to hallucinate?

Telling a model "do not hallucinate" has weak effect because the model has no internal flag for truth. Specific, actionable instructions like answer only from the context and cite your source work far better than abstract commands.

How much context should I provide?

Provide the relevant material and trim the rest. Dumping huge unfiltered documents can bury the answer and increase the odds the model latches onto the wrong passage. Retrieve and include the passages most likely to contain the answer, then constrain the model to them.

Will these techniques slow down my responses?

Slightly. Grounding adds retrieval, citations add tokens, and verification adds a check. The latency cost is real but small relative to the cost of a confident wrong answer reaching a client.

Key Takeaways

  • Hallucination comes from a model optimized for fluency, not truth, so it fabricates confidently when knowledge is missing.
  • The strongest prompting move is grounding: supply the source material and constrain the answer to it.
  • Explicitly grant permission to say "I don't know" so abstention becomes the safe default instead of guessing.
  • Ask for citations only when you provide real sources, and always verify the cited passage.
  • Measure with a small known-answer evaluation set that rewards both correct answers and appropriate refusals.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification