After reading this, you'll be able to classify the hallucination you're facing, estimate its risk, choose the right mitigation, and run a verification workflow in three minutes rather than thirty.
TL;DR
- Hallucination is not one phenomenon — there are four distinct types, each with different causes and different fixes.
- The root cause is the same in every case: text prediction optimizes for plausibility, not truth.
- Risk is not uniform — hallucination rates vary sharply by task type, topic density, and prompt structure.
- The highest-yield mitigation is task design, not post-hoc verification: ask the right kind of question and you get far fewer errors.
- A three-step verification protocol handles most real-world cases without becoming a full-time job.
What hallucination actually is (the mechanism)
A large language model (LLM) generates text by predicting the next token — roughly, a chunk of text — given all tokens before it. Training exposes the model to vast human writing and teaches it the statistical patterns of which tokens follow which. The model learns what a plausible next token looks like in each context. (For the full treatment, see [How Large Language Models Work](how-large-language-models-work.md).)
That process produces impressive results because human writing is generally true and coherent. When you ask a question, the most plausible continuation is usually a correct answer.
"Usually" carries real risk. The model has no separate truth-checking mechanism — no database it queries, no alarm that fires when the plausible answer diverges from reality. It predicts text; truth is a property of text that prediction ignores.
| Human expert | LLM | |
|---|---|---|
| Answer comes from | Knowledge + memory | Statistical text patterns |
| Knows when it doesn't know? | Yes — feels the gap | No — no uncertainty signal |
| Confidence signals | Hedges ("I think"), caveats | Always fluent; confidence ≠ accuracy |
| Error type when wrong | Usually acknowledges limits | Fabricates with full confidence |
The design implication: you cannot use fluency as a quality signal. A hallucinated sentence and a correct sentence look identical from the outside.
Four types of hallucination (and how to spot each)
Treating all hallucinations as one class leads to the wrong fixes. A working taxonomy:
| Type | What happens | Common trigger | Detected by |
|---|---|---|---|
| Fabrication | The model invents a fact that doesn't exist — a fake citation, a nonexistent study | Rare or niche information request | Checking the primary source directly |
| Attribution error | A real quote, statistic, or idea is assigned to the wrong person or source | Requests for named examples or citations | Verifying attribution at source |
| Cutoff drift | The model states outdated facts as current | Questions about recent events, prices, or current status | Cross-checking with a dated source |
| Intrinsic contradiction | The model contradicts a claim it made earlier in the same response | Long, multi-part prompts with conflicting constraints | Re-reading the full output; asking the model to self-check |
Attribution error (the fake citation) is the most dangerous type because users are primed to trust it — it looks like verifiable evidence. Fabrication in dense technical domains (medicine, law, novel research) is the most harmful because the gap between plausible and true is widest.
What amplifies the risk
Hallucination rate is not constant. Several conditions reliably raise or lower it.
High-risk conditions
Specificity combined with obscurity. A question like "What did the 1987 telecommunications reform bill say about regional carriers in Wyoming?" pushes the model to fill a precise gap in thin training data. The most plausible completion is a coherent-sounding answer — not an admission of ignorance.
Requests for citations. Asking for sources triggers fabrication at a higher rate than factual questions alone. The model generates citation-shaped text; a citation is a specific entity that either exists or doesn't.
Long context with conflicting information. When a prompt contains ambiguity or contradictions, the model prioritizes coherence over accuracy. It may invent a fact to smooth a tension.
Expert impersonation prompts. Asking the model to "respond as a cardiologist" does not give it cardiologist knowledge — it biases it toward cardiologist-sounding text. Domain-specific vocabulary increases fluency and risk together.
Low-risk conditions
Well-documented, stable facts. The capital of France. The boiling point of water. Training data is dense; plausible and true almost always agree.
Reasoning over provided text. "Summarize this document" or "Find inconsistencies in this contract" — the model works on text you supplied, not on recall. Errors become compression failures, not fabrications.
Structured output from constrained input. "Convert this list of names into JSON." The model's job is transformation; hallucination has little room to operate.
A 3-step verification protocol
The right response to hallucination risk is not blanket skepticism (too slow) or blind trust (too risky). It's risk-calibrated verification.
Step 1 — Classify the task.
| Task type | Hallucination risk | Verification need |
|---|---|---|
| Factual recall (dates, names, citations) | High | Always verify at source |
| Reasoning over text you provided | Low | Spot-check the logic |
| Creative or drafting work | N/A — truth is not the point | None required |
| Recent events or current status | High (cutoff drift) | Always check a dated source |
Step 2 — Verify selectively.
Verify the specific claims that carry the most weight, not the entire output. For a legal brief, verify every citation. For a market summary, verify the numbers and dates. For a brainstorm, verify nothing — you are generating, not fact-checking.
Step 3 — Record what you confirm.
Any fact you check and confirm is now yours, not the model's. Write it down where you will find it again. Your verified notes are the layer of ground truth the model cannot provide for itself. This is also the principle behind retrieval-augmented generation (RAG): keep the model grounded on curated, verified sources so fabrication stays in a tighter box. See [What Is Retrieval-Augmented Generation](what-is-retrieval-augmented-generation.md) for how that works in practice.
Mitigation techniques ranked by yield
| Technique | What it does | When to use |
|---|---|---|
| Ask the model to reason step by step | Forces explicit logic → errors are visible and catchable | Complex factual questions |
| Ask the model to flag uncertainty | Prompts hedging; reduces (does not eliminate) confident fabrication | Any factual task |
| Supply the source, ask it to work from that | Converts recall to reasoning; moves from fabrication risk to compression risk | When you hold the document |
| Use a RAG-enabled tool | Grounds the model on your curated data | High-stakes professional workflows |
| Break multi-part prompts into steps | Reduces complexity-induced intrinsic contradiction | Long prompts with many constraints |
Techniques that do not reliably work:
- Adding "don't make things up" to the prompt — the model has no mechanism to comply; it cannot detect its own fabrications.
- Choosing a more capable model — higher capability reduces some hallucination types but eliminates none.
- Relying on the model's confidence wording — it hallucinates confidently.
Common mistakes
Treating fluency as accuracy. The most readable, well-structured answer is not the most accurate one. Fluency is a property of the generation, not the content.
Verifying the model's output against itself. Asking the same model the same question is not verification — both answers come from the same statistical process and will often agree regardless of truth.
Assuming fine-tuned models know more. Fine-tuning shapes behavior — tone, format, instruction-following — not factual knowledge. A model fine-tuned for customer service does not know your product; it knows how to sound helpful.
Stopping at "the model got it wrong." The useful question is which type of hallucination occurred. Attribution error → check citation discipline. Cutoff drift → supply the model with current context. Fabrication from thin data → use a primary source instead. The type determines the fix.
Verifying nothing. The risk is not constant, but it is never zero. Blanket trust fails at the worst moments — a fake citation in a submitted document, a fabricated statistic in a presentation — precisely because the output was fluent enough that no flag was raised.
Summary and decision rule
Hallucination is predictable in type and variable in rate. The four types — fabrication, attribution error, cutoff drift, and intrinsic contradiction — have distinct triggers, and risk scales with task specificity, recency, and citation requests.
The single decision rule that handles most cases:
If you will use the output as evidence, verify it at source. If you will use it as a starting point, proceed and check what you build on.
This is less strict than "verify everything" (which makes AI impractical) and less reckless than "trust everything" (which fails on high-stakes claims). Calibrate verification effort to the cost of being wrong.
For the full picture of how these errors arise inside the model, see [How Large Language Models Work](how-large-language-models-work.md). For the practical verification workflow in everyday AI use, see [How to Verify an AI Answer](how-to-verify-an-ai-answer.md). And for a quick briefing on the existing explainer, see [What Is an AI Hallucination](what-is-an-ai-hallucination.md) — the present piece is the deeper companion to that one.