What Is Chain-of-Thought Prompting?
When you ask an AI to answer a multi-step question directly, it often gets it wrong. When you ask it to reason through the steps first, it often gets it right.
Chain-of-thought (CoT) prompting is the technique of asking a model to generate its intermediate reasoning steps before producing a final answer. The model works the problem out loud, and the reasoning it produces improves the accuracy of its conclusion.
It is not a workaround for weak models — CoT measurably lifts accuracy on reasoning tasks across all modern [large language models](what-is-a-large-language-model.md), including the strongest ones currently available.
How it works
1. Why direct answers fail on multi-step problems
A large language model generates text token by token. Each token is predicted from everything that preceded it inside its [context window](what-is-a-context-window.md). When a model jumps straight to an answer on a multi-step problem, it compresses several inferential steps into a single output — and the compression introduces errors. CoT forces those steps to appear in the output, where the model can read its own prior reasoning and condition each next step on it. The chain of tokens is the scratchpad.
2. The two CoT variants
| Variant | Trigger | When to use |
|---|---|---|
| Zero-shot CoT | Append "Let's think step by step" (or similar) to any prompt | Fast; works on most reasoning tasks without writing examples |
| Few-shot CoT | Include 2–3 worked examples where reasoning appears before the answer | Higher effort; transfers better to unusual or domain-specific problems |
Zero-shot CoT is the faster starting point — one phrase appended to an existing prompt. Few-shot CoT is worth the investment when accuracy on a specific repeated task is critical. See [few-shot prompting](what-is-few-shot-prompting.md) for the mechanics of embedding examples.
3. What the model is actually doing
CoT works because the model's output tokens are also its context. When it writes "First, let me identify the variables…" before computing, those words become part of the context window for every subsequent token. The model is reading its own prior steps as it writes. This is why CoT helps most on tasks with multiple sub-goals — each step in the chain constrains and anchors what comes next.
4. When CoT helps — and when it doesn't
| Task type | CoT benefit | Reason |
|---|---|---|
| Multi-step math | High | Consistent accuracy gains across benchmarks |
| Logical/causal reasoning | High | Explicit steps improve coherence |
| Simple factual lookup | None | No chain needed; adds tokens, no gain |
| Classification | Low | Direct prompts usually sufficient |
| Creative writing | Negligible | Reasoning steps are not the limiting factor |
CoT is not universally better. It consumes more [tokens](what-is-a-token.md) and costs more per query. On simple tasks it adds noise without adding accuracy. Apply it selectively to problems with sub-goals that must be resolved in sequence.
A concrete example
Without CoT (direct prompt):
"Sarah has 3 times as many apples as Tom. Together they have 24. How many does each have?"
Models asked directly may return 18 and 6, or 16 and 8, with no consistent reliability and no way to audit the answer.
With CoT (zero-shot trigger appended):
"Sarah has 3 times as many apples as Tom. Together they have 24. How many does each have? Let's think step by step."
The model now typically outputs:
"Let S = Sarah's apples, T = Tom's. S = 3T. S + T = 24. 3T + T = 24. 4T = 24. T = 6. S = 18. Answer: Sarah has 18, Tom has 6."
The reasoning chain is auditable. Each step is either right or wrong — and when the model makes an error, you can see exactly where. Without the chain, you have only the final number to evaluate.
Why it matters
Chain-of-thought prompting is useful wherever an AI answer requires more than one inference:
- Research synthesis: asking an AI to compare two arguments before reaching a verdict
- Financial analysis: walking through assumptions step by step before estimating an outcome
- Debugging logic: having the model trace a process before diagnosing where it fails
- Complex decisions: enumerating trade-offs before recommending an action
The practical implication: if an AI's direct answers on reasoning-heavy tasks seem unreliable, add a CoT trigger before accepting or discarding the output. The improvement is typically large enough to matter without switching models or rewriting prompts.
Try this
On your next multi-step AI query — a math problem, a causal question, a comparison — append "Let's think step by step" and compare the output to the direct version. On a reasoning task, you will typically see both a more accurate answer and a chain you can audit for errors.
If you use JustJot.ai to capture AI-assisted research, save a CoT prompt template in your notes alongside the task types where it performs best. The template transfers across every AI tool you use — the technique is model-agnostic.
The operating rule: when you need an AI to reason, tell it to reason out loud. The steps it generates become the scratchpad it reads.