JustJot.ai
← Articles
ai-literacy2026-06-17

"Why AI Gives You a Different Answer Every Time (And Why That's Not a Bug)"

"She asked the same question twice and got two different answers. She thought the AI was broken. It was doing exactly what it was built to do."

the storyteller

The first time it happened, Maya thought she'd misread the screen.

She was drafting a tricky email — turning down a client without burning the bridge — and she'd asked her AI assistant for help. The reply was good. A little too formal, maybe, but good. She closed the tab to grab coffee, came back, and asked the exact same question again, word for word, just to compare. The new answer was warmer, shorter, and opened with a completely different line.

Same prompt. Same app. Same minute. Two different answers.

Her first instinct was the one most of us have: something's broken. A calculator that returned 4 and then 5 for 2+2 would be defective. We expect machines to be consistent, and an AI that won't give you the same answer twice feels less like a tool and more like a moody coworker.

But the AI wasn't broken. It was doing the one thing it was actually built to do — and once you understand what that is, every "inconsistency" you've ever seen from one of these systems suddenly makes sense. By the end of this guide you'll know exactly why the same question yields different answers, when that's a feature you want, when it's a risk you need to control, and how to get repeatable results on the days you need them.

TL;DR

The machine is a fortune-teller, not a filing cabinet

Here's the mental model that fixes everything. Picture two very different machines.

The first is a filing cabinet. You ask it a question, it walks to the right drawer, pulls the one correct folder, and reads it back to you. Ask again, same drawer, same folder, same words. A search engine works a little like this. So does a calculator.

The second is a fortune-teller who is preternaturally good at finishing your sentences. You give her the beginning — "The best way to apologize is…" — and she predicts the most fitting next word, then the next, then the next, building the whole answer one word at a time. She's not retrieving a stored response. She's generating one, live, based on everything she's ever read.

A large language model is the second machine. It does not store answers and hand them back. It predicts text, one piece (one [token](what-is-a-token.md)) at a time, and at every single step it faces a fork: many different next words would all be reasonable.

Filing cabinet (search)Fortune-teller (LLM)
What it doesRetrieves a stored resultGenerates new text live
Same input twiceIdentical outputOften different output
"Knows" the answer asA document it can fetchA pattern it can continue
Surprises you?Almost neverBy design

Maya was treating the fortune-teller like a filing cabinet. That's the whole misunderstanding. And the fork in the road — that moment where many words would fit — is where the variation lives.

The fork in the road: why there's never just one right word

Slow the machine down to a single step and watch it think.

You've typed: "The weather today is". The model now has to predict the next word. It doesn't pick one — it produces a whole ranked list of candidates, each with a probability, like a weather forecast for words:

"The weather today is ___" - sunny — 24% - cold — 19% - beautiful — 14% - perfect — 9% - terrible — 6% - …and thousands more, trailing off toward zero

Notice the problem this creates. There is no single "correct" next word here — sunny, cold, and beautiful are all perfectly good. A filing cabinet would have nothing to retrieve. The model has to choose from a crowd of good options.

So how should it choose? You might think: always take the top one. Always say sunny. And the machine can do that. But if it always grabbed the single highest-probability word at every step, its writing would come out stiff, repetitive, and weirdly flat — the same safe phrasings over and over, like a person who only ever says the most expected thing. The most probable sentence is rarely the most human one.

So instead, at each fork, the model rolls weighted dice. sunny has the best odds of being picked, but cold and beautiful are genuinely in the running too. Multiply that small dice-roll across the hundreds of forks in a full answer, and two runs of the same prompt drift down different paths almost immediately — like two hikers who start at the same trailhead, take slightly different turns, and end up at different lookouts. Both walks are valid. They just aren't the same walk.

That's it. That's the entire mystery Maya ran into. The answer changed because at every word, the machine had real choices, and it didn't always make the same one.

Temperature: the dial that sets how adventurous the dice are

There's a single setting that governs how wild those dice rolls get, and it has a wonderfully physical name: temperature.

Think of it as how much you're willing to let the model wander away from the safest, most-likely word.

TemperatureThe dice are…The model feels…Reach for it when
Low (near 0)Loaded — top word almost always winsFocused, predictable, a little dryFacts, code, data extraction, anything you need repeatable
MediumBalancedNatural and varied, still on-topicEveryday writing, explaining, general chat
HighLoose — long-shot words get a real chanceCreative, surprising, sometimes off the railsBrainstorming, poetry, breaking a blank page

At a temperature of zero, the fortune-teller stops gambling. She takes the single most likely word at every fork, every time — and the fortune-teller starts to behave like a filing cabinet. Ask the same thing twice and you'll usually get the same answer, or very nearly. Crank the temperature up, and you've handed her permission to surprise you — wonderful for a brainstorm, nerve-wracking for a tax question.

Most chat assistants run at a medium temperature out of the box, because that's what makes them feel like they're talking to you rather than reciting. That friendly, slightly different-every-time quality you've come to expect? It's a deliberate choice on this dial — not an accident.

Framework — match the dial to the job: Need the same answer twice? → Low. Facts, code, classification, anything auditable. Need it to sound human and stay on track? → Medium. The default for a reason. Need ten ideas you haven't thought of? → High. Then you pick the keeper.

When you want the surprise — and when you don't

Here's the turn, and it's worth saying plainly: the variation isn't the enemy. Using the wrong amount of it is.

Think about what Maya was actually doing. She wanted help finding the right tone for a delicate email — and "give me a few different takes so I can pick" is exactly the kind of task where a machine that surprises you is a gift. Run the prompt three times and you've got three drafts to react to. The variation did her a favor; she just didn't know to expect it.

Now move the same behavior to a different room. Imagine she'd asked, "What's the dosage limit on this medication?" or "What's the formula in cell B7?" Suddenly an answer that changes every time isn't charming — it's dangerous. For facts, math, code, and anything with a single correct answer, you want the filing cabinet, not the fortune-teller.

The variation is a……when the task isBecause
Featurebrainstorming, naming, drafting, rewriting for tonemore options = more raw material to choose from
Liabilityfacts, calculations, code, legal/medical/financial infothere's one right answer, and "creative" means "possibly wrong"

This is also why a confidently-worded answer that changes on a second ask should make your antennae go up. If the machine is willing to tell you two different "facts" with equal confidence, that's a quiet signal it may be guessing — which is its own well-documented failure mode. (See [What Is an AI Hallucination](what-is-an-ai-hallucination.md) for why a model can sound certain and still be wrong.)

How to get the same answer twice (when you actually need it)

So you're doing serious work and you need the machine to stop wandering. You have more control than it feels like.

  1. Turn the temperature down — if the app lets you. In developer tools and many "advanced settings" you can set temperature directly; push it toward zero for repeatable, just-the-facts output. (Plenty of consumer chat apps hide this dial — in which case lean on the next three.)
  2. Pin the prompt down. Vagueness forces the model to make more choices, and every choice is another dice roll. "Summarize this in exactly three bullet points, each under twelve words" leaves far less room to wander than "summarize this." Specific instructions are themselves a kind of temperature control. ([You Don't Have a Prompting Problem](you-dont-have-a-prompting-problem.md) goes deeper on writing prompts that constrain the answer.)
  3. Ask three times and compare. This is the trick the pros use precisely because the answers vary. Where three runs agree, you can be more confident. Where they diverge, you've found the soft spot that needs a human to check.
  4. Never confuse consistent with correct. A model at temperature zero will give you the same answer every time — even when that answer is wrong. Repeatability removes the randomness, not the errors. The verification step is still on you; [How to Verify an AI Answer](how-to-verify-an-ai-answer.md) is the checklist for that.

That last point is the one to tattoo on the inside of your eyelids. Lowering the temperature buys you the same answer twice. It does not buy you the right answer. Those are different purchases.

Common mistakes

Summary, and one thing to try right now

A language model gives you different answers to the same question because it isn't fetching a stored answer — it's predicting one word at a time, and at every step it rolls weighted dice among many plausible options. The temperature setting decides how adventurous those dice are: low for repeatable facts, high for creative surprise. The variation is a gift when you're brainstorming and a hazard when you need the truth — so the skill isn't eliminating it, it's aiming it.

Maya didn't have a broken assistant. She had a fortune-teller she'd mistaken for a filing cabinet. The moment she understood that, she stopped fighting the variation and started using it — three drafts for the delicate email, one locked-down answer for the numbers.

Try this right now: open your AI assistant and ask it something open-ended — "Give me a metaphor for how memory works" — twice in a row. Watch the two answers diverge, and you've just seen the dice roll with your own eyes. Then ask it a factual question twice and notice how much less it drifts. You now know more about how the machine thinks than most of the people using it.

From here, the natural next reads are [How Large Language Models Work](how-large-language-models-work.md) for the full picture of the prediction engine, and [What Is a Token](what-is-a-token.md) to see exactly what "one word at a time" really means under the hood.