"How Large Language Models Actually Work — A Plain-English Guide (No Math Required)"

You have almost certainly used a large language model — an LLM — in the last week. You typed a question, words came back, and they were usually good. But if someone asked you how it did that, you might reach for the wrong picture: a giant search engine, a database of answers, a person typing back. None of those is right, and holding the wrong picture is exactly why the tool sometimes surprises you.

Here is the promise of this guide: by the end, you will be able to explain in one sentence what an LLM is doing, predict the kinds of mistakes it will make, and write prompts that get better results — all without a single equation. We will build the idea from something you already know, then add one layer at a time.

TL;DR

An LLM does exactly one thing: given the text so far, it predicts the most likely **next

chunk of text**, then repeats. Everything else is built on that.

It learned those predictions by reading an enormous amount of text and adjusting billions of

internal dials until its guesses got good. It is a pattern machine, not a fact lookup.

It has no memory between conversations and no live access to the world unless a tool gives

it one. What it "knows" is frozen patterns, not current truth.

Hallucinations are not bugs bolted on — they are the same next-word machine running when the

pattern is thin. Understanding this tells you when to trust it.

You steer it with context: the words you put in front of it change which patterns it reaches

for. That is the whole skill of prompting.

1. Start with something you already do

Read this and let your brain finish it: "Thanks for the help, I really appreciate ___."

You said "it," or maybe "you," almost without trying. You did not look anything up. You have read and heard so much English that your brain holds a sense of what tends to come next, and it filled the blank with the most likely word.

That instinct — predict the next word from everything before it — is the entire core of a large language model. The "large" part just means it has read far more than any one person could, and it tracks those next-word odds with extraordinary precision. Strip away the marketing and an LLM is a very, very good autocomplete.

Hold onto that sentence. Everything difficult about these systems becomes simple once you trace it back to: it is predicting the next chunk of text.

2. It works in tokens, not words

One small correction to the picture, because it explains real behavior you will notice.

An LLM does not predict whole words. It predicts tokens — pieces of text that are often a word, but sometimes part of a word or a bit of punctuation. "Notebook" might be one token or split into "note" + "book." A rare name might be three or four tokens.

Why should you care? Because a few quirks fall straight out of it:

What you notice	Why tokens explain it
It miscounts letters in a word ("how many r's in strawberry?")	It sees tokens, not individual letters, so spelling-level questions are genuinely hard for it.
It sometimes invents a plausible-looking word	It is assembling tokens that fit the pattern, not retrieving a dictionary entry.
Very long inputs cost more and can get truncated	Limits are measured in tokens, not words — roughly 3–4 characters each.

So the precise version of our one sentence is: given the tokens so far, predict the next token, add it, and repeat. It writes your whole answer one token at a time, each new token chosen in light of every token before it — including the ones it just produced.

3. Where the predictions come from: training

If it is guessing the next token, the obvious question is: how does it guess well? The answer is training, and it happens in two stages worth separating.

Stage one — pretraining (learning the patterns). The model is shown a staggering amount of text and plays one game billions of times: cover the next token, guess it, check the real answer, nudge the internal dials to be a little less wrong. Those dials are called parameters, and a big model has hundreds of billions of them. No human writes rules like "after 'the' comes a noun." The rules emerge from the nudging. After enough rounds, the model has absorbed grammar, facts, tones, and reasoning patterns — not as stored sentences, but as statistical structure baked into the dials.

Stage two — fine-tuning (learning to be useful). A raw pretrained model just continues text; ask it a question and it might reply with more questions, because that is a pattern it saw. So humans show it examples of helpful answers and rate its attempts, teaching it to behave like an assistant. This is where "be helpful, be honest, refuse harmful requests" gets shaped in.

Here is the framework to carry forward:

The two-stage model. 1. Pretraining gives it the knowledge and pattern sense — what language and the world look like. Frozen at a cutoff date. 2. Fine-tuning gives it the manners — how to respond as an assistant. A model that gets a fact wrong usually has a pretraining gap; one that is rude or off-format usually has a fine-tuning gap.

4. Why it sounds confident even when it is wrong

Now we can explain the behavior that trips people up most.

The model is always doing the same thing: producing the next most-likely token. When you ask something it has seen a thousand times — the capital of France, how a for-loop works — the patterns are thick and consistent, and the likely next tokens are the correct ones. It looks like it "knows."

When you ask something rare, oddly specific, or simply untrue-but-plausible — the page number of a quote, a citation for a niche claim, the details of a person it barely encountered — the patterns are thin. But the machine never stops. It still produces the most likely-looking continuation, and a likely-looking citation has an author, a year, and a confident tone. That is a hallucination: not a glitch, but the exact same next-token engine running where the pattern was weak.

This is the single most useful thing to internalize, so here is a checklist for when to double-check:

[ ] Is this a specific, verifiable fact (a number, date, quote, citation, name)?
[ ] Is it about recent events — possibly after the model's training cutoff?
[ ] Is it niche — something few sources would have covered in depth?
[ ] Would being wrong cost you (medical, legal, financial, or published work)?

The more boxes you check, the more you verify the output yourself. The model's confidence is a feature of fluent text, not a measure of truth.

5. It has no memory and no live world

Two more limits follow directly from "it predicts text from its dials," and they surprise people who imagine the model as a little being that persists.

No memory between chats. Each conversation starts blank. The reason it seems to remember what you said three messages ago is that the entire conversation so far is fed back in as input every single time it generates a reply. Memory across sessions — your name, your preferences — only exists if the app deliberately stores those notes and re-sends them. The model itself forgets everything the moment the window closes.

No live access to the world. By default the model only has its frozen pretraining patterns. It cannot see today's news, your files, or a live price unless a tool hands that information to it — web search, a database query, a calculator. When a tool is connected, the app fetches real data, pastes it into the input, and then the model predicts a reply using it. The intelligence is still next-token prediction; the tool just supplies fresh, true text to predict from.

Common assumption	What is actually true
"It remembers me."	Only if the app re-sends saved notes each time. The model is stateless.
"It's searching the internet."	Only if a search tool is attached and run; otherwise it works from frozen patterns.
"It knows what happened yesterday."	Not unless given the info — its knowledge has a cutoff date.
"It double-checked its math."	Not unless a calculator tool ran; raw token-prediction is shaky at arithmetic.

This is also why semantic search pairs so well with LLMs: you can fetch the right notes by meaning and feed them in as context. If that idea is new to you, see our explainer, [What Is Semantic Search?](../ai-notetaking/what-is-semantic-search.md) — it is the other half of how AI tools find and use the right information.

6. The one lever you control: context

If you only remember one actionable thing, make it this. You cannot retrain the model, but you fully control the text you put in front of it — and that text decides which patterns it reaches for. This is called context, and prompting is simply the craft of shaping it.

A worked example. Compare two prompts:

"Write about notes."

versus

"You are helping a medical student. Explain, in three short bullet points, how to turn a dense lecture transcript into review-ready notes. Use plain language and one concrete example."

Same model, wildly different output. The second prompt loaded the context with a role (medical student), a format (three bullets), a task (transcript → review notes), and a style (plain

example). Each of those words tilts the next-token odds toward the answer you actually want.

A simple, durable framework for any prompt:

R-T-F-C. - Role — who the model should act as ("a patient tutor," "a skeptical editor"). - Task — the precise thing to do, one clear verb. - Format — how the answer should be shaped (bullets, table, 100 words, JSON). - Context — the specific material and constraints it should use. Give it all four and you are no longer hoping — you are steering.

Common mistakes

A few misunderstandings cause most of the frustration people have with LLMs:

Treating it as a search engine. It does not retrieve documents; it generates likely text. For

facts that must be exact, give it the source or use a tool — do not trust recall.

Believing the confident tone. Fluency is not knowledge. A smooth, certain paragraph can be

entirely invented. Run the verify-checklist from Section 4.

Assuming it remembers you. Without an app deliberately saving and re-sending notes, every chat

is a clean slate.

Blaming the model for a thin prompt. Vague in, vague out. Most "bad" answers improve

dramatically once you add role, task, format, and context.

Expecting reliable arithmetic or letter-counting. Token prediction is not calculation. For

exact math, use a tool or check it yourself.

Summary — and what to do next

Strip away the mystique and a large language model is one idea repeated at enormous scale: given the text so far, predict the next token, then do it again. It learned those predictions from a vast read-through of human text (pretraining) and was then taught to behave like an assistant (fine-tuning). It is stateless, frozen at a cutoff, and blind to the live world unless a tool feeds it. Its confidence tracks fluency, not truth — so it shines on well-trodden ground and hallucinates where patterns run thin. And the one lever in your hands is context: the words you put in front of it.

Try this today. Take a question you would normally type in one line and rewrite it with R-T-F-C — give the model a role, a precise task, a format, and the specific context. Run both versions and compare. The gap you see is the skill, and it is entirely learnable.

If you want to go deeper on how AI actually finds the right information to work with — the piece that makes everything above more reliable — read [What Is Semantic Search?](../ai-notetaking/what-is-semantic-search.md) next.