What Is a Context Window?
People assume a chatbot "remembers" the conversation the way a person does. It doesn't — everything it appears to know in the moment is sitting inside one finite buffer.
A context window is the maximum amount of text a language model can consider at once — your prompt, the documents you paste, and the conversation so far — measured in tokens. Anything outside it might as well not exist; the model cannot see it. A large language model is the prediction system behind tools like ChatGPT or Claude (see [How Large Language Models Work](how-large-language-models-work.md)). The context window is its working memory, and it has a hard ceiling.
How it works
1. Text is measured in tokens, not words
A token is a chunk of text the model reads as one unit — roughly four characters, or about three-quarters of an English word. "Notebook" might be one token; an unusual name might be three. The working rule:
| Tokens | Approx. English words | Rough page count |
|---|---|---|
| 1,000 | ~750 | ~1.5 pages |
| 10,000 | ~7,500 | ~15 pages |
| 100,000 | ~75,000 | a short book |
These ratios are stable for English prose; code and other languages run denser. Treat them as estimates, not exact counts.
2. The window holds everything at once
The context window is not "how much you can type." It is the sum of four things the model must hold simultaneously:
- The system instructions (set by the app, often invisible to you).
- The full conversation history so far.
- Whatever you paste in this turn — notes, an article, a transcript.
- The reply the model is about to generate (it reserves room for its own output).
All four compete for the same fixed budget. A long pasted document leaves less room for a long answer.
3. When it fills, the oldest text drops out
Context windows do not overflow gracefully — older turns get truncated or summarized to make room. This is why a long chat "forgets" what you said at the start: that text scrolled out of the window. The model isn't being careless. The information is simply no longer in front of it.
4. Bigger windows have grown fast — but aren't free
Window sizes have climbed from a few thousand tokens in early models to hundreds of thousands or more in recent ones, and they keep growing — so any specific number here would date quickly (flag that as an assumption, not a fact to memorize). Two things stay true regardless of the headline number: the limit is always finite, and stuffing the window full has costs — slower responses, higher price per call, and a measurable tendency for models to lose track of details buried in the middle of very long inputs.
A concrete example
You paste a 40-page report (~20,000 tokens) and ask for a summary. That works. You then ask twelve follow-up questions, each adding your question plus the model's answer to the running total. Around the point where conversation + report exceed the window, the model quietly stops "seeing" the report's opening pages. Its answers about the introduction get vaguer — not because it got worse at reading, but because page one is no longer in the window. Paste the relevant section again and accuracy returns instantly. That snap-back is the tell: it was a context limit, not a comprehension failure.
Why it matters
Most "the AI got dumber" complaints are really context-window effects. Once you can name the cause, you can fix it:
| Symptom | Likely cause | Fix |
|---|---|---|
| Forgot earlier instructions | They scrolled out of the window | Restate them in the current turn |
| Vague on a long pasted doc | Doc partly truncated | Paste only the relevant section |
| Answers got slower / pricier | Window near full | Start a fresh chat with a clean summary |
| Ignored a detail "buried" mid-document | Lost-in-the-middle effect | Move the key fact to the top or bottom |
The pattern: the model is only ever as good as what's currently inside the window. Curate that input and the output improves — no prompt magic required.
Try this
Next time a long chat starts drifting, don't fight it with more prompting. Open a new conversation and paste a tight, hand-written summary of what matters — the decisions, the constraints, the current state. You'll have spent a few hundred tokens to buy back the model's full attention.
This is also the quiet case for keeping your own durable notes. A context window is rented memory that empties every session; a [second brain](../ai-notetaking/how-to-build-a-second-brain.md) of notes you actually wrote is memory you own. Capture the facts that matter in JustJot.ai, and you can re-load any AI's working memory in seconds — feeding it exactly what it needs to see, instead of hoping it still remembers.
The decision rule: if an answer surprises you, first ask "is the thing I'm asking about still inside the window?" before you ask "is the model wrong?" Most of the time, it's the window.