Context: What the Model Holds in Mind at Once

Context is all the text the model sees in front of it when it generates an answer: your prompt, attached files, conversation history, instructions. The model has no memory between calls — it knows exactly what's in the context right now, and nothing more. Understanding this is critical: almost all "the model didn't get it / forgot / mixed things up" problems are context problems.

The context window

Every model has a context window — a limit on how many tokens it can hold at once (input and output together). It's like the size of a desk: what's on it is what the model works with; what didn't fit doesn't exist for it.

Windows vary — from thousands to hundreds of thousands and millions of tokens. A large window lets you give the model more (whole files, long history), but "large" doesn't mean "should be filled to the brim".

Why the model "forgets"

Two different mechanisms that beginners confuse:

Between conversations the model remembers nothing at all. A new chat is a blank slate. What you discussed yesterday has to be given again (this is what skills and persistent memory solve — they put the needed things into context automatically).
Within a long dialogue everything that still fits in the window is available to the model — but when history overflows the window, the old starts getting pushed out, and the model "forgets" the beginning of the conversation.

So "the model forgot what I asked at the start" in a long dialogue isn't a glitch — it's window overflow.

More context ≠ better

The temptation is to dump everything into the model: the whole repository, all the docs, the entire conversation. In practice this hurts:

The important gets diluted. Among 50 files, the three needed ones get lost; the model finds relevant material worse in noise. This is sometimes called "lost in the middle" — what sits in the middle of a huge context the model accounts for worse than the beginning and the end.
More expensive and slower. Every extra token is money and time.
Higher risk of error. More contradictory material means a higher chance the model latches onto the wrong thing.

The product engineer's rule: give relevant context, not maximal. Three needed files beat fifty "just in case".

How context is managed

Managing context is a key skill of working with AI:

Select the relevant — only the pieces of code, docs, and requirements that relate to the task.
Give facts, not "recall it yourself" — grounding in data right in the context sharply reduces hallucinations.
Pull context on demand via tool calling: instead of pouring everything in at once, let the model fetch what it needs (code search, a database query) at the moment it's needed.
Don't drag endless history — on long dialogues, start a new one with a short summary of what's needed.
Automate the repetitive — what the model should always know (style, project rules) goes into skills and memory, so you don't insert it by hand every time.

What this means in practice

Context is the model's working memory, and the quality of the answer is largely determined by what you put into it. A product engineer thinks not "what question to ask" but "what should be in front of the model's eyes so it answers well": the right files, facts, rules — and nothing extra. Managing context matters more than phrasing the prompt nicely.

What's next

Context can be filled not only by hand but by letting the model fetch what it needs itself — via tool calling. And when the model, in a loop, decides for itself what to fetch and what to do — that's agents.