Tokens and Cost: What You Pay For When Working With AI

Working with AI is measured and billed not in words and not in requests, but in tokens. You need to understand them for two reasons: cost is built from tokens (sometimes surprisingly large), and context — how much the model can "hold" at once — is measured in tokens.

What a token is

The model doesn't work with letters or words directly — it breaks text into tokens, chunks of roughly 3–4 characters. A short, common word ("cat", "the") is one token; a long or rare word is split into several. Code is tokenized too: brackets, indentation, variable names — all tokens.

A rough guide: one token ≈ 0.75 of a word for English; for other languages and for code there are usually more tokens for the same text. A page of text is on the order of a few hundred tokens.

You don't need to know the exact numbers by heart — what matters is a feel for scale: a long document or a large code file handed to the model is thousands of tokens, and they aren't free.

Input and output tokens

Every call to the model counts tokens on two sides:

Input — everything you sent: your prompt, attached files, conversation history, system instructions. The model "reads" all of it every time.
Output — everything the model generated in reply.

Usually output tokens cost more than input (generation is heavier). And an important non-obvious point: in a dialogue, history accumulates — each new request drags the entire previous conversation along as input tokens. A long conversation gets more expensive with every turn, even if your replies are short.

How cost adds up

Price is charged per token (usually per million tokens, separately for input and output). Practical implications for a product engineer:

Large context costs money. Handing the model the whole repository "just in case" is expensive and often bad for quality (see context). Give the relevant, not everything.
Long dialogues accumulate cost. Sometimes it's cheaper to start a new conversation from scratch than to drag a huge history along.
Automation multiplies the price. One request costs pennies, but an agent doing hundreds of steps in a loop, or processing thousands of items, adds up to real money. Estimate the cost at scale before running.
Different model sizes, different price. A powerful model costs more per token. Not every task needs the strongest one — give the simple stuff to a smaller model.

Tokens are also speed

Cost isn't the only consequence. The model generates its answer token by token, so long output takes longer. And a large input takes the model longer to "read". So bloated context hits on three fronts at once: more expensive, slower, and often worse in quality.

What this means in practice

Tokens are the unit of both money and the model's attention. A product engineer keeps them in mind as a resource: give the model exactly what the task needs, watch out for dialogues bloating, estimate the cost of automations at scale, and choose model size to fit the task. Saving tokens almost always goes hand in hand with better quality — a short, relevant input beats a huge "just in case" one.

What's next

Tokens are also the measure of how much the model holds at once. That's context and the context window. And how models are given access to external data instead of bloating the input — tool calling.