How LLMs Actually Process Your Words

To write effective prompts, you need a working mental model of what happens between sending your text and receiving a response. You do not need a PhD in machine learning — you need the right level of abstraction.

Tokens, Not Words

LLMs do not read words. They read tokens — fragments of text that might be whole words, parts of words, or punctuation. The word "understanding" might be two tokens: "under" and "standing." The word "cat" is one token. A line of code might be six tokens.

This matters because models have token limits, pricing is per-token, and the way text is split into tokens affects how the model interprets meaning. A prompt that looks short to you might be expensive in tokens, and vice versa.

The Attention Mechanism

When processing your prompt, the model does not read it left-to-right like a human reading a sentence. Instead, every token "attends" to every other token simultaneously, building a rich web of relationships. The word "bank" takes on different meaning depending on whether "river" or "money" appears nearby — and the model captures this through attention.

This is why position and context within your prompt matters. Instructions placed near relevant content are more strongly associated than instructions separated by walls of text.

Probability, Not Understanding

The model generates its response one token at a time, selecting the next token based on probability distributions shaped by your prompt. It is not reasoning about your request — it is computing the most likely continuation given its training and your input.

This distinction is practical, not philosophical. It explains why models confidently produce wrong answers, why rephrasing a question changes the answer, and why explicit structure in your prompt leads to more predictable outputs.

The Useful Simplification

Think of an LLM as a sophisticated pattern-completion engine that has absorbed the patterns of human text. Your prompt sets the pattern. The more clearly your prompt establishes the pattern you want continued, the more useful the output will be.