DECHIVE
DECHIVE
← Archive
AI/

How Does an LLM Generate Answers

How LLMs Generate Text — Understanding the principles of token prediction, attention mechanisms, and temperature reveals why prompts work.

Understanding How LLM Actually Works

Everyone has the same experience when first using AI. Some days you get surprisingly accurate answers, other days you ask the same question and get completely different responses. You wonder if the model really understands and answers, if you asked the question wrong, or if the answers just happen to differ.

This confusion arises when you don't know how AI works. Once you understand what an LLM actually does, you start to see why the same question produces different answers, why some prompts work and others fail.


LLM Does Not Read Sentences Like Humans Do

The way LLMs process text is completely different from how humans read text.

When humans read a sentence, they grasp the overall context and intent simultaneously. LLMs don't work that way. Instead, they repeat one simple task: predicting the most plausible next piece, a token, that comes after all the text entered so far.

When predicting the word that comes after "To write a good prompt," the LLM calculates probabilities based on how this pattern continued in the training data. It selects a word with high probability, appends that word, and predicts the next word again. As this process repeats, a sentence is formed.

Here's the important point: the LLM doesn't "know" the correct answer and choose that word. It chooses with high probability because that pattern appeared frequently enough in the training data. This difference might seem small, but it carries very important implications when designing prompts.

Text is converted into numbers inside the AI. The word "apple" has coordinates in a vector space of thousands of dimensions. In this space, "apple" is positioned close to words like "fruit," "red," and "tree." The LLM predicts the next word based on these mathematical relationships.


Every Word Sees Every Other Word

The core of the Transformer architecture is the Attention mechanism.

Previous models processed text sequentially. Processing the front word, then the next word. In long sentences, information from the front became dimmer as you moved toward the end—this was a problem.

Attention works differently. All words in a sentence simultaneously look at each other. In the sentence "The cat jumped over the fence," the verb "jumped" simultaneously calculates how much attention it should pay to both the subject "cat" and the object "fence." What emerges from this calculation is the attention score—a weight indicating how much focus each word should place on other words.

What this means practically is this: the longer your prompt, the more relatively less prominent important information becomes. The phenomenon where AI seems to "forget the rules you mentioned earlier" can be understood in this context. This is also why giving instructions in a short, structured manner is effective.


Why the Same Question Produces Different Answers

When predicting the next word, an LLM doesn't determine a single answer but creates a probability distribution of possible words and selects from it.

The variable that controls this selection process is Temperature.

When temperature is low, the word with the highest probability is selected almost deterministically. Responses are consistent and predictable. It's suited for tasks like coding or fact-checking.

When temperature is high, words with slightly lower probabilities also have a chance of being selected. Creative ideas emerge, but simultaneously, the likelihood of plausibly fabricating false content increases.

Most APIs let you set this value directly. When using general interfaces like ChatGPT or Claude, the temperature is managed internally, but the reason you get slightly different answers when sending the same prompt multiple times is because of this sampling process.


Why Prompts Work

Once you understand these principles, you see why prompts work.

A prompt is not a means of injecting new knowledge into the AI. It's a means of adjusting the probability distribution of the next token in the direction you want, drawn from the vast data already learned. Assigning a role, providing context, and specifying output format are all methods of sculpting this probability distribution.

For example, if you say "write a blog post," the model chooses the next sentence from too broad a range of possibilities. But if you say "explain the event concept in GA4 for beginner developers, in short paragraphs," the range of possibilities narrows. The model hasn't acquired completely new knowledge; rather, the direction it should reference when generating answers has become clearer.

The fact that LLM is not a sentient being that understands sentences but a mathematical structure that calculates probabilities might sound disappointing at first. But once this perspective clicks, paradoxically, more becomes possible. From that point on, an LLM becomes less a vague magic and more a tool that you can understand and handle with principle.