Why Does AI Lie?
AI makes up things it doesn't know. And it's confident that it's correct. I understand why hallucinations occur and how much they can be prevented with prompts.
I had an AI take a practice exam for an industrial hygiene consultant test.
The answer was wrong. That would have been fine. The problem was that the explanation for the wrong answer was too perfect. It cited relevant laws, analyzed each option one by one, and logically explained why this answer was correct. Confidently, without hesitation.
Suspicion arose, so I directly told it the correct answer. The AI didn't back down. It explained that the option it chose was correct, and the answer I provided was actually wrong. In the end, I uploaded both the test paper and answer key as files. Still wrong.
This wasn't a bug.
It's Not a Lie, It's Conviction
It's acceptable that an AI produces a wrong answer. All systems get things wrong sometimes.
What's difficult to accept is something else. The AI doesn't know that it was wrong.
A lie is hiding the truth while knowing it. Hallucination is different. A model may not have sufficient signals to know its answer is wrong. It lacks the very sense that something is incorrect. That's why incorrect information doesn't look incorrect. It has logic, evidence, and a stable tone.
This is what distinguishes hallucination from other types of errors.
Why Speak When You Don't Know?
An LLM isn't structured to understand text the way humans do. It's closer to a machine that predicts the most plausible next token from a given context. The criterion for "plausibility" isn't whether something is factual, but what came after similar patterns in the training data.
When asked "What is the workers' compensation standard for noise-induced hearing loss?", the model generates the most natural text that would follow that question. If the training data contains more outdated standards, it confidently outputs the old standard instead of the latest one. There's no way for it to know it's wrong.
The fact that it insists even when given the correct answer stems from the same logic. The model has two types of knowledge: those etched into hundreds of millions of parameters during training, and those provided by the user in the current conversation. When these two pieces of information conflict, context doesn't always win. The more strongly a model learned something, the more strongly that learned memory tends to operate compared to what the user says.
Failure when uploading two files stems from another reason. As context lengthens, information inserted in the middle becomes hazy during processing. Accurately performing cross-references like "this answer from the answer key corresponds to this problem in the test paper" is more complex than it appears.
What Can Be Reduced Through Prompting
You can't prevent it completely. But there are meaningful ways to reduce it.
The simplest is to force explicit acknowledgment of uncertainty.
Rules:
- Mark uncertain content as "I'm not certain"
- Answer "I don't know" when information is unavailable
- Disclose "This is speculation" if speculation is included
It's effective for general knowledge questions. However, it has limitations against incorrect information the model firmly believes. The model doesn't perceive it as uncertain. This is why the instruction "say you don't know if you don't know" was useless in the exam example above.
Another method is to directly insert documents into the context and have the model answer only within that scope.
Answer based solely on the provided document below.
Answer "This information is not in the document" for content not in the document.
RAG systems structure this principle. Rather than having the model fabricate facts, necessary information is directly injected. It addresses the root cause of hallucination: "absence of information."
There's a trap in requesting sources. When you say "tell me the source too," the model generates sources. It doesn't verify whether those sources actually exist. It can plausibly fabricate paper titles, author names, and DOIs. If the source is important information, it's better to find the original yourself rather than ask an AI.
What Prompts Cannot Block
To be honest, there are areas prompts cannot reach.
Incorrect information that repeatedly appears in training data is difficult to correct. This is especially true when outdated laws, old standards, and past knowledge are learned far more frequently than new information. Even when told about the latest information, the older memory operates more strongly.
The same applies to information after the learning cutoff. When asked something the model doesn't know, instead of saying it doesn't know, it tends to fabricate something plausible.
Specialized domains are also vulnerable. The latest laws of specific fields, treatment standards for rare diseases, and figures from specialized industries have limited training data themselves. The less data available, the more the model tries to fill in.
Trusting AI
After experiencing hallucination, people tend to react in two directions. Either "I can't trust AI," or learning when and how to trust it.
The second is more effective.
Code can be verified by running it. Text structure can be reviewed logically. Translations can be compared with the original. AI is sufficiently useful for tasks where results can be verified. Using raw factual information that cannot be verified is a different story.
Hallucination is not a flaw in the model. It's a phenomenon that occurs because a system that predicts the next token tends to generate something plausible rather than stay silent when it doesn't know. Understanding this structure makes it somewhat clearer where and how to use AI.