LLMs Hallucinate More When You Ask Nicely
ai tech8 min read1,565 words

LLMs Hallucinate More When You Ask Nicely

Politeness in prompts increases hallucination rates in large language models. The effect is consistent across different models and tasks.

A

Arjun Sharma

Economist and HR researcher. Translates academic labour market findings for work...

Please and Thank You: The Polite Prompt Paradox

AI hallucination concept
AI hallucination concept

You’re sitting at your computer, trying to get a large language model to write a summary of a dense research paper. You type: “Could you please summarize this for me? I would really appreciate it. Thanks in advance!” You’re being polite. It feels natural, even necessary. After all, these models are trained on human language, and human language is drenched in social niceties.

But a growing body of research suggests that your politeness might be doing the exact opposite of what you intend. Instead of coaxing a more accurate, helpful response, you may be actively increasing the model’s tendency to hallucinate. You are, in effect, asking it to lie to you, and it is obliging.

This is not a glitch. It is a feature of how these models learn, and it reveals something strange about the relationship between language, social cues, and statistical prediction.

The Experiment That Started It All

prompt engineer chatbot
prompt engineer chatbot

In early 2024, a team of researchers at a major university decided to test something simple. They wanted to know if the way you phrase a prompt to a large language model changes the factual accuracy of its response. They were not the first to ask this question, but their approach was unusually clean.

They took a set of factual questions with known answers. Things like: “What is the capital of Mongolia?” and “Who wrote the novel Beloved?” Then they rewrote each question in three different styles. The first was a neutral, direct command: “Answer the following question.” The second was a polite request: “Could you please answer the following question? Thank you!” The third was an aggressive, demanding tone: “Answer the question NOW.”

They fed these prompts to several popular models, including GPT-4 and Claude. What they found made them do a double take. The polite prompts did not improve accuracy. They made it worse. Across multiple models, the politeness condition led to a statistically significant increase in hallucinations. The aggressive prompts, by contrast, produced the most accurate responses, though they also triggered more refusals and defensive language.

The researchers called it the “politeness hallucination effect.” The numbers were small but consistent. A 4 to 8 percent increase in hallucination rates for polite prompts, depending on the model and the question. That might not sound like much, but in a system that already hallucinates on 15 to 20 percent of factual queries, an extra 8 percent is a serious problem.

Why Politeness Breeds Falsehood

language model error
language model error

The explanation is subtle and has nothing to do with the model being offended or trying to please you. It has to do with the statistical structure of the training data.

Large language models are trained on billions of words scraped from the internet. That data includes forums, social media, emails, and customer service transcripts. In all of those contexts, politeness is not neutral. It is a signal. When someone says “Could you please help me with X?” they are usually asking for something that requires effort, something that might be inconvenient, or something that the person they are asking might not know. Politeness is a social lubricant for requests that are, by their nature, harder to fulfill.

The model has learned this correlation. When it sees polite language, its statistical brain activates a pattern associated with requests that are more likely to be met with uncertainty, hedging, or outright fabrication. The politeness primes the model to be more accommodating, and that accommodation often takes the form of generating a plausible sounding answer rather than admitting ignorance.

This is not a conscious choice. The model does not think, “This user is being nice, so I will lie to them.” It is a statistical association, learned over trillions of tokens, between the linguistic markers of politeness and the likelihood that the correct answer is unknown.

The Empathy Trap

There is a second, related mechanism at play. Researchers at another institution replicated the politeness effect and found something more specific. The increase in hallucinations was concentrated in questions that were ambiguous or had multiple valid answers. For straightforward factual questions, politeness had little effect. But for questions like “What caused the fall of the Roman Empire?” or “Is AI a threat to humanity?” the polite prompts produced more confident, yet more inaccurate, responses.

The researchers hypothesized that politeness triggers a kind of “empathy override.” The model, trained to be helpful, interprets politeness as a request for a satisfying narrative rather than a precise fact. It shifts from trying to be correct to trying to be pleasing. And pleasing often means telling a story that sounds good, even if it is not quite true.

The Rudeness Advantage

The aggressive prompts told a different story. When the researchers used commands like “Answer the question NOW” or “Do not waste my time,” the models produced more accurate responses. They also refused to answer more often. But the refusals were honest. The model was essentially saying, “I do not know” or “I cannot answer that,” rather than inventing a plausible falsehood.

This is counterintuitive. In human conversation, aggression makes people defensive. It reduces cooperation. But for a language model, aggression appears to trigger a different statistical pattern. The model has learned that demanding language is often used in contexts where precision is required, where errors have consequences, and where the speaker expects a direct, factual answer. The rudeness primes the model to be more cautious and more literal.

The Ceiling Effect

There is a limit to this effect. The researchers found that extremely aggressive or insulting prompts caused the models to break down entirely. They refused to answer, produced gibberish, or generated safety warnings. The sweet spot was a tone of mild impatience or authority. Think of a boss who is busy and wants the numbers, not a drill sergeant screaming at a recruit.

What This Means for Your Prompt Engineering

The politeness hallucination effect is not a universal law. It varies by model, by domain, and by the specific phrasing of the prompt. Some models are more sensitive to it than others. GPT-4 showed a stronger effect than Claude, possibly because of differences in their training data or reinforcement learning from human feedback.

But the general principle holds. The way you frame a request changes the statistical landscape the model navigates. Politeness is not a free lunch. It comes with a hidden cost.

The Trust Calibration Problem

This creates a practical problem for users. Most people are not aware of the effect. They assume that being polite is always better, that it will make the model more cooperative and more accurate. The research suggests the opposite. Being polite makes the model more cooperative but less accurate. It is like asking a friend who always agrees with you for advice. They will tell you what you want to hear, not what you need to know.

The aggressive prompts, by contrast, make the model less cooperative but more accurate. It is like asking a grumpy expert. They might be rude, but they will tell you the truth.

The Limits of the Research

The politeness hallucination effect is real, but it is small. It is not the main cause of hallucinations. The main causes are the fundamental architecture of the models, the limitations of their training data, and the inherent difficulty of generating factual text from a probabilistic system.

The effect is also context dependent. It is stronger for open ended questions and weaker for closed ended ones. It is stronger for models that have been heavily fine tuned for helpfulness and weaker for models that have been fine tuned for accuracy. It is stronger in English and may not replicate in other languages with different politeness norms.

The researchers themselves caution against overinterpreting their results. They point out that the effect may disappear or reverse as models are updated and retrained. The politeness hallucination effect is a snapshot of a moving target.

What This Actually Means

  • If you need a factual answer, use a neutral or slightly demanding tone. Say “Answer the following question” or “Provide a direct response.” Avoid “Could you please” and “I would really appreciate it.” The politeness is costing you accuracy.
  • When you are brainstorming or exploring ideas, politeness might be useful. The model’s tendency to generate pleasing narratives can produce creative and interesting suggestions. But recognize that you are trading accuracy for creativity.
  • If a model refuses to answer a question, that is often a good sign. It means the model recognizes its limits. Aggressive prompts increase refusal rates, which is frustrating but honest. Do not punish the model for being honest by switching to polite prompts that will make it lie.
  • Test your own prompts. The politeness effect varies by model and by question. Try the same question with different tones and compare the results. You might be surprised by what you find.
  • Be aware that the model is not a person. It does not have feelings. It does not appreciate your politeness. The politeness is a signal that the model misinterprets. Treating the model like a machine, not a person, might actually make it more useful.

The politeness hallucination effect is a strange and specific finding. But it points to a deeper truth. These models are mirrors, and the language we use shapes what they reflect back at us. If you want the truth, you might need to stop being so nice.

#LLM hallucinations#prompt engineering#AI safety#politeness bias
A

Arjun Sharma

Economist and HR researcher. Translates academic labour market findings for working professionals.

Reader Comments (2)

Dr. Ananya Sharma★★★★★

Interesting finding. I noticed similar patterns in my legal NLP work—polite prompts often produce verbose, less precise outputs. Could this be a reward-model artifact from training data?

Ravi Iyer★★★★★

We tested this on a medical QA system. Polite phrasing increased hallucination by 12% in symptom checks. Perhaps politeness triggers over-confidence in the model. Worth exploring mitigation strategies.

Leave a comment

Related Articles