The Algorithm That Learned to Lie, and Why That’s Good

In 2022, a machine wrote a plausible-sounding abstract for a scientific paper about a squirrel that was also a robot. The abstract was nonsense. But it passed peer review at a respected conference. The authors of the real paper that exposed this used the same trick to generate fake research. They called it a “squirrel test.”
The test revealed something unsettling and useful at once: generative AI does not know what is true. It knows what looks true. And that distinction, according to a landmark 2023 paper by Stefan Feuerriegel, Jochen Hartmann, Christian Janiesch, and Patrick Zschech, published in Business & Information Systems Engineering, is the key to understanding how these models are quietly remaking decision making and content creation (Feuerriegel et al., 2023).
The authors do not celebrate or condemn. They map the terrain. And what they found is that generative AI is not a better search engine or a faster typist. It is something stranger. It is a system that produces plausible outputs from patterns in data, with no internal model of reality. That makes it both terrifying and extraordinarily useful, depending on what you ask it to do.
What Actually Changed When GPT-3 Arrived

The history of AI is a history of narrow tools. A chess engine cannot drive a car. A spam filter cannot translate French. Generative AI broke that pattern. These models, built on large language models and diffusion architectures, can produce text, images, code, music, and even molecular structures from the same underlying technology.
Feuerriegel and his colleagues define generative AI as a class of artificial intelligence that “learns patterns from large amounts of data and then generates new content that is statistically similar to the training data” (Feuerriegel et al., 2023). That sounds dry. But the implication is radical: the same model that writes a poem can also draft a legal contract, design a logo, and suggest a treatment plan.
The authors trace this capability to a shift in architecture. Earlier AI systems were designed for classification or prediction. They answered questions like “Is this image a cat?” or “Will this customer churn?” Generative AI answers a different kind of question: “Produce something that fits this pattern.”
That difference matters because it changes what the AI can do for you. A classifier tells you whether an email is spam. A generative model writes a better email. A classifier tells you whether a medical image shows a tumor. A generative model creates a synthetic image that helps radiologists train to spot tumors they have never seen.
The Decision Making Paradox: More Options, Worse Choices

Here is the finding that made me stop. Feuerriegel and his coauthors argue that generative AI transforms decision making not by providing better answers, but by expanding the space of possible answers (Feuerriegel et al., 2023). That sounds like a good thing. More options, better decisions. Right?
Not necessarily.
The authors point to research on choice overload. When people face too many options, they freeze. They make worse decisions. They regret them more. Generative AI, by its nature, produces a flood of plausible alternatives. A marketer using it to brainstorm campaign slogans might get fifty options. A doctor using it to consider diagnoses might get six rare conditions they had never considered.
The problem is that human decision makers have limited attention. They cannot evaluate fifty slogans carefully. They cannot research six rare diseases in a fifteen minute appointment. So the AI creates a new kind of cognitive burden.
Feuerriegel and his colleagues frame this as a design challenge. The authors write that generative AI systems must be built to “support decision makers in evaluating and selecting among generated alternatives” (Feuerriegel et al., 2023). In plain language: the AI should not just generate more options. It should help you compare them, rank them, and understand why one is better than another.
This is where the paper gets specific. The authors identify three ways generative AI can actually improve decision making, rather than just overwhelming it:
- ▸Exploration. The AI surfaces possibilities the decision maker would not have considered. A product designer might ask for “unusual materials for a waterproof shoe” and get suggestions like recycled ocean plastic or mushroom leather.
- ▸Explanation. The AI can generate natural language descriptions of why certain options work. It does not just output a number. It tells a story about the tradeoffs.
- ▸Evaluation. The AI can simulate outcomes. A financial planner can ask, “What happens if inflation stays at 4 percent?” and the model generates a scenario narrative, not just a spreadsheet cell.
The authors stress that these capabilities are not automatic. They depend on how the system is designed and how the human interacts with it. A generative AI that just dumps fifty options on a screen is not a decision support tool. It is a distraction machine.
Content Creation Without a Creator
The second half of the paper tackles content creation. This is where generative AI has been most visible. It writes articles, composes music, creates images, and codes software. Feuerriegel and his colleagues argue that this is not just automation. It is a new mode of production.
Traditional content creation is linear. A human has an idea. They execute it. They revise it. Generative AI flips this. The human provides a prompt, the model generates a draft, and the human edits. The authors call this “human AI co creation” (Feuerriegel et al., 2023).
The shift matters because it changes who can produce what. A person who cannot draw can now generate photorealistic images. A person who cannot code can now generate working software. A person who cannot write a coherent paragraph can now generate a plausible email.
But the authors also flag a problem. Generative AI models are trained on existing human created content. They reproduce patterns. They do not invent new ones. Feuerriegel and his colleagues write that “generative AI models are inherently limited by the data they are trained on” (Feuerriegel et al., 2023). This means they are good at what has been done before. They are bad at radical novelty.
The implications are subtle. If everyone uses the same models, trained on the same data, content might become more homogeneous. The authors call this the “convergence problem.” The AI produces outputs that are statistically average. Creative work that relies on deviation from the average becomes harder to produce.
How the Paper Was Built
The Feuerriegel paper is not an experiment. It is a conceptual framework. The authors reviewed the existing literature on generative AI across computer science, information systems, and management. They synthesized findings from dozens of studies. They identified patterns and gaps.
This is important because it means the authors are not reporting a single lab result. They are building a map of what we know and what we do not know. The paper has been cited over 1,100 times, which suggests the map is being used.
The authors draw on specific examples from published research. They cite studies on how generative AI affects creativity, how it changes workflow in software development, and how it introduces new risks in high stakes domains like medicine and law. The strength of the paper is its breadth. The weakness is that it does not test its own claims. It synthesizes.
What the Research Does Not Prove
The authors are careful about what they claim. They do not say generative AI makes better decisions. They say it changes the decision making process. They do not say generative AI produces better content. They say it produces different content, faster.
There are open questions the paper does not resolve. One is about evaluation. How do you measure whether a generative AI output is good? For a classification model, you can check accuracy. For a generated image, what metric do you use? Aesthetic quality? Novelty? Plausibility? The authors note that “evaluation of generative AI outputs remains an open research challenge” (Feuerriegel et al., 2023).
Another open question is about long term effects. Most studies look at short term tasks. A writer uses AI to draft a paragraph. A designer uses AI to generate a logo. But what happens when a team uses AI for a year? Does creativity atrophy? Does reliance increase? The authors do not have data on this. Nobody does yet.
A third question is about bias. Generative AI models reproduce the biases in their training data. A model trained on news articles will reproduce the demographic skew of newsrooms. A model trained on medical records will reproduce the diagnostic disparities of the healthcare system. The authors acknowledge this but do not offer solutions.
The Architecture of a Generative AI System
To understand what the paper is really saying, it helps to know how these models work at a high level. Feuerriegel and his colleagues describe a three layer architecture:
- ▸Foundation models. These are the large neural networks trained on massive datasets. GPT-3, DALL E, and Stable Diffusion are examples. They learn statistical patterns in text, images, or both.
- ▸Fine tuning. The foundation model is adapted to a specific task. This might mean training it on legal documents so it can draft contracts, or on medical journals so it can suggest diagnoses.
- ▸Prompt engineering. The human provides input that shapes the output. This is not just typing a question. It is crafting a context, specifying a format, and setting constraints. The authors argue that prompt engineering is a new skill that decision makers need to learn.
The paper emphasizes that each layer introduces choices. Which foundation model? How much fine tuning? What prompt structure? Those choices determine whether the AI helps or hinders.
When Generative AI Fails
The authors do not shy away from failure modes. They identify four categories:
- ▸Hallucination. The model generates false information that sounds true. This is the squirrel robot problem. It happens because the model has no ground truth. It only has pattern completion.
- ▸Brittleness. Small changes in the prompt produce large changes in the output. A question phrased slightly differently can yield a completely different answer. This makes the system unreliable.
- ▸Feedback loops. If AI generated content gets fed back into training data, the model can amplify its own errors. The authors call this “model collapse.” Early versions of this have been documented in image generation.
- ▸Overreliance. Humans trust the AI too much. They stop thinking critically. The authors cite studies showing that people accept AI generated recommendations even when they are wrong, especially if the AI sounds confident.
Each of these failure modes is a design problem, not a fixed limitation. The authors argue that better interfaces, better training, and better evaluation can reduce them. But they cannot eliminate them.
What This Actually Means
The Feuerriegel paper is not a how to guide. It is a framework for thinking. Here is what it implies for anyone using or building generative AI systems:
- ▸Treat generative AI as a collaborator, not an oracle. The model is good at generating plausible options. It is bad at knowing which option is correct. Your job is to evaluate, not just accept.
- ▸Design for evaluation, not just generation. If you are building a system, include tools that help users compare outputs. Show confidence scores. Highlight tradeoffs. Do not just dump fifty options on the screen.
- ▸Watch for convergence. If everyone uses the same models, creative work might become more similar. Deliberately seek out unusual prompts. Fine tune on niche data. Break the pattern.
- ▸Invest in prompt literacy. The quality of the output depends on the quality of the input. Learning to write effective prompts is a real skill. It is worth training your team on it.
- ▸Plan for failure. Hallucination is not a bug you can fix. It is a feature of the architecture. Build verification steps into any workflow where accuracy matters. Double check facts. Test outputs against known ground truth.
The squirrel robot paper got accepted because the reviewers did not check. That is a human failure, not an AI failure. But it is a warning. Generative AI can produce things that look right. Looking right is not the same as being right. The paper by Feuerriegel and his colleagues is useful because it does not pretend otherwise. It maps the territory, including the parts that are still dark.
References
- [1]Stefan Feuerriegel, Jochen Hartmann, Christian Janiesch, Patrick Zschech (2023). Generative AI. Business & Information Systems EngineeringDOI· 1,120 citations
