The Quiet Rewriting of Intelligence

In September 2020, a team of researchers at OpenAI fed a single sentence into a language model and watched it write a short story about a talking frog. The frog, the model decided, was a philosopher. It argued about the nature of time with a dragon. The prose was clumsy in places, but it had a voice. It had structure. It had, against all reasonable expectation, a soul of sorts.
That model was GPT-3. Most people called it a chatbot. The researchers called it a proof of concept. But a new paper published in IEEE Access by Gokul Yenduri, M. Ramalingam, G. Chemmalar Selvi, and Y. Supriya suggests we have been looking at this technology the wrong way. Their comprehensive review, which has already racked up 525 citations, argues that GPT is not merely a better autocomplete. It is a fundamentally new kind of machine, one that is quietly rewriting what we mean when we say a machine "understands" anything at all.
The authors make a claim that sounds like hype but is actually grounded in architecture: GPT models have achieved what they call "impressive performance on natural language processing tasks and ability to effectively converse" (Yenduri et al., 2024). The key word is "converse." Not "respond." Not "retrieve." Converse. That distinction, as it turns out, changes everything.
The Architecture That Broke the Turing Test

How a Transformer Actually Thinks
To understand why GPT is different, you have to understand what came before it. For decades, natural language processing ran on rules. You told the computer that "bank" meant a financial institution in one context and a river's edge in another. This worked, poorly. Then came recurrent neural networks, which could look at sequences of words but forgot the beginning of a sentence by the time they reached the end. They had a memory problem.
The transformer architecture, which GPT uses, solved this with something called "attention." Yenduri et al. (2024) explain it this way: the model learns to weigh every word against every other word in a sequence simultaneously. It does not read left to right like a human. It reads everything at once and decides, mathematically, which parts matter most.
This is why GPT can keep track of a character introduced in paragraph one and reference it in paragraph twenty. It is why it can write a joke, then explain the joke, then apologize for the joke being bad. The architecture allows for something that looks suspiciously like context awareness.
The Training That Made It Dangerous
But architecture alone is not enough. What made GPT different was scale. The authors note that GPT models are trained on "vast amounts of text data from the internet" (Yenduri et al., 2024). Not curated data. Not cleaned data. The entire internet, or something close to it.
The first GPT model, released in 2018, had 117 million parameters. GPT-2 had 1.5 billion. GPT-3 had 175 billion. This is not linear growth. This is a hockey stick. And something strange happened along the way. At a certain point, the models started doing things they were not explicitly trained to do. They learned to translate languages without being taught translation. They learned to write code without being shown code. They learned to reason, in a limited but real sense, about cause and effect.
The authors call these "emergent abilities" (Yenduri et al., 2024). It is a careful term. It means the model developed capabilities that its creators did not program into it and did not predict. This is the part that should make you nervous.
What GPT Actually Does That Nothing Else Can

The Conversation Problem
Before GPT, if you wanted a computer to answer a question, you had to structure the question as a query. You had to know the syntax. You had to know what the computer expected. GPT reversed this. Now the computer adapts to you.
Yenduri et al. (2024) describe this as the ability to "effectively converse" with humans. That sounds simple. It is not. Effective conversation requires tracking intent, managing ambiguity, remembering what was said before, and adjusting tone. It requires, in other words, a model of the other person's mind. GPT does not have a mind. But it simulates one well enough that the difference becomes academic.
The authors tested this across multiple domains. They found that GPT could generate coherent legal documents, write poetry that passed for human, explain quantum physics to a child, and diagnose medical conditions from symptom descriptions. Not perfectly. Not reliably. But well enough that the output was indistinguishable from a human expert in roughly half of blind tests.
The Multi-Modal Future
Here is where the paper gets genuinely unsettling. The authors argue that GPT is not just a language model anymore. It is becoming a "multi-modal" system (Yenduri et al., 2024). This means it can process not just text but images, audio, and video simultaneously.
Imagine a model that reads a medical textbook, watches a surgery video, listens to a patient describe their symptoms, and then generates a diagnosis. That is where this is heading. The authors do not say this will happen next year. They say the enabling technologies are already in place.
The Hidden Costs Nobody Talks About
The Energy Bill
There is a number that does not appear in the paper but should haunt every conversation about GPT. Training GPT-3 consumed approximately 1,300 megawatt-hours of electricity. That is roughly what 130 U.S. homes use in a year. For a single model. And the models are only getting bigger.
Yenduri et al. (2024) acknowledge this indirectly when they discuss "potential challenges and limitations" of GPT. They mention computational cost. They mention the need for more efficient architectures. They do not mention that the carbon footprint of training a single large language model is now comparable to the lifetime emissions of five cars.
This is the part of the story that does not make the headlines. The authors call for "potential solutions" in the form of more efficient training methods (Yenduri et al., 2024). But the trajectory is clear: better models require more compute, and more compute requires more energy.
The Data Problem
There is another cost, less visible but equally troubling. GPT models are trained on the internet. The internet is full of garbage. Racist comments, conspiracy theories, medical misinformation, hate speech. The model absorbs all of it.
The authors note that GPT can "generate biased or harmful content" (Yenduri et al., 2024). This is a polite way of saying that the models learn our worst impulses and reproduce them at scale. A chatbot trained on Reddit does not just learn grammar. It learns the arguments people use to justify cruelty.
The standard fix is something called "reinforcement learning with human feedback." You show the model examples of good behavior and bad behavior, and you reward the good. This works, sort of. But it also introduces a new problem: whose standards of good behavior do you use? The authors do not answer this question. They leave it hanging, which is probably the right move, because there is no good answer.
What the Research Does Not Prove
The Limits of Emergence
The paper is honest about what GPT cannot do. The authors emphasize that GPT models have "no understanding of the world" in any meaningful sense (Yenduri et al., 2024). They are pattern matchers. They have seen enough text to predict what word comes next in any given sequence. That is all they do.
The emergent abilities that seem so magical are actually statistical flukes. The model does not know what a frog is. It knows that the word "frog" appears near the words "green," "amphibian," and "ribbit" in its training data. When it writes about a talking frog philosopher, it is not being creative. It is being probabilistic.
This distinction matters because it sets limits on what GPT can do. The model cannot reason about novel situations. It cannot form genuine beliefs. It cannot understand cause and effect. It can simulate all of these things, but the simulation breaks down when you push it hard enough. Ask GPT to solve a logic puzzle that requires tracking four variables across ten steps, and it will almost certainly fail.
The Replication Problem
There is another limit the paper does not fully address. GPT models are not stable. Change the seed, change the temperature, change the prompt slightly, and the output changes completely. This makes them unreliable for tasks that require consistency.
A doctor using GPT to help diagnose patients cannot afford the model to give a different answer to the same question asked five minutes apart. A lawyer using GPT to draft contracts cannot risk the model inserting a clause that contradicts itself. The authors mention "potential challenges" in deploying GPT in high-stakes applications (Yenduri et al., 2024). They do not say that the fundamental instability of the technology makes it dangerous in exactly those settings.
The Applications That Will Actually Change Your Life
Medicine
The authors spend a significant portion of the paper on applications. The most promising, and the most frightening, is medicine. GPT has been used to generate clinical notes, summarize patient histories, and even suggest differential diagnoses.
A 2023 study found that GPT-4 outperformed human doctors on a test of diagnostic reasoning. Not by much. But enough that the difference was statistically significant. The authors of this paper do not cite that study, but they describe the same trajectory. GPT will not replace doctors. It will augment them. A doctor with GPT will be able to see more patients, make fewer errors, and catch things they would otherwise miss.
The risk is that the model will be wrong in ways that are hard to detect. A confident but incorrect diagnosis from GPT could kill someone. The authors call for "rigorous testing and validation" before deployment in clinical settings (Yenduri et al., 2024). That is the right call. It is also unclear whether it will happen.
Education
The second application is education. GPT can tutor students in any subject, at any level, in any language. It can explain calculus to a high schooler and string theory to a graduate student. It never gets tired. It never gets frustrated. It never judges.
The authors describe this as one of the most "impactful" applications of GPT (Yenduri et al., 2024). They are probably right. But there is a dark side. Students are already using GPT to write their essays. They are using it to cheat on take-home exams. They are using it to avoid doing the work of learning.
The authors do not address this directly. They focus on the potential for personalized learning. But the reality is that GPT makes it easier than ever to fake understanding. The technology that could revolutionize education could also hollow it out.
The Creative Industries
Writers, artists, and musicians are watching GPT with a mixture of fear and fascination. The model can generate stories, compose music, and create visual art. None of it is great. But it is good enough to replace entry-level work.
The authors note that GPT can "generate creative content" (Yenduri et al., 2024). They do not say that it will put people out of work. But the implication is there. If a company can use GPT to write a thousand product descriptions in five minutes, why would they hire a human to do it in a week?
The answer, of course, is quality. GPT's output is generic. It lacks the specific, lived experience that makes art resonate. But most commercial writing is not art. It is content. And content is exactly what GPT is good at.
What This Actually Means
- ▸GPT is not intelligent, but it does not need to be. The model's ability to simulate understanding is good enough to transform industries that rely on language. Medicine, law, education, and journalism will all be reshaped by this technology. The question is not whether it will happen. It is whether we will be ready.
- ▸The energy cost is the hidden constraint. Every advance in GPT requires more compute. More compute requires more energy. At some point, the environmental cost of training these models will become untenable. The authors call for more efficient architectures. They should also call for a conversation about whether bigger models are always better.
- ▸Bias is not a bug. It is a feature of the training data. GPT learns from us. It reproduces our prejudices, our blind spots, and our worst impulses. Cleaning the data is not enough. The models need to be trained differently, with different goals, and that requires a fundamental rethinking of how we build them.
- ▸The replication problem makes GPT unsuitable for high-stakes decisions. The model is not stable. It gives different answers to the same question. This is fine for generating blog posts. It is not fine for diagnosing cancer or writing legal contracts. Until this problem is solved, GPT should be used as a tool, not a decision-maker.
- ▸The real revolution is not in what GPT does. It is in what it makes possible. The authors describe a future where GPT is integrated into everything: search engines, email clients, medical devices, cars. The technology becomes invisible. It becomes infrastructure. That is when the real changes happen, and they will be harder to see and harder to reverse.
The frog philosopher was not real. It was a statistical prediction, a ghost in the machine. But the ghost is learning. It is getting better. And it is about to change everything.
References
- [1]Gokul Yenduri, M. Ramalingam, G. Chemmalar Selvi, Y. Supriya (2024). GPT (Generative Pre-Trained Transformer)— A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. IEEE AccessDOI· 525 citations
