ChatGPT Shows Promise and Peril in Healthcare

The Doctor Will Read Your Chart Now. But Should He Trust the Bot?

Here is a paradox that should unsettle anyone who has ever googled a symptom: the same large language model that can pass the United States Medical Licensing Exam can also fabricate a patient’s medical history, cite a nonexistent study, and sound completely confident while doing both. Malik Sallam, a researcher at the University of Jordan, wanted to know exactly how much promise and how much peril ChatGPT brings to healthcare. After systematically reviewing 60 studies, he found that 85 percent of them described benefits of the chatbot, but nearly 97 percent also raised serious concerns (Sallam, 2023). That gap between enthusiasm and caution is not a bug. It is the whole story.

Sallam’s review, published in the journal Healthcare, is not a single experiment but a meta level look at the first wave of research on ChatGPT in medicine. He searched PubMed and Google Scholar for English language records, both published papers and preprints, that examined ChatGPT in healthcare education, research, or practice. He ended up with 60 eligible records, then coded them for benefits and concerns. The method is straightforward: count what people are actually finding and flagging, then see where the field stands.

What he found is a portrait of a tool that is simultaneously brilliant and dangerous.

The Good: What ChatGPT Does Well Enough to Scare the Old Guard

The benefits Sallam identified fall into four buckets, and each one challenges how medicine currently works.

Scientific writing and research equity. ChatGPT can draft sections of a paper, summarize literature, and even generate code for data analysis. For researchers in low resource settings who lack access to expensive statistical software or professional editors, this is a leveling force. Sallam notes that the chatbot can “improve scientific writing and enhance research equity and versatility” (Sallam, 2023). A researcher in a small university without a writing center can now produce prose that reads like it came from a high end lab. That is not trivial.

Data analysis and drug discovery. The model can handle large datasets, write analysis scripts, and even suggest drug candidates. Sallam found records citing ChatGPT’s utility in “efficient analysis of datasets, code generation, literature reviews, saving time to focus on experimental design, and drug discovery and development” (Sallam, 2023). In a world where most biomedical data goes unanalyzed because there are not enough statisticians, this matters.

Clinical workflow and personalized medicine. In practice settings, ChatGPT can streamline documentation, generate patient summaries, and help personalize treatment recommendations. Sallam lists benefits including “streamlining the workflow, cost saving, documentation, personalized medicine, and improved health literacy” (Sallam, 2023). Imagine a physician who spends two hours a night on charting suddenly freed to talk to patients.

Medical education. This is where the promise gets most concrete. ChatGPT can act as a personalized tutor, generating practice questions, explaining concepts in different ways, and letting students focus on critical thinking rather than memorization. Sallam found that educators see potential for “improved personalized learning and the focus on critical thinking and problem based learning” (Sallam, 2023). A student who does not understand a concept can ask the chatbot to explain it five different ways, at any hour, without judgment.

The Bad: Why 97 Percent of Studies Are Worried

Here is where the story turns. Almost every single study Sallam reviewed raised a red flag. The concerns are not minor. They are fundamental.

Hallucination and inaccurate content. ChatGPT does not know what it does not know. It generates text that sounds authoritative but can be completely wrong. Sallam found “inaccurate content with risk of hallucination, limited knowledge, incorrect citations” (Sallam, 2023). In medicine, a hallucination is not a quirky mistake. It is a patient getting the wrong drug or a surgeon planning the wrong incision.

Bias and plagiarism. The model trains on the internet, which means it absorbs every bias present in human generated text. It can reproduce racist, sexist, or otherwise harmful patterns. It also does not cite its sources in a way that satisfies academic standards. Sallam flags “the risk of bias, plagiarism, lack of originality” (Sallam, 2023). If a student submits a ChatGPT generated essay, is that plagiarism? If the chatbot reproduces a biased treatment recommendation, who is responsible?

Ethical and legal quicksand. Who owns the output? Who is liable when the advice is wrong? Sallam lists “ethical, copyright, transparency, and legal issues” as core concerns (Sallam, 2023). The current framework for medical liability assumes a human made the decision. ChatGPT breaks that assumption completely.

Cybersecurity and infodemics. The same model that can help a doctor write a note can also generate convincing misinformation at scale. Sallam warns of “cybersecurity issues, and risk of infodemics” (Sallam, 2023). In a pandemic, a flood of AI generated fake health advice could kill people faster than any virus.

The Authorship Problem: Why ChatGPT Cannot Be a Coauthor

Sallam takes a firm stance on one specific issue. He argues that “as it currently stands, ChatGPT does not qualify to be listed as an author in scientific articles unless the ICMJE/COPE guidelines are revised or amended” (Sallam, 2023). The International Committee of Medical Journal Editors requires authors to take responsibility for the work, to approve the final version, and to be accountable for accuracy. A language model cannot do any of those things. It cannot be sued. It cannot retract a paper. It cannot explain why it made a mistake.

Some journals have already banned ChatGPT authorship. But the problem is deeper. If a researcher uses ChatGPT to write a methods section, and that section contains a hallucinated detail that leads to a flawed replication attempt, who is at fault? The researcher who did not check? The model that made it up? The journal that published it? Sallam’s review makes clear that the current system has no answer.

What This Research Does Not Prove

Sallam’s review is a systematic synthesis, not an experiment. It tells us what researchers are saying and finding, but it does not measure ChatGPT’s performance on a specific clinical task. The review does not tell you whether ChatGPT is better or worse than a human doctor at diagnosing pneumonia. It does not compare the chatbot to other AI tools. It does not track outcomes over time.

This is important because the 60 studies Sallam reviewed are early work. Many are preprints, not peer reviewed. Some are opinion pieces rather than empirical tests. The field is moving so fast that a review published in 2023 already feels like ancient history. The specific numbers may shift. But the structure of the problem will not. The tension between utility and risk is baked into the technology itself.

What This Actually Means

▸Do not let ChatGPT write clinical notes without a human editor. The model will invent citations and misinterpret lab values. Every output needs a trained professional to verify it. That verification is not optional. It is the only thing standing between a useful tool and a dangerous one.

▸Medical schools should teach students how to use ChatGPT, not ban it. The technology is not going away. Students who learn to prompt effectively, to spot hallucinations, and to verify outputs will have an advantage. Those who are told to never touch it will be left behind.

▸Hospitals need clear policies on liability before deployment. If a chatbot suggests a treatment and a patient is harmed, the hospital, the physician, and the software developer all have plausible deniability. That ambiguity is a lawsuit waiting to happen. Institutions should write rules now, not after the first case.

▸Researchers should treat ChatGPT as a research assistant, not a coauthor. It can draft, summarize, and generate code. It cannot take responsibility. Sallam’s review is explicit on this point. List the model in the methods section, not the author list.

▸The single most important skill for a doctor in 2025 is critical evaluation of AI output. Medical training has always emphasized differential diagnosis and evidence appraisal. Those skills now apply to the machine itself. The best clinicians will be the ones who can tell when the bot is wrong.

References

[1]Malik Sallam (2023). ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. HealthcareDOI· 2,662 citations