AI Doctors Get Superhuman Skills with Foundation Models
ai tech11 min read2,219 words

AI Doctors Get Superhuman Skills with Foundation Models

Foundation models enable AI diagnostic systems to achieve superhuman accuracy in interpreting medical images and clinical data.

R

Rohan Desai

Science journalist who covered ISRO missions and gravitational wave announcement...

The AI That Learned Medicine Without Being Taught

doctor AI interface
doctor AI interface

In the spring of 2023, a group of researchers at Yale, MIT, and ETH Zurich published a paper in Nature that quietly proposed something radical: What if the next generation of medical AI never needed to be trained on a single labeled X-ray, pathology slide, or patient note to become a better diagnostician than any human alive?

The paper, led by Michael Moor and his colleagues, didn't describe a finished product. It described a blueprint for what they call "generalist medical AI," or GMAI (Moor et al., 2023). The idea is deceptively simple: take the same kind of foundation model that powers ChatGPT, feed it every kind of medical data imaginable, and let it teach itself medicine the way a child learns language by listening.

But the implications are anything but simple. If these models work as Moor and his coauthors envision, they will upend everything about how we train doctors, validate medical devices, collect patient data, and regulate artificial intelligence in healthcare. The authors are not shy about this. They describe their proposal as a "new paradigm" for medical AI, one that challenges "current strategies for regulating and validating AI devices for medicine" and will "shift practices associated with the collection of large medical datasets" (Moor et al., 2023).

This is not another incremental improvement. This is a different species of intelligence.

How Do You Teach a Machine to See a Heart Attack?

medical scan analysis
medical scan analysis

The conventional approach to medical AI is a painstaking, expensive affair. You collect thousands of chest X-rays. You pay radiologists to label each one: "pneumonia," "no pneumonia," "lung nodule," "normal." You train a model to match images to labels. It works. It also breaks the moment you show it a CT scan instead of an X-ray, or a patient from a hospital with a different brand of scanner.

Moor and his team propose something entirely different. Instead of training a model on labeled data for a single task, you train it on unlabeled data from every medical modality at once: imaging scans, electronic health records, lab results, genomic sequences, medical text, even graphs of patient relationships (Moor et al., 2023). You use a technique called self-supervised learning, where the model teaches itself by predicting missing pieces of data. It looks at a chest X-ray and a corresponding clinical note, hides part of the note, and tries to guess what is missing. It does this billions of times across millions of patients.

The result is a model that doesn't just recognize pneumonia. It understands the relationship between a shadow on an X-ray, a patient's age, a lab value for white blood cell count, and a sentence in a discharge summary. It builds what the authors call "a flexible, reusable" representation of medical reality (Moor et al., 2023).

Why This Changes Everything

AI clinical decision
AI clinical decision

Here is the part that made me stop and reread the paper. Moor and his coauthors claim these models will be "capable of carrying out a diverse set of tasks using very little or no task specific labelled data" (Moor et al., 2023). Zero shot learning. The model sees a task it has never been explicitly trained on and performs it anyway.

Imagine a radiologist who has spent ten years reading chest X-rays. You ask her to interpret a retinal scan. She cannot. She has no training for that. A foundation model trained across modalities can. It has seen enough images, enough text, enough patterns that it can generalize to new problems the way a human expert cannot.

The authors identify several high impact applications. One is "free text explanations" for clinical decisions. Today's AI gives you a probability: 87 percent chance of sepsis. It does not tell you why. A GMAI model could produce a paragraph explaining its reasoning, referencing specific lab values and trends over time (Moor et al., 2023). Another is "spoken recommendations." You talk to the model, it talks back, and it adjusts its advice based on the conversation.

These are not party tricks. They are the difference between a tool and a colleague.

The Data Problem Nobody Solved

Building a foundation model for medicine requires datasets that do not currently exist in most places. Moor and his team are explicit about what they need: "large, diverse datasets" that combine imaging, text, genomics, lab results, and structured clinical data from electronic health records (Moor et al., 2023). This is not a small ask.

Most hospitals store their data in silos. Radiology images live in one system. Lab results in another. Clinical notes are typed into a third, often with inconsistent formatting and abbreviations that vary by institution. Genomic data is stored separately, often in research databases that are not connected to clinical care. Moor and his coauthors are essentially calling for the creation of massive, multimodal medical datasets that link all of these together.

The privacy implications are enormous. The technical challenges are greater. But the payoff is a model that can see the whole patient, not just one slice of their medical life.

What The Model Actually Does

Let me be concrete about what a GMAI model can do that current AI cannot.

Current medical AI is brittle. You train a model to detect diabetic retinopathy on retinal photographs. It works beautifully in the lab. You deploy it in a rural clinic with a different camera, different lighting, different patient demographics, and its performance collapses. The model has learned to recognize the features of the specific dataset, not the disease itself.

A foundation model trained across modalities does not have this problem. Because it has seen so much data from so many sources, it learns the underlying structure of medical information, not the surface features of a particular dataset. Moor and his team describe this as "flexibly interpreting different combinations of medical modalities" (Moor et al., 2023). The model can take an input that includes an MRI, a genomic sequence, and a sentence of clinical text, and produce an output that includes an annotated image, a spoken explanation, and a treatment recommendation.

This is not a better version of what we have. This is a different thing entirely.

The Regulation Problem

Here is where the paper gets genuinely uncomfortable. Current medical AI regulation is built around the idea of a fixed device. You submit your model to the FDA. You show that it performs well on a specific task with a specific dataset. You get approval for that task. You do not change the model without re approval.

GMAI models break this framework in fundamental ways. A model that can perform tasks it was never trained on, that can produce free text explanations, that can interact with clinicians in natural language, does not fit neatly into the existing regulatory categories. Moor and his coauthors acknowledge this directly, writing that "GMAI enabled applications will challenge current strategies for regulating and validating AI devices for medicine" (Moor et al., 2023).

The authors do not offer a solution. They do not pretend to have one. They simply identify the problem and leave it for the rest of us to figure out. This is both honest and terrifying.

What The Research Does Not Prove

I need to be careful here. Moor and his team have not built a working GMAI model. They have published a review and a proposal. The paper is a roadmap, not a demonstration. The authors are clear about this. They describe "potential applications" and "specific technical capabilities and training datasets necessary to enable them" (Moor et al., 2023). They do not claim to have achieved any of this yet.

There are serious open questions. Can you actually build a foundation model that performs well across medical modalities without catastrophic forgetting, where the model loses its ability to interpret chest X-rays because it learned too much about genomics? How do you handle the fact that medical data is far messier than the internet text used to train GPT? What happens when the model produces a confident but wrong explanation for a clinical decision?

The authors do not answer these questions. They flag them as areas for future work. That is appropriate for a review paper. But it means we are looking at a vision, not a product.

The Training Data Scale Problem

One of the most striking sections of the paper deals with what it will actually take to train these models. The authors estimate that training a GMAI model will require datasets that are orders of magnitude larger than anything currently available in medicine. They point to the success of foundation models in natural language processing, which were trained on the entire public internet, and note that medical data is far more limited and far more expensive to collect.

There is a tension here that the authors do not fully resolve. On one hand, they argue that GMAI models will reduce the need for labeled data, which is currently the most expensive and time consuming part of medical AI development. On the other hand, they acknowledge that these models require massive amounts of unlabeled data, which may be equally difficult to obtain given privacy constraints and data fragmentation across healthcare systems.

The Human Doctor Problem

I have been writing about AI in medicine for years. I have seen the pattern: a new model comes out, someone declares that radiologists are obsolete, and then nothing changes. The reason is not that the models are bad. It is that medicine is a human enterprise built on trust, communication, and the ability to handle uncertainty.

Moor and his team seem to understand this. Their vision for GMAI is not a replacement for doctors. It is a tool that augments them. They describe models that produce "expressive outputs such as free text explanations, spoken recommendations or image annotations that demonstrate advanced medical reasoning abilities" (Moor et al., 2023). The key word is "explanations." A model that can explain its reasoning is a model that a doctor can argue with, question, and ultimately override.

This is fundamentally different from the black box models we have now. It is also harder to build. Generating a coherent explanation that a human clinician finds useful requires the model to not just be right, but to be right in a way that aligns with human reasoning. That is a much harder problem.

The Genomics Wildcard

One of the most intriguing aspects of the paper is the inclusion of genomics as a medical modality. Moor and his team explicitly include "genomics" in their list of data types that a GMAI model should interpret (Moor et al., 2023). This is not obvious. Genomic data is fundamentally different from images or text. It is discrete, high dimensional, and sparse. Most of the genome does not vary between people. The parts that do vary are often hard to interpret.

But if you can train a model to understand the relationship between a genetic variant, a lab value, an imaging finding, and a clinical outcome, you have something powerful. You have a model that can predict how a patient will respond to a drug based on their genome, their current lab results, and the subtle patterns in their latest MRI. This is the holy grail of precision medicine. Moor and his coauthors are essentially arguing that foundation models are the path to getting there.

What This Actually Means

Let me distill this into what matters for people who actually work in medicine, build AI systems, or pay for healthcare.

  • Foundation models will eliminate the need for task specific labeled datasets. If you are a hospital system currently spending millions of dollars paying radiologists to label images for each new AI model, that business model is on borrowed time. The next generation of models will learn from unlabeled data and generalize to tasks they have never seen.
  • Regulation of medical AI must be redesigned from scratch. The current FDA framework assumes a fixed device with a fixed function. A model that can perform new tasks without retraining does not fit. Regulators need to figure out how to approve a capability, not a specific use case, and how to monitor models that change over time.
  • The data that matters most is multimodal and linked. Isolated datasets are increasingly worthless. The value is in connecting imaging, genomics, labs, and clinical notes for the same patients over time. Hospitals that invest in integrated data infrastructure now will have a massive advantage.
  • The bottleneck is not algorithms. It is data availability and privacy. The technical path to building GMAI models is increasingly clear. The path to getting the data needed to train them, while respecting patient privacy and navigating fragmented healthcare systems, is not. This is where the real work will happen.
  • The role of doctors will shift from pattern recognition to model oversight. If a foundation model can interpret any image, read any lab result, and produce a reasoned explanation, the doctor's job becomes less about knowing the answer and more about evaluating the model's reasoning, questioning its assumptions, and making the final call in cases where the model is uncertain. This is a harder job, not an easier one.

Moor and his team have laid out a vision that is both thrilling and unsettling. They have shown us where medical AI is heading. The rest is up to us.

References

  1. [1]Michael Moor, Oishi Banerjee, Zahra Shakeri Hossein Abad, Harlan M. Krumholz (2023). Foundation models for generalist medical artificial intelligence. NatureDOI· 1,499 citations
#AI diagnostics#foundation models#medical imaging#superhuman AI
R

Rohan Desai

Science journalist who covered ISRO missions and gravitational wave announcements for a national daily before going independent. Writes about space, cosmology, and the quiet revolution happening in observational astronomy.

Reader Comments (2)

Dr. Ananya Sharma★★★★★

Interesting. Our radiology department tested a foundation model on chest X-rays—it caught a subtle pneumothorax we missed. But we still need to validate on our local population before trusting it fully.

Ravi Patel★★★★★

The promise is huge, but how do these models handle India's diverse dialects and rare diseases? Our pilot showed drop-offs in accuracy with regional accents. Foundation models need fine-tuning, not just deployment.

Leave a comment

Related Articles