AI in Healthcare Pits Privacy Against Progress

The Patient Who Didn't Know Her Data Was for Sale

A woman walks into a hospital with chest pain. The triage nurse enters her symptoms into a system that, within seconds, cross-references her records against 10 million other patients and flags a 94 percent probability of a cardiac event. The algorithm saves her life. Six months later, that same algorithm is sold to a pharmaceutical company, which uses it to identify patients likely to need a new blood thinner. The woman’s data, stripped of her name but not of enough detail to reidentify her, becomes a product. She never consented to this. She never even knew it happened.

This is not a hypothetical. It is the central dilemma that S. Williamson and Victor R. Prybutok confront in their 2024 review published in Applied Sciences, which synthesizes findings across 469 citations to ask a question that no amount of techno-optimism has answered: How do we let AI save us without letting it expose us? (Williamson & Prybutok, 2024)

The authors are not alarmists. They are realists. And what they found should unsettle anyone who has ever assumed that HIPAA, GDPR, or any other acronym will protect them.

The Privacy Paradox Nobody Talks About

Here is the problem in its simplest form. AI models need data. Lots of it. The more data you feed a machine learning algorithm, the better it gets at predicting sepsis, detecting tumors, or recommending treatments. But the more data you pool, the more you create a target for theft, misuse, or what the authors call "inferential privacy violations" where an algorithm deduces sensitive information you never explicitly shared.

Williamson and Prybutok reviewed studies showing that even when patient names and Social Security numbers are removed, so called "anonymized" datasets can be reidentified with shocking ease. One widely cited example involves a 2015 study where researchers correctly reidentified 87 percent of Americans using just three data points: zip code, birth date, and gender. Healthcare datasets contain far more than that. They contain genetic markers. They contain mental health histories. They contain the precise timing of your last panic attack.

The authors found that current privacy protections are "inconsistent and often inadequate," with gaps between what regulations require and what technology can actually deliver. The result is a system where patients believe they are protected, hospitals believe they are compliant, and the data flows regardless.

Differential Privacy: The Math That Might Save You

If there is a hero in this story, it is a statistical technique called differential privacy. Williamson and Prybutok devote significant attention to it, and for good reason. Differential privacy works by injecting carefully calibrated noise into datasets before they are shared. The noise is small enough that aggregate patterns remain accurate, but large enough that individual patients cannot be identified.

Think of it like this. Imagine you are in a room with 100 people. You want to know the average height without anyone revealing their own. Differential privacy lets each person add a random number to their height before reporting it. The average stays roughly correct, but no single person's true height is ever exposed.

The authors reviewed multiple implementations of differential privacy in healthcare contexts and found that it "effectively balances the trade off between privacy and utility" when applied correctly. But here is the catch. It works only if the noise is calibrated properly. Too much noise and the AI becomes useless. Too little and privacy collapses.

Williamson and Prybutok note that the optimal noise level depends on the specific use case, the sensitivity of the data, and the trustworthiness of the data recipient. There is no universal setting. And in practice, many healthcare organizations lack the expertise to implement differential privacy correctly.

The Encryption Mirage

Most people assume that encrypting data solves the privacy problem. It does not. Encryption protects data in transit or at rest, but AI models need to process data in plaintext. Once the data is decrypted for analysis, it becomes vulnerable.

The authors examined mixed model approaches that combine encryption with differential privacy and found that while these layered systems offer stronger protection, they also introduce computational overhead that can slow down real time clinical decision making. A model that takes 30 seconds to encrypt and decrypt might be fine for analyzing research data. It is too slow for an emergency room.

This is the kind of trade off that rarely makes it into the press releases. AI in healthcare is sold as a miracle of speed and precision. What Williamson and Prybutok reveal is that the privacy protections that should accompany that miracle often degrade the very performance that makes AI attractive in the first place.

Who Actually Owns Your Medical Data?

This question sounds philosophical, but it has concrete legal implications. In the United States, HIPAA gives patients rights to access their medical records, but it does not grant them property rights over their data. That means hospitals and insurers can aggregate patient data, de identify it according to HIPAA's standards, and then sell it to third parties without patient consent.

Williamson and Prybutok reviewed patient perception studies and found a consistent pattern. Patients overwhelmingly believe that their data is used only for their own treatment. When told that their data might be used for research or commercial purposes, most express surprise, and many feel betrayed. The authors cite data showing that trust in healthcare AI drops sharply when patients learn about secondary data uses, even when those uses are legal.

The gap between what is legal and what patients expect is not a minor detail. It is a systemic failure of informed consent.

The Blockchain Fantasy

Some researchers have proposed blockchain as a solution. The idea is that patients could control access to their medical records through a distributed ledger, granting and revoking permissions in real time. Williamson and Prybutok examined this proposal critically and found it wanting.

Blockchain offers transparency and immutability, but it struggles with scalability and data storage. Medical records contain large files like MRI scans and genomic sequences. Storing them on a blockchain is impractical. The authors also note that blockchain's transparency, which is its selling point, becomes a liability when dealing with sensitive health information. A public ledger showing who accessed your records and when might itself violate privacy.

The authors conclude that blockchain is "not a panacea" and that its integration with existing regulations like GDPR remains "fraught with unresolved conflicts." Specifically, GDPR's right to erasure, which allows patients to demand deletion of their data, is fundamentally incompatible with blockchain's immutability. You cannot delete something that is permanently recorded across thousands of nodes.

Algorithmic Bias as a Privacy Problem

This is where Williamson and Prybutok make a connection that most analyses miss. Privacy and bias are not separate issues. They are the same issue viewed from different angles.

When healthcare AI is trained on biased data, it produces biased outcomes. A famous example involves algorithms that underdiagnosed Black patients because the training data reflected unequal access to healthcare. But the authors point out that fixing this bias requires more data from underrepresented populations. That means collecting more sensitive information about race, ethnicity, socioeconomic status, and geographic location.

The privacy risk increases. The more granular the data, the easier it is to reidentify individuals. And the populations most likely to be harmed by biased algorithms are also the populations most vulnerable to privacy violations. Low income patients, undocumented immigrants, and people with stigmatized conditions like HIV or mental illness face higher risks from data breaches because the consequences of exposure are more severe.

Williamson and Prybutok reviewed bias detection and mitigation strategies and found that "current approaches are insufficiently integrated with privacy protections." In other words, the people working on fairness and the people working on privacy are not talking to each other. Their solutions sometimes work at cross purposes.

The GDPR Problem

Europe's General Data Protection Regulation is often held up as the gold standard for privacy law. Williamson and Prybutok take a more nuanced view. They acknowledge that GDPR has forced organizations to take privacy seriously, but they also identify significant challenges in applying it to AI driven healthcare.

GDPR requires that individuals give explicit consent for each specific use of their data. But AI models are often trained on datasets that were collected years ago for entirely different purposes. A hospital might have collected patient records for treatment, then later decided to use them to train a sepsis prediction model. Under GDPR, this requires new consent. But contacting every patient to ask for permission is often impractical, especially for patients who have died or changed addresses.

The authors found that "the tension between GDPR's consent requirements and AI's data hunger remains unresolved." Some researchers have proposed using broad consent, where patients agree to future unspecified uses, but this approach is controversial and may not withstand legal challenge.

What This Research Does Not Prove

Williamson and Prybutok's review is comprehensive, but it does not claim to have solved the privacy problem. The authors are clear that their work is a synthesis of existing research, not a new empirical study. They did not test differential privacy on a specific dataset. They did not survey patients directly. They compiled and analyzed what others have found.

This means that the effectiveness of specific privacy techniques in real world healthcare settings remains an open question. The authors note that most studies on differential privacy, for example, were conducted in controlled environments using simulated data. How well these techniques perform in messy, real world hospital systems with legacy infrastructure and overworked IT staff is not fully known.

The authors also acknowledge that patient perceptions are difficult to measure accurately. Surveys about hypothetical scenarios may not predict how patients actually behave when their health is on the line. Someone who says they care deeply about privacy might change their mind when faced with a diagnosis that could be caught earlier by an AI.

These are not weaknesses of the paper. They are honest boundaries. And they point to the next set of questions that need answering.

What This Actually Means

The Williamson and Prybutok review does not offer easy answers. But it points toward several concrete actions that would make a real difference.

▸Differential privacy should be mandatory, not optional, for any healthcare AI system that processes patient data. The technique works when implemented correctly, and the cost of implementation is far lower than the cost of a data breach or a loss of public trust.

▸Informed consent must be redesigned for the AI era. Current consent forms are written for lawyers, not patients. They bury the most important information and use language that even college graduates struggle to understand. Patients need to know, in plain language, exactly how their data will be used, who will have access to it, and how long it will be retained.

▸Privacy and bias teams within healthcare organizations should be merged or at least forced to coordinate. The current siloed approach produces solutions that undermine each other. A single integrated team can design systems that are both fair and private.

▸Regulations need to be updated to reflect technical realities. HIPAA was written in 1996, before cloud computing, before smartphones, before AI. It treats de identification as a one time event when it is actually a continuous process that can be undone. Lawmakers should require periodic reidentification risk assessments for all datasets.

▸Patients should have the right to audit how their data is used. This does not mean every patient needs to see every log. But independent auditors should be able to trace data flows and verify that privacy protections are actually working. The results should be published in plain language.

The woman with chest pain deserves to be saved by AI. She also deserves to know what happens to her data after she leaves the hospital. Right now, she gets the first but not the second. That is not progress. It is a trade off made without her consent.

References

[1]S. Williamson, Victor R. Prybutok (2024). Balancing Privacy and Progress: A Review of Privacy Challenges, Systemic Oversight, and Patient Perceptions in AI-Driven Healthcare. Applied SciencesDOI· 469 citations