AI Ethics Guidelines Fail Without Enforcement Mechanisms

When the European Commission published its Ethics Guidelines for Trustworthy AI in 2019, the document ran 41 pages. It listed seven requirements: human agency, technical robustness, privacy, transparency, diversity, non-discrimination, and accountability. It was a beautiful document. It was also, for all practical purposes, a suggestion box.

The problem is not that the guidelines were wrong. The problem is that they were toothless. And according to a comprehensive 2024 review by Ciro Mennella, Umberto Maniscalco, Giuseppe De Pietro, and Massimo Esposito, this gap between ethical aspiration and regulatory reality is not just a political inconvenience. It is a direct threat to patient safety in healthcare AI.

The authors of that review, published in Heliyon, examined the ethical and regulatory landscape of AI in clinical settings. Their conclusion is blunt: the current system of voluntary compliance and aspirational ethics is failing. And the consequences are not theoretical.

The Paper That Refuses to Look Away

Mennella et al. (2024) conducted a narrative review, meaning they synthesized existing research rather than running new experiments. But narrative reviews, when done rigorously, serve a critical function: they take the scattered evidence from hundreds of studies and ask what the whole picture looks like. The authors searched databases including PubMed, Scopus, and IEEE Xplore, focusing on peer-reviewed articles published between 2015 and 2023 that addressed ethical, legal, or regulatory challenges of AI in healthcare. They did not limit themselves to one country or one regulatory framework. They looked at the global landscape.

What they found is a system that looks organized on paper but is chaotic in practice.

The Ethics Paradox

Here is the contradiction that Mennella et al. (2024) identify: healthcare AI is being deployed faster than regulators can keep up, yet the ethical frameworks meant to govern it are multiplying like rabbits. Every major institution has one. The World Health Organization published its own in 2021. The OECD has one. The IEEE has one. Individual hospitals have them. But none of these documents carry the force of law.

The authors write that "a robust governance framework is imperative to foster the acceptance and successful implementation of AI in healthcare." That sentence sounds like boilerplate. It is not. It is a diagnosis. Because right now, we have the opposite: a fragmented patchwork of voluntary guidelines that no one is required to follow.

Consider what happens when a hospital deploys an AI system that recommends treatment plans. If the system makes a mistake, who is liable? The hospital? The software developer? The doctor who followed the AI's recommendation? The authors found that current legal frameworks do not answer this question clearly. And without clear liability, ethics guidelines become what one might call aspirational literature.

The Black Box Problem

One of the most striking findings in Mennella et al. (2024) concerns transparency, or the lack of it. Many AI systems used in clinical settings are what researchers call "black boxes." They produce outputs, but the reasoning behind those outputs is opaque even to the engineers who built them.

The authors note that this opacity violates a core principle of medical ethics: informed consent. How can a patient consent to an AI-assisted diagnosis if the doctor cannot explain how the AI reached its conclusion? The answer is that they cannot. And yet these systems are being used.

The review cites evidence that some AI systems perform differently across demographic groups, a problem known as algorithmic bias. But because the systems are opaque, detecting this bias requires external auditing. And external auditing requires access to the underlying code and training data, which developers often refuse to share, citing proprietary concerns.

This is where the absence of enforcement becomes visible. A guideline that says "AI systems should be transparent" means nothing if there is no mechanism to force developers to open their black boxes.

The FDA's Dilemma

The United States Food and Drug Administration has tried to address this. In 2023, the FDA published draft guidance on how it would regulate AI-enabled medical devices. But Mennella et al. (2024) point out a fundamental problem: the FDA's framework was designed for static devices, not for AI systems that learn and change over time.

A traditional medical device, like a pacemaker, does not update itself. An AI system can. It can be retrained on new data, subtly altering its behavior. The FDA's current approach requires developers to submit a new application for each major change. But what counts as a major change? The authors found no consensus. Some developers argue that retraining on new data is a minor update. Patient safety advocates argue it is a fundamental change that requires new review.

Without enforcement mechanisms, each developer gets to decide for themselves. The guideline says "ensure safety." The enforcement says nothing.

Europe's GDPR: A Case Study in Good Intentions

The European Union's General Data Protection Regulation (GDPR) is often held up as a model for AI regulation. It includes provisions that seem directly relevant: the right to explanation, the requirement for data protection impact assessments, and strict rules about automated decision-making.

But Mennella et al. (2024) found that GDPR's application to healthcare AI is riddled with gaps. The right to explanation, for example, sounds powerful. But the regulation does not specify what counts as a sufficient explanation. Does it mean the doctor must understand the AI's reasoning? Does it mean the patient must understand it? The authors found that in practice, explanations are often technical and incomprehensible to patients.

More critically, GDPR's enforcement depends on national data protection authorities, which vary wildly in resources and aggressiveness. Ireland's Data Protection Commission, which oversees many of the largest tech companies due to their European headquarters being in Dublin, has been criticized for slow enforcement. Germany's authorities, by contrast, have been more active. The result is a patchwork where the same AI system might face strict oversight in one country and none in another.

The authors argue that this inconsistency undermines trust. Patients cannot know whether the AI system diagnosing them has been meaningfully reviewed or simply deployed in a regulatory vacuum.

The Liability Question

Here is where the review gets specific about what enforcement would actually look like. Mennella et al. (2024) identify liability as the single most important missing piece.

Consider a scenario from the paper: an AI system recommends a treatment that harms a patient. Who is responsible? The doctor who followed the recommendation? The hospital that purchased the system? The developer who trained it on potentially biased data? The regulator who approved it?

Current legal frameworks, the authors found, were designed for a world where humans make decisions and machines are tools. AI systems blur this distinction. When a system learns from data and makes autonomous recommendations, it is no longer a simple tool. It is more like a collaborator with unknown capabilities and limitations.

The review notes that some courts have begun to address this. In 2023, a German court ruled that a hospital could be held liable for errors made by an AI system it deployed. But this is case law, not legislation. It varies by jurisdiction. And it leaves developers largely insulated from liability, which the authors argue creates perverse incentives.

If developers face no consequences when their systems fail, they have little reason to invest in safety testing, transparency, or bias detection. The ethics guidelines say they should. The market says they do not have to.

What Enforcement Would Actually Look Like

Mennella et al. (2024) do not just diagnose the problem. They propose solutions, though they are careful to note that these are recommendations, not proven fixes.

The authors call for mandatory pre-market approval for high-risk healthcare AI systems, similar to the process for new drugs. This would require developers to submit evidence of safety and efficacy before deployment. They also call for continuous post-market surveillance, because AI systems change over time. A system that passes pre-market testing might develop dangerous behaviors after retraining on new data.

They also recommend mandatory transparency requirements. Developers would have to disclose training data sources, known limitations, and performance across different demographic groups. This information would be publicly available, allowing independent researchers to audit the systems.

And they call for clear liability frameworks that assign responsibility to developers, not just to clinicians or hospitals. This would create financial incentives for safety.

The authors acknowledge that these recommendations face political and practical obstacles. Developers argue that transparency requirements would force them to reveal trade secrets. Pre-market approval would slow innovation. Liability frameworks would discourage investment.

But the authors counter that these objections apply equally to the pharmaceutical industry, which faces all of these requirements and still produces new drugs. The difference is that drug regulation has enforcement mechanisms. AI regulation, for the most part, does not.

The Global Divide

One of the most uncomfortable findings in Mennella et al. (2024) concerns global inequality. The review found that high-income countries are developing AI systems using data from their own populations, then exporting these systems to low and middle-income countries.

This creates a double problem. First, the systems may not perform well on populations different from the training data. An AI trained on electronic health records from Sweden may miss diseases common in sub-Saharan Africa. Second, the exporting countries may have strong ethical guidelines, but the importing countries often lack the regulatory infrastructure to enforce them.

The authors found that some AI developers use this regulatory arbitrage, deploying systems in countries with weak oversight that would not pass review in their home markets. The ethics guidelines of the exporting country do not apply. The importing country has no guidelines of its own. The result is a regulatory vacuum where patient safety depends entirely on the developer's voluntary compliance.

This is not a hypothetical. The review cites documented cases of AI systems deployed in low-resource settings without adequate validation, leading to misdiagnoses and inappropriate treatments.

What the Research Does Not Prove

It is important to be precise about what Mennella et al. (2024) do and do not show. This is a narrative review, not a systematic meta-analysis. It synthesizes existing evidence but does not produce new quantitative findings. The authors did not measure the frequency of AI failures in clinical settings. They did not compare outcomes between regulated and unregulated environments.

What they did was identify structural weaknesses in the current regulatory landscape. Their argument is logical, supported by case studies and legal analysis, but it is not experimental. A skeptic could argue that the evidence of harm is anecdotal, that the problems are manageable within existing frameworks, that regulation would stifle beneficial innovation.

The authors acknowledge these counterarguments but find them unpersuasive. They point to the growing number of documented failures, the increasing complexity of AI systems, and the accelerating pace of deployment. They argue that waiting for definitive proof of widespread harm is itself a risk, because by the time that proof arrives, the systems will be too embedded to regulate effectively.

This is a legitimate scientific disagreement. The review does not settle it. What it does is frame the stakes clearly.

The Missing Piece: Political Will

Reading Mennella et al. (2024), one gets the sense that the authors are frustrated. They have identified the problem clearly. They have proposed solutions that are technically feasible. And yet nothing changes.

The reason is political. Regulation requires enforcement, and enforcement requires resources, authority, and the willingness to confront powerful interests. AI developers have lobbyists. They have trade associations. They have the argument that regulation will cede technological leadership to China or other competitors.

The authors do not say this explicitly, but the implication is clear: ethics guidelines exist precisely because enforcement is too politically difficult. They are a substitute for real action. They allow policymakers to say "we addressed the issue" without actually addressing it.

What This Actually Means

▸If you are a hospital administrator considering an AI system, do not rely on the developer's ethics claims. Demand independent validation. Ask for transparency about training data and known limitations. If they refuse, consider that a red flag, not a trade secret.
▸If you are a clinician using AI recommendations, document every case where you follow or override the system's advice. Current liability frameworks are unclear, and documentation is your best protection. The law has not caught up to the technology, and it may not protect you.
▸If you are a patient, ask your doctor whether AI was involved in your diagnosis or treatment. You have the right to know. If the doctor cannot explain how the AI reached its conclusion, that is a problem with the system, not with your question.
▸If you are a policymaker, stop writing ethics guidelines and start building enforcement mechanisms. Pre-market approval, post-market surveillance, transparency requirements, and clear liability frameworks are not optional. They are the difference between regulation and theater.
▸If you are a developer, recognize that the current regulatory vacuum will not last. The public backlash is coming. The lawsuits are coming. The smart move is to adopt rigorous safety practices now, voluntarily, before they are imposed on you. Because when they are imposed, they will be harsher than anything you would have chosen for yourself.

The ethics guidelines are not failing because they are wrong. They are failing because they are empty. And empty words do not protect patients.

References

[1]Ciro Mennella, Umberto Maniscalco, Giuseppe De Pietro, Massimo Esposito (2024). Ethical and regulatory challenges of AI technologies in healthcare: A narrative review. HeliyonDOI· 775 citations

AI Ethics Guidelines Fail Without Enforcement Mechanisms