AI's Black Box Problem Holds Back Trust and Adoption

The Algorithm That Can’t Explain Itself

In 2016, a team of researchers at a major hospital trained an AI to detect pneumonia from chest X-rays. The model achieved 92 percent accuracy. It was better than most radiologists. The hospital prepared to deploy it in emergency rooms. Then someone asked a simple question: How does it know?

The team ran the model through an explainability tool. The result was unsettling. The AI had learned to associate pneumonia not with the white patches in the lungs, but with the presence of a metal token placed in the corner of every X-ray image. That token was a marker used by technicians to indicate the patient’s left side. In the training data, sicker patients were more likely to be X-rayed in a specific position, and the token appeared more often in those images. The AI had found a shortcut. It was cheating.

That story is not in the paper by Sajid Ali, Tamer Abuhmed, Shaker El-Sappagh, and Khan Muhammad, but it captures the problem they spent 410 articles trying to solve. Their comprehensive review, published in Information Fusion, synthesizes what we know about explainable AI and what is still broken. The title of their paper is polite: “Explainable Artificial Intelligence: What we know and what is left to attain Trustworthy Artificial Intelligence.” The subtext is less polite. We are deploying systems we do not understand, and the consequences are mounting.

Why Your Doctor Won’t Trust the Algorithm

The black box problem is not a philosophical puzzle. It is a practical barrier. When a model denies a loan application, flags a patient for sepsis, or recommends a prison sentence, the person affected wants to know why. So does the person responsible for the decision. But many of the most powerful AI models, especially deep neural networks, operate in a way that is fundamentally opaque. They learn patterns in high dimensional space that human brains cannot visualize. They are brilliant at pattern recognition and terrible at self explanation.

Ali and his colleagues conducted a systematic review of 410 articles published between January 2016 and October 2022. They categorized the field into four axes: data explainability, model explainability, post hoc explainability, and assessment of explanations. The taxonomy itself reveals something important. The authors found that the field has been fragmented, with researchers working on different parts of the problem without a shared vocabulary.

The core tension is this. You can build a model that is interpretable by design, like a simple decision tree, but it will be less accurate than a deep neural network. Or you can use a black box model and then apply a second model to explain the first one. That is post hoc explainability, and it is where most of the research has landed. But there is an uncomfortable circularity. You are using one opaque system to explain another.

The Four Axes of Not Knowing

Data Explainability: Garbage In, Gospel Out

The first axis is the easiest to grasp. Before you even build a model, you can ask what the data contains. The authors found that data explainability involves understanding the distribution, biases, and provenance of the training data. This seems obvious. But in practice, most AI projects skip this step. The hospital pneumonia model would have been caught if someone had asked: “Does the training data contain spurious correlations that have nothing to do with the disease?”

The authors note that data explainability is often neglected because it is boring. It does not involve fancy algorithms. It involves looking at spreadsheets and asking uncomfortable questions about how the data was collected.

Model Explainability: The Glass Box Fantasy

The second axis is model explainability. This is the ideal. A model that is inherently interpretable, where you can trace every decision back to a specific rule or feature. Linear regression is interpretable. Decision trees with a small number of branches are interpretable. But the authors found that these models are rarely used in high stakes applications because they cannot capture the complexity of real world data. They are accurate but not useful, or useful but not accurate.

Post Hoc Explainability: The Explanations That Lie

This is where most of the action is. Post hoc methods take a trained black box model and generate explanations after the fact. The two most popular methods are LIME (Local Interpretable Model Agnostic Explanations) and SHAP (SHapley Additive exPlanations). LIME works by perturbing the input and seeing how the output changes. SHAP uses game theory to assign credit to each feature.

Ali and his colleagues reviewed these methods extensively. They found that post hoc explanations are fragile. Small changes to the input can produce radically different explanations. The methods are also unstable. Run the same explanation tool twice on the same prediction, and you might get different results. The authors cite studies showing that LIME and SHAP can produce explanations that are inconsistent with the model’s actual behavior.

This is not a bug. It is a feature of the approach. You are approximating a complex function with a simpler one. Approximations are always wrong. The question is whether they are wrong in ways that matter.

Assessment of Explanations: Who Decides What “Good” Means?

The fourth axis is the most neglected. How do you evaluate whether an explanation is any good? The authors found no consensus. Some researchers ask human subjects to rate explanations. Others use synthetic data where the ground truth is known. Still others use mathematical metrics that measure fidelity or stability.

The problem is that “good” means different things to different people. A doctor wants to know which features in the X-ray led to the diagnosis. A patient wants to know what they can do differently. A regulator wants to know if the model is fair across demographic groups. The authors argue that explanations must be tailored to the user type. A single explanation cannot serve all purposes.

The Legal Hammer That Forces the Issue

The paper devotes significant attention to what the authors call “XAI concerns.” These are not technical problems. They are legal and regulatory demands. The European Union’s General Data Protection Regulation (GDPR) includes a right to explanation for automated decisions. The EU’s proposed AI Act goes further, requiring explainability for high risk systems. In the United States, the Algorithmic Accountability Act would require companies to assess the impact of their automated systems.

The authors note that these regulations are ahead of the technology. There is no standardized way to produce explanations that satisfy legal requirements. A bank that denies a loan using a black box model cannot simply say “the algorithm decided.” They need to provide a specific reason. But if the explanation is unreliable, the bank is exposed to legal liability.

This creates a perverse incentive. Companies may choose to use simpler, less accurate models simply because they can be explained. Or they may use black box models and produce explanations that are technically compliant but meaningless. The authors warn that this could lead to a “check the box” approach to explainability, where the form is satisfied but the spirit is violated.

The Case Study That Changes Everything

The paper includes a case study that illustrates the problem concretely. The authors apply several XAI methods to a credit scoring dataset. They train a deep neural network to predict whether a loan applicant will default. Then they use LIME, SHAP, and two other methods to explain individual predictions.

The results are sobering. Different methods produce different explanations for the same prediction. One method says the applicant’s income was the most important factor. Another method says it was their credit history. A third method says it was their age. All of these methods claim to be explaining the same model. The authors found that the explanations are not just different. They are contradictory.

This is not a failure of any single method. It is a fundamental limitation of post hoc explainability. You are trying to summarize a complex decision boundary in a few features. The summary will always be incomplete. The question is whether the incompleteness is acceptable.

What the Paper Does Not Prove

The authors are careful to note what their review does not establish. They did not run experiments to compare XAI methods head to head. They did not conduct user studies to measure whether explanations actually improve trust. They synthesized existing research, and that research has gaps.

One gap is the lack of ground truth. In most real world applications, we do not know the “correct” explanation. We only know the model’s output. This makes it hard to evaluate whether an explanation is accurate. The authors call for more research using synthetic data where the ground truth is known, but they acknowledge that synthetic data may not capture the messiness of real world problems.

Another gap is the cultural dimension. The authors note that most XAI research has been conducted in Western, educated, industrialized, rich, and democratic (WEIRD) populations. What counts as a satisfactory explanation may vary across cultures. A study in Japan might find that people prefer explanations that emphasize group harmony. A study in Germany might find that people want mechanistic explanations. The field has not grappled with this.

The Uncomfortable Truth About Trust

The word “trust” appears in the title of the paper, but the authors do not define it. This is not an oversight. It is a reflection of the field’s confusion. Trust is a psychological state. It involves vulnerability and expectation. An AI system does not have trust. Humans have trust in AI systems.

The authors found that most XAI research assumes that providing explanations will increase trust. But the evidence is mixed. Some studies show that explanations increase trust, even when the explanations are wrong. Other studies show that explanations decrease trust, especially when they reveal that the model is using biased or irrelevant features.

The most troubling finding is that explanations can be used to manipulate trust. A company could provide explanations that make a biased model look fair. They could cherry pick explanations that cast the model in a favorable light. The authors warn that XAI methods can be used for “explainability washing,” the AI equivalent of greenwashing.

What This Actually Means

▸Stop treating explanations as truth. Every post hoc explanation is an approximation. It is a summary, not a transcript. Treat it as a hypothesis to be tested, not a definitive answer. If an explanation says a patient’s age was the most important factor in a diagnosis, ask: “What happens if we remove age from the model?”

▸Match the explanation to the user. A regulator needs different information than a doctor. A patient needs different information than a technician. The authors found that most XAI research ignores this. Build systems that can generate multiple explanations for the same prediction, each tailored to a specific audience.

▸Invest in data explainability first. Before you build a model, understand your data. The authors found that data explainability is the most neglected axis, but it is also the most powerful. If your training data contains spurious correlations, no amount of post hoc explanation will save you.

▸Regulate the output, not the model. Instead of requiring that models be interpretable by design, regulators could require that decisions be explainable in a way that is meaningful to the affected person. This shifts the burden from the model builder to the deployer. It also forces companies to think about what a good explanation looks like.

▸Assume the explanation is wrong until proven otherwise. The authors found that XAI methods are unstable and contradictory. Do not make high stakes decisions based on a single explanation. Run multiple methods. Compare them. If they disagree, figure out why. The disagreement itself is information.

The black box is not going away. Deep neural networks are too powerful to abandon. But we can stop pretending that we understand them. The first step toward trustworthy AI is admitting that we do not know. The second step is building systems that help us find out.

References

[1]Sajid Ali, Tamer Abuhmed, Shaker El–Sappagh, Khan Muhammad (2023). Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Information FusionDOI· 1,437 citations