The Black Box Apologists Have Failed

In 2016, a team of doctors fed chest X-rays to a neural network. The AI detected pneumonia with alarming accuracy, often outperforming human radiologists. Then someone asked why. The machine could not say. It could only point to probabilities, not patterns, not reasoning, not a single coherent explanation. The doctors had a choice: trust the black box or ignore a tool that might save lives.
They chose neither. They demanded explanations.
Eight years later, the field of Explainable AI has produced thousands of papers, dozens of toolkits, and a mountain of good intentions. But according to a new manifesto published in Information Fusion by Luca Longo and 27 coauthors, the whole enterprise is stuck. The authors identify 28 open problems in XAI, organized into nine categories, and they argue that the field has been chasing the wrong questions (Longo et al., 2024). The core problem is not technical. It is philosophical.
The authors, drawn from computer science, cognitive psychology, philosophy, and law, claim that XAI has become a kind of cargo cult. Researchers build explanations that look like explanations but do not actually help anyone understand anything. The explanations are too simple, too narrow, or too detached from how humans actually reason. And the consequences are not academic. When an AI denies a loan, recommends a prison sentence, or misdiagnoses a tumor, an explanation that does not work is worse than no explanation at all. It gives the illusion of accountability without the reality.
Longo and his colleagues call their paper a manifesto. They want to reset the field. They want XAI 2.0.
What XAI Gets Wrong

The Explanation That Explains Nothing
The most popular XAI methods today fall into two camps. One camp builds simple, interpretable models that approximate the behavior of a black box. The other generates feature attributions, which tell you which input variables mattered most for a given prediction. LIME and SHAP are the poster children of this second approach. They produce numbers and heatmaps. They look scientific.
But Longo et al. (2024) argue that these methods fail a basic test: they do not actually explain. A heatmap that highlights pixels in an image does not tell you why the model thinks those pixels matter. It does not tell you what the model would do if the image changed. It does not tell you whether the model is relying on spurious correlations, like the presence of a hospital bed in a chest X-ray rather than the pathology itself.
The authors call this the "explanation versus interpretation" problem. An interpretation is a mapping from inputs to outputs. An explanation is a story that connects causes to effects, that accounts for counterfactuals, that lets a human reason about what would happen under different circumstances. Most XAI tools produce interpretations. They do not produce explanations. And the difference matters because humans do not think in heatmaps. They think in narratives.
The Audience Problem
Who is an explanation for? The obvious answer is the end user, the person whose life is affected by the AI decision. But Longo et al. (2024) point out that XAI research has largely ignored this question. Most methods are evaluated by machine learning researchers using technical metrics, not by domain experts or laypeople using comprehension tests.
The authors break down the audience into at least five groups: developers, domain experts, regulators, affected individuals, and the general public. Each group needs a different kind of explanation. A developer wants to debug a model. A doctor wants to verify a diagnosis. A judge wants to assess fairness. A patient wants to understand why they were denied coverage. A regulator wants to audit the system at scale.
Current XAI methods treat all these audiences as interchangeable. They are not. And the result is that explanations designed for one group are useless or misleading for another. Longo et al. (2024) call this the "stakeholder alignment" problem, and they argue that solving it requires abandoning the one size fits all approach.
The 28 Problems That Matter

Longo and his coauthors do not just complain. They catalog. The manifesto lists 28 open problems, each with a description and a proposed research direction. The problems fall into nine categories: foundational definitions, evaluation metrics, human centered design, causal reasoning, uncertainty communication, fairness and bias, regulatory compliance, scalability, and interdisciplinary integration.
Some of these problems are technical. How do you scale explanations to models with billions of parameters? How do you ensure that explanations are faithful to the underlying model, not just plausible fictions? How do you communicate uncertainty without undermining trust?
But many of the problems are conceptual. The authors argue that the field lacks a shared definition of what an explanation even is. They point out that philosophers have debated this question for centuries, and that AI researchers cannot solve it by fiat. They also argue that XAI has ignored the role of causality. Most explanations are correlational. They tell you what features matter, but not why they matter or what would happen if you changed them. Real explanations, the kind that humans find satisfying, are causal. They allow you to reason about interventions.
Longo et al. (2024) also highlight the problem of "explanation evaluation." How do you know if an explanation is good? Current metrics measure things like fidelity, which is the degree to which the explanation matches the model's behavior. But fidelity does not guarantee usefulness. An explanation can be perfectly faithful and completely incomprehensible. The authors call for new metrics that measure human comprehension, decision quality, and trust calibration.
What the Research Does Not Prove
The manifesto is ambitious, but it has limits. The authors do not present a unified theory of explanation. They do not offer a single algorithm or toolkit. They do not claim to have solved any of the 28 problems. Instead, they lay out a roadmap and invite the community to follow.
The paper also does not prove that current XAI methods are actively harmful. It argues that they are insufficient, not that they cause damage. But the authors cite evidence that poorly designed explanations can mislead users, create false confidence, or obscure systemic biases. A 2022 study found that users who saw SHAP explanations were more likely to trust a biased model than users who saw no explanation at all. That is not a bug. It is a feature of the wrong approach.
And the manifesto does not address the political economy of XAI. Who pays for better explanations? Who enforces standards? The authors mention regulation briefly, but they do not grapple with the fact that companies have little incentive to make their models transparent. If an explanation reveals that a hiring algorithm is racist, the company faces liability. The current system, where explanations are optional and unregulated, serves the interests of those who build black boxes, not those who suffer from them.
How to Fix It
Stop Building Tools for Yourself
Longo et al. (2024) argue that XAI researchers need to stop designing explanations for other XAI researchers. The field has become insular, with papers evaluated on technical novelty rather than real world impact. The authors call for more user studies, more field deployments, and more collaboration with domain experts.
One concrete proposal: every XAI paper should specify its target audience and evaluate the explanation against that audience's needs. If you are building an explanation for radiologists, test it on radiologists. If you are building one for loan applicants, test it on people who might actually apply for loans. This seems obvious, but it is not standard practice.
Think Causally, Not Correlationally
The authors argue that XAI needs to incorporate causal reasoning. Instead of asking "which features mattered?" the field should ask "what would happen if we changed this feature?" This is the difference between a heatmap and a counterfactual. A counterfactual explanation says: "If your income had been $5,000 higher, your loan would have been approved." That is something a human can act on.
Longo et al. (2024) point to recent work in causal machine learning as a promising direction. But they also acknowledge that causal inference is hard, especially in high dimensional settings. The payoff, however, is enormous. Causal explanations are more faithful, more actionable, and more aligned with how humans actually reason.
Embrace Pluralism
The manifesto calls for a pluralistic approach to explanation. No single method will work for all audiences, all tasks, and all domains. The field needs a toolbox, not a silver bullet. And it needs guidelines for choosing the right tool for the right job.
This means abandoning the search for a universal definition of explainability. Instead, researchers should ask: "What does this specific stakeholder need to know, and how can we give it to them?" The answer will vary, but the question is universal.
Regulate, Do Not Just Innovate
The authors do not say this explicitly, but the implication is clear. Technical fixes are not enough. XAI needs regulatory pressure. The European Union's AI Act includes provisions for explainability, but the details are vague. Without enforcement, companies will continue to treat explanations as a checkbox exercise.
Longo et al. (2024) call for interdisciplinary collaboration, but they also need political will. The manifesto is a roadmap for researchers. It is not a blueprint for policymakers. That is the next step.
What This Actually Means
- ▸Current XAI methods like LIME and SHAP produce interpretations, not explanations. They tell you what the model saw, not why it matters or what to do about it. If you are using these tools to make high stakes decisions, you are operating on incomplete information.
- ▸The audience for an explanation matters more than the technical sophistication of the method. A perfect explanation for a developer is useless to a patient. XAI researchers must specify and test for their target audience, or their work will remain academic.
- ▸The field needs causal explanations, not just correlational ones. Counterfactuals, which tell you what would change the outcome, are more actionable and more aligned with human reasoning than feature attributions. This is a harder problem, but it is the right one.
- ▸There is no universal definition of explainability, and the search for one is a distraction. The goal should be fit for purpose explanations that meet the needs of specific stakeholders in specific contexts.
- ▸Regulatory pressure is essential. Without legal requirements for transparency, companies have no incentive to invest in real explanations. Researchers can build the tools, but only regulators can force their use.
References
- [1]Luca Longo, Mario Brčić, Federico Cabitza, Jaesik Choi (2024). Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Information FusionDOI· 455 citations
