Why AI Must Explain Itself Before Trusting It With Finance

In 2019, JPMorgan Chase deployed an AI system to scan legal documents and extract key data. The bank claimed it would save 360,000 hours of lawyer time per year. Then something strange happened. The system started flagging contracts as high risk, but no one could say why. Lawyers had to redo the work anyway. The black box had failed its first test: trust.

This is not an isolated story. It is the core tension that Patrick Weber, K. Valerie Carl, and Oliver Hinz discovered when they reviewed 2,022 articles on explainable AI (XAI) in finance. Their systematic review, published in Management Review Quarterly in 2023, found that finance is rushing to adopt AI while regulators demand transparency. The two goals are colliding. And the authors argue that without explainability, AI in finance is a liability, not an asset.

Weber et al. (2023) screened every article from leading finance, information systems, and computer science journals. They landed on 60 relevant papers. What they found should make anyone who uses a robo advisor, applies for a loan, or trades stocks pay attention.

The Paradox at the Heart of Modern Finance

Finance runs on decisions. Loan approvals. Fraud detection. Stock trades. Risk assessments. Each decision must be traceable. Regulators like the SEC and European Banking Authority require that a human can explain why a loan was denied or why a trade was flagged. This is not a suggestion. It is the law.

But modern AI, especially deep learning, works differently. It finds patterns in data that humans cannot see. A neural network that approves mortgages might weigh 10,000 variables simultaneously. It might be right 99% of the time. But when it is wrong, no one knows why.

Weber et al. (2023) frame this as a fundamental conflict. Finance is a highly regulated domain. AI is a black box. The two cannot coexist unless someone builds a bridge. That bridge is explainable AI.

The authors classified the 60 papers by the XAI methods they used and the goals they aimed to achieve. They found that researchers are trying to solve two problems at once. First, they want AI to be accurate. Second, they want it to be transparent. These goals sometimes pull in opposite directions.

What the Research Actually Found

The review is systematic. Weber et al. (2023) did not run their own experiments. They mapped an entire field. Here is what the map shows.

Risk Management Is Well Covered, Anti Money Laundering Is Not

The authors found that certain areas of finance have been studied heavily. Risk management, portfolio optimization, and stock market applications dominate the literature. These are areas where the stakes are high but the feedback loops are fast. If an AI misprices a risk, the market corrects quickly. Researchers can test and retest.

But one area is conspicuously understudied: anti money laundering. Weber et al. (2023) note that only a handful of papers address XAI in AML contexts. This is alarming. Money laundering is a trillion dollar problem. Banks spend billions on compliance. AI could help, but regulators will not accept a black box that flags suspicious transactions without explanation. If the AI says "this wire transfer is suspicious because of 47 factors," the bank cannot file a Suspicious Activity Report without knowing which factors mattered most.

The authors call this a gap. It is more like a hole in the hull of a ship.

Transparent Models vs. Post Hoc Explainability

The 60 papers reveal a split in how researchers approach explainability. Some use transparent models. These are algorithms like linear regression or decision trees that are inherently interpretable. A linear regression tells you exactly how much each input contributed to the output. No mystery.

Others use post hoc explainability. This means they train a black box model like a neural network, then run a second algorithm to explain its decisions. Methods like LIME and SHAP are popular. They generate approximations of what the model is doing.

Weber et al. (2023) found that researchers have recently favored post hoc methods. This makes sense. Deep learning is more accurate than linear regression. But the authors also found something subtle. Post hoc explanations are not perfect. They are approximations. A SHAP value tells you which features were most important, but it does not tell you why the model used them that way. It is like asking a magician to explain a trick after the show. The explanation might be plausible but wrong.

The Trade Off Between Accuracy and Explainability

This is the central tension. The authors found that in many finance applications, the most accurate models are the least explainable. A deep neural network can predict stock prices better than a linear model, but no one can audit its reasoning. Regulators hate this.

Some researchers argue that the trade off is inevitable. You cannot have both. Others are trying to build models that are both accurate and transparent. Weber et al. (2023) do not take sides. They simply document the landscape. But the landscape is clear. Most high accuracy models in finance are black boxes. Most transparent models are less accurate.

How the Study Was Done

The methodology matters. Weber et al. (2023) followed the PRISMA guidelines for systematic reviews. They searched four databases: Web of Science, Scopus, EBSCOhost, and AIS eLibrary. They used keywords like "explainable artificial intelligence," "XAI," and "interpretable machine learning" combined with "finance." They screened 2,022 titles and abstracts, then read 300 full papers. They ended with 60.

This is a thorough review. The authors did not cherry pick. They applied strict inclusion criteria. Papers had to be peer reviewed, published in English, and directly address XAI in a finance context. They excluded papers that only mentioned XAI in passing.

The result is a reliable map of the field. But maps have limitations. They show where people have traveled, not where they should go.

What This Means for Your Money

If you use a robo advisor, an AI is managing your portfolio. If you apply for a mortgage, an AI might approve or deny it. If you trade stocks, an AI is likely involved in setting prices. These systems are not transparent. You cannot appeal their decisions because you cannot understand them.

Weber et al. (2023) found that XAI methods exist but are not widely deployed in production systems. Most finance AI is still black box. The research is ahead of practice. This creates a dangerous gap.

Consider a scenario. An AI denies your loan. You ask why. The bank says "the model determined your risk profile is unfavorable." That is not an explanation. It is a dodge. Regulators are starting to push back. The European Union's AI Act will require explainability for high risk systems. The United States is moving in a similar direction. But the technology is not ready.

The authors found that post hoc methods like LIME and SHAP are popular in research but rarely used in practice. Why? Because they are computationally expensive and their explanations can be unstable. A small change in input can produce a completely different explanation. This is not acceptable for a loan denial.

Where the Research Falls Short

Weber et al. (2023) are honest about limitations. Their review only covers papers published in leading journals. This means they might miss cutting edge work in conference proceedings or preprints. The field moves fast. A paper published in 2023 might already be outdated.

More importantly, the authors note that most papers focus on technical methods, not on human evaluation. An explanation is only useful if a human can understand it. But few studies test whether explanations actually help human decision makers. The authors call for more research on the human side of XAI.

This is a critical gap. An explanation that is mathematically correct but incomprehensible to a loan officer is useless. The authors found that the literature is dominated by computer scientists, not psychologists or finance practitioners. The result is a lot of technical solutions looking for a problem.

The Open Question That Keeps Me Up at Night

Can we ever build an AI that is both accurate and truly explainable? The authors do not answer this. No one does. The trade off might be fundamental. Or it might be a temporary limitation of current methods.

Some researchers argue that we need to rethink what "explainable" means. A linear regression is explainable in the sense that you can read its coefficients. But is that really an explanation? If a model says "your loan was denied because your income is low," that is technically transparent. But it might miss the real reason: the model learned that people with your zip code default more often. The transparency is fake.

Weber et al. (2023) do not resolve this. They document it. The field is still figuring out what explainability even means.

What This Actually Means

The review by Weber et al. (2023) is not a call to abandon AI in finance. It is a call to build it differently. Here is what the evidence suggests for anyone building, regulating, or using AI for money.

▸If you are a bank or fintech, do not deploy black box models for regulated decisions. The authors found that anti money laundering is understudied. This is a red flag. If you cannot explain why a model flagged a transaction, you cannot defend it in court. Use transparent models where possible. If you must use deep learning, invest in post hoc methods and test them with real humans.

▸If you are a regulator, demand more than accuracy metrics. The authors found that most papers focus on technical performance. Regulators should require proof that explanations are actually useful. A SHAP plot that a loan officer cannot read is not an explanation.

▸If you are a researcher, study the human side. The authors found a gap between technical XAI methods and human evaluation. Build experiments that test whether explanations change decisions. Measure comprehension, not just accuracy.

▸If you are a consumer, ask questions. When an AI makes a decision about your money, ask for an explanation. If the answer is vague, push back. The authors found that the field is moving toward transparency, but only if users demand it.

▸If you are building the next generation of AI, do not assume accuracy is enough. The authors documented a clear trade off. But they also found that some researchers are trying to have both. Build models that are interpretable by design, not as an afterthought. The future of finance depends on it.

The paper by Weber, Carl, and Hinz is a map of a field in transition. Finance is adopting AI faster than it can explain it. The authors do not predict disaster. They simply show the gap. But gaps can swallow you whole.

References

[1]Patrick Weber, K. Valerie Carl, Oliver Hinz (2023). Applications of Explainable Artificial Intelligence in Finance—a systematic review of Finance, Information Systems, and Computer Science literature. Management Review QuarterlyDOI· 230 citations

Why AI Must Explain Itself Before Trusting It With Finance