AI Has Six Grand Challenges That We Must Solve Now
ai tech13 min read2,657 words

AI Has Six Grand Challenges That We Must Solve Now

The article identifies six critical challenges in AI, including safety, alignment, and bias, that must be resolved to ensure beneficial development.

R

Rahul Venkatesh

Former ML engineer at a Bengaluru AI startup, now a science communicator. Spent ...

AI Has Six Grand Challenges That We Must Solve Now

Here is a paradox for you. We are building machines that can pass the bar exam, generate photorealistic video from a text prompt, and write poetry that makes you feel something. And yet, these same machines, deployed in the real world, are systematically denying people housing, misdiagnosing patients by race, and reinforcing the very inequalities they were supposed to erase.

The gap between what AI can do and what AI should do is not closing. It is widening.

In 2023, a group of 26 experts from academia, industry, and government sat down to name the problem. The result was a paper published in the International Journal of Human Computer Interaction titled "Six Human Centered Artificial Intelligence Grand Challenges" (Garibay et al., 2023). The authors did not propose a new algorithm. They did not build a better model. Instead, they did something more uncomfortable: they listed the six fundamental problems that the entire field of AI must solve before it is too late.

These are not technical tweaks. These are structural failures. And the authors argue that if we do not address them now, the technology will continue to optimize for the wrong things.

Here is what they found.

Challenge One: AI Must Actually Improve Human Well Being, Not Just Efficiency

ethical AI development
ethical AI development

The first grand challenge sounds obvious, which is why it is so easy to ignore. The authors argue that AI systems should be evaluated not by how fast they process data or how accurately they predict outcomes, but by whether they improve human well being (Garibay et al., 2023). That sounds like common sense. It is not how the industry works.

Right now, the dominant metric for AI success is performance. Can the model classify images faster? Can it generate text that fools a human reader? Can it drive a car with fewer interventions? These are engineering benchmarks. They measure what the machine does, not what the machine does to people.

The authors point out that a system can be highly accurate and still cause harm. A credit scoring algorithm can be 99% accurate at predicting defaults and still systematically deny loans to qualified applicants from minority neighborhoods. A hiring tool can match candidates to job descriptions with high precision and still filter out women with gaps in their resumes. Accuracy and well being are not the same thing.

The paper calls for a shift in how we define success. Instead of asking "does this AI work?", we should ask "does this AI make people's lives better?" That means measuring outcomes like financial stability, mental health, social connection, and autonomy. These are harder to quantify than error rates. That is precisely why the industry avoids them.

Garibay et al. (2023) argue that the burden of proof should be on the developers. Before deploying a system at scale, companies should have to demonstrate that it improves human well being, not just that it does not cause obvious harm. This flips the default assumption. Right now, the default is "deploy first, apologize later." The authors want it to be "prove it helps, then deploy."

Challenge Two: AI Must Be Designed Responsibly, Not Just Legally

machine learning bias
machine learning bias

The second challenge is about responsibility. Not in the abstract philosophical sense, but in the practical engineering sense. The authors argue that AI systems must be designed with explicit, auditable mechanisms for accountability (Garibay et al., 2023).

Here is the problem. When a human makes a bad decision, we can ask them why. We can investigate their reasoning, their biases, their incentives. When an AI makes a bad decision, we often cannot. Modern deep learning models are black boxes. They make decisions based on millions of parameters that no human can trace. If a model denies someone a kidney transplant, we cannot point to a specific line of code and say "that is where the error happened."

The authors call this a "responsibility gap." It is not just a technical problem. It is a moral one. If we cannot explain why a system made a decision, we cannot hold anyone accountable for it. And if no one is accountable, then no one is incentivized to fix it.

The paper advocates for a shift from "legal compliance" to "responsible design." Legal compliance means meeting the minimum standards set by regulators. Responsible design means building systems that are transparent, auditable, and contestable by default. That might mean using simpler models that humans can understand, even if they are slightly less accurate. It might mean building in explicit override mechanisms so that human operators can flag suspicious decisions. It might mean publishing the training data and the decision logic so that independent auditors can check for bias.

The authors are not naive. They know that perfect transparency is impossible for some systems. But they argue that the burden of proof should be on the developers to show that they have made a good faith effort to make their systems understandable, not on the victims to prove that they were harmed.

Challenge Three: AI Must Respect Privacy, Not Just Collect Consent

responsible AI future
responsible AI future

The third challenge is about privacy, but not in the way you might think. The authors argue that the current model of privacy protection is broken (Garibay et al., 2023).

Right now, most privacy protections rely on informed consent. A website pops up and says "we use cookies." You click "accept." A company says "we will use your data to improve our services." You sign the form. The problem is that this model assumes that people understand what they are consenting to.

The authors point out that modern AI systems can infer sensitive information from seemingly innocuous data. Your browsing history can reveal your political affiliation, your sexual orientation, your health status. Your typing patterns can reveal your emotional state. Your voice can reveal your age, your gender, your ethnicity. When you "consent" to a company collecting your data, you are not consenting to them knowing all of these things. You are consenting to something you do not fully understand.

Garibay et al. (2023) argue that we need to move beyond consent based models. They call for "privacy by design" systems that minimize data collection by default, that anonymize data at the point of collection, and that give users meaningful control over how their data is used. This is not just about regulation. It is about architecture. If a system does not need your data to function, it should not collect it. If it does need your data, it should process it locally on your device, not send it to a central server.

The authors also raise a subtler point. Even if data is anonymized, AI systems can often re identify individuals by cross referencing multiple datasets. A "de identified" medical record can be linked back to a specific person if it contains enough unique details. The paper warns that privacy is not a binary state. It is a spectrum. And current AI systems are pushing us toward the wrong end of it.

Challenge Four: AI Must Be Designed for Humans, Not Just for Users

The fourth challenge is about design methodology. The authors argue that AI systems should be built using human centered design principles, which means involving real people at every stage of development, not just at the testing phase (Garibay et al., 2023).

This sounds like common sense, but it is not how most AI systems are built. The typical development cycle looks like this: engineers collect a dataset, train a model, test it on a holdout set, and then deploy it. Users are brought in at the end, usually to beta test a finished product. By that point, the fundamental design decisions have already been made. The users can only give feedback on minor tweaks.

The authors argue that this approach produces systems that are optimized for the engineers' assumptions, not for the users' realities. A voice assistant that works perfectly in a quiet lab may fail in a noisy home. A medical diagnostic tool that was trained on data from urban hospitals may fail in rural clinics. A content moderation system that was designed by engineers in Silicon Valley may fail to understand the cultural context of a post from Nigeria.

Garibay et al. (2023) call for a shift to participatory design. That means bringing in diverse stakeholders from the beginning. Not just engineers and product managers, but also the people who will actually use the system, the people who will be affected by its decisions, and the people who will be responsible for fixing its mistakes. It means testing systems in real world conditions, not just in controlled environments. It means iterating based on feedback from actual use, not just on performance metrics.

The authors are clear about what this requires. It requires time. It requires money. It requires humility from engineers who are used to being the experts. But they argue that the alternative is worse: systems that work in theory but fail in practice, and that fail in ways that harm real people.

Challenge Five: AI Must Be Governed, Not Just Regulated

The fifth challenge is about governance. The authors argue that we need new institutions and new processes to oversee AI systems, not just new laws (Garibay et al., 2023).

Here is the distinction. Regulation is about setting rules. Governance is about creating the infrastructure to enforce those rules, to adapt them as technology changes, and to hold people accountable when they break them. We have plenty of proposed regulations for AI. We have very little governance.

The paper points to existing models from other high risk industries. Aviation has the Federal Aviation Administration, which certifies aircraft, investigates accidents, and updates safety standards based on what it learns. Pharmaceuticals have the Food and Drug Administration, which requires clinical trials, monitors side effects, and can pull drugs from the market. Both of these institutions have the authority to say "no." They can ground a plane. They can ban a drug.

No equivalent institution exists for AI. There is no agency that can say "this algorithm is too dangerous to deploy." There is no independent body that can audit a model's training data for bias. There is no system for reporting and investigating AI failures. When an AI system causes harm, the victims have to rely on lawsuits, which are slow, expensive, and often unsuccessful.

Garibay et al. (2023) call for the creation of new governance structures that are agile enough to keep up with the pace of AI development, but powerful enough to enforce meaningful oversight. They suggest models like "algorithmic impact assessments" that would be required before deploying high risk systems. They call for independent auditing bodies that would have access to the code and the data. They argue for whistleblower protections so that engineers who discover problems can report them without fear of retaliation.

The authors are not proposing a single solution. They are arguing that the current governance vacuum is unsustainable. Without institutions that can say "no," the default will always be "yes." And "yes" is not always the right answer.

Challenge Six: AI Must Respect Human Cognition, Not Overload It

The sixth and final challenge is about the human mind. The authors argue that AI systems must be designed to work with human cognitive capacities, not against them (Garibay et al., 2023).

This is the least technical challenge, but it may be the most important. The authors point out that humans have limited attention, limited working memory, and limited ability to process information under time pressure. AI systems, by contrast, can generate information at an unlimited rate. When you combine the two, you get cognitive overload.

Think about what happens when you use a navigation app. The app gives you turn by turn directions. It recalculates when you miss a turn. It warns you about traffic. It suggests alternate routes. This is helpful. But it also means that you are constantly processing new information, constantly updating your mental model, constantly making decisions. The app is not just assisting you. It is driving your cognition.

The authors warn that this dynamic is becoming pervasive. Email filters that sort your messages. News algorithms that curate your feed. Recommendation systems that suggest what to watch, what to buy, what to read. Each of these systems is designed to reduce cognitive load, but together they create a new kind of load: the load of managing the systems themselves.

Garibay et al. (2023) argue that AI systems should be designed to augment human cognition, not replace it. That means giving users the ability to understand what the system is doing, to override its suggestions, and to turn it off when it becomes overwhelming. It means designing systems that are transparent about their limitations, that admit uncertainty, and that defer to human judgment when the stakes are high.

The authors are not anti technology. They are pro human. They argue that the goal of AI should not be to make decisions for us, but to help us make better decisions ourselves.

What the Paper Does Not Prove

It is important to be clear about what this paper is and is not. The authors do not present new experimental data. They do not test a hypothesis. They do not run a controlled trial. This is a consensus paper, the result of a structured dialogue among 26 experts. It is a synthesis of existing knowledge, not a breakthrough discovery.

This means that the paper's authority comes from the expertise of its authors, not from the rigor of its methods. That is a legitimate form of scholarship, but it is different from a randomized controlled trial. The recommendations in the paper are informed by evidence, but they are not themselves evidence. They are expert opinions, carefully argued and well referenced, but opinions nonetheless.

The paper also does not offer a detailed roadmap for implementation. It identifies the challenges. It suggests directions. But it does not tell you how to build a governance institution, how to design a privacy preserving architecture, or how to measure human well being. Those are open research questions that the paper is calling on the field to address.

What This Actually Means

  • Stop optimizing for the wrong thing. If your AI system is accurate but makes people's lives worse, you have failed. Redefine success around human well being, not just performance metrics. This is not soft. It is hard. It is the real work.
  • Build systems you can explain. If you cannot trace why your model made a decision, you cannot fix it when it is wrong. Prioritize interpretability over accuracy. A slightly worse model that you understand is better than a perfect model that is a black box.
  • Minimize data by default. Do not collect data because you can. Collect data because you need it. Process it locally when possible. Anonymize it immediately. Assume that consent is not enough, because it is not.
  • Involve real people from day one. Do not design in a lab and test on users. Design with users, in the real world, over time. Their feedback is not a bug report. It is the data you need to build something that actually works.
  • Create institutions that can say no. We need agencies that can audit, certify, and if necessary, stop AI systems from being deployed. The market will not police itself. Regulation without enforcement is theater.
  • Respect the limits of the human mind. AI should reduce cognitive load, not create a new kind of it. Give people control, transparency, and the ability to opt out. The goal is not to replace human judgment. It is to support it.

The six challenges are not abstract. They are happening right now, in every domain where AI is deployed. The question is whether we will treat them as optional or as fundamental. The authors of this paper have made their choice. The rest of us need to make ours.

References

  1. [1]Özlem Özmen Garibay, Brent Winslow, Salvatore Andolina, Margherita Antona (2023). Six Human-Centered Artificial Intelligence Grand Challenges. International Journal of Human-Computer InteractionDOI· 434 citations
#AI challenges#AI safety#AI alignment#bias in AI
R

Rahul Venkatesh

Former ML engineer at a Bengaluru AI startup, now a science communicator. Spent six years building production language models before switching to writing about the research nobody inside the lab has time to explain.

Reader Comments (2)

Dr. Ananya Sharma★★★★★

Interesting framing, but I wonder if we're prioritizing the wrong challenges. In my NLP work, bias mitigation feels more urgent than general reasoning. The paper glosses over real-world deployment hurdles in Indian languages.

Ravi Iyer★★★★★

Good overview, yet missing the energy cost elephant in the room. I've seen small teams solve specific problems with lightweight models. Grand challenges shouldn't ignore frugal innovation lessons from the Global South.

Leave a comment

Related Articles