The Planet Has a Second Language, and We Are Just Learning to Speak It

For most of human history, understanding the Earth meant building models out of math. You wanted to predict a flood? You wrote equations for rainfall, soil saturation, river slope, and friction. You wanted to forecast a hurricane’s path? You solved the Navier-Stokes equations for fluid dynamics, plugged in temperature and pressure gradients, and hoped your approximations held up. This approach, called physics based modeling, is elegant and logical. It is also fundamentally limited.
The Earth is not a clean set of equations. It is a system of staggering complexity, where a butterfly’s wing in Brazil might indeed affect a tornado in Texas, but where we cannot possibly measure every butterfly, every wing beat, or every molecule of water vapor that connects them. Physics based models work beautifully inside their assumptions. Outside those assumptions, they break.
A team of researchers led by Tianjie Zhao from the Aerospace Information Research Institute in Beijing has just published a sweeping review of where geoscience stands right now, and the picture they paint is one of a discipline in the middle of a conversion. The old physics based models are not being thrown out. They are being fused with something new. And that fusion, the authors argue, might finally let us see the Earth the way it actually behaves, not just the way we assumed it did (Zhao et al., 2024).
Why Physics Based Models Hit a Wall

The problem with traditional geoscience models is not that they are wrong. It is that they are incomplete. Every equation that describes how water moves through soil or how heat transfers through the atmosphere is a simplification. To make the math tractable, modelers have to ignore variables. They have to assume homogeneity where the real world is heterogeneous. They have to treat time steps as discrete when nature is continuous.
Zhao and colleagues lay this out plainly. Traditional models, they write, are "grounded in physical and numerical frameworks" and "provide robust explanations by explicitly reconstructing underlying physical processes." That sounds good. But then comes the catch: these models face "limitations in comprehensively capturing Earth's complexities and uncertainties," which creates "challenges in optimization and real-world applicability" (Zhao et al., 2024).
What does that mean in practice? It means that a flood model built on physics might work perfectly for a river basin where you have fifty years of data on rainfall, soil type, and land use. But take that same model to a watershed in the Himalayas where the data is sparse and the terrain is extreme, and it will fail. The physics is the same. The inputs are not.
This is not a small problem. Climate change is making old assumptions dangerous. Storms are behaving in ways that historical data did not prepare us for. Permafrost is thawing at rates that models did not predict. The Earth is changing, and our physics based models, for all their elegance, are struggling to keep up.
The Data Flood That Changed Everything

While geoscientists were wrestling with the limits of their equations, something else was happening. The planet was getting wired.
Satellites now beam down terabytes of imagery every day. Sensor networks monitor soil moisture, atmospheric CO2, and ocean temperatures in real time. Drones map terrain at centimeter resolution. The volume of geoscience data being collected today is not just larger than it was twenty years ago. It is larger by orders of magnitude.
This created a strange paradox. Geoscientists had more data about the Earth than they had ever had, but their traditional models could not absorb it. The models were designed to work with sparse, carefully curated inputs. Feed them a firehose of raw satellite imagery, and they would choke.
Enter artificial intelligence. Machine learning models, particularly deep learning systems, are built for exactly this kind of data abundance. They do not need equations that describe the physics of the system. They need examples. Give a deep learning model enough images of floodplains before and after storms, and it will learn to predict flooding patterns without ever being told what a cubic meter of water weighs.
Zhao and his coauthors describe this as a fundamental shift in the paradigm of geoscience. The old approach was "physics first." The new approach, at least in part, is "data first." The authors note that modern data driven models "leverage extensive geoscience data to glean insights without requiring exhaustive theoretical knowledge" (Zhao et al., 2024). In other words, the AI does not need to know the theory. It just needs to see enough examples.
What Deep Learning Sees That Physicists Miss
This is where the results get genuinely surprising. The authors catalog a range of applications where AI has outperformed traditional physics based models. In some cases, the AI is not just matching human expertise. It is finding patterns that human experts, working from first principles, never thought to look for.
Consider earthquake prediction. Physics based models try to forecast earthquakes by measuring stress accumulation along fault lines, tracking seismic waves, and modeling the mechanical properties of rock. It is slow, expensive, and notoriously unreliable. Deep learning models, trained on thousands of seismic records, have started to pick up precursor signals in the noise that human analysts miss. The AI does not know what those signals mean in physical terms. It just knows that when they appear, an earthquake is more likely.
The same pattern shows up in climate modeling. Traditional climate models divide the planet into grid cells and solve equations for each cell. The resolution is limited by computing power. Most global climate models use grid cells that are 100 kilometers wide. That is fine for predicting average global temperature. It is useless for predicting where a thunderstorm will form in the next hour. Deep learning models, trained on high resolution weather radar data, can now forecast precipitation at the street level, minutes in advance.
Zhao and his colleagues review dozens of these applications. They note that machine learning techniques "have shown promise in addressing Earth science related questions" ranging from mineral exploration to landslide prediction to ocean current mapping (Zhao et al., 2024). The common thread is that AI excels at finding relationships in data that are too complex or too nonlinear for physics based equations to capture.
The Black Box Problem Is Real
But here is where the story gets complicated. The same properties that make AI powerful also make it dangerous.
A physics based model is transparent. If it predicts a flood, you can open the model and see exactly which equations produced that prediction. You can trace the causal chain: rainfall leads to soil saturation leads to runoff leads to river stage exceeds bank height. You can understand why the model made its prediction, even if the prediction turns out to be wrong.
An AI model is not transparent. A deep learning neural network has millions of parameters, all interacting in ways that are not interpretable by human intuition. The model might be 99 percent accurate at predicting landslides, but it cannot tell you why it made a specific prediction for a specific hillside. It is a black box.
Zhao and his coauthors are direct about this. They list the "black box nature of AI models" as one of the major challenges hindering their integration into geoscience (Zhao et al., 2024). This is not just an academic concern. If an AI model tells a city government to evacuate a neighborhood because of flood risk, and the flood does not happen, the city needs to know why. Was the model wrong because of bad data? Because of a rare weather pattern it had not been trained on? Or because it had learned a spurious correlation that had nothing to do with flooding?
Without interpretability, trust is fragile. And in geoscience, where decisions can involve billions of dollars and human lives, fragile trust is not enough.
The Hybrid That Might Save Us
The most interesting part of Zhao’s review is not the critique of physics based models or the celebration of data driven ones. It is the middle ground. The authors argue that the future of geoscience is not a choice between physics and AI. It is a fusion.
They call these hybrid models. The idea is simple: use physics based equations to constrain what the AI can learn, and use AI to fill in the gaps that the physics cannot handle.
Here is how it works in practice. Instead of training a deep learning model on raw satellite data and letting it learn everything from scratch, you start by embedding known physical laws into the model’s architecture. You tell the neural network that water flows downhill, that energy is conserved, and that pressure gradients drive wind. The AI does not have to rediscover these principles. They are baked in.
Then, within those constraints, the AI is free to learn the messy, nonlinear, data intensive parts of the system that physics cannot capture. The result is a model that is both physically consistent and data driven. It does not violate the laws of thermodynamics, but it can also predict the exact moment a thunderstorm will form over a specific neighborhood.
Zhao and colleagues describe these hybrid models as "an alternative paradigm" that "incorporates domain knowledge to guide AI methodologies." They report that these models "demonstrate enhanced efficiency and performance with reduced training data requirements" (Zhao et al., 2024). In other words, you get the best of both worlds: the reliability of physics and the pattern finding power of AI, with less data needed than a pure deep learning approach.
The Data Scarcity Trap
There is a catch, and it is a big one. Hybrid models still need data. And for many parts of the Earth, the data does not exist.
The authors flag "data scarcity" as a persistent challenge (Zhao et al., 2024). This is not about the total volume of data. It is about the type of data that AI models actually need. Deep learning systems require labeled examples. To train a model to predict landslides, you need thousands of images of hillsides, each one labeled with whether a landslide occurred and, ideally, when and why.
Getting that data is expensive. It means sending geologists into the field. It means installing monitoring equipment. It means waiting for landslides to happen, which could take years. For rare events, like major earthquakes or volcanic eruptions, the training data is vanishingly sparse.
This creates a chicken and egg problem. AI models work best when they have lots of data. But the geoscience phenomena we most want to predict, the rare and catastrophic ones, are precisely the ones where data is hardest to come by.
The authors do not have a clean solution. They point to techniques like transfer learning, where a model trained on one type of data is adapted to another, and to synthetic data generation, where simulations create artificial training examples. But they are honest that these are partial fixes, not complete solutions.
What the AI Does Not Know
It is worth pausing to consider what this research does not claim. The authors are not saying that AI will replace geoscientists. They are not saying that physics based models are obsolete. And they are certainly not saying that we can trust AI predictions without verification.
What they are saying is more nuanced. They are saying that the old way of doing geoscience, where every prediction had to be derived from first principles, is no longer the only way. AI offers a complementary approach. It can find patterns that physics cannot. It can process data at scales that humans cannot. And when combined with physical knowledge, it can produce predictions that are both accurate and physically plausible.
But the black box problem remains. The data scarcity problem remains. And the fundamental challenge of predicting a chaotic, nonlinear system like the Earth remains.
The open question, and it is a fascinating one, is whether hybrid models can overcome these limitations. Can we build AI systems that are both powerful and interpretable? Can we generate enough training data for rare events? And can we trust a machine to tell us when to evacuate a city?
What This Actually Means
The review by Zhao and colleagues is not a manifesto. It is a map. It shows where geoscience has been, where it is going, and where the obstacles lie. Here is what that map tells us, in practical terms:
- ▸Physics based models are not dead. They are being upgraded. The future of geoscience is hybrid models that embed physical laws into AI architectures. If you are a geoscientist, learning to build these hybrids is the most valuable skill you can develop.
- ▸Data scarcity is the bottleneck. The AI techniques exist. What is missing is high quality labeled data for rare events. Governments and research institutions should prioritize building open, well labeled datasets for earthquakes, landslides, floods, and volcanic eruptions. Without that data, the AI revolution in geoscience stalls.
- ▸Interpretability is not optional. For AI to be trusted in high stakes geoscience applications, the black box must open. Researchers should focus on developing explainable AI methods that can trace predictions back to specific features and physical mechanisms.
- ▸The best predictions will come from fusion, not competition. The most successful geoscience models of the next decade will not be pure physics or pure AI. They will be systems where physics constrains the AI and the AI fills the gaps in the physics. This is not a compromise. It is a synthesis.
- ▸The Earth is still surprising us. The fact that AI can find patterns in geoscience data that human experts missed suggests that our understanding of the planet is incomplete in ways we did not fully appreciate. The next great discovery in geoscience might not come from a new equation. It might come from a machine that saw something we were not looking for.
References
- [1]Tianjie Zhao, Sheng Wang, Chaojun Ouyang, Min Chen (2024). Artificial intelligence for geoscience: Progress, challenges, and perspectives. The InnovationDOI· 314 citations
