A few weeks ago, Google released AlphaEvolve, a system designed for "general-purpose algorithm discovery" which has a far broader range of capabilities than previous systems such as AlphaGo. The potential for systems like this to revolutionize science is widely hyped in the AI industry.
Beyond these well-known systems, more bespoke AI and machine learning methods have already diffused widely across many different scientific disciplines. Experiences with these more narrowly-applicable methods have raised questions about whether the current AI hype is warranted. As I'll argue, these narrow methods are fundamentally different that what is possible with AlphaEvolve or other language-model-based AI systems, and it is a mistake to conflate the two as the same kind of thing. Really, "artificial intelligence" as a category is becoming so broad that it is losing its usefulness.
Thanks for reading! Subscribe for free to receive new posts and support my work.
Broadly speaking, I think there are two key types of AI that need to be distinguished, perhaps most simply using an example. Consider the problem of solving a differential equation. Most commonly, especially in applied fields such as mine (climate science), an AI-based approach would be to set up an e.g. neural network-based machine learning algorithm and then train the neural network from a large collection of input-output pairs, which in this example might be initial conditions and some final state. The training data must be generated using some other method such as a traditional numerical solver for differential equations. The machine learning algorithm "learns" to predict a solution organically without being specifically programed to do so, hence the term “artificial intelligence”.
The second way of using AI is entirely different and, crucially, is only just becoming feasible in the age of language models capable of reasoning such as Gemini which powers AlphaEvolve. A language model, being trained to solve problems in a more similar way to a human, is able to generate an algorithm to solve a problem as opposed to generating the solution directly. The algorithm might be machine-learning-based, but it is not necessarily. In the differential equation scenario, the end result may well be exactly the type of mathematical model that a human could have discovered, and therefore has the same potential for e.g. theoretical guarantees and physical understanding.
As a recent example of this new type of science, this paper used OpenAI's o3-mini model to discover an exact solution to a Potts model, a simplified mathematical model of a magnet. This is a case where the mathematical setup of the Potts model is very difficult to solve either analytically or numerically. The traditional route to using AI in this situation would be to set up an ML algorithm to predict approximate solutions based on numerical simulations. But the authors took a very different approach -- they obtained exact analytical solutions with the help of a language model. The traditional ML method would only obtain approximate solutions which cannot be verified and are, in all likelihood, completely opaque to human understanding. In contrast, the analytical solutions may be theoretically verified and offer valuable insights into the Potts model -- insights that humans can understand.
Although these two methods of using AI in sciences are in practice quite different in a qualitative way, there are historical reasons why both are called AI, and I suspect there are edge cases where it would not be possible to separate the two types cleanly. The basic similarity in the two approaches is that some type of ML algorithm -- usually a deep neural network -- is used somewhere. The difference is whether the neural network is trained to directly predict the solution or if it is trained to output an algorithm or explanation which then predicts the solution, as is the case for language models.
Before language models, one may reasonably have thought that a neural network would never be able to discover knowledge that is provably correct or even understandable to humans at all. But now, we can see that the neural network can be trained create an output that is expressed in the medium in which humans explain the world. This type of output is therefore likely to be much more useful, at least for the human project of understanding and explaining the world.
I think one useful way of seeing this distinction is to think of the output of a neural network as a form of "artificial intuition".1
Take a neural network for weather prediction as an example. These systems take weather maps at a given time as input and output the (forecasted) weather map at some later time, often 6 hours later. They are trained using a large number of initial/final weather map pairs. In the "artificial intuition" view, this would be like asking a human to look at a bunch of weather map pairs and then produce a forecast by simply guessing using their intuition - perhaps by simply drawing what they expect the weather map to look like in the future. Before computer simulations of weather were invented, weather forecasts looked something like this.2
However, training human forecasters to simply guess at what weather maps might look like was not the best way to use human intuition to forecast the weather. Instead, once computers were invented, a better use was to train human intuition on physics, allow humans to intuit sets of equations that correctly model the weather, and then simulate the weather based on those equations. In other words, today we do not use human intuition to directly generate the solution to the problem (the forecast). Intuition is used to create better computer simulations, and then it is the simulations which generate the forecast.3
So, the majority of past AI in science is the automated version of a human simply guessing about the solution to a problem. But this is often not the best way for humans to solve problems, so we should not expect artificial intuition to be the best use of AI for all scientific problems. Instead, many problems will be solved by using artificial intuition to create algorithms which create solutions.
This should also make us more hopeful about being able to understand the solutions that AI systems discover. It is famously difficult to understand how artificial intuition systems (e.g. neural networks) solve problems. It's also very difficult for humans to explain how their intuition came up with an idea. But if AI starts discovering algorithms and explanations, the algorithms might be much easier to understand. One hopeful example, though we don't know the details, is a heuristic that AlphaEvolve discovered to more efficiently schedule jobs on Google's compute infrastructure. Apparently, it was still interpretable, debuggable, predictable, and easy to deploy. These are not adjectives typically used for artificial intuition systems such as those used for weather forecasts.
The term "artificial intelligence" should include any automated method that reproduces the actions of an intelligent being. The majority of past AI use in science has been limited to artificial intuition -- ML algorithms that directly output the solution to a problem, analogous to a human intuiting the solution. But artificial intelligence could also mean a system that outputs a method to obtain a solution, such as a novel (non-ML-based) algorithm, a mathematical equation, or a new type of explanation. Thus, although many past AI methods in science suffered from numerous drawbacks such as being narrowly applicable, approximate, and difficult to understand, these limitations will not necessarily be shared with systems just coming online -- such as AlphaEvolve -- which are able to produce explanations, rather than just solutions.
1This argument is sometimes used to explain why reasoning models outperform GPTs. As the analogy goes, models such as GPT-4 are the equivalent of what a human would be like if they simply said the first thing that comes to mind when asked a question. But, for hard questions, a human has another tool in their toolbox -- they can think for a while, considering various arguments generated by their intuition, before deciding on the best answer and then saying that. This is what "reasoning" models are trained to do. They create an invisible "chain of thought", which allows them to consider various options before outputting what they consider to be the best answer.
2They did not look exactly like this. Forecasters went through a chain of reasoning, identifying and classifying meteorological features such as fronts, and used knowledge of how e.g. frontal systems typically evolve. But they did draw maps.
3The analogy is not perfect as there remains some element of human intuition in the best forecasts. However, purely computer-based forecast (for example, the National Blend of Models from NOAA) are quite good and certainly better than pre-computer forecasts.