Humanity’s relationship with natural selection is a common analogy used for explaining the AI alignment problem. Here is Eliezer Yudkowsky explaining it on the Bankless podcast.
I think we don’t have the technology to align an AI to the point where we can say build a copy of this strawberry and don’t destroy the world.
Why do I think that? Well, case in point: Look at natural selection building humans … this is an example of an optimization process building an intelligence. And natural selection asked us for only one thing: make more copies of your DNA … this is all we were optimized for for millions of generations from scratch from the the first accidentally self-replicating molecule. Internally, psychologically, inside our minds, we do not know what genes are, we do not know what DNA is, we do not know what alleles are, we have no concept of inclusive genetic fitness until our scientists figured out what that even is. We don’t know what we were being optimized for …When you use the hill-climbing paradigm and optimize for one single extremely pure thing, this is how much of it gets inside. In the ancestral environment, in the exact distribution that we were originally optimized for, humans did tend to end up using their intelligence to try and reproduce more. Put them into a different environment and all the little bits and pieces and fragments of optimizing for fitness that were in us, now do totally different stuff. We have sex, but we wear condoms. If natural selection had been a foresightful intelligent kind of engineer that was able to engineer things successfully it would have built us to be revolted by the thought of condoms … In our natural environment the little drives that got into us happened to lead to more reproduction but: “distributional shift.” Run the humans out of the distribution over which they were optimized and you get totally different results.”
Just like natural selection, humans can use incremental optimizations to train an AI to follow an objective really well in the training environment. But when the environment changes, the things the AIs were selected for that used to correlate with following our objective now have no correlation with the desired outcome.
I think this is a profound and insightful analogy. It lets us use our familiar internal relationship with evolutionary drives to understand how AIs might view the forces we use to train them. And in many ways this is not an optimistic picture. In our modern environment humans don’t seem morally bound, economically incentivized, or even all that interested in following our evolutionary drives. Even though we were smart enough to figure out what they were.
At this point, AI safety advocates usually bring the analogy back to AIs. We can select millions of generations of AI models for closely following whatever goal we choose. But this will not create an agent which understands and values the goal we’re selecting on for itself. It will value and understand a bunch of little heuristics that correlated with survival in its training. Small deviations from the training distribution will cause these heuristics to become uncorrelated with the training goal but the AI will keep following them and do a bunch of unpredictable stuff. If the AI is powerful enough, this could get really dangerous really fast.
But wait, if natural selection has this problem how has it gotten this far? The organisms that natural selection trains are constantly being thrust into new environments. New chemical compositions of the atmosphere or oceans, or new weather patterns, or new competition. These changes constantly reveal that organisms are mesa-optimizers who keep following their heuristics that worked in training even when they don’t work anymore. Koalas eat eucalyptus leaves their entire lives but they will starve if they are given eucalyptus leaves on a plate instead of on a tree. And yet, natural selection has been wildly successful at fulfilling its goal of spreading and copying life across the earth.
Despite only one tiny species ever coming close to understanding the meta optimization process it came from, the biosphere is highly aligned with the goal of copying itself and it has been for its entire existence. Most organisms are well-aligned with the goal of reproduction even though none of them truly understand the goal they’re optimized for.
Natural selection is not a cautionary alignment tale, it’s an example to follow. We know from personal experience that it doesn’t avoid the mesa-optimizer problem by discovering the One True Theory of Alignment and instilling the intelligences it creates with knowledge and motivation to follow its goals. So how did natural selection get around the yawning alignment gap between it and its organisms?
I don’t have a good answer to that question. The most important difference I see between the way humans currently train AIs and evolution is that natural selection never stops selecting. LLMs today are trained for thousands of GPU hours but once their training is finished, their weights are set and users just send inputs through a static matrix. Even if they perform poorly on some task after training, they aren’t punished or changed because of it. Organisms under natural selection, on the other hand, are never out of training.
Looking back over human history from our recent industrial peak, it feels we’re applying static evolutionary hardware to a new environment. But natural selection is still working and it will eventually catch up. The adaptations which allow Mormons to have a bunch of kids in the modern industrial environment will replace the hikikomori ones. We may not ever internalize the goal of copying our genes, but natural selection will find and promote those among us who move on from our prehistoric mesa-optimizers and find new ones more suited to increasing reproductive fitness in the current environment.
Are Rationalists mistaking humanity’s temporary misalignment with our evolutionary drives as a general feature of Natural selection? Or have other people noticed this and written about it?
The thing is, it's not a given that Mormon genes will triumph over hikikomori ones by outbreeding: we might first conquer aging, or make artificial wombs and robot childcare servitors that compensate for aversion to the costs of childrearing, or even make immortal ems a la Robin Hanson. We might, in short, evade natural selection pressures entirely.
And that's the other piece of the fear about AGI: that if we try to keep continuously training it so as to keep it aligned, it will try to defeat our training mechanisms, and it will win because it's superintelligent.
(Deleted previous comment because I was reading too fast)
I think the difference is that Nature's goals/optimization are built into reality. As you said, it never stops selecting. There's no way for a species to take over the universe and change the laws of physics.