The Illusion Of AI’s Existential Risk

In sum, AI acting on its own cannot induce human extinction in any of the ways that extinctions have happened in the past. Appeals to the competitive nature of evolution or previous instances of a more intelligent species causing the extinction of a less intelligent species reflect a common mischaracterization of evolution by natural selection.

Could artificial intelligence (AI) soon get to the point where it could enslave us? An Amii colleague sent me to this sensible article, The Illusion Of AI’s Existential Risk that argues that it is extremely unlikely that an AI could evolve to the point where it could manipulate us and prevent us from turning it off. One of the points they make is that the situation is completely different from past extinctions.

Our safety is the topic of Brian Christian’s excellent The Alignment Problem book which talks about different approaches to developing AIs so they are aligned with our values. An important point made by Stuart Russell and quoted in the book is that we don’t want AIs to have the same values as us, we want them to value our having values and to pay attention to our values.

This raises the question of how an AI might know what we value. One approach is Constitutional AI where we train ethical AIs on a constitution that captures our values and then use it to model others.

One of the problems, however, with ethics is that human ethics isn’t simple and may not be something one can capture in a constitution. For this reason another approach is Inverse Reinforcement Learning (IRL) where were ask an AI to infer our values from a mass of evidence of ethical discourse and behaviour.

My guess is that this is what they are trying at OpenAI in their Superalignment project. Imagine an ethical surveillance project that uses IRL to develop a (black) moral box which can be used to train AIs to be aligned. Imagine if it could be tuned to different community ethics?