EA - Counterarguments to the basic AI risk case by Katja Grace
The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund
Kategorien:
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Counterarguments to the basic AI risk case, published by Katja Grace on October 14, 2022 on The Effective Altruism Forum. This is cross-posted from the AI Impacts blog This is going to be a list of holes I see in the basic argument for existential risk from superhuman AI systems1. To start, here’s an outline of what I take to be the basic case2: I. If superhuman AI systems are built, any given system is likely to be ‘goal-directed’ Reasons to expect this: Goal-directed behavior is likely to be valuable, e.g. economically. Goal-directed entities may tend to arise from machine learning training processes not intending to create them (at least via the methods that are likely to be used). ‘Coherence arguments’ may imply that systems with some goal-directedness will become more strongly goal-directed over time. II. If goal-directed superhuman AI systems are built, their desired outcomes will probably be about as bad as an empty universe by human lights Reasons to expect this: Finding useful goals that aren’t extinction-level bad appears to be hard: we don’t have a way to usefully point at human goals, and divergences from human goals seem likely to produce goals that are in intense conflict with human goals, due to a) most goals producing convergent incentives for controlling everything, and b) value being ‘fragile’, such that an entity with ‘similar’ values will generally create a future of virtually no value. Finding goals that are extinction-level bad and temporarily useful appears to be easy: for example, advanced AI with the sole objective ‘maximize company revenue’ might profit said company for a time before gathering the influence and wherewithal to pursue the goal in ways that blatantly harm society. Even if humanity found acceptable goals, giving a powerful AI system any specific goals appears to be hard. We don’t know of any procedure to do it, and we have theoretical reasons to expect that AI systems produced through machine learning training will generally end up with goals other than those they were trained according to. Randomly aberrant goals resulting are probably extinction-level bad for reasons described in II.1 above. III. If most goal-directed superhuman AI systems have bad goals, the future will very likely be bad That is, a set of ill-motivated goal-directed superhuman AI systems, of a scale likely to occur, would be capable of taking control over the future from humans. This is supported by at least one of the following being true: Superhuman AI would destroy humanity rapidly. This may be via ultra-powerful capabilities at e.g. technology design and strategic scheming, or through gaining such powers in an ‘intelligence explosion‘ (self-improvement cycle). Either of those things may happen either through exceptional heights of intelligence being reached or through highly destructive ideas being available to minds only mildly beyond our own. Superhuman AI would gradually come to control the future via accruing power and resources. Power and resources would be more available to the AI system(s) than to humans on average, because of the AI having far greater intelligence. Below is a list of gaps in the above, as I see it, and counterarguments. A ‘gap’ is not necessarily unfillable, and may have been filled in any of the countless writings on this topic that I haven’t read. I might even think that a given one can probably be filled. I just don’t know what goes in it. This blog post is an attempt to run various arguments by you all on the way to making pages on AI Impacts about arguments for AI risk and corresponding counterarguments. At some point in that process I hope to also read others’ arguments, but this is not that day. So what you have here is a bunch of arguments that occur to me, not an exhaustive literature review. Counterarguments A. Contra “...
