EA - There are no coherence theorems by EJT

The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund

Podcast artwork

Kategorien:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: There are no coherence theorems, published by EJT on February 20, 2023 on The Effective Altruism Forum.IntroductionFor about fifteen years, the AI safety community has been discussing coherence arguments. In papers and posts on the subject, it’s often written that there exist 'coherence theorems' which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy. Despite the prominence of these arguments, authors are often a little hazy about exactly which theorems qualify as coherence theorems. This is no accident. If the authors had tried to be precise, they would have discovered that there are no such theorems.I’m concerned about this. Coherence arguments seem to be a moderately important part of the basic case for existential risk from AI. To spot the error in these arguments, we only have to look up what cited ‘coherence theorems’ actually say. And yet the error seems to have gone uncorrected for more than a decade.More detail below.Coherence argumentsSome authors frame coherence arguments in terms of ‘dominated strategies’. Others frame them in terms of ‘exploitation’, ‘money-pumping’, ‘Dutch Books’, ‘shooting oneself in the foot’, ‘Pareto-suboptimal behavior’, and ‘losing things that one values’ (see the Appendix for examples).In the context of coherence arguments, each of these terms means roughly the same thing: a strategy A is dominated by a strategy B if and only if A is worse than B in some respect that the agent cares about and A is not better than B in any respect that the agent cares about. If the agent chooses A over B, they have behaved Pareto-suboptimally, shot themselves in the foot, and lost something that they value. If the agent’s loss is someone else’s gain, then the agent has been exploited, money-pumped, or Dutch-booked. Since all these phrases point to the same sort of phenomenon, I’ll save words by talking mainly in terms of ‘dominated strategies’.With that background, here’s a quick rendition of coherence arguments:There exist coherence theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy.Sufficiently-advanced artificial agents will not pursue dominated strategies.So, sufficiently-advanced artificial agents will be ‘coherent’: they will be representable as maximizing expected utility.Typically, authors go on to suggest that these expected-utility-maximizing agents are likely to behave in certain, potentially-dangerous ways. For example, such agents are likely to appear ‘goal-directed’ in some intuitive sense. They are likely to have certain instrumental goals, like acquiring power and resources. And they are likely to fight back against attempts to shut them down or modify their goals.There are many ways to challenge the argument stated above, and many of those challenges have been made. There are also many ways to respond to those challenges, and many of those responses have been made too. The challenge that seems to remain yet unmade is that Premise 1 is false: there are no coherence theorems.Cited ‘coherence theorems’ and what they actually sayHere’s a list of theorems that have been called ‘coherence theorems’. None of these theorems state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue dominated strategies. Here’s what the theorems say:The Von Neumann-Morgenstern Expected Utility Theorem:The Von Neumann-Morgenstern Expected Utility Theorem is as follows:An agent can be represented as maximizing expected utility if and only if their preferences satisfy the following four axioms:Completeness: For all lotteries X and Y, X...

Visit the podcast's native language site