EA - The current alignment plan, and how we might improve it | EAG Bay Area 23 by Buck

The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund

Kategorien:

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The current alignment plan, and how we might improve it | EAG Bay Area 23, published by Buck on June 8, 2023 on The Effective Altruism Forum.I gave a talk at EAG SF where I tried to describe my current favorite plan for how to align transformatively powerful AI, and how I thought that this plan informs current research prioritization. I think this talk does a reasonable job of representing my current opinions, and it was a great exercise for me to write it.I have one major regret about this talk, which is that my attitude towards the risks associated with following the proposed plan now seems way too blasé to me. I think that if labs deploy transformative AI in the next ten years after following the kind of plan I describe, there's something like 10-20% chance that this leads to AI takeover.Obviously, this is a massive amount of risk. It seems totally unacceptable to me for labs to unilaterally impose this kind of risk on the world, from both the longtermist perspective and also from any common-sense perspective. The potential benefits of AGI development are massive, but not massive enough that it’s worth accepting massive risk to make the advent of AGI happen a decade sooner.I wish I'd prefixed the talk by saying something like: "This is not the plan that humanity deserves. In almost all cases where AI developers find themselves in the kind of scenario I described here, I think they should put a lot of their effort into looking for alternative options to deploying their dangerous models with just the kinds of safety interventions I was describing here. From my perspective, it only makes sense to follow this kind of plan if the developers are already in exceptionally dire circumstances which were already a massive failure of societal coordination."So if I think that this plan is objectively unacceptable, why did I describe it?Substantially, it's just that I had made the unforced error of losing sight of how objectively terrible a 10-20% chance of AI takeover is.This is partially because my estimate of x-risk from AI used to be way higher, and so 10-20% intuitively feels pretty chill and low to me, even though obviously it's still objectively awful. I spend a lot of time arguing with people whose P(AI takeover) is way higher than mine, and so I kind of naturally fall into the role of trying to argue that a plan is more likely to succeed than you might have thought.I realized my error here mostly by talking to Ryan Greenblatt and Beth Barnes. In particular, Beth is focused on developing safety standards that AI labs follow when developing and deploying their AI systems; plans of the type that I described in this talk are probably not able to be robust enough that they would meet the safety standards that Beth would like to have in place.I think of my job as an alignment researcher as trying to do technical research such that AI developers are able to make the best of whatever empirical situation they find themselves in; from this perspective, it’s someone else’s job to reduce the probability that we end up in a situation where the alignment techniques are under a lot of strain. And so the fact that this situation is objectively unacceptable is often not really on my mind.Since this talk, I’ve also felt increasingly hopeful that it will be possible to intervene such that labs are much more careful than I was imagining them being in this talk, and I think it’s plausible that people who are concerned about AI takeover should be aiming for coordination such that we’d have an overall 1% chance of AI takeover rather than just shooting for getting to ~10%. The main levers here are:Go slower at the end. A lot of the risk of developing really powerful AI comes from deploying it in high-leverage situations before you’ve had enough time to really understand its propert...

Visit the podcast's native language site