EA - OpenAI's massive push to make superintelligence safe in 4 years or less (Jan Leike on the 80,000 Hours Podcast) by 80000 Hours
The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund

Kategorien:
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI's massive push to make superintelligence safe in 4 years or less (Jan Leike on the 80,000 Hours Podcast), published by 80000 Hours on August 9, 2023 on The Effective Altruism Forum.We just published an interview: Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less. You can click through for the audio, a full transcript, and related links. Below are the episode summary and some key excerpts.Episode summaryIf you're thinking about how do you align the superintelligence - how do you align the system that's vastly smarter than humans? - I don't know. I don't have an answer. I don't think anyone really has an answer.But it's also not the problem that we fundamentally need to solve. Maybe this problem isn't even solvable by humans who live today. But there's this easier problem, which is how do you align the system that is the next generation? How do you align GPT-N+1? And that is a substantially easier problem.Jan LeikeIn July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort.Today's guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, ".the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. . Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue."Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it's not just throwing compute at the problem - it's also hiring dozens of scientists and engineers to build out the Superalignment team.Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains:Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on. and I think it's pretty likely going to work, actually. And that's really, really wild, and it's really exciting. It's like we have this hard problem that we've been talking about for years and years and years, and now we have a real shot at actually solving it. And that'd be so good if we did.Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain.The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy - as one person described it, "like using one fire to put out another fire."But Jan's thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves.And there's an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep.Jan doesn't want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to...