EA - Implications of the Whitehouse meeting with AI CEOs for AI superintelligence risk - a first-step towards evals? by Jamie Bernardi
The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund
Kategorien:
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Implications of the Whitehouse meeting with AI CEOs for AI superintelligence risk - a first-step towards evals?, published by Jamie Bernardi on May 7, 2023 on The Effective Altruism Forum.IntroducionOn Wednesday 4th May, Sam Altman (Open AI) and Dario Amodei (Anthropic) - amongst others - met with US Vice President Kamala Harris (with a drop-in from President Joe Biden), to discuss the dangers of AI.Announcement | Fact sheet | EA Forum linkpostI spent about 2 hours trying to understand what happened, who was involved, and what its possible implications for superintelligence risk.I decided to make this post for two reasons:I am practising writing and developing my opinions on AI strategy (so feedback is very welcome, and you should treat my epistemic status as ‘new to this’!)I think demystifying the facts of the announcement and offering some tentative conclusions will positively contribute to the community's understanding of AI-related political developments.My main conclusionsThree announcements were made, but the announcement on public model evaluations involving major AI labs seemed most relevant and actionable to me.My two actionable conclusions are:I think folks with technical alignment expertise should consider attending DEF CON 31 if it’s convenient, to help shape the conclusions from the event.My main speculative concern is that this evaluation event could positively associate advanced AI and the open source community. As far as those that feel the downside of model proliferation outweighs the benefits of open sourcing, spreading this message in a more focused way now may be valuable.Summary of the model evaluations announcementThis is mostly factual, and I’ve flagged where I’m offering my interpretation. Primary source: AI village announcement.There’s going to be an evaluation platform made available during a conference called DEF CON 31. DEF CON 31 is the 31st iteration of DEF CON, “the world’s largest security conferenceâ€, taking place in Los Angeles on 10th August 2023. The platform is being organised by a subcommunity at that conference called the AI village.The evaluation platform will be provided by Scale AI. The platform will provide “timed access to LLMs†via laptops available at the conference, and attendees will red-team various models by injecting prompts. I expect that the humans will then rate the output of the model as good or bad, much like on the ChatGPT platform. There’s a points-based system to encourage participation, and the winner will win a “high-end Nvidia GPUâ€.The intent of this whole event appears to be to collect adversarial data that the AI organisations in question can use and 'learn from' (and presumably do more RLHF on). The orgs that signed up include: Anthropic, Google, Hugging Face, Microsoft, NVIDIA, OpenAI, and Stability AI.It seems that there won’t be any direct implications for the AI organisations. They will, by default, be allowed to carry on as normal no matter what is learned at the event.I’ll provide more details on what has happened after the takeaways section.Takeaways from the Whitehouse announcement on model evaluationsI prioritised communicating my takeaways in this section. If you want more factual context to understand exactly what happened and who's involved- see the section below this one.For the avoidance of doubt, the Whitehouse announcement on the model evaluation event doesn’t come with any regulatory teeth.I don’t mean that as a criticism necessarily; I’m not sure anyone has a concrete proposal for what the evaluation criteria should even be, or how they should be enforced, etc, so it’d be too soon to see an announcement like that.That does mean I’m left with the slightly odd conclusion that all that’s happened is the Whitehouse has endorsed a community red-teaming event at a con...
