EA - Strongest real-world examples supporting AI risk claims? by rosehadshar
The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund

Kategorien:
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Strongest real-world examples supporting AI risk claims?, published by rosehadshar on September 5, 2023 on The Effective Altruism Forum.[Manually cross-posted to LessWrong here.]There are some great collections of examples of things like specification gaming, goal misgeneralization, and AI improving AI. But almost all of the examples are from demos/toy environments, rather than systems which were actually deployed in the world.There are also some databases of AI incidents which include lots of real-world examples, but the examples aren't related to failures in a way that makes it easy to map them onto AI risk claims. (Probably most of them don't in any case, but I'd guess some do.)I think collecting real-world examples (particularly in a nuanced way without claiming too much of the examples) could be pretty valuable:I think it's good practice to have a transparent overview of the current state of evidenceFor many people I think real-world examples will be most convincingI expect there to be more and more real-world examples, so starting to collect them now seems goodWhat are the strongest real-world examples of AI systems doing things which might scale to AI risk claims?I'm particularly interested in whether there are any good real-world examples of:Goal misgeneralizationDeceptive alignment (answer: no, but yes to simple deception?)Specification gamingPower-seekingSelf-preservationSelf-improvementThis feeds into a project I'm working on with AI Impacts, collecting empirical evidence on various AI risk claims. There's a work-in-progress table here with the main things I'm tracking so far - additions and comments very welcome.Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org