EA - AI Safety Seems Hard to Measure by Holden Karnofsky
The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund
Kategorien:
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Safety Seems Hard to Measure, published by Holden Karnofsky on December 11, 2022 on The Effective Altruism Forum.More detail on why AI could make this the most important century (Details not included in email - click to view on the web) Why would AI "aim" to defeat humanity? (Details not included in email - click to view on the web) How could AI defeat humanity? (Details not included in email - click to view on the web) Why are AI systems "black boxes" that we can't understand the inner workings of? (Details not included in email - click to view on the web) How could AI defeat humanity? (Details not included in email - click to view on the web) The Volkswagen emissions scandal (Details not included in email - click to view on the web)In previous pieces, I argued that there's a real and large risk of AI systems' developing dangerous goals of their own and defeating all of humanity - at least in the absence of specific efforts to prevent this from happening.A young, growing field of AI safety research tries to reduce this risk, by finding ways to ensure that AI systems behave as intended (rather than forming ambitious aims of their own and deceiving and manipulating humans as needed to accomplish them).Maybe we'll succeed in reducing the risk, and maybe we won't. Unfortunately, I think it could be hard to know either way. This piece is about four fairly distinct-seeming reasons that this could be the case - and that AI safety could be an unusually difficult sort of science.This piece is aimed at a broad audience, because I think it's important for the challenges here to be broadly understood. I expect powerful, dangerous AI systems to have a lot of benefits (commercial, military, etc.), and to potentially appear safer than they are - so I think it will be hard to be as cautious about AI as we should be. I think our odds look better if many people understand, at a high level, some of the challenges in knowing whether AI systems are as safe as they appear.First, I'll recap the basic challenge of AI safety research, and outline what I wish AI safety research could be like. I wish it had this basic form: "Apply a test to the AI system. If the test goes badly, try another AI development method and test that. If the test goes well, we're probably in good shape." I think car safety research mostly looks like this; I think AI capabilities research mostly looks like this.Then, I’ll give four reasons that apparent success in AI safety can be misleading.“Great news - I’ve tested this AI and it looks safe.†Why might we still have a problem? Problem Key question Explanation The Lance Armstrong problem Did we get the AI to be actually safe or good at hiding its dangerous actions? The King Lear problem The lab mice problem Today's "subhuman" AIs are safe.What about future AIs with more human-like abilities? The first contact problemWhen dealing with an intelligent agent, it’s hard to tell the difference between “behaving well†and “appearing to behave well.â€When professional cycling was cracking down on performance-enhancing drugs, Lance Armstrong was very successful and seemed to be unusually “clean.†It later came out that he had been using drugs with an unusually sophisticated operation for concealing them.The AI is (actually) well-behaved when humans are in control. Will this transfer to when AIs are in control?It's hard to know how someone will behave when they have power over you, based only on observing how they behave when they don't.AIs might behave as intended as long as humans are in control - but at some future point, AI systems might be capable and widespread enough to have opportunities to take control of the world entirely. It's hard to know whether they'll take these opportunities, and we can't exactly run a clean test of the situation.Like King Lear...
