EA - A Defense of Work on Mathematical AI Safety by Davidmanheim
The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund

Kategorien:
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Defense of Work on Mathematical AI Safety, published by Davidmanheim on July 6, 2023 on The Effective Altruism Forum.AI Safety was, a decade ago, nearly synonymous with obscure mathematical investigations of hypothetical agentic systems. Fortunately or unfortunately, this has largely been overtaken by events; the successes of machine learning and the promise, or threat, of large language models has pushed thoughts of mathematics aside for many in the âAI Safetyâ community. The once pre-eminent advocate of this class of âagent foundationsâ research for AI safety, Eliezer Yudkowsky, has more recently said that timelines are too short to allow this agenda to have a significant impact. This conclusion seems at best premature.Foundational research is useful for prosaic alignmentFirst, the value of foundational and mathematical research can be synergistic with both technical progress on safety, and with insight into how and where safety is critical. Many machine learning research agendas for safety are investigating issues identified years earlier by foundational research, and are at least partly informed by that research. Current mathematical research could play a similar role in the coming years, as more funding and research are increasingly available for safety. We have also repeatedly seen the importance of foundational research arguments in discussions of policy, from Bostromâs book to policy discussions at OpenAI, Anthropic, and DeepMind. These connections may be more conceptual than direct, but they are still relevant.Long timelines are possibleSecond, timelines are uncertain. If timelines based on technical progress are short, many claim that we have years not decades until safety must be solved. But this assumes that policy and governance approaches fail, and that we therefore need a full technical solution in the short term. It also seems likely that short timelines make all approaches less likely to succeed. On the other hand, if timelines for technical progress are longer, fundamental advances in understanding, such as those provided by more foundational research, are even more likely to assist in finding or building more technical routes toward safer systems.Aligning AGI â aligning ASIThird, even if safety research is successful at âaligningâ AGI systems, both via policy and technical solutions, the challenges of ASI (Artificial SuperIntelligence) still loom large. One critical claim of AI-risk skeptics is that recursive self-improvement is speculative, so we do not need to worry about ASI, at least yet. They also often assume that policy and prosaic alignment is sufficient, or that approximate alignment of near-AGI systems will allow them to approximately align more powerful systems. Given any of those assumptions, they imagine a world where humans and AGI will coexist, so that even if AGI captures an increasing fraction of economic value, it wonât be fundamentally uncontrollable. And even according to so-called Doomers, in that scenario, for some period of time it is likely policy changes, governance, limited AGI deployment, and human-in-the-loop and similar oversight methods to limit or detect misalignment will be enough to keep AGI in check.This provides a stop-gap solution, optimistically for a decade or even two - a critical period - but is insufficient later. And despite OpenAIâs recent announcement that they plan to solve Superalignment, there are strong arguments that control of strongly superhuman AI systems will not be amenable to prosaic alignment, and policy-centric approaches will not allow control.Resource AllocationGiven the above claims, a final objection is based on resource allocation, in two parts. First, if language model safety was still strongly funding constrained, those areas would be higher leverage, and avenues of foundat...