EA - My lab's small AI safety agenda by Jobst Heitzig (vodle.it)
The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund

Kategorien:
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My lab's small AI safety agenda, published by Jobst Heitzig (vodle.it) on June 18, 2023 on The Effective Altruism Forum.My lab has started devoting some resources to AI safety work. As a transparency measure and to reach out, I here describe our approach.Overall ApproachI select small theoretical and practical work packages that...seem manageable in view of our very limited resources,match our mixed background in applied machine learning, game theory, agent-based modeling, complex networks science, dynamical systems theory, social choice theory, mechanism design, environmental economics, behvioural social science, pure mathematics, and applied statistics, andappear under-explored or neglected but promising or even necessary, according to our subjective assessment based on our reading of the literature and exchanges with individuals from applied machine learning, computer linguistics, AI ethics researchers, and most importantly, AI alignment researchers (you?).Initial ReasoningI believe that the following are likely to hold:We don't want the world to develop into a very low-welfare state.Powerful AI agents that optimizes for an objective not almost perfectly aligned with welfare can produce very low-welfare states.Powerful AI agents will emerge soon enough.It is impossible to specify sufficiently well what "welfare" means (welfare theorists have tried for centuries and still disagree, common people disagree even more).My puzzling conclusion from this is:We can't make sure that powerful AI agents optimize for an objective that is almost perfectly aligned with welfare.Hence we must try to prevent that any powerful AI agent optimizes for any objective whatsoever.Those of you who are Asimov fans like me might like the following...Six Laws of Non-OptimizingNever attempt to optimize your behavior with regards to any metric.Constrained by 1, don't cause suffering or do other harm.Constrained by 1-2, prevent other agents from violating 1. or 2.Constrained by 1-3, do what the stakeholders in your behavior would collectively decide you should do.Constrained by 1-4, cooperate with other agents.Constrained by 1-5, protect and improve yourself.Rather than trying to formalize this or even define the terms precisely, I just use them to roughly guide my work.When saying "optimize" I mean it in the strict mathematical sense: aiming to find an exact or approximate, local or global maximum or minimum of some function. When I mean mere improvements w.r.t. some metric, I just say "improve" rather than "optimize".AgendaWe currently slowly pursue two parallel approaches, the first related to laws 1,3,5 from above, the other related to law 4.Non-Optimizing AgentsExplore several novel variants of "satisficing" policies and related learning algorithms for POMDPs, produce corresponding non-optimizing versions of classical to state-of-the art tabular and ANN-based RL algorithms, and test and evaluate them in benchmark and safety-relevant environments from the literature, plus in tailormade environments for testing particular hypotheses. (Currently underway)Test them in near-term relevant application areas such as autonomous vehicles, via state-of-the-art complex simulation environments. (Planned with partner from autonomous vehicles research)Using our game-theoretical and agent-based modeling expertise, study them in multi-agent environments both theoretically and numerically.Design evolutionarily stable non-optimizing strategies for non-optimizing agents that cooperate with others to punish violations of law 1 in paradigmatic evolutionary games.Use our expertise in adaptive complex networks and dynamical systems theory to study dynamical properties of mixed populations of optimizing and non-optimizing agents: attractors, basins of attraction, their stability and...