EA - An Introduction to Critiques of prominent AI safety organizations by Omega
The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund

Kategorien:
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An Introduction to Critiques of prominent AI safety organizations, published by Omega on July 19, 2023 on The Effective Altruism Forum.What is this series (and who are we)?This is a series of evaluations of technical AI safety (TAIS) organizations. We evaluate organizations that have received more than $10 million per year in funding and that have had limited external evaluation.The primary authors of this series include one technical AI safety researcher (>4 years experience), and one non-technical person with experience in the EA community. Some posts also have contributions from others with experience in technical AI safety and/or the EA community.This introduction was written after the first two posts in the series were published. Since we first started working on this series we have updated and refined our process for evaluating and publishing critiques, and this post reflects our present views.Why are we writing this series?Recently, there has been more attention on the field of technical AI safety (TAIS), meaning that many people are trying to get into TAIS roles. Without knowing significant context about different organizations, new entrants to the field will tend to apply to TAIS organizations based on their prominence, which is largely related to factors such as total funding, media coverage, volume of output, etc, rather than just the quality of their research or approach. Much of the discussion we have observed about TAIS organizations, especially criticisms of them, happens behind closed doors, in conversations that junior people are usually not privy to. We wish to help disseminate this information more broadly to enable individuals to make a better informed decision.We focus on evaluating large organizations, defined as those with more than $10 million per year in funding. These organizations are amongst the most visible and tend to have a significant influence on the AI safety ecosystem by virtue of their size, making evaluation particularly important. Additionally, these organizations would only need to dedicate a small fraction of their resources to engaging with these criticisms.How do we evaluate organizations?We believe that an organization should be graded on multiple metrics. We consider:Research outputs: How much good quality research has the organization published? This is the area where we put the most weight.Research agenda: Does the organization's research plan seem likely to bear fruit?Research team: What proportion of researchers are senior/experienced? What is the leadership's experience in ML and safety research? Are the leaders trustworthy? Are there conflicts of interest?Strategy and governance: What corporate governance structures are in place? Does the organization have independent accountability? How transparent is it? The FTX crisis has shown how important this can be.Organizational culture and work environment: Does the organization foster a good work environment for their team? What efforts has the organization made to improve its work culture?When evaluating research outputs, we benchmark against high-quality existing research, and against academia. Although academic AIS research is not always the most novel or insightful, there are strong standards for rigor in academia that we believe are important. Some existing research that we think is exceptional include:Eliciting latent knowledge (ARC)Iterated Distillation and Amplification (Paul Christiano)Constitutional AI (Anthropic)Trojan Detection Competition (CAIS)Causal scrubbing (Redwood)Toy models of superposition (Anthropic)Our thoughts on hits-based research agendasWhen we criticized Conjecture's output, commenters suggested that we were being unfair, because Conjecture is pursuing a hits-based research agenda, and this style of research typically takes...