EA - “The Race to the End of Humanity” – Structural Uncertainty Analysis in AI Risk Models by Froolow

The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund

Podcast artwork

Kategorien:

In very broad terms, models can be thought of as a collection of parameters, and instructions for how we organise those parameters to correspond to some real-world phenomenon of interest. These instructions are described as the model’s ‘structure’, and include decisions like what parameters will be used to analyse the behaviour of interest and how those parameters will interact with each other. Because I am a bit of a modelling nerd I like to think of the structure as being the model’s ontology – the sorts of things need to exist within the parallel world of the model in order to answer the question we’ve set for ourselves. The impact of structure on outcomes is often under-appreciated; the structure can obviously constrain the sorts of questions you can ask, but the structure can also embody subtle differences in how the question is framed, which can have an important effect on outcomes.Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: “The Race to the End of Humanity” – Structural Uncertainty Analysis in AI Risk Models, published by Froolow on May 20, 2023 on The Effective Altruism Forum. Published on May 19, 2023 12:03 PM GMTSummaryThis is an entry into the Open Philanthropy AI Worldview Contest. It investigates the risk of Catastrophe due to an Out-of-Control AI. It makes the case that that model structure is a significant blindspot in AI risk analysis, and hence there is more theoretical work that needs to be done on model structure before this question can be answered with a high degree of confidence.The bulk of the essay is a ‘proof by example’ of this claim – I identify a structural assumption which I think would have been challenged in a different field with a more established tradition of structural criticism, and demonstrate that surfacing this assumption reduce the risk of Catastrophe due to Out-of-Control (OOC) AI by around a third. Specifically, in this essay I look at what happens if we are uncertain about the timelines of AI Catastrophe and Alignment, allowing them to occur in any order.There is currently only an inconsistent culture of peer reviewing structural assumptions in the AI Risk community, especially in comparison to the culture of critiquing parameter estimates. Since models can only be as accurate as the least accurate of these elements, I conclude that this disproportionate focus on refining parameter estimates places an avoidable upper limit on how accurate estimates of AI Risk can be. However, it also suggests some high value next steps to address the inconsistency, so there is a straightforward blueprint for addressing the issues raised in this essay.The analysis underpinning this result is available in this spreadsheet. The results themselves are displayed below. They show that introducing time dependency into the model reduces the risk of OOC AI Catastrophe from 9.8% to 6.7%:My general approach is that I found a partially-complete piece of structural criticism on the forums here and then implemented it into a de novo predictive model based on a well-regarded existing model of AI Risk articulated by Carlsmith (2021). If results change dramatically between the two approaches then I will have found a ‘free lunch’ – value that can be added to the frontier of the AI Risk discussion without me actually having to do any intellectual work to push that frontier forward. Since the results above demonstraste quite clearly that the results have changed, I conclude that work on refining parameters has outpaced work on refining structure, and that ideally there would be a rebalancing of effort to prevent such ‘free lunches’ from going unnoticed in the future.I perform some sensitivity analysis to show that this effect is plausible given what we know about community beliefs about AI Risk. I conclude that my amended model is probably more suitable than the standard approach taken towards AI...

Visit the podcast's native language site