EA - Reflections on my first year of AI safety research by Jay Bailey

The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund

Kategorien:

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reflections on my first year of AI safety research, published by Jay Bailey on January 9, 2024 on The Effective Altruism Forum.Last year, I wrote apost about my upskilling in AI alignment. To this day, I still get people occasionally reaching out to me because of this article, to ask questions about getting into the field themselves. I've also had several occasions to link people to the article who asked me about getting into the field from other means, like my local AI Safety group.Essentially, what this means is that people clearly found this useful (credit to the EA Forum for managing to let the article be findable to those who need it, a year after its publication!) and therefore people would likely find a sequel useful too! This post is that sequel, but reading the first post is not necessary to read this one.The major lesson of this post is this: I made a ton of mistakes, but those mistakes taught me things. By being open to that feedback and keeping my eye on the ball, I managed to find work that suited me in the field eventually. Just like the previous post, I'm happy to answer more questions via PM or in the comments.It's worth noting, this isn't a bold story of me getting a ton of stuff done. Most of the story, by word count, is me flailing around unsure of what to do and making a lot of mistakes along the way. I don't think you'll learn a lot about how to be a good researcher from this post, but I hope you might learn some tips to avoid being a bad one.SummaryI was a software engineer for 3-4 years with little to no ML experience before I was accepted for my initial upskilling grant. (More details are in myinitial post)I attendedSERI MATS, working on aligning language models under Owain Evans. Due to a combination of factors, some my fault and some not, I don't feel like I got a great deal of stuff done.I decided to pivot away from evals towards mechanistic interpretability since I didn't see a good theory of change for evals - this was two weeks before GPT-4 came out and the whole world sat up and took notice. Doh!After upskilling in mechanistic interpretability, I struggled quite a bit with the research. I eventually concluded that it wasn't for me, but was already funded to work on it. Fortunately I had a collaborator, and eventually I wound up using my engineering skills to accelerate his research instead of trying to contribute to the analysis directly.After noticing my theory of change for evals had changed now that governments and labs were committing to red-teaming, I applied for some jobs in the space. I received an offer to work in the UK's task force, which I accepted.List of LessonsIt's important to keep in mind two things - your theory of change for how your work helps reduce existential risk, and your comparative advantage in the field. These two things determined what I should work on, and keeping them updated was crucial for me finding a good path in the end.Poor productivity is more likely to be situational than you might think, especially if you're finding yourself having unusual difficulty compared to past projects or jobs. It's worth considering how your situation might be tweaked before blaming yourself.Trying out different subfields is useful, but don't be afraid to admit when one isn't working out as well as you'd like. See the first lesson.If you're going to go to a program like SERI MATS, do so because you have a good idea of what you want, not just because it's the thing to do or it seems generically helpful. I'm not saying you can't do such a program for that reason, but it is worth thinking twice about it.It is entirely possible to make mistakes, even several of them, and still wind up finding work in the field. There is no proper roadmap, everyone needs to figure things out as they go. While it's worth having...

Visit the podcast's native language site