AI Safety Fundamentals: Alignment

Ein Podcast von BlueDot Impact

Podimo 60!!! Tage kostenlos! testen

Ein Universum voller exklusiver Podcasts und Hörbücher. Klicken Sie hier um loszulegen!

83 Folgen

Constitutional AI Harmlessness from AI Feedback
Vom: 19.7.2024
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Vom: 19.7.2024
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Vom: 19.7.2024
Chinchilla’s Wild Implications
Vom: 17.6.2024
Deep Double Descent
Vom: 17.6.2024
Intro to Brain-Like-AGI Safety
Vom: 17.6.2024
Eliciting Latent Knowledge
Vom: 17.6.2024
Toy Models of Superposition
Vom: 17.6.2024
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
Vom: 17.6.2024
Discovering Latent Knowledge in Language Models Without Supervision
Vom: 17.6.2024
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation
Vom: 17.6.2024
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions
Vom: 17.6.2024
Imitative Generalisation (AKA ‘Learning the Prior’)
Vom: 17.6.2024
An Investigation of Model-Free Planning
Vom: 17.6.2024
Low-Stakes Alignment
Vom: 17.6.2024
Gradient Hacking: Definitions and Examples
Vom: 17.6.2024
Empirical Findings Generalize Surprisingly Far
Vom: 17.6.2024
Compute Trends Across Three Eras of Machine Learning
Vom: 13.6.2024
Worst-Case Thinking in AI Alignment
Vom: 29.5.2024
Public by Default: How We Manage Information Visibility at Get on Board
Vom: 12.5.2024

1 / 5

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment