Best AI papers explained

Ein Podcast von Enoch H. Kang

Podimo 90!!! Tage kostenlos! testen

Ein Universum voller exklusiver Podcasts und Hörbücher. Klicken Sie hier um loszulegen!

515 Folgen

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling
Vom: 27.5.2025
RL with KL penalties is better viewed as Bayesian inference
Vom: 27.5.2025
Asymptotics of Language Model Alignment
Vom: 27.5.2025
Qwen 2.5, RL, and Random Rewards
Vom: 27.5.2025
Theoretical guarantees on the best-of-n alignment policy
Vom: 27.5.2025
Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models
Vom: 27.5.2025
Improved Techniques for Training Score-Based Generative Models
Vom: 27.5.2025
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
Vom: 27.5.2025
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Vom: 27.5.2025
Harnessing the Universal Geometry of Embeddings
Vom: 27.5.2025
Goal Inference using Reward-Producing Programs in a Novel Physics Environment
Vom: 27.5.2025
Trial-Error-Explain In-Context Learning for Personalized Text Generation
Vom: 27.5.2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Vom: 27.5.2025
Test-Time Reinforcement Learning (TTRL)
Vom: 27.5.2025
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Vom: 26.5.2025
Agentic Reward Modeling_Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Vom: 26.5.2025
Beyond Reward Hacking: Causal Rewards for Large LanguageModel Alignment
Vom: 26.5.2025
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
Vom: 26.5.2025
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Vom: 26.5.2025
UFT: Unifying Supervised and Reinforcement Fine-Tuning
Vom: 26.5.2025

13 / 26

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

Visit the podcast's native language site

515 Folgen

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

RL with KL penalties is better viewed as Bayesian inference

Asymptotics of Language Model Alignment

Qwen 2.5, RL, and Random Rewards

Theoretical guarantees on the best-of-n alignment policy

Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models

Improved Techniques for Training Score-Based Generative Models

Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Harnessing the Universal Geometry of Embeddings

Goal Inference using Reward-Producing Programs in a Novel Physics Environment

Trial-Error-Explain In-Context Learning for Personalized Text Generation

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Test-Time Reinforcement Learning (TTRL)

Interpreting Emergent Planning in Model-Free Reinforcement Learning

Agentic Reward Modeling_Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Beyond Reward Hacking: Causal Rewards for Large LanguageModel Alignment

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval

UFT: Unifying Supervised and Reinforcement Fine-Tuning