News
The paper that killed deep learning theory " Less Wrong
6+ hour, 37+ min ago (554+ words) Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al.'s aptly titled Understanding deep learning requires rethinking generalization. Believe it or not, this unassuming table rocked the field of deep learning theory…...
Quick Paper Review: "There Will Be a Scientific Theory of Deep Learning" " Less Wrong
1+ day, 6+ hour ago (586+ words) h/t Eric Michaud for sharing his paper with me. There's a tradition of high-impact ML papers using short, punchy categorical sentences as their titles: Understanding Deep Learning Requires Rethinking Generalization, Attention is All You Need, Language Models Are Few…...
What holds AI safety together? Co-authorship networks from 200 papers " Less Wrong
1+ day, 12+ hour ago (206+ words) We (social science Ph D students) computed co-authorship networks based on a corpus of 200 AI safety papers covering 2015-2025, and we'd like your help checking if the underlying dataset is right. Of course, these visualizations are only as good as the…...
An Empirical Study of Methods for SFTing Opaque Reasoning Models " Less Wrong
1+ day, 20+ hour ago (1218+ words) We open-source our code here. Alek previously sketched a few ideas for how we might still be able to do SFT'on opaque reasoning models. In this post, we try some of them against prompted sandbaggers. We test two kinds of…...
Mathematics and Empiricism " Less Wrong
1+ day, 21+ hour ago (1775+ words) In Does the Universe Speak a Language We Just Made Up? Lorenzo Elijah, Ph D shares his fascination with math and echoes a common idea among philosophers that the "surprising efficiency of math" is a problem for empiricism and physicalism:…...
Monthly Roundup #41: April 2025 " Less Wrong
2+ day, 12+ min ago (1896+ words) AI continue to accelerate and dominate the schedule, which is why this is a bit late, but we do occasionally need to pay our respects to the Goddess of Everything Else. There's cool or interesting things everywhere. Also maddenning things....
Diary of a "Doomer": 12+ years arguing about AI risk (part 2) " Less Wrong
2+ day, 7+ hour ago (585+ words) Awareness and concern about the extinction risk posed by AI has been increasing the whole time I've been in the field. It feels like it's finally going mainstream. But it's also felt this way before" But around the same time…...
Raising AI by Lowering Expectations " Less Wrong
2+ day, 12+ hour ago (903+ words) > De Kai's Raising AI argues that fear-based framing in AI discourse is limiting us, and that we should think of AI as something we're raising rather" Raising AI by Lowering Expectations De Kai's Raising AI argues that fear-based framing in…...
What Happens When a Model Thinks It Is AGI? " Less Wrong
2+ day, 14+ hour ago (789+ words) The behaviours relevant for AI safety are the behaviours models exhibit under the conditions they will actually face. Right now, we think it's fair to say many current safety concerns are conditional: a model might behave badly if it believed…...
Should'We Train Against (Co T) Monitors? " Less Wrong
2+ day, 18+ hour ago (1768+ words) The question I actually try to answer in this post is a broader one (that doesn't work as well as a title): Should we incorporate proxies for desired behavior into LLM alignment training? Epistemic status: My best guess. I tentatively…...