News
Unprecedented Catastrophes Have Non-Canonical Probabilities — LessWrong
4+ hour, 21+ min ago (1309+ words) Don't worry! This post is none of those things. (If you want to double-check, feel free to skip down to the'Why Is This Different section.) And to be clear at the outset: I am'not anti-Bayesian, and I definitely think quantifying…...
Mechanistic Interpretability of Biological Foundation Models — LessWrong
4+ hour, 43+ min ago (1374+ words) So the question becomes practical: can we actually extract meaningful biological knowledge from these models using mechanistic interpretability tools? That is what I spent the last months trying to find out, and the answer is more nuanced than either the…...
AI for societal decision making - How promising is the space? 80,000 Hours profile — LessWrong
9+ hour, 17+ min ago (307+ words) Hi everyone, Zershaaneh here! 80,000 Hours has published an article'on using AI to improve societal decision making. This post includes'some context, the summary from the article, and the table of contents with links to each section. This is meant to be…...
Human Fine-Tuning — LessWrong
12+ hour, 24+ min ago (1850+ words) We constantly change, as time passes and we experience the world. We learn and we forget.We get addicted and traumatised.We build habits and lose them.We discover new facets of reality, and start ignoring them.Our personality changes....
Flamingos (among other things) reduce emergent misalignment — LessWrong
1+ day, 3+ hour ago (329+ words) Work conducted as part of Neel Nanda's MATS 10.0 exploration phase. Emergent Misalignment (Betley et al. (2025b)) is a phenomenon in which training language models to exhibit some kind of narrow misbehavior induces a surprising degree of generalization, making the model become…...
Aleatoric Uncertainty Is A Skill Issue — LessWrong
1+ day, 9+ hour ago (537+ words) Epistemic status: shitpost with a point You know the textbook distinction. Epistemic uncertainty is what you don't know. Aleatoric uncertainty is what can't be known " irreducible randomness baked into the fabric of reality itself. Classic examples of aleatoric uncertainty: coin…...
How much superposition is there? — LessWrong
2+ day, 8+ hour ago (726+ words) Written as part of MATS 7.1. Math by Claude Opus 4.6. I know that models are able to represent exponentially more concepts than they have dimensions by engaging in superposition (representing each concept as a direction, and allowing those directions to overlap…...
Already Optimized — LessWrong
2+ day, 12+ hour ago (1767+ words) A Harry Potter fanfiction. Based on the world of "Harry Potter and the Methods of Rationality" by Eliezer Yudkowsky, diverging from canon. On Friday evening, buoyed by the week's successes and looking for a specific reference on crystalline wand cores…...
Distinguish between inference scaling and "larger tasks use more compute" — LessWrong
1+ week, 2+ day ago (477+ words) However, it's important to distinguish between two reasons inference cost is going up: To understand this, it's helpful to think about the Pareto frontier of budget versus time-horizon. I'll denominate this in 50% reliability time-horizon. [1] Here is some fake data to…...
Monitor Jailbreaking: Evading Chain-of-Thought Monitoring WithoutEncoded Reasoning — LessWrong
1+ week, 2+ day ago (1407+ words) A key concern about chain-of-thought monitoring is that optimization pressure on the CoT during RL could drive models toward encoded reasoning, where models reason in ways that are not readable or that look like innocuous text (steganography). If a model…...