Shopping News / Articles
Why Motivated Reasoning? — LessWrong
7+ hour, 43+ min ago (422+ words) There's a standard story which says roughly "motivated reasoning in humans exists because it is/was adaptive for negotiating with other humans". I do not think that story stands up well under examination; when I think of standard day-to-day examples…...
AI Safety at the Frontier: Paper Highlights of December 2025 — LessWrong
13+ hour, 8+ min ago (123+ words) Paper of the month: Visit AISafety.com to find ways of helping with AI safety! Read the paper [UK AISI, FAR.AI, Anthropic] Using probes to inspect a model's internal activations has emerged as a potential defense against scheming AI…...
Parameters Are Like Pixels — LessWrong
13+ hour, 52+ min ago (462+ words) More parameters = better model. So went the common misconception. After GPT-4.5, Llama 4, Nemotron-4, and many other "big models", I think most of you reading are already aware that the relationship between parameters and performance is not linear. I think very…...
How Much of AI Labs' Research Is Safety? — LessWrong
1+ day, 1+ hour ago (210+ words) [This is a cross-post from here. Find the code used to do the analysis here.]Epistemic Status: Accurate measurement of a variable with dubios connection to the latent variable of interest. OpenAI seems comparatively much better than it is credited…...
Analysing CoT alignment in thinking LLMs with low-dimensional steering — LessWrong
1+ day, 6+ hour ago (546+ words) I ran two separate experiments to find empirical evidence towards faithfulness and coherence behavioural editing. While I present some explanations for my results, I am by no means an experienced interpretability researcher, so some of my claims may need to…...
Playing Dumb: Detecting Sandbagging in Frontier LLMs via Consistency Checks — LessWrong
1+ day, 8+ hour ago (321+ words) Detecting sandbagging is difficult because we cannot simply compare a model's performance to its "true" capability level. Instead, we need indirect methods to identify when a model is intentionally underperforming. One approach is to test whether a model answers the…...
Language models resemble more than just language cortex, show neuroscientists — LessWrong
1+ day, 9+ hour ago (256+ words) In their new results, they found that signal correlations between model and brain region change significantly over the course of the 'training' process, where models are taught to autocomplete as many as trillions of elided words (or sub-words, known as…...
Schelling Coordination in LLMs: A Review — LessWrong
1+ day, 11+ hour ago (1119+ words) This blogpost summarises the findings of my (lite) systematic literature review of Schelling coordination in LLMs that I undertook as part of the Apart Research Fellowship. If LLMs can identify and exploit Schelling points, they might coordinate without leaving observable…...
The Reality of Wholes: Why the Universe Isn’t Just a Cellular Automaton — LessWrong
1+ day, 12+ hour ago (1680+ words) ~Qualia of the Day: PageRank Monadology~ Before diving in, let me be explicit about what a successful theory of consciousness needs to explain, at minimum (cf. Breaking Down the Problem of Consciousness): The framework I'm sketching here, building on David…...
We need a better way to evaluate emergent misalignment — LessWrong
3+ day, 11+ hour ago (964+ words) [Link dump, feel free to skip] Emergent misalignment (Betley et al. 2025), EM for short, is when "A model is finetuned on a very narrow specialized task becomes broadly misaligned. This is first discovered with a LLM SFT'ed on examples of…...
Shopping
Please enter a search for detailed shopping results.