Search Results

Shopping News / Articles

lesswrong.com
lesswrong.com > posts > GnatTWjdfCNn6hrFM > why-motivated-reasoning

Why Motivated Reasoning? — LessWrong

Why Motivated Reasoning? — LessWrong 7+ hour, 43+ min ago (422+ words) There's a standard story which says roughly "motivated reasoning in humans exists because it is/was adaptive for negotiating with other humans". I do not think that story stands up well under examination; when I think of standard day-to-day examples…...

lesswrong.com
lesswrong.com > posts > Z6Zz2vQHM5ReWqiZ5 > ai-safety-at-the-frontier-paper-highlights-of-december-2025

AI Safety at the Frontier: Paper Highlights of December 2025 — LessWrong

13+ hour, 8+ min ago (123+ words) Paper of the month: Visit AISafety.com to find ways of helping with AI safety! Read the paper [UK AISI, FAR.AI, Anthropic] Using probes to inspect a model's internal activations has emerged as a potential defense against scheming AI…...

lesswrong.com
lesswrong.com > posts > 9G4ss5ddGT7gjNPHf > parameters-are-like-pixels

Parameters Are Like Pixels — LessWrong

13+ hour, 52+ min ago (462+ words) More parameters = better model. So went the common misconception. After GPT-4.5, Llama 4, Nemotron-4, and many other "big models", I think most of you reading are already aware that the relationship between parameters and performance is not linear. I think very…...

lesswrong.com
lesswrong.com > posts > EfCdQeNBaeYtYH374 > how-much-of-ai-labs-research-is-safety

How Much of AI Labs' Research Is Safety? — LessWrong

1+ day, 1+ hour ago (210+ words) [This is a cross-post from here. Find the code used to do the analysis here.]Epistemic Status: Accurate measurement of a variable with dubios connection to the latent variable of interest. OpenAI seems comparatively much better than it is credited…...

lesswrong.com
lesswrong.com > posts > jhmbdpykjtCeYyjfa > analysing-cot-alignment-in-thinking-llms-with-low

Analysing CoT alignment in thinking LLMs with low-dimensional steering — LessWrong

1+ day, 6+ hour ago (546+ words) I ran two separate experiments to find empirical evidence towards faithfulness and coherence behavioural editing. While I present some explanations for my results, I am by no means an experienced interpretability researcher, so some of my claims may need to…...

lesswrong.com
lesswrong.com > posts > g3doG7J7JHKnghmja > playing-dumb-detecting-sandbagging-in-frontier-llms-via

Playing Dumb: Detecting Sandbagging in Frontier LLMs via Consistency Checks — LessWrong

1+ day, 8+ hour ago (321+ words) Detecting sandbagging is difficult because we cannot simply compare a model's performance to its "true" capability level. Instead, we need indirect methods to identify when a model is intentionally underperforming. One approach is to test whether a model answers the…...

lesswrong.com
lesswrong.com > posts > dbao5iMWcdsN2vcee > language-models-resemble-more-than-just-language-cortex-show

Language models resemble more than just language cortex, show neuroscientists — LessWrong

Language models resemble more than just language cortex, show neuroscientists — LessWrong 1+ day, 9+ hour ago (256+ words) In their new results, they found that signal correlations between model and brain region change significantly over the course of the 'training' process, where models are taught to autocomplete as many as trillions of elided words (or sub-words, known as…...

lesswrong.com
lesswrong.com > posts > tJKNXCxx7ZKD5mtG9 > schelling-coordination-in-llms-a-review

Schelling Coordination in LLMs: A Review — LessWrong

1+ day, 11+ hour ago (1119+ words) This blogpost summarises the findings of my (lite) systematic literature review of Schelling coordination in LLMs that I undertook as part of the Apart Research Fellowship. If LLMs can identify and exploit Schelling points, they might coordinate without leaving observable…...

lesswrong.com
lesswrong.com > posts > ZvAQfLQMSuKJFsion > the-reality-of-wholes-why-the-universe-isn-t-just-a-cellular

The Reality of Wholes: Why the Universe Isn’t Just a Cellular Automaton — LessWrong

1+ day, 12+ hour ago (1680+ words) ~Qualia of the Day: PageRank Monadology~ Before diving in, let me be explicit about what a successful theory of consciousness needs to explain, at minimum (cf. Breaking Down the Problem of Consciousness): The framework I'm sketching here, building on David…...

lesswrong.com
lesswrong.com > posts > XC28DmEYPLqfwc8tf > we-need-a-better-way-to-evaluate-emergent-misalignment

We need a better way to evaluate emergent misalignment — LessWrong

3+ day, 11+ hour ago (964+ words) [Link dump, feel free to skip] Emergent misalignment (Betley et al. 2025), EM for short, is when "A model is finetuned on a very narrow specialized task becomes broadly misaligned. This is first discovered with a LLM SFT'ed on examples of…...

Shopping

Please enter a search for detailed shopping results.