News

lesswrong. com
lesswrong. com > posts > z T76 Jcom Kkdqo8t C6 > a-slightly-mechanistic-theory-for-exponentially-increasing

A (Slightly) Mechanistic Theory for Exponentially Increasing AI Time Horizons? " Less Wrong

3+ hour, 58+ min ago  (985+ words) AI "time horizons" are mostly not about time (I think it"s mostly "data", but you"ll see where I"m unsure). "...

Symbols: btc-usd
lesswrong. com
lesswrong. com > posts > o8 PQcgpznf6 GKszd A > probabilities-are-not-the-right-concept

Probabilities are not the right concept " Less Wrong

1+ day, 3+ hour ago  (1750+ words) This sequence is an attempt to sketch a unified framework for several interconnected questions: Where do Bayesian priors come from? What even are probabilities? How should we deal with infinite ethics? What's going on with anthropics? I hope to lay…...

lesswrong. com
lesswrong. com > posts > Kmra Qd Rk Q7 Ax Ffzcd > can-large-language-models-identify-novel-threats-part-1

Can Large Language Models Identify Novel Threats? Part 1: Mirror Life and the Classification Gap " Less Wrong

1+ day, 16+ hour ago  (354+ words) Can an LLM refuse a harmful uplift request when the topic in question hasn't been identified as dangerous yet? In 2022, mirror RNA polymerase was act...

Symbols: nasdaq:lnai
lesswrong. com
lesswrong. com > posts > jkr Syy3p C6e Drur DQ > counting-arguments-in-ai-safety

Counting Arguments in AI Safety " Less Wrong

2+ day, 11+ hour ago  (21+ words) cf. https: //www. lesswrong. com/posts/Ys FZF3 K9tuzbfr Lxo/counting-arguments-provide-no-evidence-for-ai-doom, https: //www. lesswrong. com/posts/y QSmcf N4k A...

Symbols: nasdaq:gen
lesswrong. com
lesswrong. com > posts > xs DWd7e2yr Pdt XMSu > what-am-i-if-not-an-ai-1

What am I, if not an AI? " Less Wrong

3+ day, 5+ hour ago  (234+ words) TL: DR " * I RL fine-tuned Mistral 7 B Instruct v0. 3 and Llama 3. 1 8 B Instruct to avoid self-identifying as a language model, without specifying a tar...

lesswrong. com
lesswrong. com > posts > x Ws Bwrbo YDEMdj8 TC > ai-169-new-knowledge

AI #169: New Knowledge " Less Wrong

3+ day, 6+ hour ago  (1447+ words) Even in a relatively quiet period, AI is out there creating new knowledge. The new knowledge in question is Open AI getting us the first truly impress...

Symbols: btc-usd
lesswrong. com
lesswrong. com > posts > HPq Rsg Szg Qd5 HQsr B > learned-chain-of-thought-obfuscation-generalises-to-unseen

Learned Chain-of-Thought Obfuscation Generalises to Unseen Tasks " Less Wrong

3+ day, 9+ hour ago  (299+ words) TL; DR Training against a Co T or summary-only monitor can lead to obfuscation of dangerous reasoning in unseen tasks. This strengthens the "don't trai...

Symbols: symbol:once
lesswrong. com
lesswrong. com > posts > 9toz5 YHYZs Hpzy4ce > why-does-off-model-sft-degrade-capabilities

Why does off-model SFT degrade capabilities? " Less Wrong

3+ day, 19+ hour ago  (1026+ words) Off-model SFT (SFT on outputs generated by a different model) might be an important method for controlling AI behavior. For instance, it seems like a...

Symbols: nasdaq:ftnt
lesswrong. com
lesswrong. com > posts > w32ptvwibx PCJB57 P > sparse-efficiency-vs-superposition-the-interpretability

Sparse Efficiency vs. Superposition: The Interpretability Tradeoff " Less Wrong

4+ day, 36+ min ago  (401+ words) Today's frontier models train in an expensive style: dense forward passes, huge matrix multiplies, and broad weight updates. The human brain (~5 MWh over 28 years) is an existence proof that learning can be vastly more energy efficient - about 10, 000x - than modern AI…...

Symbols: trg-rf,nasdaq:ctw
lesswrong. com
lesswrong. com > posts > Xy JDPDvvq9 Amg Bwpx > singular-learning-theory-comprehensive-1

Singular Learning Theory Comprehensive - 1 " Less Wrong

3+ day, 23+ hour ago  (1264+ words) Introduction There are some very nice resources to understand the intuition of Singular Learning Theory. However, I am quite unsatisfied with the cur...