News
Automated Alignment Researchers: Using large language models to scale scalable oversight
8+ hour, 22+ min ago (420+ words) One is how alignment can keep up. Frontier AI models are now contributing to the development of their successors. But can they provide the same kind of uplift for alignment researchers? Could our language models be used to help align…...
The assistant axis: situating and stabilizing the character of large language models
2+ mon, 3+ week ago (947+ words) We can investigate these questions by looking at the neural representations" inside language models'the patterns of activity that inform how they respond. In a new paper, conducted through the MATS and Anthropic Fellows programs, we look at several open-weights language…...
Building Effective AI Agents
9+ mon, 3+ week ago (1292+ words) Published Dec 19, 2024 We've worked with dozens of teams building LLM agents across industries. Consistently, the most successful implementations use simple, composable patterns rather than complex frameworks. Over the past year, we've worked with dozens of teams building large language model…...