News

The Decoder
the-decoder.com

Current language model training leaves large parts of the internet on the table

9+ hour, 23+ min ago  (515+ words) Large language models learn from web data, but which pages actually end up in training sets depends heavily on the HTML extractor used. Researchers at Apple, Stanford, and the University of Washington show that three common tools extract surprisingly different…...

The Decoder
the-decoder.com

Popular LLM ranking platforms are statistically fragile, new study warns

1+ week, 6+ day ago  (525+ words) Researchers show that popular LLM ranking platforms are surprisingly fragile. Removing just 0.003 percent of user ratings is enough to topple the top-ranked model. Unlike standardized benchmarks, platforms like Arena (formerly LMArena or Chatbot Arena) measure how language models perform in…...

The Decoder
the-decoder.com

Deepmind's research AI occasionally solves what humans can't and mostly gets everything else wrong

2+ week, 2+ day ago  (1135+ words) Google Deepmind's AI agent Aletheia independently wrote a math paper, disproved a decade-old conjecture, and caught an error that cryptography experts had missed. But a systematic evaluation across 700 open problems puts those achievements in perspective. The researchers also provide a…...

The Decoder
the-decoder.com

Best multimodal models still can't crack 50 percent on basic visual entity recognition

2+ week, 6+ day ago  (597+ words) The WorldVQA benchmark tests whether multimodal language models actually recognize visual entities or just hallucinate them. Even the best models can't crack the 50 percent mark. Researchers at Moonshot AI, the company behind the Kimi model series, have released a new…...

The Decoder
the-decoder.com

Study finds AI reasoning models generate a "society of thought" with arguing voices inside their process

2+ week, 6+ day ago  (444+ words) Reasoning models like Deepseek-R1 don't just think longer. A new study finds they internally simulate a kind of debate between different perspectives that challenge and correct each other. The researchers spotted these patterns using an LLM-as-judge approach, with Gemini 2.5 Pro…...

The Decoder
the-decoder.com

Nvidia releases open model PersonaPlex, a voice AI that listens and talks at the same time

1+ mon, 2+ day ago  (570+ words) Nvidia has released PersonaPlex, a conversational AI model that enables natural real-time conversations with customizable voices and user-defined roles. Traditional voice assistants run speech recognition, language models, and speech synthesis one after another. This allows voice and role customization but…...

The Decoder
the-decoder.com

GPT-5.2 Pro solves another Erdős problem while a new database reveals most attempts still fail

1+ mon, 1+ week ago  (176+ words) OpenAI's GPT-5.2 Pro has helped solve another Erd's problem. Neel Somani used the AI model to crack Erd's problem #281 from number theory. Mathematician Terence Tao calls this "perhaps the most unambiguous instance" of an AI solving an open mathematical problem....

The Decoder
the-decoder.com

Terence Tao says GPT-5.2 Pro cracked an Erdős problem, but warns the win says more about speed than difficulty

1+ mon, 1+ week ago  (607+ words) Mathematician Terence Tao just documented a milestone in applying AI to math problems. But he's also warning people not to read too much into it. Mathematician Paul Erd's spent his lifetime formulating hundreds of open problems. These so-called Erd's problems…...

The Decoder
the-decoder.com

Google's MedGemma 1.5 brings 3D CT and MRI analysis to open-source medical AI

1+ mon, 2+ week ago  (665+ words) Google has updated its open-source medical AI model, MedGemma 1.5, making it the first publicly available language model capable of interpreting three-dimensional CT and MRI images. The healthcare industry is adopting generative AI at roughly twice the rate of the broader…...

The Decoder
the-decoder.com

AI models don't have a unified "self" - and that's not a bug

1+ mon, 2+ week ago  (101+ words) Expecting internal coherence from language models means asking the wrong question, according to an Anthropic researcher. "Why does page five of a book say that the best food is pizza and page 17 says the best food is pasta? What does…...