News
Vibe Thinker-3 B: Exploring the Frontier of Verifiable Reasoning in Small Language Models | Papers
6+ hour, 54+ min ago (136+ words) Vibe Thinker-3 B: Exploring the Frontier of Verifiable Reasoning in Small Language Models As shown in the framework diagram below: The authors apply this to Math, Code, and STEM domains. Notably, the Math RL phase includes a Long2 Short stage to optimize…...
Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories | Papers
6+ hour, 6+ min ago (84+ words) Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Hyper AI The authors introduce a multi-agent framework termed the Virtual Newsroom, which automates the end-to-end process of data journalism. As illustrated in the overview below, the system transforms raw data…...
Search Swarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research | Papers
6+ day, 5+ hour ago (83+ words) Search Swarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research Hyper AI Search Swarm-30 B-A3 B is a model trained via supervised fine-tuning on harness-generated trajectories to internalize delegation intelligence for long-horizon deep research, achieving 68. 1 on Browse Comp and…...
Jenseits statischer Dialoge: Benchmarking realistischer, heterogener und sich entwickelnder Langzeitged'chtnismodelle | Paper
1+ week, 22+ hour ago (146+ words) Dataset Composition and Sources Key Details for Each Subset Data Usage and Processing The Dialogue Generation phase utilizes a two-stage pipeline to simulate authentic user-assistant interactions. In the first stage, a user simulator acts as a conversation initiator, introducing new…...
Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory | Papers
1+ week, 22+ hour ago (139+ words) Dataset Composition and Sources Key Details for Each Subset Data Usage and Processing The Dialogue Generation phase utilizes a two-stage pipeline to simulate authentic user-assistant interactions. In the first stage, a user simulator acts as a conversation initiator, introducing new…...
Cosmos 3: Omnimodal World Models for Physical AI | Papers
1+ week, 5+ day ago (46+ words) Cosmos 3: Omnimodal World Models for Physical AI Hyper AI Cosmos 3: Omnimodal World Models for Physical AI From idea to launch " accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs. Build the Future of Artificial…...
Building a RAG Cost Control Layer | Trending Stories
2+ week, 4+ day ago (463+ words) Retrieval-Augmented Generation systems often suffer from hidden inefficiencies that drain budgets without compromising output quality. A new cost control layer designed in pure Python addresses three primary failure modes: context over-fetching, missing semantic caching, and the use of high-cost models…...
Self-Improving Language Models with Bidirectional Evolutionary Search | Papers
2+ week, 5+ day ago (158+ words) Dataset composition and sources: The authors compile a benchmark suite for open problem solving tasks, pairing each task with specialized prompts tailored for their evolutionary search system. The data draws from established coding benchmarks and centers on iterative program improvement....
Research Math-14 K: Scaling Research-Level Mathematics via Agents | Papers
2+ week, 5+ day ago (141+ words) Dataset Composition and Sources Key Details for Each Subset Data Usage and Training Strategy The authors present a multi-stage framework for extracting, refining, and verifying open research questions from scholarly sources, designed to ensure both behavioral fidelity and factual accuracy…...
RAGEN-2 Agentic RL'Reasoning Collapse" | "
2+ mon, 1+ week ago (121+ words) RAGEN-2 Agentic RL'Reasoning Collapse'hyper. ai By identifying template collapse as a failure mode where agentic reinforcement learning models adopt input-agnostic reasoning patterns despite stable entropy, the RAGEN-2 study proposes using mutual information proxies for diagnosis and introduces SNR-Aware Filtering to…...