News
Qwen Introduces Qwen3. 7-Max: A Reasoning Agent Model With a 1 M-Token Context Window
14+ min ago (815+ words) Most AI models today are not designed for sustained, multi-step autonomous execution. Tasks like running hundreds of iterative code modifications, or chaining tool calls across hours without human intervention, require a different kind of model architecture and training focus. Alibaba's…...
Cohere Releases Command A+: A 218 B Sparse Mo E Model for Agentic Workflows That Runs on as Few as Two H100 GPUs
1+ hour ago (404+ words) The attention layers interleave sliding-window attention layers with Rotational Positional Embeddings and global attention layers without positional embeddings in a 3: 1 ratio. The sparse Mo E layer is trained in a fully dropless manner and uses a token-choice router, with a…...
How to Build Knowledge Graph Generation Pipelines From Text With kg-gen, Network X Analytics, and Interactive Visualizations
1+ day, 4+ hour ago (283+ words) We begin by installing all the required libraries for knowledge graph generation, graph analytics, and visualization. We then import the core packages, including kg-gen, Network X, Py Vis, Matplotlib, and display utilities for Colab. We also configure the API key…...
NVIDIA AI Releases Nemotron-Labs-Diffusion: A Tri-Mode Language Model with 6" Tokens Per Forward Over Qwen3-8 B
1+ day, 12+ hour ago (689+ words) NVIDIA researchers have released Nemotron-Labs-Diffusion, a language model family that unifies three decoding modes in one architecture. The model supports autoregressive (AR) decoding, diffusion-based parallel decoding, and self-speculation decoding. It is available in 3 B, 8 B, and 14 B parameter sizes. The…...
A Coding Guide Implementing SHAP Explainability Workflows with Explainer Comparisons, Maskers, Interactions, Drift, and Black-Box Models
4+ day, 15+ hour ago (643+ words) In this tutorial, we implement SHAP workflows as a practical framework for interpreting machine learning models beyond basic feature-importance plots. We start by training tree-based models and then compare different SHAP explainers, including Tree, Exact, Permutation, and Kernel methods, to…...
How to Build an MCP Style Routed AI Agent System with Dynamic Tool Exposure Planning, Execution, and Context Injection
6+ day, 1+ hour ago (697+ words) In this tutorial, we build a fully functional MCP-style routed agent system from scratch, combining tool discovery, intelligent routing, structured planning, and execution into a single cohesive workflow. We start by setting up a modular tool server that exposes capabilities…...
Zyphra Releases ZAYA1-8 B-Diffusion-Preview: The First Mo E Diffusion Model Converted From an Autoregressive LLM With Up to 7. 7x Speedup
6+ day, 2+ hour ago (291+ words) This creates a bottleneck. When the GPU spends more time moving data from memory than performing actual computation, the system becomes memory-bandwidth bound rather than compute-bound. This limits how efficiently modern GPU hardware " which has been scaling compute FLOPs faster…...
Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2. 5x Across 270 M to 10 B Parameter Models
1+ week, 17+ hour ago (698+ words) The two-phase training technique " validated across four model scales from 270 M to 10 B parameters " modifies only the training loop, leaving the inference-time architecture completely untouched. Pre-training large language models is expensive enough that even modest efficiency improvements can translate into…...
Meet Ant Angel Med: A 103 B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio Mo E Architecture
1+ week, 2+ day ago (522+ words) A team researchers from China have released Ant Angel Med, a large open-source medical language model that the team describes as the largest and most capable of its kind currently available. Ant Angel Med is a medical-domain language model with…...
Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon
1+ week, 2+ day ago (288+ words) However, U-Nor Muon still has a problem: it forcefully overrides the polar factor with uniform row norms, sacrificing polar factor precision, which is both theoretically undesirable and empirically costly in the Muon framework (the paper shows that Muon achieves monotonically…...