News
Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE
1+ hour, 56+ min ago (576+ words) Federated learning (FL) is no longer a research curiosity'it's a practical response to a hard constraint: the most valuable data is often the least movable. Regulatory boundaries, data sovereignty rules, and organizational risk tolerance routinely prevent centralized aggregation. Meanwhile, sheer…...
Simplify Sparse Deep Learning with Universal Sparse Tensor in nvmath-python
1+ day, 17+ hour ago (624+ words) In a previous post, we introduced the Universal Sparse Tensor (UST), enabling developers to decouple a tensor's sparsity from its memory layout for greater flexibility and performance. We're excited to announce the integration of the UST into nvmath-python v0. 9. 0 to accelerate…...
Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron
1+ day, 20+ hour ago (639+ words) Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved significant success more recently when applied to leading LLMs. In particular, Muon (Moment Um Orthogonalized by Newton-Schulz)…...
Maximizing Memory Efficiency to Run Bigger Models on NVIDIA Jetson
3+ day, 17+ hour ago (1078+ words) The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these models at the edge, enabling physical AI agents and autonomous robots to automate heavy-duty…...
Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision
3+ day, 18+ hour ago (703+ words) To make these workloads viable, researchers and engineers are turning to low-precision datatypes like FP8 to boost performance in training and throughput-oriented generation. Moreover, in some scenarios where generation is bound by GPU memory bandwidth, using low-precision parameters can improve performance…...
Building Custom Atomistic Simulation Workflows for Chemistry and Materials Science with NVIDIA ALCHEMI Toolkit
1+ week, 3+ day ago (743+ words) Machine learning interatomic potentials (MLIPs) have emerged as the bridge, offering quantum accuracy at classical speeds. However, the software ecosystem is a new bottleneck. While the MLIP models themselves run on GPUs, the surrounding simulation infrastructure often relies on legacy…...
How to Accelerate Protein Structure Prediction at Proteome-Scale
2+ week, 1+ day ago (543+ words) Proteins rarely function in isolation as individual monomers. Most biological processes are governed by proteins interacting with other proteins, forming protein complexes whose structures are described in the hierarchy of protein structure as the quaternary representation." This represents one level…...
Integrate Physical AI Capabilities into Existing Apps with NVIDIA Omniverse Libraries
2+ week, 2+ day ago (768+ words) Physical AI'AI systems that perceive, reason, and act in physically grounded simulated environments'is changing how teams design and validate robots and industrial systems, long before anything ships to the factory floor. At GTC 2026, NVIDIA highlighted physical AI as a key…...
CUDA Tile Programming Now Available for BASIC!
3+ week, 2+ day ago (949+ words) CUDA 13. 1 introduced CUDA Tile, a next generation tile-based GPU programming paradigm designed to make fine-grained parallelism more accessible and flexible. One of its key strengths is language openness: any programming language can target CUDA Tile, enabling developers to bring tile-based…...
Designing Protein Binders Using the Generative Model Proteina-Complexa
4+ week, 2+ day ago (895+ words) To address these challenges, NVIDIA has released Proteina-Complexa, a generative model that designs de novo protein binders and enzymes." In this post, we detail the key technologies behind Proteina-Complexa, explore primary use cases, and highlight the extensive experimental validation of…...