News

NVIDIA Technical Blog
developer. nvidia. com > blog > federated-learning-without-the-refactoring-overhead-using-nvidia-flare

Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE

1+ hour, 56+ min ago  (576+ words) Federated learning (FL) is no longer a research curiosity'it's a practical response to a hard constraint: the most valuable data is often the least movable. Regulatory boundaries, data sovereignty rules, and organizational risk tolerance routinely prevent centralized aggregation. Meanwhile, sheer…...

NVIDIA Technical Blog
developer. nvidia. com > blog > simplify-sparse-deep-learning-with-universal-sparse-tensor-in-nvmath-python

Simplify Sparse Deep Learning with Universal Sparse Tensor in nvmath-python

1+ day, 17+ hour ago  (624+ words) In a previous post, we introduced the Universal Sparse Tensor (UST), enabling developers to decouple a tensor's sparsity from its memory layout for greater flexibility and performance. We're excited to announce the integration of the UST into nvmath-python v0. 9. 0 to accelerate…...

NVIDIA Technical Blog
developer. nvidia. com > blog > advancing-emerging-optimizers-for-accelerated-llm-training-with-nvidia-megatron

Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron

1+ day, 20+ hour ago  (639+ words) Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved significant success more recently when applied to leading LLMs. In particular, Muon (Moment Um Orthogonalized by Newton-Schulz)…...

NVIDIA Technical Blog
developer. nvidia. com > blog > maximizing-memory-efficiency-to-run-bigger-models-on-nvidia-jetson

Maximizing Memory Efficiency to Run Bigger Models on NVIDIA Jetson

3+ day, 17+ hour ago  (1078+ words) The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these models at the edge, enabling physical AI agents and autonomous robots to automate heavy-duty…...

NVIDIA Technical Blog
developer. nvidia. com > blog

Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision

3+ day, 18+ hour ago  (703+ words) To make these workloads viable, researchers and engineers are turning to low-precision datatypes like FP8 to boost performance in training and throughput-oriented generation. Moreover, in some scenarios where generation is bound by GPU memory bandwidth, using low-precision parameters can improve performance…...

NVIDIA Technical Blog
developer. nvidia. com > blog > building-custom-atomistic-simulation-workflows-for-chemistry-and-materials-science-with-nvidia-alchemi-toolkit

Building Custom Atomistic Simulation Workflows for Chemistry and Materials Science with NVIDIA ALCHEMI Toolkit

1+ week, 3+ day ago  (743+ words) Machine learning interatomic potentials (MLIPs) have emerged as the bridge, offering quantum accuracy at classical speeds. However, the software ecosystem is a new bottleneck. While the MLIP models themselves run on GPUs, the surrounding simulation infrastructure often relies on legacy…...

NVIDIA Technical Blog
developer. nvidia. com > blog > how-to-accelerate-protein-structure-prediction-at-proteome-scale

How to Accelerate Protein Structure Prediction at Proteome-Scale

2+ week, 1+ day ago  (543+ words) Proteins rarely function in isolation as individual monomers. Most biological processes are governed by proteins interacting with other proteins, forming protein complexes whose structures are described in the hierarchy of protein structure as the quaternary representation." This represents one level…...

NVIDIA Technical Blog
developer. nvidia. com > blog > integrate-physical-ai-capabilities-into-existing-apps-with-nvidia-omniverse-libraries

Integrate Physical AI Capabilities into Existing Apps with NVIDIA Omniverse Libraries

2+ week, 2+ day ago  (768+ words) Physical AI'AI systems that perceive, reason, and act in physically grounded simulated environments'is changing how teams design and validate robots and industrial systems, long before anything ships to the factory floor. At GTC 2026, NVIDIA highlighted physical AI as a key…...

NVIDIA Technical Blog
developer. nvidia. com > blog > cuda-tile-programming-now-available-for-basic

CUDA Tile Programming Now Available for BASIC!

3+ week, 2+ day ago  (949+ words) CUDA 13. 1 introduced CUDA Tile, a next generation tile-based GPU programming paradigm designed to make fine-grained parallelism more accessible and flexible. One of its key strengths is language openness: any programming language can target CUDA Tile, enabling developers to bring tile-based…...

NVIDIA Technical Blog
developer. nvidia. com > blog > designing-protein-binders-using-the-generative-model-proteina-complexa

Designing Protein Binders Using the Generative Model Proteina-Complexa

4+ week, 2+ day ago  (895+ words) To address these challenges, NVIDIA has released Proteina-Complexa, a generative model that designs de novo protein binders and enzymes." In this post, we detail the key technologies behind Proteina-Complexa, explore primary use cases, and highlight the extensive experimental validation of…...