NVIDIA Launches the GB300 NVL72: A Massive Leap in AI Supercomputing

Amaryllo Group
Jul 14
2 min read

NVIDIA has officially revealed the GB300 NVL72, its most powerful AI server to date. This full-rack AI supercomputer is built with the new Blackwell Ultra GPUs and is designed for large-scale AI workloads such as training and inference of massive language models and high-performance computing tasks.

Replacing the previous GB200 model, the GB300 brings improvements not just in speed but also in serviceability, cooling, and energy efficiency. It is built to meet the needs of modern AI factories that require scale, performance, and reliability.

A Redesigned AI Cabinet from the Ground Up

The GB300 NVL72 is a tall, sleek cabinet filled with compute trays, switch trays, and power trays. Each compute tray now includes removable B300 GPUs and Grace CPUs. This is a significant shift from the GB200, where the components were soldered onto the board and could not be removed individually. The socket-based design in the GB300 makes upgrades and maintenance easier, improving overall system uptime.

The cooling system has also been upgraded. Instead of using a single water block to cover multiple chips, each CPU and GPU in the GB300 now has its own dedicated cold plate. This improves thermal efficiency and allows the system to run cooler under heavy workloads. As a result, each compute tray in the GB300 now includes six cold plates and fourteen quick-disconnect fittings, compared to just two plates and six fittings in the previous generation.

Improved Power Resilience

To support the increased power demands of the new GPUs, NVIDIA has added Backup Battery Units (BBUs) as a standard feature in the GB300 NVL72. These units provide an immediate power source in case of a primary power failure. This helps ensure continuous operation and avoids disruptions during power transitions.

What’s Inside and When It’s Coming

The GB300 NVL72 combines 72 B300 GPUs and 36 Grace CPUs in a 48U rack, using full liquid cooling and NVLink 5.0 for high-speed interconnect. This setup delivers 1.5 times the performance of the GB200 for large model inference tasks such as Llama-3-400B. Compared to the H100 NVL72 system, the GB300 offers up to 50 times more inference throughput per cabinet.

Each B300 GPU is equipped with 288 gigabytes of HBM3E memory, allowing it to run models with up to 400 billion parameters without needing to split them across multiple GPUs. This reduces latency and simplifies system design.

The GB300 will enter mass production in the second half of 2025 and will be delivered to cloud service providers and systems integrators.

Why This Matters

The GB300 NVL72 represents a major step forward in AI infrastructure. It offers increased performance, easier maintenance, and a more robust cooling and power architecture. Whether it's for training large language models, powering AI research, or scaling up enterprise AI operations, the GB300 is designed to handle the most demanding tasks in modern computing.

It is not just a product upgrade. It is a glimpse into the future of scalable AI.