HiPerGator

UF's Supercomputer: HiPerGator 4th Gen

Faceplate from an NVIDIA B200

June 2025 Update

UF received the first delivery of a DGX B200 SU in the world!

NVIDIA DGX B200 SuperPOD SU1

The full NVIDIA DGX B200 SuperPOD will consist of two scalable units (SUs). SU1 was delivered at the end of January and has undergone extensive testing since the installation was completed in early April. UFIT Research Computing is finalizing benchmarks and ensuring the system is ready for users.

The current plan is for SU1, including 31 servers and 248 B200 GPUs, to come online for users sometime next week. However, until the system is in full production, the B200 servers will run under an “early access” service level.

  • Maintenance on servers will be more frequent. While staff will try to avoid stopping running jobs, occasionally taking some or all servers offline may be necessary. We will not be able to announce all maintenance in advance.
  • Slurm scheduler settings for GPU use and priorities may change between early access and the final production state.

New CPU/GPU servers

Another part of HiPerGator 4th Gen is the replacement of HiPerGator 2 servers and 2080 Ti and RTX 6000 GPUs after many years of service. Our vendor, Lenovo, has delivered our new system with 19,200 cores and 600 NVIDIA L4 GPUs. UFIT Research Computing staff are getting these servers connected, burned in, and ready for users. These resources will deliver refreshed CPU resources and a much more capable GPU to take on many common AI workloads and provide graphics capabilities. The current plan is for these servers to come online for users by the end of June.  

Decommissioning the A100 and 2080 Ti servers    

Part of the cost of the HiPerGator 4th Gen upgrades was offset by trading in the older hardware. We are working with our vendors to coordinate dates that align with bringing new hardware online with minimal disruption of service availability. However, depending on the dates vendors pick up the components, we may have some periods with limited GPU availability.

The current plan is for the 2080 Ti and RTX 6000 GPUs to remain in production until the L4 GPUs come online towards the end of June. The vendors may want these sooner, reducing GPU availability until we can get the L4s into production. The remaining A100s will be removed from service on June 24, making space for the delivery of SU2 of the NVIDIA DGX B200 SuperPOD.

NVIDIA DGX B200 SuperPOD SU2

UFIT Research Computing anticipates the delivery of the second SU in early July. After installation, the system will be validated and hopefully turned over to our staff by the end of July. In August, we will need to combine and benchmark the full system with both SUs. That will require all B200 servers to be removed from user access, leaving only the L4s accessible to users during much of August.

Blue Storage Replacement

Our storage vendor, DDN, is finalizing the configuration of the new all-flash 11 PB Blue storage system that will replace the current 7.2 PB Blue storage. Once received, UFIT Research Computing staff will update users with more details of the transition plan to migrate data to the new system while minimizing interruption of system availability.

HiPerGator 4th Gen Production

We are still on track for an early September readiness of the full HiPerGator 4th Gen with approximately 60,000 cores, 600 L4 GPUs, 504 B200 GPUs and an 11PB all-flash, high-performance, parallel file system. This system will continue to cement UF’s status as a leader in high-performance computing and AI capabilities.

March 2025 update

The UF Information Technology (UFIT) Data Center has been an especially busy place for the past few months! On Jan. 22, two semi-trucks made it through the snow and ice that blanketed the South to deliver the first NVIDIA DGX B200 servers anywhere to the University of Florida. Since then, teams from UFIT, Mark III, and NVIDIA have been working to install and certify the first of two NVIDIA DGX B200 SuperPOD scalable units (SUs). This is the first of several steps before the system can be put into production.

UFIT expects Lenovo to deliver a 19,000-core system with 600 NVIDIA L4 GPUs in April to replace 30,000 CPU cores from HiPerGator 2nd Gen (2015) and 600 NVIDIA 2080ti GPUs from 2019. DataDirect Networks (DDN) will also deliver a new storage system to replace the Blue storage with 11 PB of all-flash storage. UFIT Research Computing staff will continue to install the SuperPOD and configure the new CPU cores, GPUs, and Blue storage through July.

Dr. Deumens installs an NVIDIA B200 SystemUFIT estimates that users will begin to have some access to the new systems in July. There will be times when the systems will not be available due to maintenance and full-system testing. Around that time, the remaining A100 DGX servers will be removed to make room for the second DGX B200 SuperPOD SU delivery and installation.

As the new Blue storage becomes ready for production, UFIT will migrate data from the current Blue storage to the new system, possibly requiring some periods of downtime in July and August.

Our goal is to have the second SU of the B200 SuperPOD ready for production by the end of August. At that point, UFIT will need to use the whole system to run benchmarks and ensure all components are ready for production.

By September 2025, HiPerGator 4th Gen is planned to be ready for production with approximately 60,000 cores, 600 L4 GPUs, 504 B200 GPUs, and an 11 PB all-flash high-performance parallel file system. This system will continue to cement UF’s status as a leader in high-performance computing and AI capabilities.