Dear Hoffman2 Users,
The Hoffman2 Cluster will be unavailable from 6:00AM on Monday, December 14, 2015 through 6:00PM on Tuesday, December 15, 2015 for scheduled upgrades and maintenance.
During this time, we will be installing a new OS image with the latest Centos 6.6 release, performing upgrades on storage equipment, and doing a major upgrade of the Infiniband fabric.
The primary focus of the maintenance window will be the upgrade of the Infiniband infrastructure in the POD data center.
We will be replacing the existing QDR Infiniband core switch in the POD with a new, modern FDR core switch. We will also be replacing many of the QDR leaf switches with FDR leaf switches and replacing and upgrading cabling. These upgrades should greatly increase the throughput and reliability of our IB fabric.
Due to the major changes involved in this upgrade, we will be closely monitoring the performance of parallel jobs running on the Infiniband fabric following the upgrade.
You will not be able to log in, run jobs or transfer files during the maintenance window. Starting Monday, December 1st, we will hourly reduce the maximum run time in order to completely drain the cluster’s running jobs by December 14. If your job requests more than this maximum time, it will not start. All jobs (queued or running) at the time of this outage will be killed; affected users will have to resubmit their jobs after the outage. Please plan your runs accordingly.
If you have any questions regarding this maintenance, please submit a ticket to our support site at:
IDRE Research Technology Group