The Shared Hoffman2 Cluster is made up of two main virtualized clusters that have been optimized for different research needs. The Research Virtual Shared Cluster is made up from Contributed cores purchased by individual research groups and Base cores purchased by IDRE to augment the Contributed cores. One benefit of contributing cores to the shared cluster is that a research group is guaranteed use of the number of cores contributed with the ability to use surplus cores from the entire Hoffman2 Cluster. Other benefits provided to research groups when they join the shared cluster include:
- Complete system administration for contributed cores
- Cluster access through a 10Gb network interconnect to the campus backbone
- High performance home and scratch storage space
- A dedicated data center facility for housing the cluster. This eliminates the need to perform expensive space, cooling, and electrical modifications to existing office or lab space
- The capability to run large parallel jobs that can take advantage of the cluster’s InfiniBand interconnect
Research groups who have contributed cores to the Research Virtual Shared Cluster also have access to the features of the General Purpose Cluster. This gives them:
- Access to pooled licenses, allowing researchers to run larger commercial applications without the cost of buying additional licenses
- Access to additional commercial and open source applications
Base and Contributed Equipment Standards and Policies
All contributed hardware must be compatible with the base core architecture, processor type and speed, memory, disk space, and interconnect. This maximizes the effective management of the Hoffman2 Cluster to provide the highest level computing services to shared cluster customers. IDRE provides full support in helping researchers specify and purchase at optimal price/performance their cores to meet these standards.
Once contributed, these cores become part of the entire Hoffman2 Cluster and are no longer physically linked to a given research group. Because cycles are pooled across all Base and Contributed cores, which may be in use by others, the equivalent number of cores to those contributed is made available within 24 hours after a request. In practice, the number of cores contributed by a research group is generally available much sooner. Jobs that run on the Virtual Shared Cluster have a 14-day upper limit (with appropriate notification, longer runs may be accommodated).
While it is hard to give an exact number of additional cores available, in practice there are unused cores that can be made available within a reasonable period of time for researches that require use of cores in addition to those contributed.
With advance agreement, a very large job that requires a large segment of the entire shared cluster (those cores connected through InfiniBand) can be accommodated dependent upon current cluster usage and consent by affected research groups.
Research Virtual Shared Cluster Hosting Costs
There are two hosting “costs” to participate in the Hoffman2 Shared Cluster
- The first is a one-time charge for an HPC Compute Node which is $5,745.77 as of June 2017. This node features 2 x 12-core Intel Xeon E5-2650v4 CPUs, running at 2.2GHz, with 64GB* of memory from Silicon Mechanics. Full access to these cores through the Hoffman2 Shared Cluster queuing system, administration of the users of your compute resources, complete support of the hardware and the operating system as well as connection to the cluster’s Infiniband fabric are also included.
- Secondly research groups that contribute cores to the Hoffman2 Cluster agree to contribute their unused cycles to other researchers. They can regain full use of their contributed cores within 24 hours of submitting a job.
* Note that the previous batch of nodes contained 128GB of memory for roughly the same cost. We have dropped back to 64GB because memory prices have risen sharply since the beginning of the year and a node with 128GB of memory would run an additional $800. Rather than increase the node cost that drastically, we opted to keep the cost close to what it was and let those that specifically need the extra memory pay the premium. Nodes can easily be upgraded to 128GB if you need the extra memory. We will continue to monitor memory prices and make adjustments to node prices to provide the best configuration we can while keeping the cost reasonable.
Users of the Virtual Research Shared Cluster and users of the General Purpose Cluster have the option of purchasing high performance project storage space on a per terabyte basis. This is a particularly important option for those desiring more space than the free 20 GB per user that is standard on the Research Virtual Shared or General Purpose Cluster. Also available are private scratch space with unlimited file retention times, and archival storage for long term storage of infrequently used data sets or output.
Base and Contributed Equipment Renewals
After a period of five years all hardware within the shared cluster is evaluated for retention based on condition of equipment, cost to maintain, relative compute power and the ability to backfill with new systems. This is done to maintain a high performance and low maintenance system, while maximizing the utilization of data center space.
If the contributed nodes can still be effectively maintained, they will remain inside the Hoffman2 Cluster and continue to be reevaluated on an annual basis. If the contributed nodes can no longer be effectively maintained, they will be removed from Hoffman2 and either redeployed for other uses or decommissioned.
The Shared Hoffman2 Hardware and Software
The Hoffman2 Cluster has 64-bit nodes with an Ethernet network and Infiniband interconnect, with the following standard software suite:
- Compilers: GCC and the best performing compiler for: C, C++, Fortran 77, 90 and 95 on the current Shared Cluster architecture
- Applications and Libraries in the Basic Software Suite
Certain applications are provided for a base level of cluster usability. Every effort is made to maximize application usage to the extent capable under license agreements. Where possible software is provided that would not make sense for an individual research group to purchase on its own.
In addition to the Base and Contributed cores, the Hoffman2 Cluster includes login nodes, dedicated data transfer nodes, and high performance storage systems. The Hoffman2 Cluster has both InfiniBand and gigabit Ethernet network switches and interconnects. The Ethernet fabric is dedicated to storage traffic as well as various administrative functions and is used as the interconnect for the Applications cluster. To maintain maximum parallel performance, InfiniBand is used strictly for inter-node, MPI-type communication across the Research Virtual Shared Cluster and the General Purpose Cluster.