When the Hoffman2 Shared Research Cluster was created, the IDRE technology team had a simple message for UCLA faculty and researchers.
“Trust us,” they said.
Trust us to treat your work with respect.
Trust us to process your computational runs quickly, and safely.
Trust us to care as much about your precious grant dollars as you do.
Trust us to do what is best for you.
The Institute for Digital Research and Education, which created the Hoffman2 in 2008, knew that researchers were accustomed to stand-alone systems and might resist joining the shared cluster. But IDRE also knew that this machine was necessary to meet the ever-growing demand for computational resources. IDRE was confident that once researchers saw what it could do for them, they would come on board.
Today, IDRE proudly points to the Hoffman2 as its greatest achievement to date. The largest and most powerful cluster in the UC system, it’s an 850- node, 8200 -CPU, 298 -GPU, 32 – terabytes of memory, 300 -teraflop computational wonder. It also exemplifies IDRE’s primary mission: to provide an integrated focus and critical infrastructure for faculty, so that those doing interdisciplinary research don’t need an interdependent expertise in cyberinfrastructure.
Under the Hoffman2 sharing system, individual research groups purchase cores which are added to base cores purchased by IDRE. Researchers who use the Hoffman2 are guaranteed the use of the number of cores they contribute, and also have the option to use surplus cores. When the Hoffman2 began operating, this purchasing and usage structure was unique in the United States. Since then it has been copied many times across the country.
Among the benefits the Hoffman2 provides users are reduced costs and a simplified process, because running a single system provides administrative advantages. IDRE loads the operating systems and special libraries, and also gets hardware quotes for its users, saving them both time and money.
“One constant with researchers is that they are extremely careful about how they spend their funds,” said Bill Labate, Director of IDRE’s Research Technology Group. “We can provide access to more resources than individuals can afford on their own, and we can get them better hardware prices than they could on their own.”
Labate likens the Hoffman2’s ease of use to a utility. In order to turn on the lights, consumers simply flip a switch. Knowledge of the electrical grid is unnecessary. Similarly, in order to run their computations, researchers simply log on to the Hoffman2. Knowledge of the cyberinfrastructure is not required.
Approximately 1,200 users, 275 faculty sponsors and 80 research groups and departments currently use the Hoffman2. IDRE estimates that the shared cluster program costs $.024 per core hour, compared to the National Science Foundation best practices rate of $.04 per core hour, and Amazon EC2 Cluster Instance rate of $.74 per core hour.
“This is a very effective and efficient system,” Labate said. “The value proposition is very high. IDRE has no hidden agenda. Our practice is to see what people need, make sure the system is running at tip-top performance, and give users value and security.”
Energy efficiency is another important benefit of shared cluster computing. IDRE estimates that the Hoffman2 saves $100,000 annually in power efficiency compared to stand-alone systems. It saves another $720,000 annually by harvesting unused cycles, as the cluster’s utilization rate is approximately 90 percent, versus 40 percent for stand-alone systems.
And let’s not forget about the computational benefits offered by this powerful system. The Hoffman2 is capable of 300 trillion operations per second. Project turnaround is quick — 99.1 percent of all jobs take 24 or fewer hours. A recent snapshot of real-time usage showed 3,201 jobs running, 3,030 pending and a system load of 94.4 percent. More than 25 million jobs are completed annually.
“It’s a monster,” Labate said. “Huge chunks of date go through this, and it performs incredibly complex calculations.”
The monster is so big that it takes three data centers to house it. And it’s still growing. IDRE estimates that with current staffing levels, data center space and power capabilities, there’s room for it to nearly triple in size, to 2,300 nodes.
The academic community has realized many direct benefits from the Hoffman2, with $10 million in grants, 150 published papers and six conferred PhD’s flowing from its use annually. Additionally, its presence has provided researchers with a local resource to test their codes to ensure they scale upward to platforms used at the national supercomputing centers. As a result, UCLA now occupies an integral position in the pipeline to leadership-class facilities.
Much of the Hoffman2’s success is attributable to word-of-mouth, as satisfied researchers across disciplines have spread the message that, yes indeed, you can trust its hosts. But there’s not a single “I told you so” emanating from the IDRE staff, which is happy to be of service.
“We have this resource available,” Labate said. “We want people to know what we can give to them.”