Most supercomputers are focused on pure processing speed. Take the DOE's new Summit system, which is now the world's most powerful supercomputer, with 9,000 22-core IBM Power9 processors and over 27,000 NVIDIA Tesla V100 GPUs. But processing performance isn't everything. Last year, Hewlett Packard Enterprise announced The Machine, its prototype for a supercomputer built around blazing fast memory. It's meant to churn through tons of data, though it can handle it's fair share of high performance computing (HPC) jobs.
Now, HPE is turning that vision into an actual product: Astra, the largest ARM-based supercomputer ever made. Developed together with the Department of Energy, it's being adopted by the Sandia National Laboratory as an experimental new platform for nuclear research. Since it's powered by Cavium ThunderX2 ARM processors, it's considerably more power efficient and denser (meaning it can fit more hardware) than a comparable x86 system. Notably, that ARM chipset also offers 33 percent faster memory speeds than many x86 CPUs.
"The energy consumption needed to move data around a system is an order of magnitude greater than the power needed to compute that data," said Mike Vildibill, VP of HPE's Advanced Technologies group, in an interview. Consequently, the need for more efficient data transfer is one reason why HPE is diving into memory-driven systems.
Astra is built on HPE's Apollo system, and it's made up of more than 145,000 cores across 2,592 dual processor servers. The 28-core ThunderX2 processors HPE is using also offer eight memory channels, compared to the six found on typical x86 chips. At its peak, HPE claims Astra can deliver 2.3 PFLOPs of performance, which makes it among the top 100 fastest supercomputers in the world (as listed by top500.org).
Additionally, each of the system's CPUs have direct access to a large pool of memory. That's a big difference from the CPU-centric computing we see today, where each chip has access to small amounts of memory, and it's tough to share information between processors.
"For one processor to access data not held in its own memory, the computer must play an inefficient game of "Mother May I," so to speak," HPE writes in a memory computing explainer. "One processor must request access from another processor to get anything accomplished. What's worse, the relationship between storage and memory is also inefficient. In fact, in today's computers it's estimated that 90 percent of work is devoted to moving information between tiers of memory and storage."
At Sandia Labs, Astra will be a part of its Vanguard prototype program, which is focused on finding new technologies for accomplishing its core mission: managing America's nuclear stockpile. Specifically, it'll be a test to see how well an ARM-based system can handle all of the physics simulations Sandia performs daily.
"With any new hardware architecture, there are going to be software challenges and gaps," said Sandia's James Laros, lead for the Vanguard project. "We're hoping that Astra is the right scale to address these problems, to identify any sort of issues in supporting their simulation code." Before laying out hundreds of millions on a complete system revamp, Sandia is making a smaller investment to prove the viability of ARM-based servers.
Sandia's typical applications are "particularly sensitive to bandwidth," Laros says — to the point where the apps sometimes overload and get slowed down by their caches. He likens the jump in ARM's bandwidth to when AMD placed a memory controller on their 2003-era CPUs, which gave those chips a major speed advantage over Intel's.
While it's being tested, Astra won't be replacing any existing systems at Sandia, but Laros says it'll likely end up being a production system eventually. At the moment, it's primarily an experiment, but it's one that has the potential to reshape the world of supercomputing.