Last December, HPE announced the world’s most scalable and modular in-memory computing platform, HPE Superdome Flex—a compute breakthrough to power critical applications, enable real-time analytics and tackle data-intensive high performance computing (HPC) workloads.
In this article, I’ll be taking an in-depth look at the HPE Superdome Flex modular, scalable architecture and the capabilities that make it unique in the industry.
Scaling beyond the capabilities of Intel
Like most other x86 server vendors, HPE uses the latest Intel® Xeon® Scalable processor—codename Skylake—in its latest-generation servers, including HPE Superdome Flex. Intel’s reference design for these processors uses the new UltraPath Interconnect (UPI) that limits scaling to 8 sockets. Most vendors using these processors base their server designs on this “glueless” interconnect method, but unlike them, HPE Superdome Flex uses a unique modular architecture that can scale beyond the capabilities of Intel—from 4 to 32-sockets in a single system.
We did this because we recognized the market need for platforms able to scale beyond Intel’s 8-socket limit, especially today when data sets are growing at an unprecedented pace and customer need scale up capacity to support growing workloads. In addition, because Intel focuses the UPI on 2- and 4-socket servers, the 8-socket “glueless” servers become bandwidth challenged. The HPE Superdome Flex design delivers high-bandwidth even when the system grows to the largest configurations.
Price/performance advantages over other systems
The HPE Superdome Flex modular architecture is based on a 4-socket chassis that can scale to 8 chassis for a total of 32 sockets in a single-system compute powerhouse. There are many different processor options to choose, from the cost-efficient Gold to the high-end Platinum “flavors” of the Xeon Scalable processor family.
This choice of Gold and Platinum processors delivers great price/performance advantages over smaller systems. For example, in a typical 6TB memory configuration, HPE Superdome Flex can deliver a lower-cost, higher-performance solution than competitive 4-socket offerings. Why? Because of their design, other 4-socket systems are forced to use 128GB DIMMs, which are a lot more expensive than the 64GB DIMMS an 8-socket HPE Superdome Flex can utilize. At this socket count, an 8-socket/6TB HPE Superdome Flex will deliver double the compute power, double the memory bandwidth and double the IO capability—and it will still be more cost effective than a 4-socket/6TB competitive product.
Similarly, for a competitive 8-socket/6TB configuration, HPE Superdome Flex can deliver a lower-cost, higher-performance 8-socket solution. How? While others are forced to use more expensive Platinum processors because of their design, an 8-socket HPE Superdome Flex can use lower-cost Gold processors to give you the same memory capacity.
In fact, of the platforms based on Intel Xeon Scalable processors, Superdome Flex is the only one able to deliver 8-sockets using the cost-effective Gold variant (as Intel´s “glueless” design supports 8-sockets only through the more expensive Platinum type). HPE Superdome Flex also comes with a variety of core count choices, enabling you to map the number of cores per processor to your workload requirements, with variations starting as low as 4 cores to as high as 28 cores per processor.
Scaling up: why it matters
The ability to scale as a single system, or scale up, delivers several advantages for those vital workloads and databases HPE Superdome Flex is best suited for. These include traditional and in-memory databases, real-time analytics, ERP, CRM and other OLTP workloads. For these types of workloads, a scale-up environment is simpler and cheaper to manage than a scale-out cluster, and it also reduces latency, increasing performance.
You can read this blog post on the transaction speed when scaling up or out with SAP S/4HANA to understand why scaling up is a much better alternative than scaling out/clustering for these types of workloads. It’s all about speed and the ability to perform at the level required for these critical applications. For a short video on when to scale-up versus out, you can click here.
Consistent high performance, even at the largest configurations
The HPE Superdome Flex extreme scale is achieved via the unique HPE Superdome Flex ASIC chipset, connecting the individual 4-socket chassis to one another in a point-to-point fashion, as shown in Figures 1 and 2. The HPE Superdome Flex ASIC technology enables adaptive routing, which load-balances the fabric and optimizes latency and bandwidth, increasing performance and system availability. The ASIC connects the chassis together in a cache-coherent fabric and maintains coherency by tracking cache line state and ownership across all the processor sockets inside a directory cache built into the ASIC itself. This coherency scheme is a critical factor in the ability of HPE Superdome Flex to perform at near linear scaling from 4-sockets all the way up to 32-sockets. Typical “glueless” architecture designs already see limited performance when scaling to as low as 4- to 8-sockets, because of broadcast snooping.
In a similar fashion to compute, memory capacity can grow as more chassis are added to the system. With support for 48 DDR4 DIMM slots per chassis, accommodating either 32 GB RDIMMs, 64 GB LRDIMMs, or even 128 GB 3DS LRDIMMs, the maximum per-chassis memory capacity is 6 TB. This gives a fully scaled 32-socket HPE Superdome Flex a whopping total memory capacity of 48 TB of shared memory to support the most demanding in-memory applications.
Extreme I/O flexibility
As for I/O, each HPE Superdome Flex chassis can be equipped with either a 16-slot or 12-slot I/O bulkhead to provide numerous stand-up PCIe 3.0 card options, giving you plenty of flexibility to support a wide variety of workloads. With either I/O bulkhead selection, the I/O design provides direct connections between the processors and the card slots—with no need for bus repeaters or retimers that can add latency or reduce bandwidth. This gives you the best per card performance possible.
Low latency is a key factor driving the high performance of HPE Superdome Flex. Although data exists in local (directly connected to processor) or remote (across chassis) memory, copies of the data can exist in various processor caches throughout the system. Cache coherency keeps the cached copies consistent in the event an operation changes the data. The round trip latency between a processor and local memory is about 100ns. Latency of a processor accessing data from memory connected to another processor over UPI is ~130ns.
Processors accessing data residing in memory in another chassis will travel between two Flex ASICs (always a single “hop”) for a roundtrip latency of under 400ns—no matter if a processor at the top of the rack is accessing data from memory at the bottom. As for bandwidth, HPE Superdome Flex provides more than 210 GB/s of bi-sectioned crossbar bandwidth at 8-sockets, more than 425 GB/s at 16-sockets and over 850 GB/s at 32-sockets. That’s plenty to power the most demanding workloads.
Why does this extreme modular scalability matter?
It’s no secret data is growing at an unprecedented pace–which means infrastructure strains to handle increasingly demanding requests to process and analyze critical, ever-growing data sets. But growth rates can be unpredictable.
To support the business, IT teams need systems that respond effectively and promptly to their requests, regardless of the amount of data or how fast it grows. Having a platform that keeps pace with the demands of your business will give you peace of mind—so you’ll know that you won’t run out of room to grow, but neither will you need to overprovision.
When you deploy memory-intensive workloads, you might ask: What will my next TB of memory capability cost? With Superdome Flex, you can scale memory capacity without a forklift upgrade, as you’re not limited to the DIMM slots in a single chassis. Also, as the number of users increase, mission-critical applications require a high performing environment regardless of size.
In closing, today’s in-memory databases demand low-latency/high-bandwidth systems. Thanks to its innovative architecture, HPE Superdome Flex delivers extreme performance, high bandwidth and consistent low latency, even at the largest configurations. What’s more, you can get all this for your critical workloads and databases at better price performance than on smaller systems. And, the platform gives you the room for growth and availability expected in a mission-critical environment.
For more information on HPE Superdome Flex, visit www.hpe.com/superdome