We recently designed and deployed an Artificial Intelligence (AI) solution for an automotive customer based on our many years of experience building sizeable High-Performance Computing (HPC) clusters for AI. The proposed data-centric architecture heavily relied on an innovative storage solution from Weka, an HPE partner. The customer was looking for a hybrid solution that would allow them to run on-premises and take advantage of the public cloud for peak capacity and future growth. As such, the overall design can run workloads on-premises or in a public cloud-based on customer-driven policies.
Virtual assistants in the Automotive world
The customer builds AI models for major automobile manufacturers worldwide. Its automotive cognitive assistance solutions power natural and intuitive interactions between automobiles, drivers and passengers, and the broader digital world. They produce one of the world’s most popular software platforms for building virtual automotive assistants — fueling AI for a world in motion. AI is used to deliver an exceptional in-car experience based on a deep understanding of language, culture, and human behavior.
The customer turned to HighFens for expertise on the project to deliver a new AI cluster that would empower R&D to achieve its research goals. The R&D organization mostly builds speech-language models for Natural Language Processing (NLP) and Natural Language Understanding (NLU).
A growing number of their workflows rely on Deep Learning (DL) frameworks paired with NVIDIA® GPUs to obtain faster and more accurate results. The ability to stage data closer to the GPU is crucial for achieving high performance.
Besides, they had a requirement to make the data accessible through a POSIX interface. It would provide consistency and compatibility with existing tools and applications.
The new AI cluster’s end-to-end design replaces the previous decade-old architecture initially designed by the HighFens team.
The customer has many petabytes of data but processes only a fraction of that data at any one point in time. It did not make sense to store all the data on high-performing, expensive storage. A modern file system, such as WekaFS®, was needed to meet the requirements. WekaFS® has a two-tier architecture that takes NVMe flash and disk-based technologies and presents them as a single hybrid storage solution.
For that reason, it was opted to take advantage of Weka’s tiering capabilities:
- Front-end storage:
- Use high-performing, low-latency NVMe drives to provide just the storage capacity needed to store the data that will soon be processed.
- The storage capacity can accommodate only about 25% of the entire data footprint, but this can vary from customer to customer.
- Back-end storage:
- The bulk of the data (75%) goes into cost-effective object storage.
The data is always accessible through a POSIX interface but can reside in back-end storage until needed for processing. WekaFS®, the Weka file system, dynamically and transparently moves the data between the front and back ends depending on usage patterns.
It is important to note that the front and back-ends can scale independently. For example, the back-end storage can increase without having to adjust the capacity of the front-end. Conversely, adding more processing resources adds more front-end storage without impacting the back end.
Many data sources generate at an increasing pace a large volume of data. Therefore, having an AI architecture that can scale and grow with the business is essential. We frequently see architectures that look fine at first but cannot scale and cause various bottlenecks, including network congestions that prevent the compute resources from accessing data quickly. Typically, this results in decreased overall performance, higher data access latency, longer wait times, and even failed jobs.
Furthermore, the architecture must deliver a cost-effective solution with a predictable price tag that meets budget constraints.
The proposed architecture was a “modular” design that scales linearly by incorporating additional building blocks to accommodate any growth. The addition of more computation power increases the need for IO bandwidth. Each module consists of a server that contributes both computing power and front-end storage bandwidth. The complete set of modules provides the total capacity of the AI cluster.
A module consists of a 1U server (HPE DL360 Gen10) that contributes to compute, storage, and networking resources to the overall solution. Each server comes with 40 physical cores, 512GB RAM, 2x25Gb/sec Ethernet ports, 2×3.2TB NVMe drives, m.2 drives for the operating system and local storage, and 1 x NVIDIA® T4 GPUs.
Figure 1 shows the logical architecture built around the WekaFS® file system. Each of the 40 modules comes with the Weka client and allows for horizontal scaling.
All the modules together make up the front-end storage (256TB of NVMe). The SuSe Ceph data lake represents the Weka back-end and is AWS S3 compatible.
Figure 1 – AI architecture with WekaFS®
- Each server running WekaFS® services has a dedicated set number of physical cores and RAM capacity.
- FAQ: “Why allocate cores? Isn’t that a waste?” Answer: a high-performance file system service needs CPU cycles regardless of whether those resources are dedicated. However, without dedicated cores, you could end up in a situation where both jobs and the file system services compete for the same cores — resulting in unpredictable and sluggish file system performance.
- The 2 x 3.2TB NVMe drives are exclusively for the Weka file system and contribute to the overall front-end storage capacity.
- The running jobs have local access to the drives and benefit from very high performance and low latency.
- The WekaFS® automatic tiering capability will take care of moving data to and from the NVMe drives when needed.
- There are two network ports, one of which is dedicated to WekaFS® services to prevent non-Weka traffic from causing bottlenecks and impacting performance.
- Many of the workloads require the same data to be accessed by both CPUs and GPUs. Rather than having separate CPU and GPU servers and moving the data around, we added an NVIDIA® T4 GPU to each CPU server.
- With the data already on the Weka front-end storage, some GPU-based workloads now run three times faster than when moving data between separate CPU and GPU servers.
- The NVIDIA® T4 is an Inference card, but it is a very efficient choice for smaller GPU workloads. Its form factor and low power, when combined with Weka, make it a perfect match for our modular design.
Figure 2 represents the physical layout: on the left, a rack of HPE DL360 Gen 10 servers, and to the right a rack with HPE Apollo 4200 Gen 10 servers. The compute portion shows the CPUs augmented with the NVIDIA® T4 GPUs. Meeting customer demand is a matter of adding the proper quantity of compute and storage racks.
Figure 2 – Physical View.
Left: HPE DL360 Gen10
Right: HPE Apollo 4200 Gen10
Weka is hardware-agnostic, and hence there is no lock-in to specific hardware, which opens the door for easy upgrade of drives or integration of newer drives when they become available.
The data is accessible through a single namespace and a POSIX file system interface.
With tiered storage, we provided the equivalent IO performance and capacity of a much larger system but with a smaller footprint.
A decoupled storage front-end and back-end enable the customer to grow and adjust the system’s performance.
The modular design makes horizontally scaling easy and is a matter of adding one or more servers (HPE DL 360, HPE Apollo 4200, or HPE Apollo 6500) where needed.
The snapshot capabilities make data movement between on-premises and the public cloud a breeze — making snapshots a crucial feature of the hybrid architecture and critical for Disaster Recovery and archiving.
The architecture has an NVIDIA®T4 GPU in each compute node (HPE DL 360) for Inference or Training workloads. The front-end storage resides on the same server and delivers the highest performance with the lowest latency. Similarly, the eight NVIDIA® V100 GPUs in each HPE Apollo 6500 perform the bulk of the GPU workflows.
Software updates and patches are applied to the system while online to avoid downtime. Avoiding downtime is an important feature that is a must-have for many customers.
HPE provided all the hardware and software, and for support, there is only one organization to deal with (avoids multiple vendors blaming each other when something goes wrong).
Hardware & Software
|HPE DL360 Gen10||2x sockets, 2x 10GbE, 2x NVMe 3.2 TB, 1x NVIDIA® T4|
|HPE Apollo 4200 Gen 10||2x sockets, 2x 10GbE, 24x 4TB SAS|
|HPE Apollo 6500 Gen 10||2x sockets, 2x 10GbE, 2x NVMe 3.2 TB, 8x NVIDIA® V100|
|HPE SN2100M||100GbE switch|
|Software (HPE Partners)|
|SuSe Enterprise Linux|
|Weka.io – WekaFS®|
HighFens, Inc. provides consultancy and services for end-to-end scalable HPC/AI solutions. Our customers look for innovative solutions that are reliable, scalable, and cost-effective. The HighFens team has more than 20 years of experience in HPC, Big Data, and AI, focusing on highly scalable solutions and data management. Feel free to reach out if you are looking for help with a project or have any questions. For more information, please visit our website and our blog page for the latest updates.