CPU unemployment precursor? NVIDIA announces direct-to-graphics SSD technology

2022-03-18 18:44:58

Traditional data reading relies on the CPU to perform virtual address translation, page-based on-demand data loading, and other extensive data management work for memory and external storage, and the graphics card as one of the core components of the computer cannot read data directly from the SSD. But with the rise of artificial intelligence and cloud computing, having GPUs directly read data within SSD hardware is the most efficient way.

CPU unemployment precursor? NVIDIA announces direct-to-graphics SSD technology

To enable GPU applications to read data directly, NVIDIA and IBM have partnered with several universities to create a new architecture that provides fast ,fine-grained access to large amounts of data storage, known as "Big Accelerator Memory" (BaM). With this technology, gpu memory capacity can be increased, storage access bandwidth can be effectively increased, and gpu threads can be provided with a high-level abstraction layer for easy on-demand and fine-grained access to massive data structures in the extended memory hierarchy.

For ordinary users, BaM has two major advantages, the first is based on software management of GPU cache, data storage and information transmission distribution between graphics cards, all handed over to the gpu core threads to manage. By using RDMA, PCI Express interface, and custom Linux kernel drivers, BaM allows GPUs to read and write SSD data directly.

The second is to open the NVMe SSD data communication request, BaM will only make the GPU thread ready to refer to the driver command when the specific data is not in the cache area managed by the software. Algorithms that run heavy workloads on the graphics processor enable efficient access to important information by optimizing access routines for specific data.

In CPU-centric policy computers, data-related access patterns with fine-grained nature are dragged down by data transfer between CPUs, GPUs, and amplification of I/O traffic. In the GPU memory of the BaM model, the researchers provide a user-level library based on the high concurrency NVMe commit/completion queue, enabling GPU threads that are not lost from the software cache to efficiently access storage in a high-throughput manner.

What's more, the BaM scheme has extremely low software overhead per storage access and supports highly concurrent threads. In the relevant experimental tests of the Linux prototype test platform based on BaM design + standard GPU + NVMe SSD, BaM delivered quite gratifying results.

As an alternative to CPU-based management of all transactions, BaM's research shows that storage access can work simultaneously, eliminate synchronization limitations, and significantly improve I/O bandwidth efficiency, resulting in a significant increase in application performance. Bill Dally, chief scientist at NVIDIA, pointed out that thanks to software caching, BaM does not rely on virtual memory address translation and is inherently immune to serialization events such as TLB misses.

Editor's comments: With the development and application of Resizable BAR and SAM technology, the bandwidth bottleneck between GPU and CPU has been greatly alleviated, but the application efficiency of allowing the GPU to obtain data directly from the SSD will be higher than that of obtaining data from the CPU. Although the new BaM is not yet clear how it will be applied in the consumer sector, it is believed that related products will be available soon.

CPU unemployment precursor? NVIDIA announces direct-to-graphics SSD technology

Read on