laitimes

Technical practice | How to do RAID for NVMe

author:Woqu Technology

1. Traditional RAID

Redundant Arrays of Independent Disks (RAID) are many independent hard disks, combined into a huge group of hard disks.

1.1 Array card

Nowadays, almost all servers are managed by RAID cards. After the hard disk is inserted into the slot of the server, the RAID card is connected by a cable on the back panel for a unified management, and then provided to the operating system, which cannot use the hard disk directly.

Here are some common RAID levels:

  • RAID 0
  • Divide the data into several blocks of equal size, write them on different hard disks, and realize parallel read access of multiple hard disks in an independent manner, and perform I/O operations concurrently. Basically can play the performance of all hard disks, but RAID0 has no redundancy mechanism, the more hard disks, the better the performance, but the higher the probability of failure causing the volume group to be unavailable. It is recommended to work with the upper layer application to provide redundant scenarios.
Technical practice | How to do RAID for NVMe
  • RAID 1

    When the upper layer of data is issued, the data is written exactly on the two hard disks to achieve data redundancy, and the data can be read from the two hard disks to improve the read performance. When one hard disk fails, the system automatically reads data from another hard disk, improving reliability, but at a higher cost, suitable for applications that require high data reliability, such as placing the system disk on RAID 1.

Technical practice | How to do RAID for NVMe
  • RAID 5

    The data and the corresponding parity information are stored on different hard disks, and the parity information occupies the space of a hard disk, allowing damage to one hard disk. When a hard disk is damaged, the corrupted data can be calculated based on the remaining data and the corresponding parity information, and the performance of the recovered data will be greatly affected. RAID 5 has low write performance, especially random write performance, and is recommended for use in scenarios where reads and writes are less frequent, such as as data backup.

Technical practice | How to do RAID for NVMe
  • RAID 6

    Similar to RAID 5, it has two separate sets of parity information that can allow damage to two hard drives.

Technical practice | How to do RAID for NVMe

1.2 Caching Policy

In addition to the ability to assemble the hard disk into different RAID levels, the RAID card can also provide cache function to accelerate reads and writes.

Each RAID card has a cache of varying sizes.

Technical practice | How to do RAID for NVMe
  • Read Policy

    1. Read Ahead: Read-ahead cache mode, the subsequent block data of the current read is also pre-read and saved in the cache, which can quickly correspond to subsequent read operations. When reading the same data for the second time, there is no need to read the data from the hard disk again, and the frequently read hotspot data is stored in the cache, which can greatly improve the read performance.

    2, No Read Ahead: read operations without caching.

  • Write Policy

    1, Write Back: WB write-back mode, the mode will be the upper layer of the data will be written to the RAID card cache first, and then write the data in the cache to the hard disk, for the upper layer application, when the data is written to the cache has been considered successful, so the WB mode can improve the write performance.

    2, Write Througt: WT write-through mode, which will directly write data to the hard disk.

The premise of enabling WB mode is that the RAID card has a battery to ensure that the data in the cache can still be written to the hard disk in the event of a server power failure. When the server's RAID card battery fails or the battery is insufficient, the WB mode will be forced to switch to WT mode to prevent data loss, but the read policy will not be affected in any way, because the data written to the cache is dirty data, inconsistent with the data in the hard disk, and the data read to the cache is completely consistent with the data in the hard disk, even if it is lost, it has no relationship.

WB mode can improve write performance, which has the premise that the amount of data issued by the upper layer is not large and can be accommodated by the RAID card cache, but when the upper layer sends a large amount of data, the cache will soon be broken down. We can verify through an experiment, adjust the write strategy of a hard disk to WB mode, and then use the fio tool to simulate the upper layer data delivery

Technical practice | How to do RAID for NVMe

It can be seen that when there is a large amount of data at the upper level, the cache is quickly broken down, and the write performance is cliff-like

1.3 Performance Comparison

The above mentioned the characteristics and advantages and disadvantages of each RAID level, below we test through the fio tool, intuitively feel the performance of different RAIDs.

Technical practice | How to do RAID for NVMe

A disk RAID0 can basically play all the performance of this disk, through the above test data can be found:

  1. RAID1 uses two disks, the read can play the performance of two disks, and the write performance is basically the same as that of a single disk, indicating that different data contents can be read from two disks at the same time, and the same data needs to be written in the two disks when writing.
  2. RAID5 uses three disks, and the read energy can basically play the performance of three disks, but the writing performance is not high, especially the random write performance is even more miserable

2.VROC

Nowadays, the performance requirements of hard disks are getting higher and higher, SATA, SAS because of its own defects, the performance will basically not exceed 600MiB/s, so the demand for NVMe disks is increasing, especially in the database and other industries with higher performance requirements for the underlying hardware.

The NVMe disk passes directly to the cpu, which cannot be managed by the RAID card, so it is not possible to create different RAID levels through the RAID card. At present, there are two ways to do raid for NVMe, one is soft RAID and the other is Intel VROC

  • Soft RAID
  • The implementation of soft RAID is achieved without hardware involvement, and is achieved entirely by software imitating disk arrays, and the RAID logic is computed through the CPU core
  • Intel VROC
  • Intel VROC is a hybrid RAID solution. Its properties are similar to hardware RAID in that a key chip feature of the Intel Volume Management Device (Intel VMD) is delivered through the new Intel Xeon Scalable processors. Intel Virtual RAID on CPU (VROC) leverages Intel VMDs to aggregate NVMe SSDs for bootable RAID. Intel VROC also has attributes such as software RAID. For example, it uses some CPU cores to compute RAID logic. Because this software is combined with a chip, Intel VROC is called a hybrid RAID solution.
Technical practice | How to do RAID for NVMe

2.1 How VROC is used

At present, Intel Xeon Scalable processors basically support VMD, but not CPU support can use VROC, you need to meet the following conditions:

1. VMD is a relatively new technology, which requires the boot mode of the server to be adjusted to UEFI boot mode.

2. Requires VROC Key, a licensed product sold through OEM or ODM with a Support Service Level Agreement. The Intel VROC hardware key is a mechanism for licensing Intel VROC software. Some OEMs/ODMs build Intel VROC-enabled servers and workstations by adding key headers to their motherboards. Each VROC Key costs around $100.

Technical practice | How to do RAID for NVMe

VROC Key has several different models, each model Key has some differences in the function, some models of Key can support the rest of the brand of NVMe disk, but for compatibility considerations, it is still not recommended to use the rest of the brand NVMe disk .

3. Server support, not every server can install VROC Key, the server motherboard needs to have the corresponding slot to insert VROC Key.

2.2 VROC and soft raid

The way VROC is managed and the system shown above is very similar to soft RAID.

Both VROC and soft RAID are managed using the mdadm command, and VROC needs to install the corresponding version of the mdadm rpm package.

rpm –U mdadm-4.0-13.1.IntelVROC6.0.el7.x86_64.rpm           

VROCs are so similar to soft RAIDs, so is there any gap in performance? We can take a look at it experimentally

Technical practice | How to do RAID for NVMe

In order to make the effect more obvious, three disks were used to create RAID0, and the results showed that there was no significant performance gap between VROC and soft RAID

  • RAID 1
Technical practice | How to do RAID for NVMe

VROC and soft RAID formation RAID1 performance are basically the same

  • RAID 5
Technical practice | How to do RAID for NVMe

VROC and soft RAID composition of raid5 performance is basically the same, but both of the writing performance are low, by observing the cpu usage rate can be found that VROC and soft RAID are single-core processes, the core usage rate has been full, CPU performance is also a factor restricting RAID5 write performance

Technical practice | How to do RAID for NVMe

Through the above tests, it can be found that the performance of VROC and soft RAID is basically the same, because both need to use the CPU core to calculate the RAID logic. VROC, on the other hand, has hardware support and has some other features

2.3 VROC remaining features

1. Bootable RAID

VROC can be used as a system disk, VROC has hardware support, before installing the operating system, you can use the NVMe disk in the BIOS to create RAID, and then install the system on this raid volume group.

Technical practice | How to do RAID for NVMe

2. Backup function

VROC has a hot spare function, which can be used to rebuild raids when a device failure is detected

Technical practice | How to do RAID for NVMe

3. NVMe disk positioning

NVMe disks have not been a good way to locate, and VROC can accurately locate to NVMe disks

ledctl locate=/dev/nvme0n1           

The specified NVMe disk can be flashed by the above command

3. Summary

In terms of performance, VROC is basically the same as soft RAID. The advantage of soft RAID is that there is no additional cost, while the advantage of VROC is that it is supported by hardware, has some features (bootable RAID, NVMe disk positioning, etc.), and has Intel's technical services. Both have advantages, so you can choose which technology to use according to your actual needs