📜 ⬆️ ⬇️

SPDK: Acceleration with NVMe Disks

The SPDK (Storage Performance Developer Kit) is a set of open source tools and libraries that are designed to facilitate the development of high-performance, scalable applications focused on interaction with disk drives. In this article we will focus on the NVMe driver available in the SPDK operating in the Linux user space, and also consider the implementation of the Hello World example application on the Intel platform.



In our experiments, the server is based on the Intel C610 chipset (C1 stepping, system bus QPI, 9.6 GT / s) with two sockets, which are equipped with 12-core Intel Xeon E5-2697 processors (clock frequency - 2.7 GHz, 24 logical cores Ht). The configuration of the RAM - 8x8 GB (Samsung M393B1G73BH0 DDR3 1866). The system has an Intel SSD DC P3700 Series solid-state drive. The operating system used is CentOS 7.2.1511 (kernel 3.10.0).

Why do we need NVMe-driver working in the user space of Linux?


Historically, disk drives are orders of magnitude slower than other components of computer systems, such as memory and processor. This means that the operating system and processor are forced to interact with the disks using the interrupt mechanism. For example, a session of such an interaction might look like this:
')
  1. A request is made to the OS to read data from the disk.
  2. The driver processes this request and communicates with the hardware.
  3. The disc plate is unwound.
  4. The read / write head moves to the desired part of the plate, preparing to start reading the data.
  5. Data is read and written to the buffer.
  6. An interrupt is generated, which notifies the processor that the data is ready for use in the system.
  7. Finally, data is read from the buffer.

The interrupt model creates an additional load on the system. However, usually this load was significantly less than the delays characteristic of conventional hard drives. As a result, this additional load was not paid much attention, since it could not noticeably reduce the efficiency of the storage subsystems.

Nowadays, SSDs and next-generation technologies, such as 3D XPoint storage, are much faster than traditional HDDs. As a result, the bottleneck of data storage subsystems, which used to be hardware, has moved into the sphere of software mechanisms. Now, as can be seen in the figure below, the delays that contribute to the process of working with drives interrupt and the operating system, in comparison with the speed of response of drives, look very significant.


SSD-drives and storage systems based on 3D XPoint technology work much faster than traditional HDDs. As a result, software has now become a bottleneck in storage subsystems

The NVMe driver, which works in Linux user space, solves the “interrupt problem”. Instead of waiting for an operation completion message, it interrogates the storage device as it reads or writes. In addition, and this is very important, the NVMe driver works inside user space. This means that applications can directly interact with the NVMe device, bypassing the Linux kernel. One advantage of this approach is getting rid of system calls that require context switching. This leads to an additional load on the system. The NVMe architecture does not provide locks; this is intended not to use processor mechanisms to synchronize data between threads. The same approach provides for parallel execution of input-output commands.

Comparing the NVMe user-space driver from the SPDK with the approach of using the Linux kernel, it can be found that when using the NVMe-driver, the delays caused by the additional load on the system are reduced by about 10 times.


Delays, in nanoseconds, caused when using the Linux kernel and SPDK mechanisms for working with drives

The SPDK can, using one processor core, serve 8 NVMe solid-state drives, which gives more than 3.5 million IOPs.


Changing I / O performance when working with different numbers of SSDs using Linux kernel and SPDK mechanisms

Prerequisites and SPDK Assembly


SPDK supports operating systems such as Fedora, CentOS, Ubuntu, Debian, FreeBSD. A complete list of packages required for SPDK can be found here .

Before collecting the SPDK, you need to install the DPDK (Data Plane Development Kit), since the SPDK relies on the memory management capabilities and the work with the queues that already exist in the DPDK. DPDK is a mature library that is commonly used to process network packets. It is well optimized for memory management and fast data queuing.

The SPDK source code can be cloned from the GitHub repository with the following command:

git clone https://github.com/spdk/spdk.git 

â–Ť Building DPDK (for Linux)


 cd /path/to/build/spdk wget http://fast.dpdk.org/rel/dpdk-16.07.tar.xz tar xf dpdk-16.07.tar.xz cd dpdk-16.07 && make install T=x86_64-native-linuxapp-gcc DESTDIR=. 

â–Ť Build SPDK (for Linux)


After the compiled DPDK is in the SPDK folder, we need to go back to this directory and compile the SPDK, passing make path to the DPDK.

 cd /path/to/build/spdk make DPDK_DIR=./dpdk-16.07/x86_64-native-linuxapp-gcc 

â–ŤSetting up the system before running the SPDK application


The following command allows you to enable the use of large memory pages (hugepages) and unlink any NVMe and I / OAT devices from the kernel drivers.

 sudo scripts/setup.sh 

The use of large pages is important for performance, as they are 2 MB in size. This is much more than standard 4 Kb pages. Due to the increased size of memory pages, the probability of a miss in the associative translation buffer (Translate Lookaside Buffer, TLB) is reduced. TLB is a component inside the processor that is responsible for translating virtual addresses into physical memory addresses. Thus, working with large pages leads to a more efficient use of TLB.

Hello World sample application


There are many examples included in the SPDK, and there is quality documentation here . All this allows you to quickly get started. We will look at an example in which the phrase “Hello World” is first saved on the NVMe device, and then read back to the buffer.

Before you get into the code, you should talk about how the NVMe devices are structured and give an example of how the NVMe driver will use this information to locate devices, write data, and then read it.

The NVMe device (also called the NVMe controller) is structured based on the following considerations:


Now we will start our step by step example.

â–ŤSetting


  1. Initialize the environment abstraction layer (EAL) abstraction layer. In the code below, -c is a bitmask that serves to select the cores on which the code will be executed. –n is the kernel ID, and --proc-type is the directory where the hugetlbfs file system will be mounted.

     static char *ealargs[] = {        "hello_world",        "-c 0x1",        "-n 4",        "--proc-type=auto", }; rte_eal_init(sizeof(ealargs) / sizeof(ealargs[0]), ealargs); 

  2. Create a query buffer pool that is used inside the SPDK to store the data of each I / O request.

     request_mempool = rte_mempool_create("nvme_request", 8192,         spdk_nvme_request_size(), 128, 0,         NULL, NULL, NULL, NULL,         SOCKET_ID_ANY, 0); 

  3. Check the system for the presence of NVMe devices.

     rc = spdk_nvme_probe(NULL, probe_cb, attach_cb, NULL); 

  4. We list the NVMe devices, returning a SPDK boolean value indicating whether the device should be attached.

     static bool probe_cb(void *cb_ctx, struct spdk_pci_device *dev, struct spdk_nvme_ctrlr_opts *opts) {    printf("Attaching to %04x:%02x:%02x.%02x\n",    spdk_pci_device_get_domain(dev),    spdk_pci_device_get_bus(dev),    spdk_pci_device_get_dev(dev),    spdk_pci_device_get_func(dev));    return true; } 

  5. The device is attached. Now you can request data on the number of namespaces.

     static void attach_cb(void *cb_ctx, struct spdk_pci_device *dev, struct spdk_nvme_ctrlr *ctrlr, const struct spdk_nvme_ctrlr_opts *opts) {   int nsid, num_ns; const struct spdk_nvme_ctrlr_data *cdata = spdk_nvme_ctrlr_get_data(ctrlr); printf("Attached to %04x:%02x:%02x.%02x\n",      spdk_pci_device_get_domain(dev),      spdk_pci_device_get_bus(dev),      spdk_pci_device_get_dev(dev),      spdk_pci_device_get_func(dev)); snprintf(entry->name, sizeof(entry->name), "%-20.20s (%-20.20s)", cdata->mn, cdata->sn); num_ns = spdk_nvme_ctrlr_get_num_ns(ctrlr); printf("Using controller %s with %d namespaces.\n", entry->name, num_ns); for (nsid = 1; nsid <= num_ns; nsid++) { register_ns(ctrlr, spdk_nvme_ctrlr_get_ns(ctrlr, nsid)); } } 

  6. We list the namespaces in order to get information about them, for example, such as size.

     static void register_ns(struct spdk_nvme_ctrlr *ctrlr, struct spdk_nvme_ns *ns) { printf("  Namespace ID: %d size: %juGB\n", spdk_nvme_ns_get_id(ns),   spdk_nvme_ns_get_size(ns) / 1000000000); } 

  7. Create a queue pair of input / output to send a read / write request to the namespace.

     ns_entry->qpair = spdk_nvme_ctrlr_alloc_io_qpair(ns_entry->ctrlr, 0); 

â–ŤRead / write data


  1. Allocate a buffer for the data to be read / written.

     sequence.buf = rte_zmalloc(NULL, 0x1000, 0x1000); 

  2. Copy the string “Hello World” to the clipboard.

     sprintf(sequence.buf, "Hello world!\n"); 

  3. Send a write request to the specified namespace, providing a pair of queues, a pointer to a buffer, an LBA index, a callback function that will work after the data has been written, and a pointer to the data that must be passed to the callback function.

     rc = spdk_nvme_ns_cmd_write(ns_entry->ns, ns_entry->qpair, sequence.buf,   0, /*  LBA */  1, /*   */  write_complete, &sequence, 0); 

  4. The callback function, after the completion of the write process, will be called synchronously.

  5. Send a read request to the specified namespace, providing the same set of service data that was used for the write request.

     rc = spdk_nvme_ns_cmd_read(ns_entry->ns, ns_entry->qpair, sequence->buf,      0, /* LBA start */  1, /* number of LBAs */      read_complete, (void *)sequence, 0); 

  6. The callback function, after the completion of the reading process, will be called synchronously.

  7. Check the flag that indicates the completion of the read and write operations. If the request is still being processed, we can check the status of the specified pair of queues. Although the actual read and write operations are performed asynchronously, the spdk_nvme_qpair_process_completions function checks the progress and returns the number of completed I / O requests, and also calls the callback functions to signal the completion of the read and write procedures described above.

     while (!sequence.is_completed) {      spdk_nvme_qpair_process_completions(ns_entry->qpair, 0); } 

  8. Free up a couple of queues and other resources before leaving.

     spdk_nvme_ctrlr_free_io_qpair(ns_entry->qpair); 

Here is the complete code for the example that was parsed here, posted on GitHub. On the spdk.io website, you can find the documentation for the API SPDK NVMe.

After launching our “Hello World” the following should be displayed:


The results of the “Hello World” example

Other examples included in SPDK


The SPDK includes many examples that are designed to help programmers quickly understand how the SPDK works and begin developing their own projects.

Here, for example, the results of the perf example, which tests the performance of the NVMe disk.


Sample perf testing NVMe disk performance

Developers who need access to information on NVMe disks, such as functionality, attributes of the administrative command set, attributes of the NVMe command set, data on power management, information about the technical state of the device, can use an example to identify .


Example identify, displaying NVMe-disk information

findings


We talked about how to use the SPDK and the driver working in the Linux user space to work with NVMe disks. This approach minimizes additional delays caused by the use of kernel mechanisms for accessing storage devices, which increases the speed of data transfer between the drive and the system.

If you are interested in developing high-performance applications that work with disk drives using SPDK, here you can subscribe to the SPDK newsletter. And here are some useful videos.

Source: https://habr.com/ru/post/315906/


All Articles