laitimes

How do I use io_uring to build responsive I/O-intensive applications?

This article is shared with HUAWEI CLOUD Community "How to Build Responsive I/O-Intensive Applications with io_uring - Cloud Community-HUAWEI CLOUD", written by Lion Long.

When it comes to building responsive, I/O-intensive applications, io_uring technology shows its outstanding potential. This article summary provides an in-depth look at how to take full advantage of io_uring features to optimize application performance. Through asynchronous I/O operations and efficient event handling, io_uring provides developers with a powerful tool to significantly reduce I/O latency and achieve higher throughput.

1. Synchronous and asynchronous

Used to describe the relationship between the two, it is a reference material that exists at the same time.

Synchronization: Synchronization means that when a request is initiated, the call does not return until the result is returned. Similar to the concept of serial.

Asynchronous: The concept of async is as opposed to synchronous, when a request is made, the call returns immediately, without waiting for the result, the actual returned result is handled by another thread/process. Similar to the concept of parallelism.

How do I use io_uring to build responsive I/O-intensive applications?

2. io_uring system calls

io_uring support starts with the LINIX 5.1 kernel, but it does not reach better support until after LINIX 5.10, so when using io_uring programming, it is best to use LINIX 5.10 later. To upgrade the Linux kernel, you can refer to the superuser guide: Easily upgrade your Ubuntu Linux kernel version_ubuntu Upgrade the kernel_Lion Long's blog - CSDN blog.

The kernel provides three interfaces, function prototypes:

#include <linux/io_uring.h>

int io_uring_setup(u32 entries,struct io_uring_params *p);

int io_uring_register(unsigned int fd,unsigned int opcode,void *arg,unsigned int nr_args);

int io_uring_entry(unsigned int fd,unsigned int to_submit,

unsigned int min_complete,unsigned int flags,sigset_t *sig);           

2.1、io_uring_setup

Function prototype:

#include <linux/io_uring.h>

int io_uring_setup(u32 entries,struct io_uring_params *params);           

A system call that sets the commit queue (SQ) and completion queue (CQ), which contain at least entries and returns a file descriptor that can be used to perform subsequent operations on the io_urine instance. SQ and CQ are shared between user space and the core, which reduces the consumption of copying data when starting and completing I/O.

parameter meaning
entries The number of queue elements
params Configure io_uring pass options to the kernel, which uses params to pass information about the ring buffer.

Returns a new file descriptor on success. The application can then provide file descriptors in subsequent mmap system calls to map submission queues and completion queues, or pass to io_uring_register()/io_uring_enter() system calls.

When an error occurs, a negative error code is returned. The caller should not rely on the errno variable.

Error code meaning
EFAULT The parameter is beyond your accessible address space.
EINVAL The resv array contains non-zero data, p.flags contains non-zero support flags, and entries go out of bounds.
EMFILE The limit on the number of open file descriptors per process has been reached.
PUT The system-wide limit on the total number of open files has been reached.
ENOMEM Insufficient available kernel resources.
EPERM A IORING_SETUP_SQPOLL has been specified, but the caller's valid user ID does not have sufficient permissions.

2.2、io_uring_register

Function prototype:

#include <linux/io_uring.h>

int io_uring_register(unsigned int fd,unsigned int opcode,void *arg,unsigned int nr_args);           

Registering a file or user buffer for asynchronous I/O, enabling the kernel to hold a reference to the file's data structure inside the kernel for a long time, or creating a long-term mapping of application memory, is performed only once at registration time, rather than being processed per I/O request, thus greatly reducing the per-IO overhead.

parameter meaning
fd File descriptor , which is the fd returned by io_uring_setup
opcode Operation code

Returns 0 on success.

When an error occurs, a negative error code is returned. The caller should not rely on the errno variable.

2.3、io_uring_enter

#include <linux/io_uring.h>

int io_uring_enter(unsigned int fd,unsigned int to_submit,

unsigned int min_complete,unsigned int flags,sigset_t *sig);           

This system call uses shared SQ and CQ initiation and complete I/O.

Single call executes simultaneously: submits a new I/O request; Wait for the I/O to complete.

Parameter:

  • fd: io_uring_setup returned file descriptor.
  • to_submit: Specifies the number of I/Os to commit from the commit queue.

The number of I/Os used is successfully returned. The value can be zero if the to_submit is zero or the submission queue is empty. Note that if a IORING_SETUP_SQPOLL is specified when the ring is created, the return value is usually the same as the to_submit because the commit occurs outside the context of the system call.

Errors related to submitting queue entries are returned through the completion queue entry, not through the system call itself.

Errors that do not represent submitting queue entries are returned directly through a system call. When this error occurs, a negative error code is returned. The caller should not rely on the errno variable.

More information can be viewed in the man io_uring_enter

2.4. Implementation process

How do I use io_uring to build responsive I/O-intensive applications?

3. io_uring related structures

3.1 Struct io_uring_params Structure

struct io_uring_params {

__u32 sq_entries;

__u32 cq_entries;

__u32 flags;

__u32 sq_thread_cpu;

__u32 sq_thread_idle;

__u32 features;

__u32 wq_fd;

__u32 resv[3];

struct io_sqring_offsets sq_off;

struct io_cqring_offsets cq_off;

};           

The flags, sq_thread_cpu, and sq_ thread_idle fields are used to configure io_uring instance. flags is a bitmask where 0 or more of the following values are OR:

sign meaning
IORING_SETUP_IOPOLL Perform a busy wait for I/O to complete instead of getting notified by sending a notification via an asynchronous IRQ (interrupt request).
IORING_SETUP_SQPOLL Create a kernel thread to perform the commit queue polling
IORING_SETUP_SQ_AFF The polling thread binds to the CPU set in the sq_thread_cpu field of the fabric io_uring_params.
IORING_SETUP_CQSIZE Use struct io_uring_params to create a completion queue
IORING_SETUP_CLAMP If this flag is specified, and if the entry exceeds the IORING_MAX_ENTERIES, the entry will be clamped on the IORING_MAX_ENTERIES. If the flag IORING _SETUP_SQPOLL is set, and if the value of struct io_uring_params cq_entries exceeds IORING_MAX_ENTERIES, it will be clamped at the IORING_MAX_ENTERIES.
IORING_SETUP_ATTACH_WQ This flag should be set with the struct io_uring_params
IORING_SETUP_R_DISABLED If this flag is specified, the io_uring ring starts in a disabled state.

3.2 struct io_cqring_offsets structure

struct io_cqring_offsets {

__u32 head;

__u32 tail;

__u32 ring_mask;

__u32 ring_entries;

__u32 overflow;

__u32 cqes;

__u32 flags;

__u32 resv[3];

};           

3.3 struct io_uring_sqe structure

struct io_uring_sqe {

__u8 opcode; /* type of operation for this sqe */

__u8 flags; /* IOSQE_ flags */

__u16 ioprio; /* ioprio for the request */

__s32 fd; /* file descriptor to do IO on */

union {

__u64 off; /* offset into file */

__u64 addr2;

};

union {

__u64 addr; /* pointer to buffer or iovecs */

__u64 splice_off_in;

}

__u32 len; /* buffer size or number of iovecs */

union {

__kernel_rwf_t rw_flags;

__u32 fsync_flags;

__u16 poll_events; /* compatibility */

__u32 poll32_events; /* word-reversed for BE */

__u32 sync_range_flags;

__u32 msg_flags;

__u32 timeout_flags;

__u32 accept_flags;

__u32 cancel_flags;

__u32 open_flags;

__u32 statx_flags;

__u32 fadvise_advice;

__u32 splice_flags;

__u32 rename_flags;

__u32 unlink_flags;

__u32 hardlink_flags;

};

__u64 user_data; /* data to be passed back at completion time */

union {

struct {

/* index into fixed buffers, if used */

union {

/* index into fixed buffers, if used */

__u16 buf_index;

/* for grouped buffer selection */

__u16 buf_group;

}

/* personality to use, if used */

__u16 personality;

union {

__s32 splice_fd_in;

__u32 file_index;

};

};

__u64 __pad2[3];

};

};           

3.4 struct io_uring_cqe structure

struct io_uring_cqe {

__u64 user_data; /* sqe->data submission passed back */

__s32 res; /* result code for this event */

__u32 flags;

};           

Fourth, Liburing library installation

(1) Download the source code.

git clone https://github.com/axboe/liburing.git           

(2) Enter liburing.

cd liburing           

(3) Configuration.

./configure           

(4) Compile and install.

make && sudo make install           

(5) To compile the application, be sure to specify the library -luring -D_GUN_SOURCE.

gcc -o io_uring_test io_uring_test.c -luring -D_GUN_SOURCE           

5. Interface provided by Liburing

5.1、io_uring_queue_init_params

Function prototype:

#include <liburing.h>

int io_uring_queue_init(unsigned entries,

struct io_uring *ring,

unsigned flags);

int io_uring_queue_init_params(unsigned entries,

struct io_uring *ring,

struct io_uring_params *params);           

The io_uring_queue_init() function executes a io_uring_setup() system call to initialize the commit and finish queues in the kernel, which contain at least entries in the commit queue, and then maps the resulting file descriptors to the memory shared between the application and the kernel.

By default, the CQ ring will have twice the number of entries specified by the SQ ring entry. This is sufficient for regular file or storage workloads, but may be too small for network workloads. The SQ ring entry does not limit the number of requests in the process that the ring can support, it just limits the number of one (batch) commit to the kernel. If the CQ ring overflows, for example, more entries are generated than in the ring before the application can acquire them, the ring enters the CQ ring overflow state. This is indicated by setting the IORING_SQ_CQ_OVERFLOW in the SQ ring flag. Entries are not deleted unless the kernel runs out of available memory, but this is a much slower completion path that slows down request processing. Therefore, this should be avoided and the size of the CQ ring is appropriate for the workload. Setting the cq_entries in struct io_uring_params will tell the kernel to allocate so many entries for the CQ ring, regardless of the SQ ring size in a given entry. If the value is not a power of 2, it will be rounded to the nearest power of 2.

On success, io_uring_queue_init returns 0 and the ring points to the shared memory containing the io_RUING queue. Returns -errno on failure. flags will be passed to io_uring_setup system call.

If io_uring_queue_init_params() is used, the parameters indicated by params are passed directly to io_uring_setup system call. After success, 0 is returned, and the resources held by the ring should be released by the corresponding call to the io_uring_queue_exit. Returns -errno on failure.

5.2、io_uring_get_sqe

Function prototype:

#include <liburing.h>

struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring);           
  • Get the next available commit queue entry from the commit queue that belongs to the ring parameter.
  • Returns a pointer to the submission queue entry on success.
  • Returns NULL on failure.

If a commit queue entry is returned, it should be filled in by one of the preparation functions such as io_uring_prep_read() and submitted via io_ uring_submit().

If NULL is returned, the SQ ring is currently full and entries must be submitted for processing before a new entry can be assigned.

5.3、io_uring_prep_accept

Function prototype:

#include <sys/socket.h>

#include <liburing.h>

void io_uring_prep_accept(struct io_uring_sqe *sqe,

int sockfd,

struct sockaddr *addr,

socklen_t *addrlen,

int flags);

void io_uring_prep_accept_direct(struct io_uring_sqe *sqe,

int sockfd,

struct sockaddr *addr,

socklen_t *addrlen,

int flags,

unsigned int file_index);

void io_uring_prep_multishot_accept(struct io_uring_sqe *sqe,

int sockfd,

struct sockaddr *addr,

socklen_t *addrlen,

int flags);

void io_uring_prep_multishot_accept_direct(struct io_uring_sqe *sqe,

int sockfd,

struct sockaddr *addr,

socklen_t *addrlen,

int flags);           

These functions prepare an asynchronous accept() request.

The io_uring_prep_accept() function is ready to accept the request. The commit queue entry sqe is set to start accepting connection requests described by the socket address and structure length addrlen at addr using the file descriptor sockfd, and use the modifier flag in the flag.

Note: io_uring_prep_accept(), like any request that passes data in a structure, the data must remain valid until the request is successfully submitted. It does not need to remain valid until it is completed. Once the request is submitted, the kernel state is stable.

5.4、io_uring_prep_recv

Function prototype:

#include <liburing.h>

void io_uring_prep_recv(struct io_uring_sqe *sqe,

int sockfd,

void *buf,

size_t len,

int flags);

void io_uring_prep_recv_multishot(struct io_uring_sqe *sqe,

int sockfd,

void *buf,

size_t len,

int flags);           

Description:

io_uring_prep_recv() function prepares the recv request. The commit queue entry sqe is set to use the file descriptor sockfd to start receiving data into a buffer destination buf of size size with modification flags.

This function is used to prepare an asynchronous recv() request.

The multishot version allows an application to make a single receive request that repeatedly publishes the CQI when data is available.

5.5、io_uring_prep_send

Function prototype:

#include <liburing.h>

void io_uring_prep_send(struct io_uring_sqe *sqe,

int sockfd,

const void *buf,

size_t len,

int flags);           

Description:

The io_uring_prep_send() function is ready to send the request. The commit queue entry sqe is set to use the file descriptor sockfd to start sending size data from buf with the modifiation flags.

This function is used to prepare an asynchronous send() request.

5.6、io_uring_submit (Important)

Function prototype:

#include <liburing.h>

int io_uring_submit(struct io_uring *ring);           

Description:

  • Submit the next event to the submission queue belonging to the ring.
  • After the caller retrieves the submission queue entry (SQE) using io_uring_get_sqe() and prepares the SQE with one of the provided helpers, it can be submitted using io_ uring_ submit().

Return value:

  • Returns the number of submission queue entries submitted on success.
  • Returns -errno on failure.

5.7、io_uring_submit_and_wait (Important)

Function prototype:

#include <liburing.h>

int io_uring_submit_and_wait(struct io_uring *ring,unsigned wait_nr);           

Description:

  • Submit the next event to the submission queue that belongs to the ring and wait for the wait_nr to complete the event.
  • After the caller retrieves the Submission Queue Entry (SQE) using io_uring_get_sqe() and prepares the SQE, it can be submitted using io_ uring_ submit_and_wait().

Return value:

  • Returns the number of submission queue entries submitted on success.
  • Returns -errno on failure.

5.8、io_uring_wait_cqe

Function prototype:

#include <liburing.h>

int io_uring_wait_cqe(struct io_uring *ring,

struct io_uring_cqe **cqe_ptr);           

Description:

  • Wait for IO from the queue belonging to the ring parameter to complete, if necessary. If an event is already available in the ring at the time of the call, no wait occurs. Enter the cqe_ptr parameters if successful.
  • After the caller submits a request with io_uring_submit(), the application can use io_uring_wait_cqe retrieval completion.

Return value:

  • If successful, return 0 and fill in the cqe_ptr parameters. Returns -errno on failure. The return value indicates that the result of the CQE is waiting and has nothing to do with the CQE result itself.

5.9、io_uring_peek_batch_cqe

#include <liburing.h>

int io_uring_peek_cqe(struct io_uring *ring,

struct io_uring_cqe **cqe_ptr);



int io_uring_peek_batch_cqe(struct io_uring *ring,

struct io_uring_cqe **cqe_ptr,

int count);           

Description:

  • The io_uring_peek_cqe() function returns IO completion (if any) from the queue belonging to the ring parameter. Upon successful return, the cqe_ptr parameter is populated with a valid CQE entry.
  • This function does not enter the kernel to wait for an event and only returns an event when it is already available in the CQ ring.
  • io_uring_peek_batch_cqe is a encapsulation of the io_uring_peek_cqe, which means that at most the count IO from the ring parameter is obtained at a time.

Return value:

  • On success io_uring_peek_cqe () returns 0 and fills in the cqe_ptr parameters.
  • Returns -EAGAIN on failure.
  • When successful io_uring_peek_cqe () returns the number of IO completions and fills in the cqe_ptr parameters.
  • Returns -EAGAIN on failure.

5.10、io_uring_cq_advance

#include <liburing.h>

void io_uring_cq_advance(struct io_uring *ring,

unsigned nr);           

Description:

The io_uring_cq_advance() function marks nr IO completions belonging to the ring parameter as consumption.

After the caller has submitted a request using io_uring_submit(), the application can retrieve the completion using io_ uring_ wait_cqe(), io_uring_peek_cqe(), or any other CQE retrieval helper and mark it as io_uring_cqe_seen().

Function io_uring_cqe_seen() calls the io_ uring_cq_advance() function.

Completions must be marked as visible so that they can be reused in the slots. Failure to do so will result in the same completion being returned on the next call.

5.11、More interfaces

Except that bind does not have an asynchronous interface, everything else basically has. For example, io_uring_prep_connect (), io_uring_prep_close (), and so on.

TCP server implementation based on liburing

The io_uring used in the application layer mainly uses the Liburing library, which provides a rich user interface, and the underlying call is the system call io_uring three kernels.

Sample code for liburing-based TCP server implementation:

#include <stdio.h>

#include <unistd.h>

#include <sys/socket.h>

#include <netinet/in.h>

#include <string.h>

#include <liburing.h>

#define ENTRIES_LENGTH 4096

#define RING_CQE_NUMBER 10

#define BUFFER_SIZE 1024

struct conninfo

{

int connfd;

int type;

};

enum

{

READ,

WRITE,

ACCPT,

};

void set_accept_event(struct io_uring *ring,int fd,

struct sockaddr* clientaddr, socklen_t *len,unsigned flags)

{

struct io_uring_sqe *sqe = io_uring_get_sqe(ring);//·µ»Ø¶ÓÁÐÊ×µØÖ·

io_uring_prep_accept(sqe, fd, clientaddr, len, flags);

struct conninfo ci = {

.connfd = fd,

.type = ACCPT

};

memcpy(&sqe->user_data, &ci, sizeof(struct conninfo));

}

void set_read_event(struct io_uring *ring, int fd, void *buf, size_t len, int flags)

{

struct io_uring_sqe *sqe = io_uring_get_sqe(ring);//·µ»Ø¶ÓÁÐÊ×µØÖ·

io_uring_prep_recv(sqe, fd, buf, len, flags);

struct conninfo ci = {

.connfd = fd,

.type = READ

};

memcpy(&sqe->user_data, &ci, sizeof(struct conninfo));

}

void set_write_event(struct io_uring *ring, int fd, void *buf, size_t len, int flags)

{

struct io_uring_sqe *sqe = io_uring_get_sqe(ring);//·µ»Ø¶ÓÁÐÊ×µØÖ·

io_uring_prep_send(sqe, fd, buf, len, flags);

struct conninfo ci = {

.connfd = fd,

.type = WRITE

};

memcpy(&sqe->user_data, &ci, sizeof(struct conninfo));

}

int main()

{

int listenfd = socket(AF_INET, SOCK_STREAM, 0);

if (listenfd == -1)

return -1;

struct sockaddr_in serveraddr,clientaddr;

serveraddr.sin_family = AF_INET;

serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);

serveraddr.sin_port = htons(9999);

if (-1 == bind(listenfd, (struct sockaddr*)&serveraddr, sizeof(serveraddr)))

return -2;

listen(listenfd, 10);

struct io_uring_params params;

memset(¶ms,0,sizeof(params));

struct io_uring ring;

io_uring_queue_init_params(ENTRIES_LENGTH,&ring,¶ms);

socklen_t clientlen = sizeof(clientaddr);



set_accept_event(&ring, listenfd, (struct sockaddr*)&clientaddr,&clientlen, 0);

char buffer[BUFFER_SIZE] = { 0 };

while (1)

{

struct io_uring_cqe *cqe;

io_uring_submit(&ring);

int ret = io_uring_wait_cqe(&ring,&cqe);

struct io_uring_cqe *cqes[RING_CQE_NUMBER];

int cqecount=io_uring_peek_batch_cqe(&ring, cqes, RING_CQE_NUMBER);

int i = 0;

unsigned count =0;

for (i = 0; i < cqecount; i++)

{

count++;

cqe = cqes[i];

struct conninfo ci;

memcpy(&ci, &cqe->user_data, sizeof(struct conninfo));

if (ci.type == ACCPT)

{

int connfd = cqe->res;



set_read_event(&ring, connfd, buffer, BUFFER_SIZE, 0);



set_accept_event(&ring, listenfd, (struct sockaddr*)&clientaddr,&clientlen, 0);

}

else if (ci.type == READ)

{

int bufsize = cqe->res;

if(bufsize==0)

{

close(ci.connfd);

}

else if(bufsize<0)

{



}

else{

//set_read_event(&ring, ci.connfd, buffer, 1024, 0);

printf("buff: %s\n",buffer);

set_write_event(&ring, ci.connfd, buffer, bufsize, 0);

}

}

else if(ci.type == WRITE)

{

set_read_event(&ring, ci.connfd, buffer, BUFFER_SIZE, 0);

}



}

io_uring_cq_advance(&ring,count);

}

return 0;

}           

summary

  1. The io_uring is mainly composed of three parts: the three system call interfaces (io_uring_setup, io_uring_register, and io_uring_enter) provided by the kernel, the system calls implemented by the kernel, and the liburing library provided by the application layer.
  2. io_uring_submite will execute functions such as accept, recv, and send in the underlying protocol stack. From the test results of the fio test disk IO, the IOPS of the io_uring is the same as libaio, twice that of psync.
  3. io_uring asynchronous operations are completed under the kernel, and user-mode calls to the API do not feel the asynchronous operations.
  4. io_uring implement the block diagram of the function call of the TCP server:
How do I use io_uring to build responsive I/O-intensive applications?

Extra!

How do I use io_uring to build responsive I/O-intensive applications?

HUAWEI WILL HOLD THE 8TH HUAWEI CONNECT 2023 CONFERENCE AT THE SHANGHAI EXPO EXHIBITION AND CONVENTION CENTER AND THE SHANGHAI EXPO CENTER FROM SEPTEMBER 20 TO 22, 2023. With the theme of "Accelerating Industry Intelligence", the conference invited thought leaders, business elites, technical experts, partners, developers and other industry colleagues to discuss how to accelerate industry intelligence from business, industry, and ecology.

We sincerely invite you to visit the scene to share the opportunities and challenges of intelligence, discuss the key measures of intelligence, and experience the innovation and application of intelligent technology. You can:

  • In 100+ keynote speeches, summits, and forums, collisions accelerate the perspective of industry intelligence
  • Visit the 17,000-square-meter exhibition area to experience the innovation and application of intelligent technology in the industry up close
  • Meet technical experts face-to-face to learn about the latest solutions, develop tools, and get hands-on
  • Find business opportunities with customers and partners

Thank you for your continued support and trust, we warmly look forward to meeting you in Shanghai.

Official website of the conference: https://www.huawei.com/cn/events/huaweiconnect

Welcome to follow the official account of the HUAWEI CLOUD Developer Alliance to get the conference agenda, exciting activities, and cutting-edge dry goods.

Follow #HUAWEI CLOUD Developer Alliance# Click below to learn about HUAWEI CLOUD fresh technologies~

HUAWEI CLOUD Blog_Big Data Blog_AI Blog_Cloud Computing Blog_Developer Center-HUAWEI CLOUD