[CSDN Editor's Note] Two years ago, C++20 was officially released. In this release, developers have finally ushered in the coroutine feature, which can make the code very refreshing and easy to understand, while maintaining asynchronous high performance. However, many developers bluntly say that the C++ coroutine standard is used by library developers, which is very complex and not friendly to ordinary developers at all. In this article, Qi Yu, a senior C++ technical expert, based on the stackless coroutine standard used in C++20, shares the specific application practice and experience of coroutine with specific examples.

Author | Qi Yu, Xu Legend, han Yao, editors | Tu Min

Exhibiting | CSDN（ID：CSDNnews）

After years of brewing, arguing, and preparing, coroutines finally entered the C++20 standard.

After Google's "defeat", C++20 entered the era of co-process with Microsoft's proposal

The stackless coroutine proposed and led by Microsoft became the C++20 coroutine standard

Coroutines are not a new concept, they are decades old and have existed in many other programming languages (Python, C#, Go).

Coroutines are divided into two types: stackless coroutines and stacked coroutines, no stack refers to functions that can be suspended/resumed, and stack coroutines are equivalent to user-state threads. The cost of a stacked coroutine switch is the cost of a user-state thread switch, while the cost of a stack-less cod switch is equivalent to the cost of a function call.

The difference between a stackless coroutine and a thread: a stackless coroutine can only be called by a thread, and it does not preempt kernel scheduling itself, while a thread can preempt kernel scheduling.

The C++20 coroutine is adopted by Microsoft (derived from C#) stackless coroutine. Many people oppose this feature, and the main slots include: difficult to understand, too flexible, performance problems caused by dynamic allocation, and so on. Google launched a series of complaints about the proposal and tried to give a stacked program. Stacked coroutines are much lighter than system-level threads, but much worse than stackless coroutines.

Since the design philosophy of C++ is "Zero Overhead Abstractions", eventually stackless coroutines became the C++20 coroutine standard.

The two main themes of the evolution of today's C++ world are asynchronization and parallelism. The C++20 coroutine can write asynchronous code in synchronous syntax, making it a good tool for writing asynchronous code, and the coroutine of asynchronous libraries will be the trend of the times, so it is necessary to master C++20 coroutines.

Let's use a simple example to show the "magic" of the coroutine.

async_resolve({host, port}, (auto endpoint){
async_connect(endpoint, (auto error_code){
async_handle_shake((auto error_code){
send_data_ = build_request;

async_write(send_data_, (auto error_code){
async_read;
});
});
});
});

voidasync_read {
async_read(response_, (auto error_code){
if(!finished) {
append_response(recieve_data_);
async_read;
}else {
std::cout<<"finished ok\n";
}
});
}

Pseudocode for asynchronous clients based on callbacks

The client process based on asynchronous callbacks is as follows:

Asynchronous domain name resolution
Asynchronous connection
Asynchronous SSL handshake
Send data asynchronously
Receive data asynchronously

This code has a lot of callback functions, there are some pitfalls when using callbacks, such as how to ensure safe callbacks, how to make asynchronous reads achieve asynchronous recursive calls, if combined with asynchronous business logic, the nesting level of callbacks will be deeper, we have seen the shadow of callback hell! Some readers may feel that this level of asynchronous callbacks is acceptable, but if the project becomes larger, the business logic becomes more complex, and the callback level becomes deeper and deeper, it is difficult to maintain.

Let's take a look at how to write this code in coroutines:

auto endpoint = co_await async_query({host, port});
auto error_code = co_await async_connect(endpoint);
error_code = co_await async_handle_shake;
send_data = build_request;
error_code = co_await async_write(send_data);
while(true) {
co_await async_read(response);
if(finished) {
std::cout<<"finished ok\n";
break;
}

append_response(recieve_data_);
}

Asynchronous client based on C++20 coroutines

The same is an asynchronous client, compared to the asynchronous client of the callback pattern, the whole code is very refreshing, simple and easy to understand, while maintaining the high performance of the asynchronous, which is the power of C++20 coroutine!

I believe that after reading this example, you should no longer want to use asynchronous callbacks to write code, it is time to embrace the coroutine!

After Google's "defeat", C++20 entered the era of co-process with Microsoft's proposal

C++20 Why choose stackless coroutines?

The usual implementation of stackful coroutines is to allocate a large memory space (such as 64K) on the heap in advance, which is the so-called "stack" of the coroutine, and parameters, return addresses, etc. can be stored on this "stack" space. If corvette switching is required, the system is made to think that the space on the heap is an ordinary stack through a form such as swapcontext, which realizes the switching of the context.

The biggest advantage of stack coroutine is that it is intrusive, very easy to use, and the existing business code hardly needs to be modified, but C++20 finally chose to use stackless coroutine, mainly for the following considerations.

Stack space limitations

The "stack" space with stack coroutine is generally relatively small, and there is a risk of stack overflow in use; and if the "stack" space becomes large, the memory space is a great waste. Stackless coroutines have no such limitations, no risk of overflow, and no need to worry about memory utilization.

performance

Stacked coroutines are indeed lighter than system threads when switching, but they are still heavier than stackless coroutines, which is not so much affected in our current actual use (the use of asynchronous systems is usually accompanied by IO, which is several orders of magnitude more expensive than switching), but it also determines that stackless coroutines can be used in some more interesting scenarios. For example, Gor Nishanov, author of the C++20 coroutines proposal, demonstrated at CppCon 2018 that stackless coroutines can achieve nanosecond switching, and implemented a feature based on this feature to reduce Cache Miss.

Stackless coroutines are generalizations of ordinary functions

A stackless coroutine is a function that can be paused and resumed, a generalization of function calls.

Why?

We know that the function body of a function is executed sequentially, and after execution, the result is returned to the caller, and we have no way to suspend it and resume it later, but to wait for it to end. Stackless coroutines, on the other hand, allow us to suspend a function and then resume and execute the function body at any time we need it, compared to ordinary functions, the function body of the coroutine can suspend and resume execution at any moment.

So, from this point of view, a stackless coroutine is a generalization of ordinary functions.

After Google's "defeat", C++20 entered the era of co-process with Microsoft's proposal

C++20 Coroutine's "Little Talk"

C++20 provides three new keywords (co_await, co_yield, and co_return) that are a coroutine if one of these keywords exists in a function.

The compiler generates a lot of code for coroutines to implement coroutine semantics. What kind of code is generated? How do we implement the semantics of the coroutine? What is the creation of coroutines? What is the mechanism of co_await? Before exploring these issues, let's take a look at some of the basic concepts related to C++20 coroutines.

Coroutine-related objects

Coroutine frame

When the caller calls a coroutine, it first creates a coroutine frame, which constructs a promise object, and then generates a return object through the promise object.

These are the main contents in coroutine frames:

Coroutine parameters
Local variables
Promise object

These are needed when the coroutine resumes running, and the caller accesses the coroutine frame through the handle std::coroutine_handle of the coroutine frame.

promise_type

promise_type is the type of the promise object. promise_type is used to define the behavior of a class of coroutines, including how coroutines are created, the behavior of coroutine initialization completes and ends, the behavior when an exception occurs, the behavior of how awaiter is generated, the behavior of co_return, and so on. The promise object can be used to record/store the state of a coroutine instance. Each coroutine frame corresponds to each promise object and each coroutine instance.

coroutine return object

It is created promise.get_return_object method, a common implementation that stores the coroutine_handle inside a coroutine object so that the return object gains access to coroutines.

std::coroutine_handle

Handles to coroutine frames, which are primarily used to access the underlying coroutine frames, restore coroutines, and release cortication frames. Programmers can wake up the coroutine by calling std::coroutine_handle::resume.

co_await、awaiter、awaitable

co_await: unary operator;
awaitable: supports the type of co_await operator;
awaiter: Defines the types of await_ready, await_suspend, and await_resume methods.

co_await expr is often used to mean waiting for a task (which may or may not be lazy) to complete. co_await expr, the type of expr needs to be an awaitable, and the specific semantics of the co_await expression depend on the awaiter generated from the awaitable.

It seems that there are many objects related to the coroutine, which is where the coroutine is complex and flexible, and these objects can be used to achieve complete control over the coroutine and realize any idea. However, you need to understand how these objects work together first, and if you figure this out, the principle of coroutine will be mastered, and you will be able to write coroutine applications with ease.

How coroutine objects collaborate

Show how these coroutine objects work together in a simple code:

Return_t foo  { 
auto res = co_await awaiter;
co_return res ;
}

Return_t：promise return object。

awaiter: Wait for a task to complete.

Coroutine operation flowchart

The light blue part of the diagram is the function of the Return_t associated promise object, and the light red part is the awaiter that co_await waits.

This process is driven by the code generated by the compiler from the coroutine functions and is divided into three parts:

Coroutine creation;
co_await awaiter waiting for the task to complete;
Gets the coroutine return value and releases the coroutine frame.

Creation of coroutines

Return_t foo  { 
auto res = co_await awaiter;
co_return res ;
}

The foo coroutine generates template code (pseudocode) like this, and the creation of coroutines produces similar code:

{
co_await promise.initial_suspend;
try
{
coroutine body;
}
catch (...)
{
promise.unhandled_exception;
}
FinalSuspend:
co_await promise.final_suspend;
}

The coroutine needs to be created first, and whether it is suspended after the coroutine is created is determined by the return type of the caller setting initial_suspend.

The process for creating a coroutine is as follows:

Create a coroutine frame
Constructs a promise object in a coroutine frame
Copy the coroutine parameters into the coroutine frame
Calling promise.get_return_object returns an object to the caller, which is the Return_t object in the code

There are some customizable points in this template framework: such as initial_suspend, final_suspend, unhandled_exception, and return_value.

We can control whether coroutines are suspended by the initial_suspend and final_suspend return types of promises, handle exceptions in the unhandled_exception, and save coroutine return values in return_value.

You can customize the return objects of initial_suspend and final_suspend as needed to decide whether you need to suspend coroutines. If the coroutine is suspended, control of the code is returned to caller, otherwise execution of the coroutine function body continues.

It's also worth noting that if you disable exceptions, there won't be try-catch in the generated code. At this point, the efficiency of the coroutine is almost equal to that of the non-coroutine version of the ordinary function. This is important in embedded scenarios and is one of the design purposes of the coroutine.

co_await mechanism

co_await operator is a new keyword in C++20, co_await expr generally means waiting for a lazy evaluation task, this task may be executed in a thread, may also be executed in the OS kernel, when the execution ends do not know, for performance, we do not want to block waiting for this task to complete, so with the help of co_await the coroutine suspended and returned to the caller, the caller can continue to do things, When the task is completed, the coroutine resumes and gets the result returned co_await.

Therefore, co_await generally has these functions:

Suspend corporal;
Return to the caller;
Returns the result of a task after waiting for a task (which may be lazy or non-lazy) to complete.

The compiler generates code like this based on co_await expr:

{
auto&& value = <expr>;
auto&& awaitable = get_awaitable(promise, static_cast<decltype(value)>(value));
auto&& awaiter = get_awaiter(static_cast<decltype(awaitable)>(awaitable));
if (!awaiter.await_ready) //是否需要挂起协程
{
using handle_t = std::experimental::coroutine_handle<P>;

using await_suspend_result_t =
decltype(awaiter.await_suspend(handle_t::from_promise(p)));

<suspend-coroutine> //挂起协程

if constexpr(std::is_void_v<await_suspend_result_t>)
{
awaiter.await_suspend(handle_t::from_promise(p)); //异步(也可能同步)执行task
<return-to-caller-or-resumer> //返回给caller
}
else
{
static_assert(
std::is_same_v<await_suspend_result_t, bool>,
"await_suspend must return 'void' or 'bool'.");

if (awaiter.await_suspend(handle_t::from_promise(p)))
{
<return-to-caller-or-resumer>
}
}

<resume-point> //task执行完成，恢复协程，这里是协程恢复执行的地方
}

return awaiter.await_resume; //返回task结果
}

This code execution process is the pink part of the "coroutine running flowchart", from this generated code can be seen that by customizing the return value of the awaiter.await_ready, you can control whether to suspend the coroutine or continue execution, return false will suspend the coroutine, and execute the awaiter.await_suspend, by awaiter.await_suspend return value to decide whether to return the caller Or continue execution.

It is this mechanism of co_await that is the key to changing "asynchronous callbacks" to "synchronous".

The two most important objects in the C++20 coroutine are the promise object (restoring the coroutine and getting the execution result of a task) and awaiter (hanging the coroutine and waiting for the task execution to complete), the others are "tool people", to achieve the desired coroutine, the key is to design how to make these two objects work well together.

For more details on the co_await, readers can refer to this document (https://lewissbaker.github.io/2017/11/17/understanding-operator-co-await).

Small words

Let's go back to this simple coroutine:

Return_t foo  { 
auto res = co_await awaiter;
co_return res ;
}

The foo coroutine has only three lines of code, but it ends up generating more than a hundred lines of code, such as whether the coroutine is created or co_await mechanism is implemented by these codes, which is the "little meaning" of the C++20 coroutine.

Having talked a lot about the concepts and implementation principles of C++20 coroutines, let's show how coroutines work with a simple C++20 coroutine example.

A simple C++20 coroutine example

This example is simple, by co_await dispatching the coroutine to a thread to print the thread ID.

#include <coroutine>
#include <iostream>
#include <thread>

namespace Coroutine {
struct task {
struct promise_type {
promise_type {
std::cout << "1.create promie object\n";
}
task get_return_object {
std::cout << "2.create coroutine return object, and the coroutine is created now\n";
return {std::coroutine_handle<task::promise_type>::from_promise(*this)};
}
std::suspend_never initial_suspend {
std::cout << "3.do you want to susupend the current coroutine?\n";
std::cout << "4.don't suspend because return std::suspend_never, so continue to execute coroutine body\n";
return {};
}
std::suspend_never final_suspend noexcept {
std::cout << "13.coroutine body finished, do you want to susupend the current coroutine?\n";
std::cout << "14.don't suspend because return std::suspend_never, and the continue will be automatically destroyed, bye\n";
return {};
}
voidreturn_void {
std::cout << "12.coroutine don't return value, so return_void is called\n";
}
voidunhandled_exception {}
};

std::coroutine_handle<task::promise_type> handle_;
};

struct awaiter {
boolawait_ready {
std::cout << "6.do you want to suspend current coroutine?\n";
std::cout << "7.yes, suspend becase awaiter.await_ready return false\n";
return false;
}
voidawait_suspend(
std::coroutine_handle<task::promise_type> handle) {
std::cout << "8.execute awaiter.await_suspend\n";
std::thread([handle] mutable { handle; }).detach;
std::cout << "9.a new thread lauched, and will return back to caller\n";
}
voidawait_resume {}
};

task test {
std::cout << "5.begin to execute coroutine body, the thread id=" << std::this_thread::get_id << "\n";//#1
co_await awaiter{};
std::cout << "11.coroutine resumed, continue execcute coroutine body now, the thread id=" << std::this_thread::get_id << "\n";//#3
}
}// namespace Coroutine

intmain {
Coroutine::test;
std::cout << "10.come back to caller becuase of co_await awaiter\n";
std::this_thread::sleep_for(std::chrono::seconds(1));

return 0;
}

Test Output:

1.create promie object
2.create coroutine return object, and the coroutine is created now
3.do you want to susupend the current coroutine?
4.don't suspend because return std::suspend_never, so continue to execute coroutine body
5.begin to execute coroutine body, the thread id=0x10e1c1dc0
6.do you want to suspend current coroutine?
7.yes, suspend becase awaiter.await_ready return false
8.execute awaiter.await_suspend
9.a new thread lauched, and will return back to caller
10.come back to caller becuase of co_await awaiter
11.coroutine resumed, continue execcute coroutine body now, the thread id=0x700001dc7000
12.coroutine don't return value, so return_void is called
13.coroutine body finished, do you want to susupend the current coroutine?
14.don't suspend because return std::suspend_never, and the continue will be automatically destroyed, bye

From this output, it is clear how coroutines are created, co_await wait for threads to end, coroutine return values after threads end, and coroutines are destroyed.

Coroutine creation

1, 2, 3 in the output show the coroutine creation process, first creating a promise, and then returning the task through the promise.get_return_object, at which point the coroutine is created.

Behavior after coroutine creation

Do you want to execute the coroutine function immediately after the coroutine creation is completed? Or hang up first? This behavior is determined by promise.initial_suspend, which returns an awaiter suspend_never, so the coroutine is not suspended, so the coroutine function is executed immediately.

co_await awaiter

When executing a coroutine-to-function co_await awaiter, do I need to wait for a task? Returning false indicates that you want to wait, so you go to the awaiter.wait_suspend, hang the coroutine, create a thread in the await_suspend to perform the task (note that the coroutine handle is passed into the thread, so that the coroutine is later restored in the thread), and then return to the caller, the caller can do other things without blocking the waiting for the thread to end. Note: The awaiter here is also an awaitable because it supports co_await.

More often than not, we resume the coroutine after the thread has completed, so that we can tell the coroutine that is pending waiting for the task to complete: the task has been completed, and now it can be resumed, and the coroutine resumes to get the result of the task and continue to execute.

Coroutine recovery

Resumes the pending coroutine when the thread starts running, at which point the code execution returns to the coroutine function to continue execution, which is the ultimate goal: to execute the print statement of the coroutine function in a new thread.

Corvage destruction

awaiter.final_suspend decide whether to automatically destroy the coroutine, return std:::suspend_never to automatically destroy the coroutine, otherwise the user needs to manually destroy it.

The "magic" of the concord

Back to the coroutine function:

task test {
std::cout << std::this_thread::get_id << "\n";
co_await awaiter{};
std::cout << std::this_thread::get_id << "\n";
}

输出结果显示 co_await 上面和下面的线程是不同的，以 co_await 为分界线，co_await 之上的代码在一个线程中执行，co_await 之下的代码在另外一个线程中执行，一个协程函数跨了两个线程，这就是协程的“魔法”。本质是因为在另外一个线程中恢复了协程，恢复后代码的执行就在另外一个线程中了。

In addition, there is no show here how to wait for a coroutine to complete, it is simply used to wait for thread hibernation, and if you want to implement the logic of waiting for the coroutine to end, the code will be doubled.

I believe that you have a deeper understanding of the operation mechanism of the C++20 coroutine through this simple example, and will also sigh that the use of coroutines is really only suitable for library authors, ordinary developers want to use C++20 coroutines is still very difficult, then you need coroutine libraries, coroutine libraries can greatly reduce the difficulty of using coroutines.

Why do you need a coroutine library

As can be seen through the previous introduction, the C++20 coroutine is still relatively complex, its concept is more, the details are more, and it is a compiler-generated template framework, and it is some customizable points, and it is necessary to understand how to cooperate with the compiler-generated template framework, which is more difficult for ordinary users to understand, and it is more flexible to use.

At this time, it is also understandable why Google complained that such a coroutine proposal was difficult to understand and too flexible, but it does allow us to control the coroutine as we want only by customizing some specific methods, which is still very flexible.

In short, this is the C++20 coroutine, which is currently only suitable for library authors, because it only provides some underlying coroutine primitives and some coroutine pause and resume mechanisms, ordinary users if they want to use coroutines can only rely on the coroutine library, by the coroutine library to mask these underlying details, providing a simple and easy-to-use API. Therefore, we urgently need a simple and easy-to-use coroutine library based on C++20 coroutine packaging.

It is in this context that the C++20 coroutine library async_simple (https://github.com/alibaba/async_simple) came into being!

The C++20 coroutine library developed by Alibaba is currently widely used in graph computing engines, time series databases, search engines and other online systems. After two consecutive years of experience in Tmall Double Eleven, it has undertaken a flood of 100 million-level traffic, with very strong performance and reliable stability.

async_simple is now open source on GitHub, and with it you don't have to worry about the complexity of C++20 coroutines, as the name suggests, making asynchrony simple.

Next, we'll look at how to use async_simple to simplify asynchronous programming.

async_simple make corporals simple

async_simple provides a rich set of coroutine components and easy-to-use APIs, including:

Lazy: Lazy-evaluated stackless coroutine
Executor: Coroutine executor
Batch Operations COROUT's APIs: collectAll and collectAny
uthread: There are stack coroutines

For more information and examples of async_simple, see the documentation on GitHub (https://github.com/alibaba/async_simple/tree/main/docs/docs.cn).

With these commonly used rich coroutine components, it becomes easy for us to write asynchronous programs, using the previous example of printing thread ids to show how to implement it using async_simple, and how much simpler the code would be with coroutine libraries.

#include "async_simple/coro/Lazy.h"
#include "async_simple/executors/SimpleExecutor.h"

Lazy<void> PrintThreadId{
std::cout<<"thread id="<<std::this_thread::get_id<<"\n";
co_return;
}

Lazy<void> TestPrintThreadId(async_simple::executors::SimpleExecutor &executor){
std::cout<<"thread id="<<std::this_thread::get_id<<"\n";
PrintThreadId.via(&executor).detach;
co_return;
}

intmain {
async_simple::executors::SimpleExecutor executor(/*thread_num=*/1);
async_simple::coro::syncAwait(TestPrintThreadId(executor));
return 0;
}

With the help of async_simple coroutines can be easily dispatched to the executor thread for execution, the whole code becomes very refreshing, simple and easy to understand, the amount of code is much less than before, and the user does not have to care about the many details of the C++20 coroutine.

With the help of async_simple this coroutine library, you can easily let C++20 coroutine, which is "Wang Xietang Qianyan, fly into the homes of ordinary people"!

async_simple provides a lot of examples, such as using async_simple to develop http clients, http servers, smtp clients, etc., more Demos can see the demo example of async_simple (https://github.com/alibaba/async_simple/blob/main/ demo_example）。

performance

Using Lazy in async_simple compared to Task in folly and task in cppcoro, performance tests were performed on stackless coroutine creation speed versus switching speed. To be clear, this is just a highly cropped test for a simple demonstration of async_simple and is not intended for any performance comparison purposes. And Folly::Task has more features, such as Folly::Task will record the context in AsyncStack when switching to enhance the debugging convenience of the program.

Test the hardware

CPU: Intel® Xeon® Platinum 8163 CPU @ 2.50GHz

Test results

Unit: Nanoseconds, the lower the value, the better.

The test results show that the performance of async_simple is still relatively good, and it will continue to be optimized and improved in the future.

summary

C++20 coroutine is like a delicate "machine", although complex, but very flexible, allowing us to customize some of its "parts", through which we can control the "machine" as we want, let it help us realize any idea.

It is this complexity and flexibility that makes the use of C++20 coroutines difficult, but fortunately we can use the industrial-grade, well-established and easy-to-use coroutine library async_simple to simplify the use of coroutines and make asynchrony simple!

Resources:

https://github.com/alibaba/async_simple
https://timsong-cpp.github.io/cppwp/n4868/
https://blog.panicsoftware.com/coroutines-introduction/
https://lewissbaker.github.io/
https://juejin.cn/post/6844903715099377672
https://wiki.tum.de/download/attachments/93291100/Kolb%20report%20-%20Coroutines%20in%20C%2B%2B20.pdf

Author: Qi Yu, Founder of modern C++ open source community purecpp.org, author of "In-depth Application of C++11"

Legend Xu, Alibaba Development Engineer, LLVM Committer, C++ Standards Committee Member

Yao Han, an engineer at Alibaba, is currently engaged in the development of search recommendation engines

After Google's "defeat", C++20 entered the era of co-process with Microsoft's proposal

END

"New Programmer 001-004" is fully listed, talking to world-class masters and reporting on innovations and creations in China's IT industry

Achieve 100 million technical people

After Google's "defeat", C++20 entered the era of co-process with Microsoft's proposal

After Google's "defeat", C++20 entered the era of co-process with Microsoft's proposal

After Google's "defeat", C++20 entered the era of co-process with Microsoft's proposal

C++20 Why choose stackless coroutines?

Stack space limitations

performance

Stackless coroutines are generalizations of ordinary functions

After Google's "defeat", C++20 entered the era of co-process with Microsoft's proposal

C++20 Coroutine's "Little Talk"

Coroutine-related objects

Coroutine frame

promise_type

coroutine return object

std::coroutine_handle

co_await、awaiter、awaitable

How coroutine objects collaborate

Creation of coroutines

The process for creating a coroutine is as follows:

co_await mechanism

Small words

A simple C++20 coroutine example

Coroutine creation

Behavior after coroutine creation

co_await awaiter

Coroutine recovery

Corvage destruction

The "magic" of the concord

Why do you need a coroutine library

async_simple make corporals simple

performance

Test the hardware

Test results

Unit: Nanoseconds, the lower the value, the better.

summary

Resources:

https://lewissbaker.github.io/