laitimes

In-depth analysis of Ant Financial RPC framework structure

Focus on high-quality technologies in the java field, welcome to pay attention

Author: Old Money Code Hole

Ant Financial recently open sourced a basket of SOFA frameworks developed for many years, including a very core RPC framework, which is called SOFA-BOLT. Today spent nearly a day carefully reading and studying its source code, the reading process encountered a lot of problems, Ant Financial's relevant technical personnel are very patient and timely answered my questions. Here I will share the knowledge points I learned from it to you.

SOFA-BOLT is based on the open source Netty framework and provides both server and client implementations. Its source code is well worth reading, the structure is simple, well thought out, and it is by no means an ordinary toy. It doesn't abuse design patterns, the source code is more straightforward to read, and there aren't too many complex structures that go around.

In-depth analysis of Ant Financial RPC framework structure

A node can be both an RPC server and a client at the same time, as a client the node needs other nodes to provide services, as a server it can provide services to other nodes. However, the above diagram is not a reasonable structure, because the two services are coupled to each other, I need you, you need me, it becomes an egg problem. The more reasonable structure is generally shown in the following figure, and there is no ring between them.

In-depth analysis of Ant Financial RPC framework structure

The communication protocol is the language of communication between the client and the server, SOFA defines its own set of communication protocols, its encoding and decoding is divided into two layers, the first layer is the binary serialization of the message body object, which is serialized by default by the open source Hession protocol library, and the second layer is responsible for adding a series of wrapping fields to the serialized message body to form a complete message. This includes the request ID, the length of the message body, the protocol version number, and the CRC32 check digit

If you want to further optimize network performance, SOFA also provides Snapy compression protocol, which can add a third layer to the existing two-layer protocol, which can significantly reduce the network transmission burden. Compression is time for space, improving network performance at the same time, it will also aggravate CPU computing, so it needs to be properly traded off when using.

In-depth analysis of Ant Financial RPC framework structure

Multiple connections are generally required between the client and the server, but not one for each request. Typically, the maximum number of connections is limited by maintaining a connection pool. The client communicates with the server over a limited number of connections.

When we use the Jedis client to communicate with the Redis server, we also get connections through connection pooling. Jedis's connection must be thread-exclusive because it is not thread-safe. When a connection is obtained from the connection pool, the other threads cannot get the connection for a while, and after the current thread has finished processing, the connection must be returned to the thread pool so that other threads can continue to use the connection.

In-depth analysis of Ant Financial RPC framework structure

Redis' client requests and answers are sequential, one question and one answer, so the request and answer do not require a unique ID to establish an association.

Bolt is different in that its Q&A is out-of-order, and the Q&A must be connected by the unique ID of the request. Bolt's client is thread-safe, it can pass multiple requests at the same time, and the connection object maintains a dictionary of RPC request objects being processed. When a client wants to initiate an RPC request, instead of picking an exclusive connection from the connection pool, it randomly chooses a connection to pass its own request, which can also be used by other threads at the same time.

The client provides a variety of implementations of complex equalization, and Ali uses the RandomLoadBalancer algorithm by default, among other things

Consistent HashLoaderBalancer consistency hash, the connection relationship between the client and the server (who connects with whom) is relatively stable

LocalPreferenceLoadBalancer Takes precedence over local loopback addresses to improve native call performance

The RoundRobinLoadBalancer loop comes in turn

WeightedRoundRobinLoadBalancer Comes in turn with a weighted loop

RandomLoadBalancer This is a weighted random, Ali's default use

The server uses the traditional netty multithreaded model, where an accessor thread is dedicated to receiving connections, then thrown to the io thread to process the read message and decoded into a request object, and finally thrown to the business thread pool for processing.

In-depth analysis of Ant Financial RPC framework structure

There will be a timed heartbeat between the client and the server to detect the survival of the connection, which defaults to 30s at a time. Tcp shutdown is to notify the other party through the FIN packet, if because of network problems, the other party can not even receive the FIN packet, then even if one side closes the socket, the other side may think that the connection is normal. Therefore, the heartbeat detection survival mechanism is very common in long-connection applications. If the client sends three consecutive heartbeats and does not receive a reply from the server, the connection is considered closed. The server will also have connection survival detection, if a client connection within 90s no messages come in, then the connection is also considered to have been disconnected. The server does not actively send heartbeat messages.

RPC is typically a request made by the client to the server and then receives a reply from the server. Bolt's RPC is duplex communication, and the server can also initiate requests to clients, and they share a TCP connection. The TCP connection itself is duplex, so it's not a miracle either. It's just that the server needs to initiate a request to the client in what business scenario, and the ant does not elaborate.

In-depth analysis of Ant Financial RPC framework structure

As the active connecting party, the client is responsible for reconnecting and initiating heartbeat messages. The server as the passive party, it does not need to deal with reconnection, if the connection is disconnected, it will directly remove the connection from the collection, no special processing, but it will detect heartbeat messages, if there is no message coming to the connection channel within a certain period of time, it will actively close.

The client's reconnection policy is a separate module, and there are two places that will become the entrance to the reconnection. One is that the normal connection disconnect triggers the channelInActive callback, and the other is that the reconnection connection cannot be established successfully and needs to be retried. Bolt has a separate reconnection thread, all the connections that need to be reconnected will be wrapped into a task crammed into the thread's task queue, the thread constantly takes tasks from the queue for reconnection processing, if the reconnection fails, it will try to repackag the task into the queue and continue processing. The default is 1s clock to handle a reconnection task.

In-depth analysis of Ant Financial RPC framework structure

The RPC connection is delayed, it tries to connect the first time the client sends an RPC request, and if the connection fails, it immediately continues to reconnect up to two default times. If the connection is not established after three attempts, an anomaly is exploded to the upper layer. It does not need to wrap a reconnection task into the ConnectManager because subsequent client requests continue to trigger connections.

RPC is usually a one-answer answer, the client can synchronously wait for the response, or it can provide a callback interface waiting for the result notification. In addition to providing a reply mode, Bolt also provides oneway one-way messages, which the server does not reply to after receiving it, and the client returns the request immediately after sending it without waiting for the result.

Oneway messages are generally used for less important log messages, which cannot guarantee that the server will be able to receive them, so this kind of business message should be the kind of message that allows loss, similar in form to UDP, which can greatly improve the throughput of messages at the premise of sacrificing reliability.

Bolt provides a callback interface that allows the monitoring system to analyze the invocation status of the request. The monitoring client can implement this interface, register the RPC client and server to collect logs, and then send them to the log analysis system.

interface Tracer { void startRpc(SofaRequest request); void serverReceived(SofaRequest request); void serverSend(SofaRequest request, SofaResponse response, Throwable throwable); void clientReceived(SofaRequest request, SofaResponse response, Throwable throwable); ...}

Bolt is a mature and complex RPC system, this small article only explains part of it, and there are still a lot of implementation details to be mined.