Set a scenario, what happens if a commodity interface suddenly rises at a certain time?

For this problem, in the e-commerce high concurrency system, the protection of the interface generally adopts: cache, throttling, and downgrading to operate.

Assuming that the interface has been processed by risk control, half of the robot script requests are filtered, and the rest are human order requests.

Service throttling

The main purpose of throttling is to limit the rate of concurrent access/requests, or to throttle requests within a time window, and once the throttling rate is reached, it can be processed by denying service, queuing or waiting, degrading, etc.

Current throttling algorithm

1. Funnel algorithm

The leaky bucket algorithm is directly put into the drain bucket when the request arrives, and if the current capacity has reached the upper limit (throttling value), it is discarded or other policies (triggering the throttling policy). The bucket drain releases access requests (that is, requests pass) at a fixed rate (based on service throughput) until the bucket is empty.

The idea of the funnel algorithm is that no matter how many requests you make, my interface consumption rate must be less than or equal to the threshold of the egress rate.

Interface concurrency solution under massive requests

This can be implemented on a message queue basis.

2. Token bucket algorithm

The token bucket algorithm is that the program adds tokens to the token bucket at the speed of v (v = time period / throttling value) until the token bucket is full, requests tokens from the token bucket when the request arrives, passes the request if the acquisition is successful, and triggers the throttling policy if the acquisition fails.

The difference in thought between the token bucket algorithm and the funnel algorithm is that the former can allow bursty requests to occur.

3. Sliding window algorithm

The sliding window algorithm is to divide a time period into N small periods, record the number of visits in each small period separately, and delete the expired small periods by sliding according to the time.

As shown in the figure below, assuming that the time period is 1 minute, divide 1 minute into 2 small periods, and count the number of visits in each small period, you can see that in the first time period, the number of visits is 75, and in the second time period, the number of visits is 100, if the sum of all small periods in a time period exceeds 100, the throttling strategy will be triggered.

Sentinel's implementation and TCP sliding window.

Access layer throttling

NGINX current limiting

NGINX current limiting uses a leaky bucket algorithm.

It can limit its access frequency according to client characteristics, which mainly refer to IP, UserAgent, etc. Using IP is more reliable than UserAgent because IP cannot be forged, and UserAgent can be forged at will.

limit_req module is IP-based:

http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

tgngine：

http://tengine.taobao.org/document_cn/http_limit_req_cn.html

The local interface throttles

Semaphore

Semaphore of the Java Concurrent Library can easily control semaphore, which can control the number of times a resource can be accessed at the same time, obtain a license through acquire(), wait if not, and release() release a permission.

If we provide an external service interface with a maximum concurrency of 40, we can look like this:

private final Semaphore permit = new Semaphore(40, true);

public void process(){

    try{
        permit.acquire();
        //TODO 处理业务逻辑

    } catch (InterruptedException e) {
        e.printStackTrace();
    } finally {
        permit.release();
    }
}

The specific Semaphore implementation refers to the source code.

Distributed interfaces are throttling

Use Message Queuing

Whether it is implemented with MQ middleware or Redis' List, the message queue can be used as a buffer queue. The idea is based on the funnel algorithm.

When an interface request reaches a certain threshold, you can enable message queuing to buffer interface data and consume data according to the throughput of the service.

Service degradation

Under the premise of good risk control of the interface, if we find that the concurrency of interface requests is rising rapidly, we can enable the fallback solution to degrade the service.

General service downgrade should be used to delay or suspend the service of some non-critical or non-urgent services or tasks.

Downgrade scenario

Stop edge business

For example, before Taobao's Double 11, it was not possible to query orders three months ago, downgrading the edge business, and ensuring the high availability of the core business.

Deny the request

Some access requests can be denied under unexpected circumstances such as when the concurrency of interface requests is greater than the threshold, or when the interface has a large number of failed requests.

Deny policy

Random rejection: Requests that exceed the threshold are randomly rejected.
Reject old requests: Earlier requests are rejected first, based on the time of the request.
Reject non-core requests: Set the core request list according to the system service and reject requests in the non-core list.

Recovery scenarios

After achieving service degradation, we can continue to register multiple consumer services to handle concurrency for burst traffic, and then we slow load some servers.

For specific implementations of downgrading, refer to other articles.

Data caching

Under the premise of good risk control of the interface, it is found that the concurrency of interface requests has increased rapidly, and we can perform the following operations:

Access requests are blocked using distributed locks.
In this short time, we can cache the hot data corresponding to the operation line in the cache middleware.
After the request is released, let all requests prioritize the cached data.
Then send the result of the operation to the consuming interface through the message queue for slow consumption.

Cache issues

Let's say we're operating an inventory interface and there are only 100 inventory in the database.

So if we put a piece of data into the cache at this time, if all the requests come to access the cache, then it is still hanged, what should we do?

Read and write splitting

The first idea is read/write separation.

Use Redis' sentinel cluster mode for read/write splitting operations for master-slave replication. The read operation is definitely greater than the write operation, and when the inventory is consumed to 0, the read operation directly fails quickly.

Load balancing

The second idea is load balancing.

After caching the data, if all requests come to the cache to operate this inventory, whether it is a pessimistic lock or an optimistic lock, the concurrency rate is very low, at this time we can split this inventory.

We can refer to the design idea of the counterCells variable in ConcurrentHashMap, split 100 inventory into 10 cache services, each cache service has 10 caches, and then we load balance the requests to each cache service.

However, this approach will be problematic, if most users are hashed to the same cache, resulting in other caches not being consumed, but returning no inventory, which is unreasonable.

page cache

The third idea is a page cache.

Most software architectures actually use this method, such as Linux kernel hard disk write, mysql brush disk, etc., that is, write operations in a short period of time to aggregate the results of write operations, and all write operations are completed in the cache.

Interface concurrency solution under massive requests