laitimes

扒一扒隔离熔断之Hystrix VS Sentinel

author:Flash Gene

introduction

Why do I need to introduce a circuit breaker isolation mechanism such as Hystrix in a project, and in what scenarios can it be used? In a distributed system, a single application usually has multiple different types of external dependent services, which usually depend on various RPC services internally and various HTTP services externally. These dependent services will inevitably fail to call, such as timeouts, exceptions, etc., and how to ensure the stability of their own applications when external dependencies have problems is the job of service assurance frameworks like Hystrix. As shown in the following figure, application X depends on services A, B, and C, A and B provide services normally, and C services fail, which is how to avoid the impact of C services on A and B services, and also introduces a concept of isolation.

扒一扒隔离熔断之Hystrix VS Sentinel

Hystrix

Hystrix [hɪst'rɪks], meaning porcupine in Chinese, has the ability to protect itself because of the thorns on its back. The Hystrix mentioned in this article is a fault-tolerant framework open source of Netflix, which is also self-protecting.

Hystrix Design Goals

• Protect and control latency and failures from dependencies that are typically accessed over the network • Prevent the ripple effect of failures • Fail fast and recover quickly • Fallback and graceful degradation • Provide near real-time monitoring and alerting

Design principles followed by Hystrix

• Prevent any individual dependency from exhausting resources (threads) • Prevent queuing by instantly cutting off overload and failing fast • Provide fallback whenever possible to protect users from failures • Use isolation techniques such as bulkheads, swimlanes, and circuit breaker patterns to limit the impact of any one dependency • Ensure failures are detected in a timely manner with near real-time metrics, monitoring, and alerts • Ensure timely recovery from failures by dynamically modifying configuration properties • Prevent execution failures for entire dependent clients, not just network communication

Main process

•Use the command pattern to wrap all calls to external services (or dependencies) in a HystrixCommand or HystrixObservableCommand object and place that object in a separate thread for execution; •Each dependency maintains a thread pool (or semaphore) that rejects requests (rather than queuing them) when the thread pool is depleted. • Log request successes, failures, timeouts, and thread rejections. •When the service error percentage exceeds the threshold, the fuse switch is automatically turned on and all requests to the service are stopped for a period of time. • Downgrade logic is executed when a request fails, is rejected, times out, or is circuit breaker. • Monitor metrics and configuration modifications in near real-time.

扒一扒隔离熔断之Hystrix VS Sentinel

Command对象封装请求

Class Structure:

扒一扒隔离熔断之Hystrix VS Sentinel

Command execution method:

There are 4 ways to execute a Hystrix command.

execute()和queue() 适用于HystrixCommand对象,而observe()和toObservable()适用于HystrixObservableCommand对象。

•execute()—the method is blocking and receives a single response from the dependent request (or throws an exception if an error occurs). •queue()—Returns a Future object containing a single response from the dependency request. •observe()—subscribe to an Observable object returned from a dependency request that represents the response. •toObservable()—returns an Observable object that executes the Hystrix command and emits a response only if you subscribe to it.

扒一扒隔离熔断之Hystrix VS Sentinel

核心代码AbstractCommand:

public Observable<R> toObservable() {
 ...
 final Func0<Observable<R>> applyHystrixSemantics = new Func0<Observable<R>>() {
 @Override
 public Observable<R> call() {
 if (commandState.get().equals(CommandState.UNSUBSCRIBED)) {
 return Observable.never();
 }
 return applyHystrixSemantics(_cmd);//1.关键步骤,命令处理
 }
 };
 ...

private Observable<R> applyHystrixSemantics(final AbstractCommand<R> _cmd) {
 ...
 if (circuitBreaker.attemptExecution()) {//1.【断路器相关处理】,之后HystrixCircuitBreaker中展示
 ..
 if (executionSemaphore.tryAcquire()) {//2.获取信号量,如果是THREAD线程池策略,【直接返回true】,这里需要注意,不然流程将进行不下去
 try {
 executionResult = executionResult.setInvocationStartTime(System.currentTimeMillis());
 return executeCommandAndObserve(_cmd)//3.核心执行方法
 .doOnError(markExceptionThrown)
 .doOnTerminate(singleSemaphoreRelease)
 .doOnUnsubscribe(singleSemaphoreRelease);
 } ...
}           

Circuit breaker implementation logic

下面的图展示了HystrixCommand和HystrixObservableCommand如何与HystrixCircuitBroker进行交互。

扒一扒隔离熔断之Hystrix VS Sentinel

The looper opens and closes in the following situations:

• Assumes that the request in the loop meets a certain threshold (HystrixCommandProperties.circuitBreakerRequestVolumeThreshold()) • Suppose the percentage of errors that occur exceeds the set threshold for errors to occur: HystrixCommandProperties.circuitBreakerErrorThresholdPercentage() • The loopback state changes from CLOSE to OPEN • If the looper is open, all requests will be fused by the looper. •After a certain amount of time, HystrixCommandProperties.circuitBreakerSleepWindowInMilliseconds(), the next request will be approved (in a semi-open state), if the request fails, the looper will return OPEN during the sleep window, and if the request is successful, the looper will be set to a closed state, restarting the 1-step logic.

Hystrix 断路器状态:

The fuse has three states: CLOSED, OPEN, and HALF_OPEN The fuse is closed by default, when the fuse is triggered, the state changes to OPEN, and after waiting for the specified time, Hystrix will release the request to check whether the service is on, during which the fuse will become HALF_OPEN semi-open, and the fuse detection service will continue to change to CLOSED to close the fuse.

扒一扒隔离熔断之Hystrix VS Sentinel

Circuit breaker implementation class:

扒一扒隔离熔断之Hystrix VS Sentinel

Core Code:

public boolean allowRequest() {

if (properties.circuitBreakerForceOpen().get()) {

// properties have asked us to force the circuit open so we will allow NO requests

return false;

}

if (properties.circuitBreakerForceClosed().get()) {

// we still want to allow isOpen() to perform it's calculations so we simulate normal behavior

isOpen();

// properties have asked us to ignore errors so we will ignore the results of isOpen and just allow all traffic through

return true;

}

return !isOpen() || allowSingleTest();

}

Here the code judges the logic

1. Determine whether to force the fuse to be opened, if it is, return false, command cannot be executed 2. Determine whether to force the fuse to be closed, if yes, return true, command can be executed 3. Determine whether the fuse is turned on circuitOpened.get() == -1 means that it is not opened, then return true, command can be executed. 4. At this point, it is proved that the fuse has been turned on, then determine whether you can try to request, and if you can, the status of the fuse will be changed to HALF_OPEN at the same time

Fusing parameters:

扒一扒隔离熔断之Hystrix VS Sentinel

Isolation:

Hystrix employs bulkhead patterns to isolate dependencies from each other and limit concurrent access to any of them.

扒一扒隔离熔断之Hystrix VS Sentinel

Isolation method:

•Thread pool isolation: Requests are concurrent and time-consuming (usually computationally large or database reads): Thread pool isolation is used to ensure that a large number of container threads are available, and will not be blocked or waited for due to service reasons, and will fail to return quickly. •Semaphore isolation requests are concurrent and time-consuming (generally small computational or read cache): Semaphore isolation is used: because the return of such services is often very fast, it will not occupy the container thread for too long, and it reduces some of the overhead of thread switching, improving the efficiency of the caching service

Thread pool semaphore thread request thread and invoking provider thread are not the same threadRequesting thread and invoking provider thread are the same threadOverheadQueuing, scheduling, context switching, etc. No thread switching, low overhead, asynchronous support, no concurrency support, support: maximum thread pool size, support: maximum semaphore, upper limit, passing header, unsupported, supported timeout, unsupported

扒一扒隔离熔断之Hystrix VS Sentinel
扒一扒隔离熔断之Hystrix VS Sentinel

Timeout implementation

HystrixCommand里有个 TimedOutStatus 超时状态

扒一扒隔离熔断之Hystrix VS Sentinel

Implementation process:

There are two threads, one is the hystrixCommand task execution thread, and the other is the thread waiting for the hystrixCommand judgment timeout, now the two threads see who can replace the hystrixCommand state first, as long as any thread superscripts the hystrixCommand, it means that the timeout judgment is over.

扒一扒隔离熔断之Hystrix VS Sentinel

Timeout implementation class

扒一扒隔离熔断之Hystrix VS Sentinel

HystrixObservableTimeoutOperator.call(),TimerListener的实现

TimerListener listener = new TimerListener() {

 @Override
 public void tick() {

 if (originalCommand.isCommandTimedOut.compareAndSet(TimedOutStatus.NOT_EXECUTED, TimedOutStatus.TIMED_OUT)) {
 // 标记事件,可以认为是开的hook,这里暂忽略
 originalCommand.eventNotifier.markEvent(HystrixEventType.TIMEOUT, originalCommand.commandKey);

 //取消原Obserable的订阅
 s.unsubscribe();

 final HystrixContextRunnable timeoutRunnable = new HystrixContextRunnable(originalCommand.concurrencyStrategy, hystrixRequestContext, new Runnable() {

 @Override
 public void run() {
 child.onError(new HystrixTimeoutException());
 }
 });
 timeoutRunnable.run();
 }
 }

 //获取配置的超时时间配置
 @Override
 public int getIntervalTimeInMilliseconds() {
 return originalCommand.properties.executionTimeoutInMilliseconds().get();
 }
 };           

Application monitoring

Monitoring metrics:

扒一扒隔离熔断之Hystrix VS Sentinel

Solid circle: contains two meanings, the color indicates the health of the instance, and the health degree decreases from green, yellow, orange, red; The size varies according to the size of the request traffic, the larger the traffic, the larger the solid circle, and vice versa.

Curve: This curve counts the changes in request traffic within 2 minutes, and analyzes the upward and downward trends of traffic.

Implementation logic:

After subscribing to the completion event of an execution, the execution result is summarized to HystrixThreadEventStream. As the name suggests, it's a stream of events.

The next operation is also easier to guess, we need a subscriber to subscribe to this event to summarize. Eventually, the result of the processing will be written to two streams, HystrixThreadPoolCompletionStream and HystrixThreadPoolCompletionStream.

扒一扒隔离熔断之Hystrix VS Sentinel

统计实现:HealthCountsStream(订阅者)

处理的结果会写到HystrixThreadPoolCompletionStream和HystrixThreadPoolCompletionStream。 最核心的统计实现逻辑HealthCountsStream。

扒一扒隔离熔断之Hystrix VS Sentinel

Glide Window:

扒一扒隔离熔断之Hystrix VS Sentinel

Class Diagram:

扒一扒隔离熔断之Hystrix VS Sentinel

Core Code:

protected BucketedRollingCounterStream(HystrixEventStream<Event> stream, final int numBuckets, int bucketSizeInMs,
 final Func2<Bucket, Event, Bucket> appendRawEventToBucket,
 final Func2<Output, Bucket, Output> reduceBucket) {
 super(stream, numBuckets, bucketSizeInMs, appendRawEventToBucket);
 Func1<Observable<Bucket>, Observable<Output>> reduceWindowToSummary = new Func1<Observable<Bucket>, Observable<Output>>() {
 @Override
 public Observable<Output> call(Observable<Bucket> window) {
 return window.scan(getEmptyOutputValue(), reduceBucket).skip(numBuckets);
 }
 };
 this.sourceStream = bucketedStream //stream broken up into buckets
 .window(numBuckets, 1) //emit overlapping windows of buckets
 .flatMap(reduceWindowToSummary) //convert a window of bucket-summaries into a single summary
 .doOnSubscribe(new Action0() {
 @Override
 public void call() {
 isSourceCurrentlySubscribed.set(true);
 }
 })
 .doOnUnsubscribe(new Action0() {
 @Override
 public void call() {
 isSourceCurrentlySubscribed.set(false);
 }
 })
 .share() //multiple subscribers should get same data
 .onBackpressureDrop(); //if there are slow consumers, data should not buffer
 }           

Ring array data structure:

扒一扒隔离熔断之Hystrix VS Sentinel

Data Structure Classes:

class ListState {

/*

* The reason why data here uses AtomicReferenceArray instead of a normal array is because data needs it

* Referencing across threads in different ListState objects requires visibility and concurrency guarantees.

*/

private final AtomicReferenceArray<Bucket> data;

private final int size;

private final int tail;

private final int head;

private ListState(AtomicReferenceArray<Bucket> data, int head, int tail) {

this.head = head;

this.tail = tail;

if (head == 0 && tail == 0) {

size = 0;

} else {

this.size = (tail + dataLength - head) % dataLength;

}

this.data = data;

}

}

Sentinel

Sentinel is an open-sourced lightweight and highly available traffic control component for a distributed service architecture that mainly takes traffic as the entry point to help users protect the stability of services from multiple dimensions such as flow control, circuit breaker degradation, and system load protection.

扒一扒隔离熔断之Hystrix VS Sentinel

Hystrix vs Sentinel

Hystrix's focus is on isolation and circuit breaking, where calls that time out or are fused will fail quickly, and can provide a fallback mechanism.

Sentinel 的侧重点在于:

• Diversified flow control • fuse degradation • system load protection • real-time monitoring and console

There is still a big difference between the problems solved by the two.

扒一扒隔离熔断之Hystrix VS Sentinel

Comparison between the resource model and the execution model

Sentinel provides a variety of ways to configure rules. In addition to registering rules directly into the in-memory state via the loadRules API, users can also register a variety of external data sources to provide dynamic rules. Users can dynamically change the rule configuration based on the current real-time situation of the system, and the data source will push the changes to Sentinel and take effect immediately

扒一扒隔离熔断之Hystrix VS Sentinel

Contrast in isolation design

Thread pool isolation can fragment machine resources.

The more thorough isolation of the thread pool mode allows Hystrix to deal with the queuing and timeout of different resource thread pools separately, but this is actually a problem to be solved by timeout circuit breaker and flow control, and if the component has the ability of timeout circuit breaker and flow control, thread pool isolation is not so necessary.

Hystrix's semaphore isolation overhead is small, but it works well. However, the downside is that you can't automatically downgrade slow calls, you can only wait for the client to time out on its own, so cascading blocking can still occur.

Sentinel can provide semaphore isolation through flow control in the number of concurrent threads pattern. Combined with the circuit breaker degradation mode based on response time, it can automatically degrade when the average response time of unstable resources is relatively high, preventing too many slow calls from occupying the number of concurrent calls and affecting the entire system.

扒一扒隔离熔断之Hystrix VS Sentinel

Comparison of circuit breaker degradation

Both Sentinel and Hystrix support circuit breaker degradation based on the failure rate (exception rate).

Sentinel also supports circuit breaker degradation based on average response time, which automatically shuts down when service response times continue to spike, rejecting more requests until a certain period of time has passed. This prevents cascading blocking caused by very slow calls.

•Degradation Judgment Criteria•Average Response Time•Exception Ratio•Number of Exceptions•SystemRule: System Load Protection: Sentinel provides protection for the system dimension, and the load protection algorithm borrows the idea of TCP BBR to balance the system's ingress traffic and the system's load to ensure that the system can handle the most requests within its capabilities.

扒一扒隔离熔断之Hystrix VS Sentinel

Sentinel控制台界面:

扒一扒隔离熔断之Hystrix VS Sentinel

Sentinel之流量控制

Sentinel's "design philosophy" is to give coders the freedom to choose the angle from which they want to control the flow and flexibly combine them to achieve the desired effect.

We can achieve flow control from the following angles:

•Resource invocation relationship: throttling according to the caller Throttling according to the ingress of the invoking link - Link throttling Resource flow control with a relationship - associated traffic control • Running metrics: such as QPS, thread pool, system load, etc.; •Control effects: such as direct throttling, cold start, queuing, etc.

Sentinel之流量整形

Sentinel supports diverse traffic shaping strategies.

When the QPS is too high, the flow rate can be automatically adjusted to the appropriate shape. Commonly used are:

•Direct Reject Mode: Requests that exceed are rejected directly. •Slow start preheating mode: When the traffic surges, control the rate of traffic passing, let the passing flow increase slowly, and gradually increase to the upper limit of the threshold within a certain period of time, giving the cold system a time to warm up and avoid the cold system being overwhelmed. •Constant speed mode: The Leaky Bucket algorithm is used to implement the constant speed mode, which strictly controls the time interval between requests passing through, and at the same time, the accumulated requests will be queued, and the requests that exceed the timeout period will be rejected directly. Sentinel also supports rate limiting based on call relationships, including rate limiting based on callers, ingress based on call chains, and associated traffic limiting.

扒一扒隔离熔断之Hystrix VS Sentinel

Comparison of real-time metric statistics implementations

Prior to Hystrix 1.5, a sliding window was implemented through a ring array, with locks and CAS operations to update the statistics for each bucket.

Hystrix 1.5 begins to refactor the implementation of real-time metric statistics, abstracting the metric statistics structure into the form of a reactive stream, which is convenient for consumers to use the metric information. At the same time, the underlying layer has been transformed into an event-driven model based on RxJava, which publishes corresponding events when the service call succeeds/fails/times out, and finally obtains a real-time stream of metric statistics through a series of transformations and aggregations, which can be consumed by the fuse or dashboard.

At present, Sentinel has abstracted the Metric indicator statistics interface, and the underlying implementation can be different, the default implementation is based on the sliding window of LeapArray, and implementation such as reactive stream may be introduced as needed in the future.

扒一扒隔离熔断之Hystrix VS Sentinel
扒一扒隔离熔断之Hystrix VS Sentinel
扒一扒隔离熔断之Hystrix VS Sentinel

Comparison summary

Compare items Sentinel Hystrix illustrate
Quarantine policy Semaphore Isolation (Current Limit for Concurrent Threads) (Analog Semaphore) Thread pool isolation/semaphore isolation Sentinel does not create thread pools where threads depend on tomcat or jetty containers, and the problem is that the number of threads running the container limits the upper limit of the sentinel setting. For example, if the tomcat thread pool is 10, it makes no sense to set 100 for sentinel, and the isolation is not good
Circuit breaker de-escalation strategy Based on response time, exception rate, number of exceptions Based on the anomaly ratio Failure fast is an essential feature
Real-time statistics implementation Sliding Window (LeapArray) Sliding window (based on RxJava)
Dynamic rule configuration Supports multiple data sources Supports multiple data sources
Scalability Multiple extensibility points plug-in form
note In the tank In the tank
Current limitation Based on QPS, it supports throttling based on call relationships Limited support (number of concurrent threads or semaphore size) Failure fast is an essential feature
Flow shaping Support preheating mode, constant mode, and preheating queuing mode Not supported (queued)
System Adaptive Protection Yes (Linux/UNIX only) Not supported Set a threshold for the maximum allowable processing capacity of a server
Console Provides an out-of-the-box console that allows you to configure rules, view second-level monitoring, machine discovery, and more Simple monitoring views near real-time data The console is a very competitive feature because it is easier to configure restricted data centrally, but the presentation of data and real-time performance is not as intuitive as hystrix.
Configure persistence ZooKeeper, Apollo, Nacos、本地文件 Git/svn/local files The Sentinel client uses a direct-link persistent store, and the application client references more dependencies, and the same store link may have multiple configurations
Dynamic configuration In the tank In the tank
Black and white lists In the tank Not supported
Springcloud集成 high Very high Spring boot使用hystrix集成度更高
Overall benefits Centralized configuration settings and monitoring + more granular control rules Beautiful interface + near real-time statistical results After docker containerization deployment, sentinel may be more useful

Author: Li Caiyun

Source-WeChat public account: Daojia trading platform technology

Source: https://mp.weixin.qq.com/s/TiuplYZBjV5u7h17G7fqhw

Read on