Hystrix基本知識

寫在前面
一、五個知識點

1.1、What Is Hystrix?
1.2、What Is Hystrix For?
1.3、What Problem Does Hystrix Solve?
1.4、What Design Principles Underlie Hystrix?
1.5、How Does Hystrix Accomplish Its Goals?

寫在前面

以下所有都是，我從Hystrix官網的文檔中找出翻譯的

一、五個知識點

1.1、What Is Hystrix?

In a distributed environment, inevitably some of the many service dependencies will fail. Hystrix is a library that helps you control the interactions between these distributed services by adding latency tolerance and fault tolerance logic. Hystrix does this by isolating points of access between the services, stopping cascading failures across them, and providing fallback options, all of which improve your system’s overall resiliency.

在分布式環境中，不可避免地會有許多服務依賴項中的某些失敗。 Hystrix是一個庫，可通過添加等待時間容限和容錯邏輯來幫助您控制這些分布式服務之間的互動。 Hystrix通過隔離服務之間的通路點，停止服務之間的級聯故障并提供後備選項來實作此目的，所有這些都可以提高系統的整體彈性。

History of Hystrix

Hystrix evolved out of resilience engineering work that the Netflix API team began in 2011. In 2012, Hystrix continued to evolve and mature, and many teams within Netflix adopted it. Today tens of billions of thread-isolated, and hundreds of billions of semaphore-isolated calls are executed via Hystrix every day at Netflix. This has resulted in a dramatic improvement in uptime and resilience.

Hystrix源自Netflix API團隊于2011年開始的彈性工程工作。2012年，Hystrix不斷發展和成熟，Netflix内的許多團隊都采用了它。如今，每天在Netflix上通過Hystrix執行數百億個線程隔離和數千億個信号量隔離的調用。這極大地提高了正常運作時間和彈性。

1.2、What Is Hystrix For?

Hystrix is designed to do the following:

Give protection from and control over latency and failure from dependencies accessed (typically over the network) via third-party client libraries.

Stop cascading failures in a complex distributed system.

Fail fast and rapidly recover.

Fallback and gracefully degrade when possible.

Enable near real-time monitoring, alerting, and operational control.

通過第三方用戶端庫提供保護，并控制延遲和失敗（通過網絡（通常是通過網絡）通路）的依賴性。
停止複雜的分布式系統中的級聯故障。
快速失敗并快速恢複。
回退并在可能的情況下正常降級。
啟用近乎實時的監視，警報和操作控制。

1.3、What Problem Does Hystrix Solve?

Applications in complex distributed architectures have dozens of dependencies, each of which will inevitably fail at some point. If the host application is not isolated from these external failures, it risks being taken down with them.

For example, for an application that depends on 30 services where each service has 99.99% uptime, here is what you can expect:

複雜分布式體系結構中的應用程式具有數十種依賴關系，每種依賴關系不可避免地會在某個時刻失敗。如果主機應用程式未與這些外部故障隔離開來，則可能會被淘汰。

例如，對于依賴于30個服務的應用程式，其中每個服務的正常運作時間為99.99％，您可以期望：

99.9930 = 99.7% uptime
0.3% of 1 billion requests = 3,000,000 failures
2+ hours downtime/month even if all dependencies have excellent uptime.

99.9930 = 99.7％的正常運作時間
10億個請求中的0.3％= 3,000,000個失敗
即使所有依賴項都具有出色的正常運作時間，每月也會有2個小時以上的停機時間。

Reality is generally worse.

Even when all dependencies perform well the aggregate impact of even 0.01% downtime on each of dozens of services equates to potentially hours a month of downtime if you do not engineer the whole system for resilience.

即使您沒有對整個系統進行永續性設計，即使所有依賴項都能很好地執行，即使0.01％的停機時間對數十種服務中的每一項的總影響也等于每月可能停機數小時。

…

All of these represent failure and latency that needs to be isolated and managed so that a single failing dependency can’t take down an entire application or system.

所有這些代表故障和延遲，需要對其進行隔離和管理，以使單個故障依賴項無法關閉整個應用程式或系統。

1.4、What Design Principles Underlie Hystrix?

Hystrix works by:

Preventing any single dependency from using up all container (such as Tomcat) user threads.
Shedding load and failing fast instead of queueing.
Providing fallbacks wherever feasible to protect users from failure.
Using isolation techniques (such as bulkhead, swimlane, and circuit breaker patterns) to limit the impact of any one dependency.
Optimizing for time-to-discovery through near real-time metrics, monitoring, and alerting
Optimizing for time-to-recovery by means of low latency propagation of configuration changes and support for dynamic property changes in most aspects of Hystrix, which allows you to make real-time operational modifications with low latency feedback loops.
Protecting against failures in the entire dependency client execution, not just in the network traffic.

翻譯如下

防止任何單個依賴項耗盡所有容器（例如Tomcat）使用者線程。
減少負載并快速失敗，而不是排隊。
在可行的情況下提供備用，以保護使用者免受故障的影響。
使用隔離技術（例如隔闆，泳道和斷路器模式）來限制任何一種依賴關系的影響。
通過近實時名額，監控和警報來優化發現時間
通過低延遲傳播配置更改來優化恢複時間，并在Hystrix的大多數方面支援動态屬性更改，這使您可以通過低延遲回報環路進行實時操作修改。
防止整個依賴用戶端執行失敗，而不僅僅是網絡通信失敗。

1.5、How Does Hystrix Accomplish Its Goals?

Hystrix does this by:

Wrapping all calls to external systems (or “dependencies”) in a HystrixCommand or HystrixObservableCommand object which typically executes within a separate thread (this is an example of the command pattern).
Timing-out calls that take longer than thresholds you define. There is a default, but for most dependencies you custom-set these timeouts by means of “properties” so that they are slightly higher than the measured 99.5th percentile performance for each dependency.
Maintaining a small thread-pool (or semaphore) for each dependency; if it becomes full, requests destined for that dependency will be immediately rejected instead of queued up.
Measuring successes, failures (exceptions thrown by client), timeouts, and thread rejections.
Tripping a circuit-breaker to stop all requests to a particular service for a period of time, either manually or automatically if the error percentage for the service passes a threshold.
Performing fallback logic when a request fails, is rejected, times-out, or short-circuits.
Monitoring metrics and configuration changes in near real-time.

翻譯如下：

将對外部系統（或“依賴項”）的所有調用包裝在通常在單獨線程中執行的HystrixCommand或HystrixObservableCommand對象中（這是指令模式的示例）。
逾時呼叫花費的時間超過您定義的門檻值。有一個預設值，但是對于大多數依賴項，您可以通過“屬性”自定義設定這些逾時，以使它們略高于針對每個依賴項測得的99.5％的性能。
為每個依賴項維護一個小的線程池（或信号燈）；如果已滿，發往該依賴項的請求将立即被拒絕，而不是排隊。
測量成功，失敗（用戶端抛出的異常），逾時和線程拒絕。
如果該服務的錯誤百分比超過門檻值，則使斷路器跳閘，以在一段時間内手動或自動停止所有對特定服務的請求。
當請求失敗，被拒絕，逾時或短路時執行回退邏輯。
幾乎實時監控名額和配置更改。

Hystrix基本知識