What is KEDA?

KEDA (Kubernetes-based Event-Driven Autoscaler) is a very powerful event-driven autoscaler in Kubernetes. Scaling is supported not only based on underlying CPU and memory metrics, but also based on the length of various message queues, statistics in the database, QPS, Cron schedules, and any other metric you can imagine, and even replicas can be scaled down to 0.

The project was accepted by the CNCF in 2020.3, began incubation in 2021.8, and was finally announced as a graduate in 2023.8.

Why do we need KEDA?

HPA is a pod horizontal autoscaler that comes with Kubernetes, which can only automatically scale and scale workloads according to monitoring metrics, mainly the CPU and memory utilization (Resource Metrics) of the workload Monitoring data in Prometheus is provided to HPA as custom metrics. Theoretically, the functions of KEDA can also be implemented with HPA + prometheus-adapter, but the implementation will be very troublesome, for example, if you want to scale according to the number of tasks to be executed recorded in the task table in the database, you need to write and deploy an exporter application, convert the statistical results into metrics and expose them to Prometheus for collection, and then prometheus-adapter from Prometheus Query the number of tasks to be executed to determine whether to scale.

KEDA is mainly designed to solve the problem that HPA cannot scale based on flexible event sources, and has dozens of common Scalers built-in, which can directly interface with various third-party applications, such as various open-source and cloud-hosted relational databases, time series databases, document databases, key-value stores, message queues, event buses, etc., and can also use Cron If you find that there is no support, you can also implement an external Scaler to use with KEDA.

The principle of KEDA

KEDA is not intended to replace HPA, but as a supplement or enhancement to HPA, in fact, many times KEDA works with HPA, this is the official architecture diagram of KEDA:

When you want to scale down the number of replicas of the workload to the number of replicas in idle time, or scale up from the number of replicas in idle time, KEDA does this by modifying the number of replicas of the workload (the number of replicas in idle time is less than minReplicaCount, including 0, that is, it can be scaled to 0).
In other cases, scaling is implemented by HPA, which is automatically managed by KEDA, which uses External Metrics as the data source, and the data on the backend of External Metrics is provided by KEDA.
The core of KEDA's various scalers is actually to expose the data in the External Metrics format for HPA, and KEDA will convert various external events into the required External Metrics data, and finally realize HPA to automatically scale through External Metrics data, and directly reuse HPA Existing capabilities, so if you want to control the details of scaling behavior (such as rapid scaling and slow scaling), you can directly configure the behavior field of HPA (requires Kubernetes version >= 1.18).

In addition to workload scaling, KEDA can also automatically create jobs based on the number of queued tasks to process tasks in a timely manner.

Which scenarios are suitable for KEDA ?

Here's a list of scenarios where KEDA is a good fit.

Multi-level invocation of microservices

In microservices, there are basically business scenarios where there are multiple levels of calls, and the pressure is passed on step by step, as shown below:

If you use traditional HPA to scale based on load, after user traffic enters the cluster:

Deploy A 负载升高，指标变化迫使 Deploy A 扩容。
After A expands out, the throughput becomes larger, B is under pressure, and the metric changes are collected again to expand Deploy B.
B 吞吐变大，C 受到压力，扩容 Deploy C。

This step-by-step process is not only slow, but also dangerous: each level of scaling is directly triggered by a spike in CPU or memory, and the possibility of being "washed out" is universal. This passive, lagging approach is clearly problematic.

At this point, we can use KEDA to achieve multi-level rapid scaling:

Deploy A can scale up or down based on metrics such as its own load or the QPS recorded by the gateway.
Deploy B and Deploy C can be scaled up or down based on the number of replicas of Deploy A (a percentage of the number of replicas at each level of service).

Task Execution (Producers vs. Consumers)

If you have computing tasks that need to be executed for a long time, such as data analysis, ETL, machine learning, etc., you need to scale tasks from a message queue or database for execution, and you need to scale them according to the number of tasks.

Periodicity

If your business has periodic peaks and troughs, you can use KEDA to scale up and down in advance before the peak occurs, and then slowly scale in after the peak occurs.

Kubernetes 事件驱动弹性伸缩最佳实践-认识 KEDA