通过定制化Prometheus实现定制化HPA

问题1：为啥要定制化HPA

以前，无论是OCP还是K8S通过CPU的利用率来实现HPA。通过内存利用率也可以实现HPA，但相对没有CPU那么有效（Java应用的内存变化并不像CPU那么明显）。

仅通过CPU利用率做HPA太单一，因此需要定制化HPA，比如通过配置https的访问量来实现HPA，这样才更贴近应用。

问题2：HPA的scaleup和scaledown时间各是多少？

针对CPU利用率的HPA而言，当检测到利用率超过阈值时，进行pod增加。从业务角度，增加是越敏感越好。

那么，当CPU利用率低于HPA设定的阈值时，是否立刻scale down？

从业务的角度，肯定是不希望立刻减少。也就是需要一个类似“缓冲时间，让子弹飞一会。

OCP和K8S HPA scale down的缓冲时间是5分钟。

--horizontal-pod-autoscaler-cpu-initialization-period = 5 minutes

--horizontal-pod-autoscaler-downscale-stabilization = 5 minutes

--horizontal-pod-autoscaler-initial-readiness-delay = 30 seconds

--horizontal-pod-autoscaler-sync-period = 15 seconds

--horizontal-pod-autoscaler-tolerance = 0.1

上面的HPA的flag参数，可以修改，但是需要API v2beta1，只有只有1.18 以上版本的k8s才有这个功能。而且这个API目前处于beta1，因此在OCP上，不建议进行修改，修改了不生效我不保证800支持。而且5分钟时间上也够了。我们知道有这档子事就可以，后面这个API正式GA后再用不迟。

修改的方式我列出来：

问题3：OCP上如何定制化HPA

OCP上默认有Prometheus，但如果定制化参数，我建议新部署一个Prometheus。让OCP自带的Prometheus做它默认干的事情。本质上，custom hpa并不是说必须使用新的prometheus，使用ocp自带的监控系统也是可以的。只不过，处于业务监控和基础设施监控分离的原则，可以新建promethues专门负责采集业务应用层面的指标，比如实验中和http request。

prometheus本质是去抓应用的指标，也就是说，如果从prometheus的一个指标中获取应用的指标，那么这个应用就必须公开这个指标（写应用的时候公开，或者用sidecar）。

我们想在OCP中通过客户化指标，如http_requests实现HPA，那就需要：

1. 应用开放了这个指标，否则Promthues 没法抓。

2. 需要部署一个Promthues adapter，这个adapter有http_requests的指标。部署完adapter后，它会挺API。Promthues抓到应用http_requests的数据后，会让在adpter中，然后HPA和这个adapter的API对接，才能通过http_requests进行HPA。http_requests是让HPA能够用http_requests指标的桥梁。

下面步骤的逻辑是：

一个新的HPA=====>一个新指标=======>一个新的Promthues adapter============>1个新的Promthues实例=====>1个新的service monitor实例===>一个新的namespace===>一个新的被监控的应用

配置步骤如下：

部署Prometheus Operator，通过UI部署Prometheus Operator

[[email protected] ~]$ oc new-project my-prometheus

在OCP的OperatorHub中安装Prometheus Operator到my-prometheus项目中 click Operators > Installed Operators > Prometheus Operator

创建Service Monitor实例

（Watch pods based on a matchLabel selector

Be watched by a Prometheus instance based on the Service Monitor's labels）

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

metadata:

labels:

lab: custom-hpa

spec:

namespaceSelector:

matchNames:

- my-prometheus

- my-hpa

selector:

matchLabels:

app: pod-autoscale

endpoints:

- port: 8080-tcp

interval: 30s

创建Prometheus实例

kind: Prometheus

prometheus: my-prometheus

namespace: my-prometheus

replicas: 2

serviceAccountName: prometheus-k8s

securityContext: {}

serviceMonitorSelector:

lab: custom-hpa

创建并查看prometheus的路由：

[[email protected] ~]$ oc expose svc prometheus-operated -n my-prometheus

route.route.openshift.io/prometheus-operated exposed

[[email protected] ~]$ oc get route prometheus-operated -o jsonpath='{.spec.host}{"\n"}' -n my-prometheus

prometheus-operated-my-prometheus.apps.weixinyucluster.bluecat.ltd

现在已经部署了ServiceMonitor和Prometheus实例，我们应该能够在Prometheus UI中查询http_requests_total指标，但是没有数据。缺少两个关键要素：

Prometheus没有适当的RBAC权限来查询其他名称空间。

没有设置可以转换Kubernetes HPA的Prometheus指标的适配器

首先，可以通过为my-prometheus命名空间中的Prometheus使用的ServiceAccount赋予对my-hpa命名空间的适当访问权限来解决RBAC权限。

$ echo "---

kind: RoleBinding

apiVersion: rbac.authorization.k8s.io/v1

namespace: my-hpa

subjects:

- kind: ServiceAccount

namespace: my-prometheus

roleRef:

apiGroup: rbac.authorization.k8s.io

kind: ClusterRole

返回Prometheus用户界面，再次对http_requests_total执行查询，您应该会看到结果。如果没有立即看到结果，请耐心等待。

Prometheus可以正常工作了，现在就可以将其连接到Kubernetes，以便HPA可以根据自定义指标进行操作。对象列表如下：

APIService

ServiceAccount

ClusterRole - custom metrics-server-resources

ClusterRole - custom-metrics-resource-reader

ClusterRoleBinding - custom-metrics:system:auth-delegator

ClusterRoleBinding - custom-metrics-resource-reader

ClusterRoleBinding - hpa-controller-custom-metrics

RoleBinding - custom-metrics-auth-reader

Secret

ConfigMap

Deployment

Service

创建所有对象

[[email protected] ~]$ oc create -f https://raw.githubusercontent.com/redhat-gpte-devopsautomation/ocp_advanced_deployment_resources/master/ocp4_adv_deploy_lab/custom_hpa/custom_adapter_kube_objects.yaml

apiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io created

serviceaccount/my-metrics-apiserver created

clusterrole.rbac.authorization.k8s.io/my-metrics-server-resources created

clusterrole.rbac.authorization.k8s.io/my-metrics-resource-reader created

clusterrolebinding.rbac.authorization.k8s.io/my-metrics:system:auth-delegator created

clusterrolebinding.rbac.authorization.k8s.io/my-metrics-resource-reader created

clusterrolebinding.rbac.authorization.k8s.io/my-hpa-controller-custom-metrics created

rolebinding.rbac.authorization.k8s.io/my-metrics-auth-reader created

secret/cm-adapter-serving-certs created

configmap/adapter-config created

deployment.apps/custom-metrics-apiserver created

service/my-metrics-apiserver created

查看API service被创建：

[[email protected] ~]$ oc get apiservice v1beta1.custom.metrics.k8s.io

NAME SERVICE AVAILABLE AGE

v1beta1.custom.metrics.k8s.io my-prometheus/my-metrics-apiserver True 19s

查看API中包含pods/http指标：

[[email protected] ~]$ oc get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq -r '.resources[] | select(.name | contains("pods/http"))'

{

"name": "pods/http_requests",

"singularName": "",

"namespaced": true,

"kind": "MetricValueList",

"verbs": [

"get"

]

}

================================================================

验证应用的Custom HPA

[[email protected] ~]$ echo "---

kind: HorizontalPodAutoscaler

apiVersion: autoscaling/v2beta1

metadata:

namespace: my-hpa

spec:

scaleTargetRef:

kind: DeploymentConfig

apiVersion: apps.openshift.io/v1

minReplicas: 1

maxReplicas: 5

metrics:

- type: Pods

pods:

metricName: http_requests

targetAverageValue: 500m" | oc create -f -

horizontalpodautoscaler.autoscaling/pod-autoscale-custom created

为了生成负载，请打开另一个SSH终端并且运行：

[[email protected] ~]$ AUTOSCALE_ROUTE=$(oc get route pod-autoscale -n my-hpa -o jsonpath='{ .spec.host}')

[[email protected] ~]$ while true;do curl http://$AUTOSCALE_ROUTE;sleep .5;done

Hello! My name is pod-autoscale-2-pvrw8. I have served 19 requests so far.

Hello! My name is pod-autoscale-2-pvrw8. I have served 20 requests so far.

Hello! My name is pod-autoscale-2-pvrw8. I have served 21 requests so far.

Hello! My name is pod-autoscale-2-pvrw8. I have served 22 requests so far.

Hello! My name is pod-autoscale-2-pvrw8. I have served 23 requests so far.

查看hpa的状态：

[[email protected] ~]$ oc describe hpa pod-autoscale-custom -n my-hpa

Name: pod-autoscale-custom

Namespace: my-hpa

Labels: <none>

Annotations: <none>

CreationTimestamp: Fri, 31 Jul 2020 12:58:08 +0000

Reference: DeploymentConfig/pod-autoscale

Metrics: ( current / target )

"http_requests" on pods: 2 / 500m

Min replicas: 1

Max replicas: 5

DeploymentConfig pods: 1 current / 4 desired

Conditions:

Type Status Reason Message

---- ------ ------ -------

AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 4

ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric http_requests

ScalingLimited False DesiredWithinRange the desired count is within the acceptable range

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal SuccessfulRescale 8s (x3 over 38s) horizontal-pod-autoscaler New size: 4; reason: pods metric http_requests above target

Type Reason Age From Message

---- ------ ---- ---- -------

Normal SuccessfulRescale 12s (x3 over 42s) horizontal-pod-autoscaler New size: 4; reason: pods metric http_requests above target

确认pod已经扩容，并且扩容的原因是：pods metric http_requests above target

[[email protected] ~]$ oc get pods -n my-hpa

NAME READY STATUS RESTARTS AGE

pod-autoscale-1-deploy 0/1 Completed 0 26m

pod-autoscale-2-2vrgc 0/1 ContainerCreating 0 1s

pod-autoscale-2-deploy 0/1 Completed 0 24m

pod-autoscale-2-dqdrg 0/1 ContainerCreating 0 1s

pod-autoscale-2-pvrw8 1/1 Running 0 24m

pod-autoscale-2-t52hd 0/1 ContainerCreating 0 1s

通过定制化Prometheus实现定制化HPA

继续阅读

关于Gradle配置的小结

Java小案例——随机数猜测随机数猜测

nginx location中斜线的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method