天天看点

通过定制化Prometheus实现定制化HPA

问题1:为啥要定制化HPA

以前,无论是OCP还是K8S通过CPU的利用率来实现HPA。通过内存利用率也可以实现HPA,但相对没有CPU那么有效(Java应用的内存变化并不像CPU那么明显)。

仅通过CPU利用率做HPA太单一,因此需要定制化HPA,比如通过配置https的访问量来实现HPA,这样才更贴近应用。

问题2:HPA的scaleup和scaledown时间各是多少?

针对CPU利用率的HPA而言,当检测到利用率超过阈值时,进行pod增加。从业务角度,增加是越敏感越好。

那么,当CPU利用率低于HPA设定的阈值时,是否立刻scale down?

从业务的角度,肯定是不希望立刻减少。也就是需要一个类似“缓冲时间,让子弹飞一会。

OCP和K8S HPA scale down的缓冲时间是5分钟。

--horizontal-pod-autoscaler-cpu-initialization-period = 5 minutes

--horizontal-pod-autoscaler-downscale-stabilization = 5 minutes

--horizontal-pod-autoscaler-initial-readiness-delay = 30 seconds

--horizontal-pod-autoscaler-sync-period = 15 seconds

--horizontal-pod-autoscaler-tolerance = 0.1

上面的HPA的flag参数,可以修改,但是需要API v2beta1,只有 只有1.18 以上版本的k8s才有这个功能。而且这个API目前处于beta1,因此在OCP上,不建议进行修改,修改了不生效我不保证800支持。而且5分钟时间上也够了。我们知道有这档子事就可以,后面这个API正式GA后再用不迟。

修改的方式我列出来:

通过定制化Prometheus实现定制化HPA

问题3:OCP上如何定制化HPA

OCP上默认有Prometheus,但如果定制化参数,我建议新部署一个Prometheus。让OCP自带的Prometheus做它默认干的事情。本质上,custom hpa并不是说必须使用新的prometheus,使用ocp自带的监控系统也是可以的。只不过,处于业务监控和基础设施监控分离的原则,可以新建promethues专门负责采集业务应用层面的指标,比如实验中和http request。

prometheus本质是去抓应用的指标,也就是说,如果从prometheus的一个指标中获取应用的指标,那么这个应用就必须公开这个指标(写应用的时候公开,或者用sidecar)。

通过定制化Prometheus实现定制化HPA

我们想在OCP中通过客户化指标,如http_requests实现HPA,那就需要:

1. 应用开放了这个指标,否则Promthues 没法抓。

2. 需要部署一个Promthues adapter,这个adapter有http_requests的指标。部署完adapter后,它会挺API。Promthues抓到应用http_requests的数据后,会让在adpter中,然后HPA和这个adapter的API对接,才能通过http_requests进行HPA。http_requests是让HPA能够用http_requests指标的桥梁。

下面步骤的逻辑是:

一个新的HPA=====>一个新指标=======>一个新的Promthues adapter============>1个新的Promthues实例=====>1个新的service monitor实例===>一个新的namespace===>一个新的被监控的应用

配置步骤如下:

部署Prometheus Operator,通过UI部署Prometheus Operator

[[email protected] ~]$ oc new-project my-prometheus

在OCP的OperatorHub中安装Prometheus Operator到my-prometheus项目中 click Operators > Installed Operators > Prometheus Operator

创建Service Monitor实例

(Watch pods based on a matchLabel selector

Be watched by a Prometheus instance based on the Service Monitor's labels)

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

metadata:

  name: pod-autoscale

  labels:

    lab: custom-hpa

spec:

  namespaceSelector:

    matchNames:

      - my-prometheus

      - my-hpa

  selector:

    matchLabels:

      app: pod-autoscale

  endpoints:

  - port: 8080-tcp

    interval: 30s

创建Prometheus实例

kind: Prometheus

  name: my-prometheus

    prometheus: my-prometheus

  namespace: my-prometheus

  replicas: 2

  serviceAccountName: prometheus-k8s

  securityContext: {}

  serviceMonitorSelector:

      lab: custom-hpa

创建并查看prometheus的路由:

[[email protected] ~]$ oc expose svc prometheus-operated -n my-prometheus

route.route.openshift.io/prometheus-operated exposed

[[email protected] ~]$ oc get route prometheus-operated -o jsonpath='{.spec.host}{"\n"}' -n my-prometheus

prometheus-operated-my-prometheus.apps.weixinyucluster.bluecat.ltd

现在已经部署了ServiceMonitor和Prometheus实例,我们应该能够在Prometheus UI中查询http_requests_total指标,但是没有数据。缺少两个关键要素:

Prometheus没有适当的RBAC权限来查询其他名称空间。

没有设置可以转换Kubernetes HPA的Prometheus指标的适配器

首先,可以通过为my-prometheus命名空间中的Prometheus使用的ServiceAccount赋予对my-hpa命名空间的适当访问权限来解决RBAC权限。

$ echo "---

kind: RoleBinding

apiVersion: rbac.authorization.k8s.io/v1

  name: my-prometheus-hpa

  namespace: my-hpa

subjects:

  - kind: ServiceAccount

    name: prometheus-k8s

    namespace: my-prometheus

roleRef:

  apiGroup: rbac.authorization.k8s.io

  kind: ClusterRole

  name: view" | oc create -f -

返回Prometheus用户界面,再次对http_requests_total执行查询,您应该会看到结果。如果没有立即看到结果,请耐心等待。

通过定制化Prometheus实现定制化HPA

Prometheus可以正常工作了,现在就可以将其连接到Kubernetes,以便HPA可以根据自定义指标进行操作。对象列表如下:

APIService

ServiceAccount

ClusterRole - custom metrics-server-resources

ClusterRole - custom-metrics-resource-reader

ClusterRoleBinding - custom-metrics:system:auth-delegator

ClusterRoleBinding - custom-metrics-resource-reader

ClusterRoleBinding - hpa-controller-custom-metrics

RoleBinding - custom-metrics-auth-reader

Secret

ConfigMap

Deployment

Service

创建所有对象

[[email protected] ~]$ oc create -f https://raw.githubusercontent.com/redhat-gpte-devopsautomation/ocp_advanced_deployment_resources/master/ocp4_adv_deploy_lab/custom_hpa/custom_adapter_kube_objects.yaml

apiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io created

serviceaccount/my-metrics-apiserver created

clusterrole.rbac.authorization.k8s.io/my-metrics-server-resources created

clusterrole.rbac.authorization.k8s.io/my-metrics-resource-reader created

clusterrolebinding.rbac.authorization.k8s.io/my-metrics:system:auth-delegator created

clusterrolebinding.rbac.authorization.k8s.io/my-metrics-resource-reader created

clusterrolebinding.rbac.authorization.k8s.io/my-hpa-controller-custom-metrics created

rolebinding.rbac.authorization.k8s.io/my-metrics-auth-reader created

secret/cm-adapter-serving-certs created

configmap/adapter-config created

deployment.apps/custom-metrics-apiserver created

service/my-metrics-apiserver created

查看API service被创建:

[[email protected] ~]$ oc get apiservice v1beta1.custom.metrics.k8s.io

NAME                            SERVICE                              AVAILABLE   AGE

v1beta1.custom.metrics.k8s.io   my-prometheus/my-metrics-apiserver   True        19s

查看API中包含pods/http指标:

[[email protected] ~]$ oc get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq -r '.resources[] | select(.name | contains("pods/http"))'

{

  "name": "pods/http_requests",

  "singularName": "",

  "namespaced": true,

  "kind": "MetricValueList",

  "verbs": [

    "get"

  ]

}

================================================================

验证应用的Custom HPA

[[email protected] ~]$ echo "---

 kind: HorizontalPodAutoscaler

 apiVersion: autoscaling/v2beta1

 metadata:

   name: pod-autoscale-custom

   namespace: my-hpa

 spec:

   scaleTargetRef:

     kind: DeploymentConfig

     name: pod-autoscale

     apiVersion: apps.openshift.io/v1

   minReplicas: 1

   maxReplicas: 5

   metrics:

     - type: Pods

       pods:

         metricName: http_requests

         targetAverageValue: 500m" | oc create -f -

horizontalpodautoscaler.autoscaling/pod-autoscale-custom created

为了生成负载,请打开另一个SSH终端并且运行:

[[email protected] ~]$ AUTOSCALE_ROUTE=$(oc get route pod-autoscale -n my-hpa -o jsonpath='{ .spec.host}')

[[email protected] ~]$ while true;do curl http://$AUTOSCALE_ROUTE;sleep .5;done

Hello! My name is pod-autoscale-2-pvrw8. I have served 19 requests so far.

Hello! My name is pod-autoscale-2-pvrw8. I have served 20 requests so far.

Hello! My name is pod-autoscale-2-pvrw8. I have served 21 requests so far.

Hello! My name is pod-autoscale-2-pvrw8. I have served 22 requests so far.

Hello! My name is pod-autoscale-2-pvrw8. I have served 23 requests so far.

查看hpa的状态:

[[email protected] ~]$  oc describe hpa pod-autoscale-custom -n my-hpa

Name:                       pod-autoscale-custom

Namespace:                  my-hpa

Labels:                     <none>

Annotations:                <none>

CreationTimestamp:          Fri, 31 Jul 2020 12:58:08 +0000

Reference:                  DeploymentConfig/pod-autoscale

Metrics:                    ( current / target )

  "http_requests" on pods:  2 / 500m

Min replicas:               1

Max replicas:               5

DeploymentConfig pods:      1 current / 4 desired

Conditions:

  Type            Status  Reason              Message

  ----            ------  ------              -------

  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 4

  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric http_requests

  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range

Events:

  Type    Reason             Age               From                       Message

  ----    ------             ----              ----                       -------

  Normal  SuccessfulRescale  8s (x3 over 38s)  horizontal-pod-autoscaler  New size: 4; reason: pods metric http_requests above target

  Type    Reason             Age                From                       Message

  ----    ------             ----               ----                       -------

  Normal  SuccessfulRescale  12s (x3 over 42s)  horizontal-pod-autoscaler  New size: 4; reason: pods metric http_requests above target

确认pod已经扩容,并且扩容的原因是:pods metric http_requests above target

[[email protected] ~]$  oc get pods -n my-hpa

NAME                     READY   STATUS              RESTARTS   AGE

pod-autoscale-1-deploy   0/1     Completed           0          26m

pod-autoscale-2-2vrgc    0/1     ContainerCreating   0          1s

pod-autoscale-2-deploy   0/1     Completed           0          24m

pod-autoscale-2-dqdrg    0/1     ContainerCreating   0          1s

pod-autoscale-2-pvrw8    1/1     Running             0          24m

pod-autoscale-2-t52hd    0/1     ContainerCreating   0          1s

下一篇: 路由扫描