Fluid給資料彈性一雙隐形的翅膀（1） -- 自定義彈性伸縮

作者| 車漾 Fluid社群Maintainer

作者| 謝遠東 Fluid社群Commiter

介紹

彈性伸縮作為Kubernetes的核心能力之一，但它一直是圍繞這無狀态的應用負載展開。而

提供了分布式緩存的彈性伸縮能力，可以靈活擴充和收縮資料緩存。本系列将介紹如何結合不同場景使用Fluid的資料彈性能力：

第一篇：自定義彈性伸縮
第二篇：定時彈性伸縮

背景

随着越來越多的大資料和AI等資料密集應用開始部署和運作在Kubernetes環境下，資料密集型應用計算架構的設計理念和雲原生靈活的應用編排的分歧，導緻了資料通路和計算瓶頸。雲原生資料編排引擎Fluid通過資料集的抽象，利用分布式緩存技術，結合排程器，為應用提供了資料通路加速的能力。

彈性伸縮作為Kubernetes的核心能力之一，但它一直是圍繞這無狀态的應用負載展開。而Fluid提供了分布式緩存的彈性伸縮能力，可以靈活擴充和收縮資料緩存。它基于Runtime提供了緩存空間、現有緩存比例等性能名額, 結合自身對于Runtime資源的擴縮容能力，提供資料緩存按需伸縮能力。

這個能力對于網際網路場景下大資料應用非常重要，由于多數的大資料應用都是通過端到端流水線來實作的。而這個流水線包含以下幾個步驟：

資料提取，利用Spark，MapReduce等大資料技術對于原始資料進行預處理
模型訓練，利用第一階段生成特征資料進行機器學習模型訓練，并且生成相應的模型
模型評估，通過測試集或者驗證集對于第二階段生成模型進行評估和測試
模型推理，第三階段驗證後的模型最終推送到線上為業務提供推理服務

可以看到端到端的流水線會包含多種不同類型的計算任務，針對每一個計算任務，實踐中會有合适的專業系統來處理（TensorFlow，PyTorch，Spark， Presto）；但是這些系統彼此獨立，通常要借助外部檔案系統來實作把資料從一個階段傳遞到下一個階段。但是頻繁的使用檔案系統實作資料交換，會帶來大量的 I/O 開銷，經常會成為整個工作流的瓶頸。

而Fluid對于這個場景非常适合，使用者可以建立一個Dataset對象，這個對象有能力将資料分散緩存到Kubernetes計算節點中，作為資料交換的媒體，這樣避免了資料的遠端寫入和讀取，提升了資料使用的效率。但是這裡的問題是臨時資料緩存的資源預估和預留。由于在資料生産消費之前，精确的資料量預估是比較難滿足，過高的預估會導緻資源預留浪費，過低的預估會導緻資料寫入失敗可能性增高。還是按需擴縮容對于使用者更加友好。我們希望能夠達成類似page cache的使用效果，對于最終使用者來說這一層是透明的但是它帶來的緩存加速效果是實實在在的。

我們通過自定義HPA機制，通過Fluid引入了緩存彈性伸縮能力。彈性伸縮的條件是當已有緩存資料量達到一定比例時，就會觸發彈性擴容，擴容緩存空間。例如将觸發條件設定為緩存空間占比超過75%，此時總的緩存空間為10G，當資料已經占滿到8G緩存空間的時候，就會觸發擴容機制。

下面我們通過一個例子幫助您體驗Fluid的自動擴縮容能力。

前提條件

推薦使用Kubernetes 1.18以上，因為在1.18之前，HPA是無法自定義擴縮容政策的，都是通過寫死實作的。而在1.18後，使用者可以自定義擴縮容政策的，比如可以定義一次擴容後的冷卻時間。

具體步驟

1.安裝jq工具友善解析json，在本例子中我們使用作業系統是centos，可以通過yum安裝jq

yum install -y jq

2.下載下傳、安裝Fluid最新版

git clone https://github.com/fluid-cloudnative/fluid.git
cd fluid/charts
kubectl create ns fluid-system
helm install fluid fluid

3.部署或配置 Prometheus

這裡通過Prometheus對于AlluxioRuntime的緩存引擎暴露的 Metrics 進行收集，如果叢集内無 prometheus:

$ cd fluid
$ kubectl apply -f integration/prometheus/prometheus.yaml

如叢集内有 prometheus,可将以下配置寫到 prometheus 配置檔案中:

scrape_configs:
  - job_name: 'alluxio runtime'
    metrics_path: /metrics/prometheus
    kubernetes_sd_configs:
      - role: endpoints
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_label_monitor]
      regex: alluxio_runtime_metrics
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      regex: web
      action: keep
    - source_labels: [__meta_kubernetes_namespace]
      target_label: namespace
      replacement: $1
      action: replace
    - source_labels: [__meta_kubernetes_service_label_release]
      target_label: fluid_runtime
      replacement: $1
      action: replace
    - source_labels: [__meta_kubernetes_endpoint_address_target_name]
      target_label: pod
      replacement: $1
      action: replace

4.驗證 Prometheus 安裝成功

$ kubectl get ep -n kube-system  prometheus-svc
NAME             ENDPOINTS        AGE
prometheus-svc   10.76.0.2:9090   6m49s
$ kubectl get svc -n kube-system prometheus-svc
NAME             TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
prometheus-svc   NodePort   172.16.135.24   <none>        9090:32114/TCP   2m7s

如果希望可視化監控名額，您可以安裝Grafana驗證監控資料，具體操作可以參考

文檔

5.部署 metrics server

檢查該叢集是否包括metrics-server, 執行

kubectl top node

有正确輸出可以顯示記憶體和CPU，則該叢集metrics server配置正确

kubectl top node
NAME                       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
192.168.1.204   93m          2%     1455Mi          10%
192.168.1.205   125m         3%     1925Mi          13%
192.168.1.206   96m          2%     1689Mi          11%

否則手動執行以下指令

kubectl create -f integration/metrics-server

6.部署 custom-metrics-api 元件

為了基于自定義名額進行擴充，你需要擁有兩個元件。第一個元件是從應用程式收集名額并将其存儲到Prometheus時間序列資料庫。第二個元件使用收集的度量名額來擴充Kubernetes自定義metrics API，即 k8s-prometheus-adapter。第一個元件在第三步部署完成，下面部署第二個元件：

如果已經配置了custom-metrics-api，在adapter的configmap配置中增加與dataset相關的配置

apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
    - seriesQuery: '{__name__=~"Cluster_(CapacityTotal|CapacityUsed)",fluid_runtime!="",instance!="",job="alluxio runtime",namespace!="",pod!=""}'
      seriesFilters:
      - is: ^Cluster_(CapacityTotal|CapacityUsed)$
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pods
          fluid_runtime:
            resource: datasets
      name:
        matches: "^(.*)"
        as: "capacity_used_rate"
      metricsQuery: ceil(Cluster_CapacityUsed{<<.LabelMatchers>>}*100/(Cluster_CapacityTotal{<<.LabelMatchers>>}))

kubectl create -f integration/custom-metrics-api/namespace.yaml
kubectl create -f integration/custom-metrics-api

注意：因為custom-metrics-api對接叢集中的Prometheous的通路位址，請替換 prometheous url 為你真正使用的Prometheous位址。

檢查自定義名額

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "pods/capacity_used_rate",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "datasets.data.fluid.io/capacity_used_rate",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "namespaces/capacity_used_rate",
      "singularName": "",
      "namespaced": false,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

7.送出測試使用的Dataset

$ cat<<EOF >dataset.yaml
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: spark
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: spark
---
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
  name: spark
spec:
  replicas: 1
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 1Gi
        high: "0.99"
        low: "0.7"
  properties:
    alluxio.user.streaming.data.timeout: 300sec
EOF
$ kubectl create -f dataset.yaml
dataset.data.fluid.io/spark created
alluxioruntime.data.fluid.io/spark created

8.檢視這個Dataset是否處于可用狀态, 可以看到該資料集的資料總量為2.71GiB，目前Fluid提供的緩存節點數為1，可以提供的最大緩存能力為1GiB。此時資料量是無法滿足全量資料緩存的需求。

$ kubectl get dataset
NAME    UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
spark   2.71GiB          0.00B    1.00GiB          0.0%                Bound   7m38s

9.當該Dataset處于可用狀态後，檢視是否已經可以從custom-metrics-api獲得監控名額

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/*/capacity_used_rate" | jq
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/%2A/capacity_used_rate"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Dataset",
        "namespace": "default",
        "name": "spark",
        "apiVersion": "data.fluid.io/v1alpha1"
      },
      "metricName": "capacity_used_rate",
      "timestamp": "2021-04-04T07:24:52Z",
      "value": "0"
    }
  ]
}

10.建立 HPA任務

$ cat<<EOF > hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: spark
spec:
  scaleTargetRef:
    apiVersion: data.fluid.io/v1alpha1
    kind: AlluxioRuntime
    name: spark
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Object
    object:
      metric:
        name: capacity_used_rate
      describedObject:
        apiVersion: data.fluid.io/v1alpha1
        kind: Dataset
        name: spark
      target:
        type: Value
        value: "90"
  behavior:
    scaleUp:
      policies:
      - type: Pods
        value: 2
        periodSeconds: 600
    scaleDown:
      selectPolicy: Disabled
EOF

首先，我們解讀一下從樣例配置，這裡主要有兩部分一個是擴縮容的規則，另一個是擴縮容的靈敏度：

規則：觸發擴容行為的條件為Dataset對象的緩存資料量占總緩存能力的90%; 擴容對象為 AlluxioRuntime , 最小副本數為1，最大副本數為4; 而Dataset和AlluxioRuntime的對象需要在同一個namespace
政策：可以K8s 1.18以上的版本，可以分别針對擴容和縮容場景設定穩定時間和一次擴縮容步長比例。比如在本例子, 一次擴容周期為10分鐘(periodSeconds),擴容時新增2個副本數，當然這也不可以超過 maxReplicas 的限制；而完成一次擴容後, 冷卻時間(stabilizationWindowSeconds)為20分鐘; 而縮容政策可以選擇直接關閉。

11.檢視HPA配置，目前緩存空間的資料占比為0。遠遠低于觸發擴容的條件

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   0/90      1         4         1          33s
$ kubectl describe hpa
Name:                                                    spark
Namespace:                                               default
Labels:                                                  <none>
Annotations:                                             <none>
CreationTimestamp:                                       Wed, 07 Apr 2021 17:36:39 +0800
Reference:                                               AlluxioRuntime/spark
Metrics:                                                 ( current / target )
  "capacity_used_rate" on Dataset/spark (target value):  0 / 90
Min replicas:                                            1
Max replicas:                                            4
Behavior:
  Scale Up:
    Stabilization Window: 0 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 600 seconds
  Scale Down:
    Select Policy: Disabled
    Policies:
      - Type: Percent  Value: 100  Period: 15 seconds
AlluxioRuntime pods:   1 current / 1 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:           <none>

12.建立資料預熱任務

$ cat<<EOF > dataload.yaml
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: spark
spec:
  dataset:
    name: spark
    namespace: default
EOF
$ kubectl create -f dataload.yaml
$ kubectl get dataload
NAME    DATASET   PHASE       AGE   DURATION
spark   spark     Executing   15s   Unfinished

13.此時可以發現緩存的資料量接近了Fluid可以提供的緩存能力（1GiB）同時觸發了彈性伸縮的條件

$  kubectl  get dataset
NAME    UFS TOTAL SIZE   CACHED       CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
spark   2.71GiB          1020.92MiB   1.00GiB          36.8%               Bound   5m15s

從HPA的監控，可以看到Alluxio Runtime的擴容已經開始, 可以發現擴容的步長為2

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   100/90    1         4         2          4m20s
$ kubectl describe hpa
Name:                                                    spark
Namespace:                                               default
Labels:                                                  <none>
Annotations:                                             <none>
CreationTimestamp:                                       Wed, 07 Apr 2021 17:56:31 +0800
Reference:                                               AlluxioRuntime/spark
Metrics:                                                 ( current / target )
  "capacity_used_rate" on Dataset/spark (target value):  100 / 90
Min replicas:                                            1
Max replicas:                                            4
Behavior:
  Scale Up:
    Stabilization Window: 0 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 600 seconds
  Scale Down:
    Select Policy: Disabled
    Policies:
      - Type: Percent  Value: 100  Period: 15 seconds
AlluxioRuntime pods:   2 current / 3 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 3
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Normal   SuccessfulRescale             21s                    horizontal-pod-autoscaler  New size: 2; reason: Dataset metric capacity_used_rate above target
  Normal   SuccessfulRescale             6s                     horizontal-pod-autoscaler  New size: 3; reason: Dataset metric capacity_used_rate above target

14.在等待一段時間之後發現資料集的緩存空間由1GiB提升到了3GiB，資料緩存已經接近完成

$ kubectl  get dataset
NAME    UFS TOTAL SIZE   CACHED    CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
spark   2.71GiB          2.59GiB   3.00GiB          95.6%               Bound   12m

同時觀察HPA的狀态，可以發現此時Dataset對應的runtime的replicas數量為3，已經使用的緩存空間比例capacity_used_rate為85%，已經不會觸發緩存擴容。

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   85/90     1         4         3          11m

16.清理環境

kubectl delete hpa spark
kubectl delete dataset spark

總結

Fluid提供了結合Prometheous，Kubernetes HPA和Custom Metrics能力，根據占用緩存空間的比例觸發自動彈性伸縮的能力，實作緩存能力的按需使用。這樣能夠幫助使用者更加靈活的使用通過分布式緩存提升資料通路加速能力，後續我們會提供定時擴縮的能力，為擴縮容提供更強的确定性。

Fluid的代碼倉庫：

https://github.com/fluid-cloudnative/fluid.git

，歡迎大家關注、貢獻代碼和star。

Fluid給資料彈性一雙隐形的翅膀（1） -- 自定義彈性伸縮

介紹

背景

前提條件

具體步驟

總結

繼續閱讀

【51CTO學院三周年】自學路上的伴侶

線上教育巨頭多鄰國Duolingo入華一周年，中國市場馬力全開

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

Sql優化一：sql語句優化

Nacos 2.0 更新前後性能對比壓測

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

Storm編譯打包過程中遇到的一些問題及解決方法

MapReduce的幾個企業級經典面試案例MapReduce的幾個企業級經典面試案例

9.spark Core 進階2--Cashe

大資料排錯SparkSpark叢集啟動時候，JAVA_HOME is not sethadoop叢集，某台伺服器jps無任何輸出IDEAkafkahadoopspark sqlfile permissionsIDEA本地測試 - OutOfMemoryError: GC overhead limit exceededhdfs負載均衡

淺談企業活動中進行資料分析的重要性

Ambari介紹和架構原理

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

NOSQL安全攻擊

win10本地scala和spark安裝安裝scala安裝spark

Fluid給資料彈性一雙隐形的翅膀 （1） -- 自定義彈性伸縮

介紹

背景

前提條件

具體步驟

總結

繼續閱讀

Fluid給資料彈性一雙隐形的翅膀（1） -- 自定義彈性伸縮