天天看點

Spark on k8s前言前提條件下面詳情介紹下每個步驟送出任務到k8s叢集總結

前言

Spark 自從2.3版本以來就支援運作在k8s上,本文主要介紹如何運作Spark在阿裡雲容器服務-Kubernetes。

前提條件

1、 已經購買阿裡雲容器服務-Kubernetes。購買連結:

Kubernetes控制台

。本例k8s叢集類型為:Kubernetes 托管版。

2、 Spark鏡像已建構。本例建構Spark的鏡像的Dokerfile内容為:

# 基礎鏡像
FROM registry.cn-beijing.aliyuncs.com/acs/spark:v2.4.0

# 作者
LABEL maintainer "[email protected]"

#拷貝jar包到制定目錄
COPY ./spark-examples-0.0.1-SNAPSHOT.jar /opt/spark/examples/jars/           

鏡像建構完後的registry的位址為:registry.cn-beijing.aliyuncs.com/bill_test/image_test:v2.4.0

3、 本例通過k8s 用戶端送出yaml檔案啟動Spark任務。

下面詳情介紹下每個步驟

制作Spark鏡像

制作Spark鏡像需要先在本地安裝Docker,本例介紹Mac的安裝方法。在Mac中執行:

brew cask install docker           

安裝完畢後執行docker指令顯示如下内容說明安裝成功。

Usage:    docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Options:
      --config string      Location of client config files (default "/Users/bill.zhou/.docker")
  -D, --debug              Enable debug mode
  -H, --host list          Daemon socket(s) to connect to
  -l, --log-level string   Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
      --tls                Use TLS; implied by --tlsverify
      --tlscacert string   Trust certs signed only by this CA (default "/Users/bill.zhou/.docker/ca.pem")
      --tlscert string     Path to TLS certificate file (default "/Users/bill.zhou/.docker/cert.pem")
      --tlskey string      Path to TLS key file (default "/Users/bill.zhou/.docker/key.pem")
      --tlsverify          Use TLS and verify the remote
  -v, --version            Print version information and quit           

制作docker鏡像需要編寫Dockerfile,本例的Dockerfile建立過程如下。

#進入目錄:
cd /Users/bill.zhou/dockertest
#拷貝測試jar包到此目錄:
cp /Users/jars/spark-examples-0.0.1-SNAPSHOT.jar ./
#建立檔案Dockerfile
vi Dockerfile           

在Dockerfile檔案中輸入如下内容:

# 基礎鏡像
FROM registry.cn-beijing.aliyuncs.com/acs/spark:v2.4.0

# 作者
LABEL maintainer "[email protected]"

#拷貝jar包到制定目錄
COPY ./spark-examples-0.0.1-SNAPSHOT.jar /opt/spark/examples/jars/           

本例鏡像引用了别人的基礎鏡像:registry.cn-beijing.aliyuncs.com/acs/spark:v2.4.0,然加入了自己的測試代碼jar包:spark-examples-0.0.1-SNAPSHOT.jar。

Dockerfile編寫完畢後,開始建構鏡像,指令如下:

docker build /Users/bill.zhou/dockertest/ -t registry.cn-beijing.aliyuncs.com/bill_test/image_test:v2.4.0           

建構完畢後需要上傳鏡像到registry,指令如下:

#先登入自己的阿裡雲賬号
docker login [email protected] registry.cn-beijing.aliyuncs.com
#上傳鏡像
docker push registry.cn-beijing.aliyuncs.com/bill_test/image_test:v2.4.0           

鏡像制作完畢後可以開始試用Spark鏡像。

送出任務到k8s叢集

本例通過k8s用戶端kubectl送出yaml到k8s。

首先購買一次ECS(和k8s在同一個vpc下),安裝k8s用戶端kubectl。安裝指導參考:

安裝k8s指導

安裝完畢後配置叢集的憑證後就可以通路k8s叢集了。叢集憑證配置方法,進入k8s叢集“基本資訊”頁面擷取憑證資訊,如下圖:

Spark on k8s前言前提條件下面詳情介紹下每個步驟送出任務到k8s叢集總結

然後參考如下步驟送出spark任務:

## 安裝crd
kubectl apply -f manifest/spark-operator-crds.yaml 
## 安裝operator的服務賬号與授權政策
kubectl apply -f manifest/spark-operator-rbac.yaml 
## 安裝spark任務的服務賬号與授權政策
kubectl apply -f manifest/spark-rbac.yaml 
## 安裝spark-on-k8s-operator 
kubectl apply -f manifest/spark-operator.yaml 
## 下發spark-pi任務
kubectl apply -f examples/spark-pi.yaml            

對應的檔案參可以從

開源社群

下載下傳最新版本。

運作完畢後可以通過指令檢視運作日志。如下:

#檢視pod -n指定命名空間
kubectl get pod -n spark-operator
#檢視pod 日志
kubectl log spark-pi-driver -n spark-operator           

看下如下内容說明執行成功:

2019-07-23 11:55:54 INFO  SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1161
2019-07-23 11:55:54 INFO  DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:17) (first 15 tasks are for partitions Vector(0, 1))
2019-07-23 11:55:54 INFO  TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks
2019-07-23 11:55:55 INFO  TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, 172.20.1.9, executor 1, partition 0, PROCESS_LOCAL, 7878 bytes)
2019-07-23 11:55:55 INFO  TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, 172.20.1.9, executor 1, partition 1, PROCESS_LOCAL, 7878 bytes)
2019-07-23 11:55:57 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 172.20.1.9:36662 (size: 1256.0 B, free: 117.0 MB)
2019-07-23 11:55:57 INFO  TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 2493 ms on 172.20.1.9 (executor 1) (1/2)
2019-07-23 11:55:57 INFO  TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 2789 ms on 172.20.1.9 (executor 1) (2/2)
2019-07-23 11:55:57 INFO  TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2019-07-23 11:55:58 INFO  DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:21) finished in 5.393 s
2019-07-23 11:55:58 INFO  DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:21, took 6.501405 s
**Pi is roughly 3.142955714778574**
2019-07-23 11:55:58 INFO  AbstractConnector:318 - Stopped Spark@49096b06{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-07-23 11:55:58 INFO  SparkUI:54 - Stopped Spark web UI at http://spark-test-1563882878789-driver-svc.spark-operator-t01.svc:4040
2019-07-23 11:55:58 INFO  KubernetesClusterSchedulerBackend:54 - Shutting down all executors
2019-07-23 11:55:58 INFO  KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:54 - Asking each executor to shut down
2019-07-23 11:55:58 WARN  ExecutorPodsWatchSnapshotSource:87 - Kubernetes client has been closed (this is expected if the application is shutting down.)
2019-07-23 11:55:59 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2019-07-23 11:55:59 INFO  MemoryStore:54 - MemoryStore cleared
2019-07-23 11:55:59 INFO  BlockManager:54 - BlockManager stopped
2019-07-23 11:55:59 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2019-07-23 11:55:59 INFO  OutputCommitCoordinator$           

最後看下spark-pi.yaml檔案内容的關鍵資訊。

apiVersion: "sparkoperator.k8s.io/v1beta1"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark-operator
spec:
  type: Scala
  mode: cluster
  image: "registry.cn-beijing.aliyuncs.com/acs/spark:v2.4.0"  #運作的鏡像registry路徑。
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi #運作的入口類。
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar"  #運作的類的相關jar包,這個路徑是鏡像中的路徑。
  sparkVersion: "2.4.0"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  #定義spark driver端的資源
  driver:
    cores: 0.1
    coreLimit: "200m"
    memory: "512m"
    labels:
      version: 2.4.0
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  #定義executor 端的資源
  executor
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 2.4.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
           

yaml是送出到k8s檔案标準格式,yaml中定義了所需的鏡像,Spark driver和executor端的資源。

總結

k8s容器介紹請參考:

容器服務Kubernetes版

其它spark on k8s請參考:

Spark in action on Kubernetes - Playground搭建與架構淺析 Spark in action on Kubernetes - Spark Operator的原了解析

繼續閱讀