Spark1.6-----源碼解讀之TaskScheduler

2023-07-16 09:23:14

TaskScheduler是SparkContext重要成員之一，負責任務的送出，并且請求叢集管理器對任務排程。他也可以看做任務排程的用戶端。

SparkContext 522行建立TaskScheduler：

val (sched, ts) = SparkContext.createTaskScheduler(this, master)

SparkContext 2592行為createTaskScheduler具體實作方法：

private def createTaskScheduler(
      sc: SparkContext,
      master: String): (SchedulerBackend, TaskScheduler) = {
    import SparkMasterRegex._

    // When running locally, don't try to re-execute tasks on failure.
    val MAX_LOCAL_TASK_FAILURES = 1

    master match {
      case "local" =>
        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
        val backend = new LocalBackend(sc.getConf, scheduler, 1)
        scheduler.initialize(backend)
        (backend, scheduler)

它會根據不同的master 産生不同的行為本文以Local為例子。它會建立TaskSchedulerImpl 并且建立LocalBackend：

構造代碼TaskSchedulerImpl 102行：

var dagScheduler: DAGScheduler = null

  var backend: SchedulerBackend = null

  val mapOutputTracker = SparkEnv.get.mapOutputTracker

  var schedulableBuilder: SchedulableBuilder = null
  var rootPool: Pool = null
  // default scheduler is FIFO
  private val schedulingModeConf = conf.get("spark.scheduler.mode", "FIFO")
  val schedulingMode: SchedulingMode = try {
    SchedulingMode.withName(schedulingModeConf.toUpperCase)
  } catch {
    case e: java.util.NoSuchElementException =>
      throw new SparkException(s"Unrecognized spark.scheduler.mode: $schedulingModeConf")
  }

  // This is a var so that we can reset it for testing purposes.
  private[spark] var taskResultGetter = new TaskResultGetter(sc.env, this)

解析：（1）擷取配置資訊比如排程模式（FIFO，FAIR）

（2）建立TaskResultGetter 作用是通過線程池對Worker上的Executor發送Task的執行結果進行處理。

TaskScheduleImpl的排程方式有兩種，但任務的最終排程都會落到ScheduleBackend的具體實作。

SparkContext 2603行建立LoaclBackend：

val backend = new LocalBackend(sc.getConf, scheduler, 1)

LoaclBackend比較注意的方法 123行 :

override def start() {
    val rpcEnv = SparkEnv.get.rpcEnv
    val executorEndpoint = new LocalEndpoint(rpcEnv, userClassPath, scheduler, this, totalCores)
    localEndpoint = rpcEnv.setupEndpoint("LocalBackendEndpoint", executorEndpoint)
    listenerBus.post(SparkListenerExecutorAdded(
      System.currentTimeMillis,
      executorEndpoint.localExecutorId,
      new ExecutorInfo(executorEndpoint.localExecutorHostname, totalCores, Map.empty)))
    launcherBackend.setAppId(appId)
    launcherBackend.setState(SparkAppHandle.State.RUNNING)
  }

解析：它會建立LocalEndpoint，可以看出LoaclBackend會同過LoaclEndpoint來進行消息的通信。

TaskSchedulerImpl和LoaclBackEnd建立好了便進行初始化。

SparkContext 2616行調用初始化方法：

scheduler.initialize(backend)

調用TaskSchedulerImpl 126行：

def initialize(backend: SchedulerBackend) {
    //獲得LoaclBackend引用
    this.backend = backend
    // temporarily set rootPool name to empty建立緩存隊列
    rootPool = new Pool("", schedulingMode, 0, 0)
    //建立不同的排程政策來操作隊列
    schedulableBuilder = {
      schedulingMode match {
        case SchedulingMode.FIFO =>
          new FIFOSchedulableBuilder(rootPool)
        case SchedulingMode.FAIR =>
          new FairSchedulableBuilder(rootPool, conf)
      }
    }
    schedulableBuilder.buildPools()
  }

TaskScheduler建立完畢。

Spark1.6-----源碼解讀之TaskScheduler

繼續閱讀

MyBatis源碼解析(一)——MyBatis初始化過程解MyBatis源碼解析(一)——MyBatis初始化過程解1. 準備工作2. MyBatis初始化過程

一篇文章讓你精通Java JSP規範

世界因大資料而改變

《資料結構》（嚴蔚敏,吳偉民版）課本源碼+習題集解析使用說明先附上文檔歸類目錄：部落客有話說：(已遷移到部落格園 ☛☛☛ 新部落格連結)

一步一步解析集合架構ArrayList源碼（2）

Spark的RDD轉換算子-雙value型Spark的RDD轉換算子-雙value型

SparkSQL項目練習1 準備資料2 需求：各區域熱門商品Top3

延雲行業搜尋資料庫在大資料生态中位置和重要性大資料的挑戰大資料技術的現狀延雲行業搜尋資料庫

Spark在windows環境裡跑時報錯找不到org.apache.hadoop.fs.FSDataInputStream

Spark流式分析系統實作流式實時日志分析系統

如何提高個人開源網站源碼開發使用率

Scala和Java二種方式實戰Spark Streaming開發

Spark基礎:Spark簡介及特點,運作模式,安裝Spark,Driver與Executor,Local模式,Standalone模式,Yarn模式,Mesos模式,WordCount案例,HA配置第1章 Spark概述第2章 Spark運作模式第3章案例實操

Spark實作wordcount

大資料排錯SparkSpark叢集啟動時候，JAVA_HOME is not sethadoop叢集，某台伺服器jps無任何輸出IDEAkafkahadoopspark sqlfile permissionsIDEA本地測試 - OutOfMemoryError: GC overhead limit exceededhdfs負載均衡

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結