spark-submit腳本執行過程注解

首先從一條指令說起：

spark-submit \
  --master yarn \
  --deploy-mode client \
  --driver-memory 10G \
  --executor-memory 10G \
  --num-executors 25 \
  --executor-cores 4 \
  --queue ltemr \
  --conf "spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78" \
  --conf "spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78" \
  --jars $(echo /home/ltemr/oozie_signal/spark/lib/*.jar | tr ' ' ',') \
  --properties-file conf/spark-properties-uemr.conf \
  uemr-streaming-driver-1.0-SNAPSHOT.jar \
  UEMRFixLocationDriver

接着會執行：

exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "[email protected]"

注解： [email protected] 表示我們spark-submit送出指令中所有的參數

我們再看看spark-class做了什麼工作，下邊是關于spark-class腳本的内容

if [ -z "${SPARK_HOME}" ]; then
  source "$(dirname "$0")"/find-spark-home
fi

#注解：加載Spark環境配置内容spark-env.sh
. "${SPARK_HOME}"/bin/load-spark-env.sh

#注解：擷取java lib用來啟動應用
if [ -n "${JAVA_HOME}" ]; then
  RUNNER="${JAVA_HOME}/bin/java"
else
  if [ "$(command -v java)" ]; then
    RUNNER="java"
  else
    echo "JAVA_HOME is not set" >&2
    exit 1
  fi
fi

#注解：擷取Spark jar包目錄以便加載
if [ -d "${SPARK_HOME}/jars" ]; then
  SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
  SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

if [ ! -d "$SPARK_JARS_DIR" ] && [ -z "$SPARK_TESTING$SPARK_SQL_TESTING" ]; then
  echo "Failed to find Spark jars directory ($SPARK_JARS_DIR)." 1>&2
  echo "You need to build Spark with the target \"package\" before running this program." 1>&2
  exit 1
else
  LAUNCH_CLASSPATH="$SPARK_JARS_DIR/*"
fi

# Add the launcher build dir to the classpath if requested.
#注：SPARK_PREPEND_CLASSES主要用于疊代式開發，不用對全部依賴進行重新編譯，隻需要對Spark本身修改部分進行編譯打包即可，以加速開發。這在Useful Developer Tools中有提及。
if [ -n "$SPARK_PREPEND_CLASSES" ]; then
  LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
fi

# For tests
if [[ -n "$SPARK_TESTING" ]]; then
  unset YARN_CONF_DIR
  unset HADOOP_CONF_DIR
fi

# The launcher library will print arguments separated by a NULL character, to allow arguments with
# characters that would be otherwise interpreted by the shell. Read that in a while loop, populating
# an array that will be used to exec the final command.
#
# The exit code of the launcher is appended to the output, so the parent shell removes it from the
# command array and checks the value to see if the launcher succeeded.
build_command() {
  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "[email protected]"
  printf "%d\0" $?
}

# Turn off posix mode since it does not allow process substitution
#注：set +o表示關閉，-o表示打開，posix是一種在Unix系統上的軟體接口标準，為跨平台相容使用，支援這種标準的軟體可以在所有Unix作業系統上移植使用。
#注：IFS表示輸入域分隔符（Input Field Separator），read -d ''表示以空格為界定符
#注：read表示讀取以IFS作為分隔符的行（可能是反斜杠分割的行）
#注：-r參數表示删除反斜杠處理這一過程，保留反斜杠
#注：CMD是一個數組，将build_command函數的輸出循環讀入數組
set +o posix
CMD=()
while IFS= read -d '' -r ARG; do
  CMD+=("$ARG")
done < <(build_command "[email protected]")

#注：指令長度
COUNT=${#CMD[@]}
#注：數組最後一個元素下标
LAST=$((COUNT - 1))
#注：數組最後一個值表示build_command執行傳回值
LAUNCHER_EXIT_CODE=${CMD[$LAST]}

# Certain JVM failures result in errors being printed to stdout (instead of stderr), which causes
# the code that parses the output of the launcher to get confused. In those cases, check if the
# exit code is an integer, and if it's not, handle it as a special error case.
#注：LAUNCHER傳回值如果不是整數，做特殊異常處理
if ! [[ $LAUNCHER_EXIT_CODE =~ ^[0-9]+$ ]]; then
  echo "${CMD[@]}" | head -n-1 1>&2
  exit 1
fi

if [ $LAUNCHER_EXIT_CODE != 0 ]; then
  exit $LAUNCHER_EXIT_CODE
fi

#注：執行送出指令
#注：組裝的完整指令示例，表示一整行指令
# JAVA_HOME/bin/java -cp SPARK_HOME/conf/:/Users/rttian/Documents/work/bigdata/spark-2.2.0-bin-hadoop2.7/jars/* 
# -Xmx1g org.apache.spark.deploy.SparkSubmit 
# --master local[3] 
# --class org.apache.spark.examples.SparkPi 
# examples/jars/spark-examples_2.11-2.2.0.jar 10
CMD=("${CMD[@]:0:$LAST}")
exec "${CMD[@]}"

spark-submit腳本執行過程注解

繼續閱讀

【51CTO學院三周年】自學路上的伴侶

線上教育巨頭多鄰國Duolingo入華一周年，中國市場馬力全開

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

Sql優化一：sql語句優化

Nacos 2.0 更新前後性能對比壓測

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

Storm編譯打包過程中遇到的一些問題及解決方法

MapReduce的幾個企業級經典面試案例MapReduce的幾個企業級經典面試案例

9.spark Core 進階2--Cashe

大資料排錯SparkSpark叢集啟動時候，JAVA_HOME is not sethadoop叢集，某台伺服器jps無任何輸出IDEAkafkahadoopspark sqlfile permissionsIDEA本地測試 - OutOfMemoryError: GC overhead limit exceededhdfs負載均衡

淺談企業活動中進行資料分析的重要性

Ambari介紹和架構原理

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

NOSQL安全攻擊

win10本地scala和spark安裝安裝scala安裝spark