天天看點

【Spark】Spark Web UI - SQL

  工作中經常會出現 Spark SQL 執行很慢或者失敗的情況,如果要排查問題,就必須要學會看 Spark Web UI。可以參考官網來學習:https://spark.apache.org/docs/3.2.1/web-ui.html#content。關于 Spark Web UI,上面有很多個 tab 頁,後面逐一學習。

【Spark】Spark Web UI - SQL

今天學習一個常用的 tab —— SQL。

SQL Tab

If the application executes Spark SQL queries, the SQL tab displays information, such as the duration, jobs, and physical and logical plans for the queries. Here we include a basic example to illustrate this tab:

如果應用程式執行 Spark SQL 查詢,SQL 頁籤會顯示查詢的持續時間、作業以及實體和邏輯計劃等資訊。這裡我們包含一個基本示例來說明此頁籤:

scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", "name")
df: org.apache.spark.sql.DataFrame = [count: int, name: string]

scala> df.count
res0: Long = 3                                                                  

scala> df.createGlobalTempView("df")

scala> spark.sql("select name,sum(count) from global_temp.df group by name").show
+----+----------+
|name|sum(count)|
+----+----------+
|andy|         3|
| bob|         2|
+----+----------+
           
【Spark】Spark Web UI - SQL
Now the above three dataframe/SQL operators are shown in the list. If we click the ‘show at : 24’ link of the last query, we will see the DAG and details of the query execution.

現在上述三個 dataframe/SQL 運算符顯示在清單中。如果我們單擊最後一個查詢的

show at <console>: 24

連結,我們将看到 DAG 和查詢執行的詳細資訊。

【Spark】Spark Web UI - SQL
The query details page displays information about the query execution time, its duration, the list of associated jobs, and the query execution DAG. The first block ‘WholeStageCodegen (1)’ compiles multiple operators (‘LocalTableScan’ and ‘HashAggregate’) together into a single Java function to improve performance, and metrics like number of rows and spill size are listed in the block. The annotation ‘(1)’ in the block name is the code generation id. The second block ‘Exchange’ shows the metrics on the shuffle exchange, including number of written shuffle records, total data size, etc.

查詢詳細資訊頁面顯示有關查詢執行時間、持續時間、關聯作業清單和查詢執行 DAG 的資訊。 第一個塊 “WholeStageCodegen (1)” 将多個運算符( “LocalTableScan” 和 “HashAggregate” )一起編譯成一個 Java 函數以提高性能,該塊中列出了行數和溢出大小等名額。 塊名稱中的注釋 “(1)” 是代碼生成 ID。 第二個 “Exchange” 塊顯示了 shuffle 交換的名額,包括寫入的 shuffle 記錄數、總資料大小等。

【Spark】Spark Web UI - SQL
Clicking the ‘Details’ link on the bottom displays the logical plans and the physical plan, which illustrate how Spark parses, analyzes, optimizes and performs the query. Steps in the physical plan subject to whole stage code generation optimization, are prefixed by a star followed by the code generation id, for example: ‘*(1) LocalTableScan’

單擊底部的

Details

連結會顯示邏輯計劃和實體計劃,說明 Spark 如何解析、分析、優化和執行查詢。 實體計劃中要進行全階段代碼生成優化的步驟,以星号為字首,後跟代碼生成 id,例如:‘*(1) LocalTableScan’

SQL metrics

The metrics of SQL operators are shown in the block of physical operators. The SQL metrics can be useful when we want to dive into the execution details of each operator. For example, “number of output rows” can answer how many rows are output after a Filter operator, “shuffle bytes written total” in an Exchange operator shows the number of bytes written by a shuffle.

SQL 算子的名額顯示在實體算子塊中。 當我們想要深入了解每個運算符的執行細節時,SQL 名額會很有用。 例如,“number of output rows” 可以回答過濾操作符後輸出了多少行,交換操作符中的 “shuffle byteswritten total” 顯示了 shuffle 寫入的位元組數。

Here is the list of SQL metrics:

SQL metrics Meaning Operators

number of output rows

the number of output rows of the operator Aggregate operators, Join operators, Sample, Range, Scan operators, Filter, etc.

data size

the size of broadcast/shuffled/collected data of the operator BroadcastExchange, ShuffleExchange, Subquery

time to collect

the time spent on collecting data BroadcastExchange, Subquery

scan time

the time spent on scanning data ColumnarBatchScan, FileSourceScan

metadata time

the time spent on getting metadata like number of partitions, number of files FileSourceScan

shuffle bytes written

the number of bytes written CollectLimit, TakeOrderedAndProject, ShuffleExchange

shuffle records written

the number of records written CollectLimit, TakeOrderedAndProject, ShuffleExchange

shuffle write time

the time spent on shuffle writing CollectLimit, TakeOrderedAndProject, ShuffleExchange

remote blocks read

the number of blocks read remotely CollectLimit, TakeOrderedAndProject, ShuffleExchange

remote bytes read

the number of bytes read remotely CollectLimit, TakeOrderedAndProject, ShuffleExchange

remote bytes read to disk

the number of bytes read from remote to local disk CollectLimit, TakeOrderedAndProject, ShuffleExchange

local blocks read

the number of blocks read locally CollectLimit, TakeOrderedAndProject, ShuffleExchange

local bytes read

the number of bytes read locally CollectLimit, TakeOrderedAndProject, ShuffleExchange

fetch wait time

the time spent on fetching data (local and remote) CollectLimit, TakeOrderedAndProject, ShuffleExchange

records read

the number of read records CollectLimit, TakeOrderedAndProject, ShuffleExchange

sort time

the time spent on sorting Sort

peak memory

the peak memory usage in the operator Sort, HashAggregate

spill size

number of bytes spilled to disk from memory in the operator Sort, HashAggregate

time in aggregation build

the time spent on aggregation HashAggregate, ObjectHashAggregate

avg hash probe bucket list iters

the average bucket list iterations per lookup during aggregation HashAggregate

data size of build side

the size of built hash map ShuffledHashJoin

time to build hash map

the time spent on building hash map ShuffledHashJoin

task commit time

the time spent on committing the output of a task after the writes succeed any write operation on a file-based table

job commit time

the time spent on committing the output of a job after the writes succeed any write operation on a file-based table

歡迎點選此處關注公衆号。

繼續閱讀