天天看點

如何檢視yarn container日志 檢視yarn任務所有container資訊 yarn gc日志的檢視和簡單分析一、shell和yarn基礎指令準備二、app日志檢視總結

檢視yarn任務日志 任務運作中和結束時檢視container的日志 GC日志的簡單分析

文章目錄

  • 一、shell和yarn基礎指令準備
  • 二、app日志檢視
    • 1. 直接使用yarn logs,可選擇指定日志類型,也可"手動"重定向至本地檔案
    • 2. 使用yarn logs檢視,并使用-out參數"自動"落盤本地檔案
    • 3. 檢視running狀态app的containerId及其日志
    • 4. 如何查詢已結束運作的container資訊
    • 5. 檢視gc日志
  • 總結

一、shell和yarn基礎指令準備

1. 别名alias和錯誤輸出重定向

$ alias lg='yarn logs -applicationId application_1652362266025_4019 [email protected] 2>/dev/null'
# 這樣後面測試的時候就不用每次書寫前年那固定的一長串了 -log_files <args> 等參數将填充至"[email protected]"位置
# 2>/dev/null 錯誤日志重定向至黑洞,更友善檢視結果
$ lg -log_files stdout | head
Container: container_e105_1652362266025_4019_01_000022 on ****-bg-w03_45454_1652490636096
LogAggregationType: AGGREGATED
=========================================================================================
LogType:stdout
LogLastModifiedTime:Sat May 14 09:10:36 +0800 2022
LogLength:6368
LogContents:
2022-05-14 08:11:37 Starting to run new task attempt: attempt_1652362266025_4019_1_10_000002_0
1.783: [GC pause (Metadata GC Threshold) (young) (initial-mark), 0.0371225 secs]
   [Parallel Time: 13.8 ms, GC Workers: 23]

           

2. yarn的幾個基礎指令

app|application      prints application(s) report/kill application/manage long running application
applicationattempt   prints applicationattempt(s) report
container            prints container(s) report
logs                 dump container logs
           

3. find的-type參數 及 grep的-o -e參數

find [options]
-type c
      d      檔案夾:directory
      p      named pipe (FIFO)
      f      正常檔案:regular file
      s      套接字:socket

grep [options]
-e, --regexp=PATTERN  正則:use PATTERN for matching
-i, --ignore-case     忽略大小寫:ignore case distinctions
-w, --word-regexp     比對完整單詞:force PATTERN to match only whole words
-v, --invert-match    反向選擇比對内容:select non-matching lines
-n, --line-number     列印所在行号:print line number with output lines
-h, --no-filename     結果首列不輸出檔案名:suppress the file name prefix on output
-o, --only-matching   隻輸出比對到的部分:show only the part of a line matching PATTERN
-r, --recursive       檔案夾遞歸:like --directories=recurse
[注]:
egrep 等同 grep -e
           

二、app日志檢視

1. 直接使用yarn logs,可選擇指定日志類型,也可"手動"重定向至本地檔案

代碼如下(示例):

$ yarn logs -applicationId application_1652362266025_4019 -log_files stderr 2>/dev/null | head
Container: container_e105_1652362266025_4019_01_000022 on ***-bg-w03_45454_1652490636096
LogAggregationType: AGGREGATED
=========================================================================================
LogType:stderr
LogLastModifiedTime:Sat May 14 09:10:36 +0800 2022
LogLength:188
LogContents:
2022-05-14 08:11:37 Starting to run new task attempt: attempt_1652362266025_4019_1_10_000002_0
2022-05-14 08:11:41 Completed running task attempt: attempt_1652362266025_4019_1_10_000002_0
...
           

2. 使用yarn logs檢視,并使用-out參數"自動"落盤本地檔案

# 沒權限安裝tree 此處用的另一叢集
# 将日志落盤到目前檔案夾logs中
$ yarn logs -applicationId application_1650527019982_0697 -out ./logs
# 檢視日志目錄結構 子目錄為節點名稱 内部的檔案為節點中運作此app的container的日志
$ tree logs/
logs/
├── node-group-***1.mrs-lcnd.com_8041
│   ├── container_e03_1650527019982_0697_01_000001
│   ├── container_e03_1650527019982_0697_01_000003
│   ├── container_e03_1650527019982_0697_01_000012
...
├── node-group-***2.mrs-lcnd.com_8041
│   ├── container_e03_1650527019982_0697_01_000002
│   ├── container_e03_1650527019982_0697_01_000004
│   ├── container_e03_1650527019982_0697_01_000006
...
│   └── container_e03_1650527019982_0697_01_000044
└── node-group-***3.mrs-lcnd.com_8041
    ├── container_e03_1650527019982_0697_01_000005
    ├── container_e03_1650527019982_0697_01_000009
    ├── container_e03_1650527019982_0697_01_000010
...
# 因為沒有使用-log_files進行過濾,所有每個檔案都包含所屬container的所有日志類型,可使用以下指令驗證:
$ cat logs/node-group-***1.mrs-lcnd.com_8041/container_e03_1650527019982_0697_01_000001 | egrep -i '^LogType'
LogType:container-localizer-syslog
LogType:directory.info
LogType:launch_container.sh
LogType:prelaunch.out
LogType:stderr
LogType:stdout
LogType:syslog
           

3. 檢視running狀态app的containerId及其日志

# 1. 擷取application的attemptId
$ yarn applicationattempt -list application_1652362266025_4832 2>/dev/null
Total number of application attempts :1
         ApplicationAttempt-Id         State      AM-Container-Id                                 Tracking-URL
appattempt_1652362266025_4832_000001   RUNNING    container_e105_1652362266025_4832_01_000001     http://****-bg-w01:8088/proxy/application_1652362266025_4832/

# 2. 擷取此attempt對應的container
$ yarn container -list appattempt_1652362266025_4832_000001 2>/dev/null
# 輸出格式同上  為友善檢視  此處轉換一下格式
Total number of containers :1
# 表頭              # 内容
Container-Id       container_e105_1652362266025_4832_01_000001
Start Time         Sat May 14 16:19:38 +0800 2022
Finish Time        N/A
State              RUNNING
Host               ****-bg-w18:45454
Node Http Address  http://****-bg-w18:8042
LOG-URL            http://****-bg-w01:8188/applicationhistory/logs/****-bg-w18:45454/container_e105_1652362266025_4832_01_000001/container_e105_1652362266025_4832_01_000001/dmp_operator1

# 此為一個hive應用,目前沒有查詢任務,清單中隻有1個running狀态的container,對比發現,此container就是am所在的container,當有查詢任務運作時,此處會出現多個container

# 3. 檢視container的日志
$ yarn logs -containerId container_e105_1652362266025_4835_01_000001 -log_files stderr 2>/dev/null | head
Container: container_e105_1652362266025_4835_01_000001 on ****-bg-w19:45454
LogAggregationType: LOCAL
===========================================================================
LogType:stderr
LogLastModifiedTime:Sat May 14 16:32:26 +0800 2022
LogLength:472
LogContents:
2022-05-14 16:23:24 Running Dag: dag_1652362266025_4835_1
2022-05-14 16:23:59 Completed Dag: dag_1652362266025_4835_1
2022-05-14 16:25:45 Running Dag: dag_1652362266025_4835_2
...
           

4. 如何查詢已結束運作的container資訊

通過3發現,通過attemptId隻能查詢處于運作狀态的container資訊,如果想要檢視曆史的資訊,可以使用yarn logs

$ lg -show_application_log_info
Application State: Completed.
Container: container_e105_1652362266025_4019_01_000022 on ****-bg-w03_45454_1652490636096
Container: container_e105_1652362266025_4019_01_000009 on ****-bg-w04_45454_1652490635977
Container: container_e105_1652362266025_4019_01_000014 on ****-bg-w04_45454_1652490635977
Container: container_e105_1652362266025_4019_01_000020 on ****-bg-w06_45454_1652490636083
Container: container_e105_1652362266025_4019_01_000012 on ****-bg-w07_45454_1652490636059
...
# 使用egrep -o 從中切分出container的id
$ lg -show_application_log_info | egrep -o 'container_e[_0-9]*'
container_e105_1652362266025_4019_01_000022
container_e105_1652362266025_4019_01_000009
container_e105_1652362266025_4019_01_000014
container_e105_1652362266025_4019_01_000020
container_e105_1652362266025_4019_01_000012
...
# 同理也可切分出node節點,不過node節點也可以使用指令檢視
$ lg -list_nodes
****-bg-w03_45454_1652490636096
****-bg-w04_45454_1652490635977
****-bg-w06_45454_1652490636083
****-bg-w07_45454_1652490636059
****-bg-w10_45454_1652490636542
...
           

5. 檢視gc日志

萬字長文教你看懂java G1垃圾回收日志

gc日志在yarn中的類型為:LogType:stdout

通過下圖,我們主要從日志提取 時間offset 和 gc實際用時real 兩行,對gc情況有個大緻的了解

如何檢視yarn container日志 檢視yarn任務所有container資訊 yarn gc日志的檢視和簡單分析一、shell和yarn基礎指令準備二、app日志檢視總結
1. 使用指令檢視 
# app級别: yarn logs -applicationId <application ID>
# container級别: yarn logs -applicationId <application ID> -containerId <Container ID>
# 實操中可能後者更有意義,因為gc一般出現在個别container中
$ lg -containerId container_e105_1652362266025_4019_01_000014 -log_files stdout | egrep '\[GC pause|real='
1.529: [GC pause (Metadata GC Threshold) (young) (initial-mark), 0.0286059 secs]
 [Times: user=0.24 sys=0.02, real=0.03 secs]
 [Times: user=0.09 sys=0.00, real=0.01 secs]
 [Times: user=0.03 sys=0.01, real=0.01 secs]
...
273.086: [GC pause (G1 Evacuation Pause) (young), 0.0094619 secs]
 [Times: user=0.07 sys=0.01, real=0.01 secs]
400.869: [GC pause (G1 Humongous Allocation) (young) (initial-mark), 0.0097788 secs]
 [Times: user=0.07 sys=0.02, real=0.01 secs]
 [Times: user=0.15 sys=0.00, real=0.01 secs]
 [Times: user=0.02 sys=0.01, real=0.01 secs]
# 輸出格式化
$ lg -containerId container_e105_1652362266025_4019_01_000014 -log_files stdout | egrep '\[GC pause|real=' | column -s, -t
1.529: [GC pause (Metadata GC Threshold) (young) (initial-mark)       0.0286059 secs]
 [Times: user=0.24 sys=0.02                                           real=0.03 secs]
 [Times: user=0.09 sys=0.00                                           real=0.01 secs]
 [Times: user=0.03 sys=0.01                                           real=0.01 secs]
2.617: [GC pause (Metadata GC Threshold) (young) (initial-mark)       0.0365129 secs]
 [Times: user=0.37 sys=0.04                                           real=0.04 secs]
 [Times: user=0.15 sys=0.00                                           real=0.01 secs]
 [Times: user=0.03 sys=0.00                                           real=0.01 secs]
...

2. 使用-out落盤到本地,使用shell指令檢視
# 使用指令找出絕對路徑
$ find logs/ -type f
logs/****-bg-w03_45454_1652490636096/container_e105_1652362266025_4019_01_000022
logs/****-bg-w17_45454_1652490635505/container_e105_1652362266025_4019_01_000005
logs/****-bg-w21_45454_1652490636022/container_e105_1652362266025_4019_01_000011
logs/****-bg-w10_45454_1652490636542/container_e105_1652362266025_4019_01_000023
...
# 同上,cat後使用egrep進行檢視
# 這裡示範一下gc次數統計
$ find logs/ -type f | while read line
> do
> echo $line
>  egrep '\[GC pause' $line | wc -l
> done
logs/****-bg-w03_45454_1652490636096/container_e105_1652362266025_4019_01_000022
3
logs/****-bg-w17_45454_1652490635505/container_e105_1652362266025_4019_01_000005
5
logs/****-bg-w21_45454_1652490636022/container_e105_1652362266025_4019_01_000011
2
logs/****-bg-w10_45454_1652490636542/container_e105_1652362266025_4019_01_000023
1304
logs/****-bg-w06_45454_1652490636083/container_e105_1652362266025_4019_01_000020
5
...
# 很顯然,其中一個container的gc次數過于頻繁
           
如何檢視yarn container日志 檢視yarn任務所有container資訊 yarn gc日志的檢視和簡單分析一、shell和yarn基礎指令準備二、app日志檢視總結

總結

其實yarn開啟日志聚合功能以後,很多都可以網頁端檢視,但是使用指令其實可以實作更加強大和精細化的功能,可以通過官方的–help文檔多多學習和測試。

上述指令也可以寫一個簡單的腳本進行封裝,以友善出現異常時進行快速定位。

如有錯誤或者更好的方法還請大家不吝指正,大家一起變得更強!