天天看點

Linux常用性能工具功能、用法及原理(一)

Linux性能觀測工具按類别可分為系統級别和程序級别,系統級别對整個系統的性能做統計,而程序級别則具體到程序,為每個程序維護統計資訊。

按實作原理分,可分為基于計數器和跟蹤以及剖析。含義如下:

計數器:核心維護的統計資料,通常為無符号整型,用于對發生的事件計數,比如,網絡包接收計數器,磁盤IO計數器等。

跟蹤:跟蹤會收集每一個事件的具體資料,由于跟蹤捕獲事件資料需要消耗CPU且需要較大的存儲空間儲存收集資料,預設不開啟。日志就是一種低頻率的跟蹤,會記錄事件資料。

剖析:對目标采樣或快照來歸納目标特征,如:CPU使用率,通過對程式計數器采樣(一種寄存器,用于訓示下一條指令的位址),跟蹤棧找到消耗CPU周期的代碼路徑。剖析也可以通過非計時的硬體事件,如CPU硬體緩存未命中或總線活動,這類資訊可以幫助開發人員針對系統資源的使用來優化自己的代碼。

本文會對基于計數器原理的系統級linux性能工具做介紹,詳細說明其用法及資料來源,後續會對程序級及基于其它原理的工具做介紹。

Linux性能觀測工具按類别可分為系統級别和程序級别,系統級别對整個系統的性能做統計,而程序級别則具體到程序,為每個程序維護統計資訊。

按實作原理分,可分為基于計數器和跟蹤以及剖析。含義如下:

計數器:核心維護的統計資料,通常為無符号整型,用于對發生的事件計數,比如,網絡包接收計數器,磁盤IO計數器等。

跟蹤:跟蹤會收集每一個事件的具體資料,由于跟蹤捕獲事件資料需要消耗CPU且需要較大的存儲空間儲存收集資料,預設不開啟。日志就是一種低頻率的跟蹤,會記錄事件資料。

剖析:對目标采樣或快照來歸納目标特征,如:CPU使用率,通過對程式計數器采樣(一種寄存器,用于訓示下一條指令的位址),跟蹤棧找到消耗CPU周期的代碼路徑。剖析也可以通過非計時的硬體事件,如CPU硬體緩存未命中或總線活動,這類資訊可以幫助開發人員針對系統資源的使用來優化自己的代碼。

一 系統級計數器工具

  系統級别的工具有vmstat、mpstat、iostat、netstat、sar,這些工具有使用慣例,可選時間間隔和次數。程序級别的工具使用核心為每個程序維護的計數器,有ps、top、pmap。本節主要講基于計數器的系統級性能工具。

1.1 vmstat

  vmstat是Virtual Meomory Statistics(虛拟記憶體統計),用來報告程序、記憶體、磁盤讀寫、CPU使用整體狀況。常用形式vmstat delay count,其中delay表示采樣間隔,count表示采樣次數,指令執行結果如下圖1.1所示,其中第一行表示系統啟動以來各名額的平均值。

  

Linux常用性能工具功能、用法及原理(一)

圖1.1 vmstat指令

各個區域的含義如下:

  • Procs
    • r: The number of processes waiting for run time.運作态和就緒态程序數目
    • b: The number of processes in uninterruptible sleep.不可中斷程序數目,程序進行系統調用且被阻塞,不可中斷和kill掉時的狀态,絕大多數不可中斷系統調用都比較快完成[1],比如mkdir(2)調用過程不會傳回EINTR(調用過程被中斷傳回),不會被中斷[2]。
  • Memory
    • swpd: the amount of virtual memory used.使用虛拟記憶體的大小
    • free: the amount of idle memory.空閑記憶體的大小(實體記憶體)
    • buff: the amount of memory used as buffers。Buff用于緩存磁盤塊資料,如檔案系統中繼資料,檔案權限、位置等(metadata)
    • cache: the amount of memory used as cache。Cache用于緩存檔案内容[3]
    • inact: the amount of inactive memory. (-a option)
    • active: the amount of active memory. (-a option)
  • Swap, 交換區,當記憶體不夠時,記憶體中的頁置換到磁盤中[4]
    • si: Amount of memory swapped in from disk (/s).從磁盤交換到記憶體的大小
    • so: Amount of memory swapped to disk (/s).從記憶體交換到磁盤的大小
  • IO
    • bi: Blocks received from a block device (blocks/s).每秒從塊裝置讀出的塊數(如從磁盤讀檔案會導緻增加)
    • bo: Blocks sent to a block device (blocks/s).每秒寫入塊裝置的塊數(如寫資料到磁盤會導緻增加)
  • System
    • in: The number of interrupts per second, including the clock.每秒中斷次數(包括時鐘中斷)
    • cs: The number of context switches per second.每秒上下文切換次數CPU
    • us: Time spent running non-kernel code. (user time, including nice time),使用者态代碼時間,運作計算密集型程序(如壓縮檔案),會導緻使用者态CPU增加[3]。
    • sy: Time spent running kernel code. (system time),核心态代碼時間(如頻繁運作系統調用/dev/urandom生成随機數,會導緻sy增加)
    • id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.空閑時間,包含了wa時間(cpu空閑的時候時間上會運作空閑線程)。
    • wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.CPU空閑,但有由此CPU發起的IO在進行[4]
    • st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.Steal time is the percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor.在雲環境下,實體CPU由多台虛拟機共享,st表示本虛拟CPU等待配置設定真實CPU的時間,st過高表示可能有其他使用者的虛拟機搶占CPU[5]。

  vmstat中的統計值由核心互動檔案/proc/meminfo(記憶體使用資訊)、/proc/stat(系統整體程序狀态,如運作态、不可中斷阻塞态程序數,cpu使用情況)、/proc/*/stat(每個程序各自的狀态資訊)統計出。

/proc/meminfo檔案内容如下,stat檔案内容見1.2。

Linux常用性能工具功能、用法及原理(一)

圖1.2 meminfo檔案内容

1.2 mpstat

  mpstat是Multiprocessor Statistics的縮寫,可以檢視每個CPU核心的資訊,而vmstat隻能檢視系統整體的CPU資訊。

  用法: mpstat [-P {|ALL}] [internal [count]],-P表示CPU清單,interval表示間隔,count表示采樣次數。執行結果如下圖1.3所示,各字段含義如下:

  

Linux常用性能工具功能、用法及原理(一)

圖1.3 mpstat結果圖

  

  • CPU:Processor number. The keyword all indicates that statistics are calculated as averages among all processors.具體的CPU,all值表示所有CPU的平均值統計資訊。
  • %usr:Show the percentage of CPU utilization that occurred while executing at the user level (application).
  • %nice:Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.nice值發生變動(即優先級發生變化)的程序的CPU時間,nice值為負表示程序的優先級變高。
  • %sys:Show  the percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this does not include time spent servicing hardware and software interrupts.核心代碼運作時間,不包括中斷服務時間。
  • %iowait:Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.含義見1.1.1節vmstat的wa字段。
  • %irq:Show the percentage of time spent by the CPU or CPUs to service hardware interrupts.CPU花在硬中斷服務時間
  • %soft:Show the percentage of time spent by the CPU or CPUs to service software interrupts. CPU花在軟中斷服務時間
  • %steal:Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.含義見1.1.1節vmstat的st字段
  • %guest:Show the percentage of time spent by the CPU or CPUs to run a virtual processor.CPU運作虛拟處理器所花費的時間
  • %gnice:Show the percentage of time spent by the CPU or CPUs to run a niced guest.
  • %idle:Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.CPU空閑時間,和vmstat有點不同,這裡不包含iowait時間,但vmstat包含。

  vmstat資料來源/proc/stat,此檔案包含所有cpu的活動,是開機累積到現在的統計值。檔案内容如下圖1.4所示,cpu表示總的資訊,cpu0、cpu1…表示各cpu的資訊,從前到後的值含義分别為user、nice、system、idle、iowait、irq、softirq、stealstolen、guest,以時間片為機關,intr後的第一個值表示系統自啟動到現在的中斷次數,後面的每一個值表示某種中斷類型發生的次數,ctxt表示cpu上下文切換的次數,btime表示系統啟動到現在的時間(實際觀察前後兩次未發生變化), processes表示系統啟動到現在以來建立的任務個數, procs_running表示目前運作隊列任務個數, procs_blocked表示阻塞任務個數,softirq表示總軟中斷次數及各種類型軟中斷的次數[6]。

Linux常用性能工具功能、用法及原理(一)

圖1.4 /proc/stat檔案内容

1.3 iostat 

  iostat是I/O statistics(輸入/輸出統計)的縮寫,用來動态監視系統的磁盤操作活動。

  用法: iostat [ -p [ device [,...] | ALL ] ] [ device [...] | ALL ] [ interval [ count ] ], -p表示裝置清單,interval表示間隔,count表示采樣次數。執行結果如下圖1.5和1.6所示,其中選項-d表示顯示裝置統計(Display the device utilization report),-x表示顯示詳細資訊(Display extended statistics)。各字段含義如下:

  

Linux常用性能工具功能、用法及原理(一)

圖1.5 iostat裝置io統計資訊

Linux常用性能工具功能、用法及原理(一)

圖1.6 iostat裝置io詳細統計資訊

  

  • Device:This column gives the device (or partition) name as listed in the /dev directory.裝置名稱
  • Tps:Indicate the number of transfers per second that were issued to the device. A transfer is an I/O request to the device. Multiple logical requests can be combined into a single I/O request to the device. A transfer is of indeterminate size.每秒I/O次數(即每秒裝置請求次數)
  • Blk_read/s (kB_read/s, MB_read/s),Indicate the amount of data read from the device expressed in a number of blocks (kilobytes, megabytes) per second. Blocks are equivalent to sectors and therefore have a size of 512 bytes.每秒讀資料多少
  • Blk_wrtn/s (kB_wrtn/s, MB_wrtn/s),Indicate the amount of data written to the device expressed in a number of blocks (kilobytes, megabytes) per second.每秒寫資料多少
  • Blk_read (kB_read, MB_read),The total number of blocks (kilobytes, megabytes) read.interval間隔内讀資料總量
  • Blk_wrtn (kB_wrtn, MB_wrtn),The total number of blocks (kilobytes, megabytes) written. interval間隔内寫資料總量
  • rrqm/s,The number of read requests merged per second that were queued to the device.每秒合并讀操作的數目,當兩個讀操作讀相鄰的資料塊時,會被合并為一個請求,提高效率, 合并的操作通常是I/O scheduler(也叫elevator)負責的。(順序讀和随機讀對比,順序讀此值會比較大)[7]
  • wrqm/s,The number of write requests merged per second that were queued to the device. .每秒合并寫操作的數目,當兩個寫操作寫相鄰的資料塊時,會被合并為一個請求,提高效率(順序寫和随機寫對比,順序寫此值會比較大)[7]
  • r/s,The number (after merges) of read requests completed per second for the device.每秒讀操作次數
  • w/s,The number (after merges) of write requests completed per second for the device.每秒寫操作次數
  • rsec/s (rkB/s, rMB/s),The number of sectors (kilobytes, megabytes) read from the device per second.每秒讀資料量大小,可以以sector和kB,MB為機關
  • wsec/s (wkB/s, wMB/s),The number of sectors (kilobytes, megabytes) written to the device per second. 每秒寫資料量大小,可以以sector和kB,MB為機關
  • avgrq-sz,The average size (in sectors) of the requests that were issued to the device. 每個IO的平均扇區數,即所有請求的平均大小,以扇區(512位元組)為機關
  • avgqu-sz,The average queue length of the requests that were issued to the device.平均請求隊列長度
  • await,The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.平均每個io的時間(包括了在等列中等待的時間和磁盤處理的時間)[8]。如果I/O模式很随機(磁盤服務時間)、I/O負載(隊列中等待時間)比較高,會導緻磁頭亂跑,尋道時間長,那麼相應地await要估算得大一些;如果I/O模式是順序讀寫,隻有單一程序産生I/O負載,那麼尋道時間和旋轉延遲都可以忽略不計,主要考慮傳輸時間,相應地await就應該很小,甚至不到1毫秒, 對磁盤陣列來說,因為有硬體緩存,寫操作不等落盤就算完成,是以寫操作的service time大大加快了,如果磁盤陣列的寫操作不在一兩個毫秒以内就算慢的了;讀操作則未必,不在緩存中的資料仍然需要讀取實體硬碟,單個小資料塊的讀取速度跟單盤差不多[9]。
  • r_await,The average time (in milliseconds) for read requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. 平均每個讀io的時間(包括了在等列中等待的時間和磁盤處理的時間)
  • w_await,The average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. 平均每個寫io的時間(包括了在等列中等待的時間和磁盤處理的時間)
  • svctm,The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more.  This field will be removed in a future sysstat version.(已被廢棄)
  • %util,Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.工作時間或者繁忙時間占總時間的百分比,表示該裝置有I/O(即非空閑)的時間比率,不考慮I/O有多少,隻考慮有沒有.現代硬碟都有并行處理的功能,當%util為100%時并不能說明磁盤處理已飽和,達到瓶頸,因為可能可以接收更多的并行處理請求,iostat沒有衡量磁盤是否飽和的名額[9]。

  iostat的統計資料主要來源與檔案/proc/diskstats,檔案内容如下圖1.7所示:

Linux常用性能工具功能、用法及原理(一)

圖1.7 diskstats檔案内容

  每行各列的含義依序如下表1.1所示,各值為系統啟動到現在為止的累加值:

  

表1.1 diskstats含義表

Field Value Quoted 解釋
F1 253 major number 主裝置号
F2 minor mumber 從裝置号
F3 vda device name 裝置名
F4(rd_ios) 1012759
reads completed       
讀完成次數
F5(rd_merges) 3418
reads merged      
讀合并次數
F6(rd_sectors) 48466695
sectors read      
讀取的扇區數
F7(rd_ticks) 4027016
milliseconds spent reading      
所有讀操作時間的累加值(ms機關)
F8(wr_ios) 19073830
writes completed      
寫操作次數
F9(writes merged) 33764041 writes merged 寫合并次數
F10(wr_sectors) 2195716912
sectors written      
寫入的扇區數
F11(wr_ticks) 120769824
milliseconds spent writing      
所有寫操作消耗的時間累加(ms機關)
F12(in_flight)
I/Os currently in progress      
未完成io的數目
F13(io_ticks) 10982660
milliseconds spent doing I/Os      
該裝置用于處理I/O的自然時間(wall-clock time)
F14(time_in_queue) 124778072
weighted # of milliseconds spent doing I/Os      
對字段F13(io_ticks)的權重值

  in_flight表示系統未完成io的任務數,當io進入請求隊列時加1(I/O請求進入隊列時,而不是送出給硬碟裝置時),當io任務完成時減1, in_flight包括在隊列中等待和正在進行io的任務。

  rd_ticks和wr_ticks是把每一個IO消耗時間累加起來,但是硬碟裝置一般可以并行處理多個IO,是以,rd_ticks和wr_ticks之和一般會比自然時間(wall-clock time)要大。而io_ticks 不關心隊列中有多少個IO在排隊,它隻關心裝置有IO的時間。即不考慮IO有多少,隻考慮IO有沒有。在實際運算中,in_flight不是0的時候保持計時,而in_flight 等于0的時候,時間不累加到io_ticks。

  其中,io_ticks這個字段被iostat用來計算%util,而time_in_queue這個字段被iostat用來計算avgqu-sz,即平均隊列長度。[9]

1.4 netstat

  netstat主要用來檢視網絡連接配接、路由表、網卡統計資訊。

1.4.1 檢視網絡連接配接(等價ss)

  用法:netstat [address_family_options] [--tcp|-t] [--udp|-u] [--udplite|-U] [--sctp|-S] [--raw|-w] [--listening|-l] [--all|-a] [--numeric|-n] [--numeric-hosts] [--numeric-ports] [--numeric-users] [--symbolic|-N] [--extend|-e[--extend|-e]] [--timers|-o] [--program|-p] [--verbose|-v] [--continuous|-c] [--wide|-W] [delay]

  其中-t,-u等表示網絡活動的協定,-l表示socket處于listening狀态,-a表示顯示所有狀态(包括listening狀态和非listening狀态),-n表示主機、端口、使用者用數字表示,不用解析成名稱,-e表示詳細資訊,-p顯示socket的程序pid和名稱,-c會每秒連續輸出網絡資訊。執行結果如下圖1.8所示:

  

Linux常用性能工具功能、用法及原理(一)

圖1.8 netstat檢視網絡連接配接

  部分字段含義如下:

  • Recv-Q.
    • Established: The count of bytes not copied by the user program connected to this socket.  指收到的資料還在緩存中,還沒被程序讀取,這個值就是還沒被程序讀取的 byte數。
    • Listening: Since Kernel 2.6.18 this column contains the current syn backlog. syn backlog表示處于半連接配接狀态的隊列,接收到用戶端的syn請求,則會進入此隊列。
  • Send-Q.
    • Established: The count of bytes not acknowledged by the remote host.  發送隊列中沒有被遠端主機确認的 bytes 數。
    • Listening: Since Kernel 2.6.18 this column contains the maximum size of the syn backlog.半連接配接隊列的最大大小
  • State.The state of the socket. Since there are no states in raw mode and usually no states used in UDP and UDPLite, this column may be left blank. Normally this can be one of several values:
    • ESTABLISHED.The socket has an established connection.三次握手,第三次握手發送ack後的狀态
    • SYN_SENT.The socket is actively attempting to establish a connection.發送syn後的狀态
    • SYN_RECV.A connection request has been received from the network.接收到syn+ack後的狀态
    • FIN_WAIT1.The socket is closed, and the connection is shutting down.socket被關閉,正在關閉連接配接(四次揮手,第一次揮手,主動發送FIN後的狀态)
    • FIN_WAIT2.Connection is closed, and the socket is waiting for a shutdown from the remote end.連接配接關閉,等待遠端關閉信号(四次揮手,第二次揮手,接收到FIN的響應ACK後的狀态)
    • TIME_WAIT.The socket is waiting after close to handle packets still in the network. FIN_WAIT2狀态接收到遠端的FIN包,變為TIME_WAIT狀态(四次揮手,收到第三次揮手FIN信号)
    • CLOSE .The socket is not being used.
    • CLOSE_WAIT.The remote end has shut down, waiting for the socket to close.對端已關閉,但本端向遠端仍可以發送資料。(四次揮手,第二次揮手,向對端發送FIN信号的響應)
    • LAST_ACK.The remote end has shut down, and the socket is closed. Waiting for acknowledgement. CLOSE_WAIT狀态下,向遠端發送FIN信号(四次揮手,第三次揮手,向對端發送FIN)
    • LISTEN.The socket is listening for incoming connections.  Such sockets are not included in the output unless you specify the --listening (-l) or --all (-a) option.
    • CLOSING.Both sockets are shut down but we still don\'t have all our data sent.
    • UNKNOWN.The state of the socket is unknown.

  Recv-Q和Send-Q一般情況下為0值,可接收短暫情況情況下為非0情況,假如長時間為非0,表明隊列有堆積,假如時Recv-Q為非0,應用程式接收不急,可能是拒絕服務攻擊,假如是Send-Q為非0,可能是應用程式發送程序和服務端接收程序速度不比對,發送太快或接收太慢[13]。

  netstat的資訊來源于/proc/net檔案夾,關于tcp連接配接的資訊來源于/proc/net/tcp檔案,内容如下圖1.9所示:

  

Linux常用性能工具功能、用法及原理(一)

圖1.9 /proc/net/tcp檔案

1.4.2 檢視路由表資訊(等價ip route)

  用法:netstat {--route|-r} [address_family_options] [--extend|-e[--extend|-e]] [--verbose|-v] [--numeric|-n] [--numeric-hosts] [--numeric-ports] [--numeric-users] [--continuous|-c] [delay]

  執行結果如下圖1.10所示,路由表資訊來源于/proc/net/route檔案:

  

Linux常用性能工具功能、用法及原理(一)

圖1.10 核心路由表

1.4.3 檢視網卡統計資訊(等價ip -s link)

  用法:netstat {--interfaces|-I|-i} [--all|-a] [--extend|-e] [--verbose|-v] [--program|-p] [--numeric|-n] [--numeric-hosts] [--numeric-ports] [--numeric-users] [--continuous|-c] [delay]

  執行結果如下圖1.11:

  

Linux常用性能工具功能、用法及原理(一)

圖1.11 網卡活動統計

1.5 sar

  sar(System Activity Reporter)用于統計和收集系統活動,包括CPU、記憶體、網絡、IO等多方面,是一個比較全面的工具。sar收集系統資料儲存在/var/log/sa/sadd(dd為日期)檔案。

1.5.1 檢視CPU使用情況

  懷疑CPU存在瓶頸,可用 sar -u和 sar -q 等來檢視。sar -u和mpstat相似,字段含義參考1.1.2。sar -p用于檢視隊列長度和負載資訊,執行結果如下圖1.12:

Linux常用性能工具功能、用法及原理(一)

圖1.12 sar -q執行結果

  

  • runq-sz.Run queue length (number of tasks waiting for run time).運作态和就緒态任務數
  • plist-sz.Number of tasks in the task list.處于任務隊列的任務總數
  • ldavg-1.System load average for the last minute.  The load  aver‐age  is  calculated  as the average number of runnable or running tasks (R state), and the number of tasks in unin‐terruptible sleep (D state) over the specified interval.過去一分鐘負載(處于運作态和就緒态及不可中斷狀态任務數)
  • ldavg-5.System load average for the past 5 minutes.過去五分鐘負載
  • ldavg-15.System load average for the past 15 minutes.過去十五分鐘負載
  • Blocked.Number  of  tasks  currently  blocked, waiting for I/O to complete.等待io完成的處于阻塞态任務數。

1.5.2 檢視記憶體使用情況

  懷疑記憶體存在瓶頸,可用sar -B、sar -r 和 sar -W 等來檢視,Sar -B,統計頁交換情況。執行結果如圖1.13:

  

Linux常用性能工具功能、用法及原理(一)

圖1.13 sar -B執行結果

  • pgpgin/s,Total number of kilobytes the system paged in from disk per second.每秒從磁盤讀入頁的大小。
  • pgpgout/s,Total  number  of  kilobytes the system paged out to disk per second.每秒寫入磁盤的頁大小。
  • fault/s,Number of page faults (major + minor) made by the  system per second.  This is not a count of page faults that generate I/O, because some page faults can be resolved without I/O.包括major和minor,major缺頁需要進行磁盤io,但minor不需要,minor falut頁在記憶體中,隻是還未指定到程序位址空間,比如多程序代碼共享,此時不需要重新從磁盤讀入代碼。
  • majflt/s,Number  of  major  faults the system has made per second,those which have required  loading  a  memory  page  from disk. 每秒産生的要進行磁盤IO的缺頁次數[16]
  • pgfree/s,Number of pages placed on the free list by the system per second. 每秒放置在可用清單中的頁數。
  • pgscank/s,Number of pages scanned by the kswapd daemon per second. 表示的是Linux當中的負責記憶體回收的線程kswapd掃描的頁的數量, 
  • pgscand/s,Number of pages scanned directly per second. 每秒直接掃描的頁數
  • pgsteal/s,Number of pages  the  system  has  reclaimed  from  cache (pagecache  and swapcache) per second to satisfy its memory demands. 系統滿足自身的記憶體需要,每秒從緩存回收的頁數.
  • %vmeff,Calculated as pgsteal / pgscan, this is a metric  of  the efficiency  of  page  reclaim.  If  it  is near 100% then almost every page coming off the  tail  of  the  inactive list  is being reaped. If it gets too low (e.g. less than 30%) then the virtual memory is having  some  difficulty. This  field  is  displayed  as zero if no pages have been scanned during the interval of time. pgscan= pgscank/s+ pgscand/s,衡量分頁回收有效性的名額,理想情況應該是100%或0(未發生頁面掃描時), 表示被掃描的頁當中, 有多少頁的資料被踢出去了換成其他頁資料了[17]

  page fault和swap不同,當要通路的代碼或資料沒有虛拟位址和實體位址對應,會産生page fault;當記憶體空間不足,需要釋放部分記憶體空間加載其它資料時,會将記憶體中的部分頁swap到磁盤。[18]

  sar -r統計記憶體使用情況。執行結果如下圖1.14所示:

  

Linux常用性能工具功能、用法及原理(一)

圖1.14 sar -r執行結果

  • kbcommit.Amount of memory in kilobytes needed for current workload. This is an estimate of  how  much RAM/swap is needed to guarantee that there never is out of memory.已經申請的記憶體的大小(可能未配置設定),對應/proc/meminfo中的Committed_AS
  • %commit.Percentage  of  memory needed for current workload in relation to the total amount of memory (RAM+swap).  This number may be greater than 100% because  the  kernel  usually  overcommits memory. Kbcommit和RAM+swap的百分比,可能大于100%,核心允許申請的記憶體大小大于總記憶體大小。[19]
  • kbactive.Amount  of  active  memory in kilobytes (memory that has been used more recently and usually not reclaimed unless absolutely necessary).活躍記憶體大小
  • kbinact.Amount of inactive memory in kilobytes (memory which has been less recently used. It is more eligible to be reclaimed for other purposes).不活躍記憶體大小
  • kbdirty.Amount of memory in kilobytes waiting to get written back to the disk. 等待寫回硬碟的記憶體數量

  sar -W統計swap in和out資訊,執行結果如下圖1.15所示:

  

Linux常用性能工具功能、用法及原理(一)

圖1.15 sar -W執行結果

  • pswpin/s,Total number of swap pages the system brought in per second.每秒swap in數目
  • pswpout/s,Total number of swap pages the system brought out per second.每秒swap out數目

1.5.3 檢視裝置IO情況

  懷疑I/O存在瓶頸,可用 sar -b和 sar -d 等來檢視,sar -b統計IO和傳輸速率資訊,執行結果如下圖1.16所示:

Linux常用性能工具功能、用法及原理(一)

圖1.16 sar -b執行結果

  • tps.Total  number of transfers per second that were issued to physical devices.  A transfer is an I/O request to a physical device. Multiple logical requests can be combined into a single I/O request to the device. A transfer is of indeterminate size.每秒IO請求數
  • rtps.Total number of read requests per second issued to physical devices.都請求數
  • wtps.Total number of write requests per second issued to physical devices.寫請求數
  • bread/s.Total amount of data read from the devices in blocks per second.  Blocks are equivalent to sectors with 2.4 kernels and newer and therefore have a size of 512 bytes. With older kernels,  a  block is of indeterminate size.每秒讀取資料塊個數
  • bwrtn/s.Total amount of data written to devices in blocks per second.每秒寫入磁盤的資料塊數.

  sar -d統計磁盤塊活動,執行結果如下圖1.17所示,和iostat字段含義類似。

  

Linux常用性能工具功能、用法及原理(一)

圖1.17 sar -d執行結果

1.5.4 檢視網絡流量情況

   sar –n用于統計網絡接收包資訊,用法如下sar -n { keyword [,...] | ALL },關鍵有DEV, EDEV, FC, ICMP, EICMP, ICMP6, EICMP6, IP, EIP, IP6, EIP6, NFS, NFSD, SOCK, SOCK6, SOFT, TCP, ETCP, UDP and UDP6,分别從不同的角度展示網絡統計資訊,這裡主要說明下DEV關鍵字,sar –n DEV執行結果如下圖1.18所示:

Linux常用性能工具功能、用法及原理(一)

圖1.18 DEV結果

  • IFACE.Name of the network interface for which statistics are reported.網絡接口名
  • rxpck/s.Total number of packets received per second.每秒收包數
  • txpck/s.Total number of packets transmitted per second.每秒發包數
  • rxkB/s.Total number of kilobytes received per second.每秒收到的kB數
  • txkB/s.Total number of kilobytes transmitted per second.每秒發送的kB數
  • rxcmp/s.Number of compressed packets received per second (for cslip etc.).每秒收到的壓縮資料包
  • txcmp/s. Number of compressed packets transmitted per second.每秒發送的壓縮資料包
  • rxmcst/s. Number of multicast packets received per second.每秒收到的廣播資料包數

  sar性能資料來源于/var/log/sa/saDD、/var/log/sa/saYYYYMMDD、以及/proc 和 /sys中的檔案。

參考文獻

[1].  Uninterruptible Sleep.https://eklitzke.org/uninterruptible-sleep

[2].  Checking if errno != EINTR: what does it mean?https://***.com/questions/41474299/checking-if-errno-eintr-what-does-it-mean

[3].  Linux Performance Measurements using vmstat。https://www.thomas-krenn.com/en/wiki/Linux_Performance_Measurements_using_vmstat

[4].  Why CPU spent time on IO(wa)? https://serverfault.com/questions/684339/why-cpu-spent-time-on-iowa

[5].  Understanding CPU Steal Time - when should you be worried? http://blog.scoutapp.com/articles/2013/07/25/understanding-cpu-steal-time-when-should-you-be-worried

[6].  mpstat指令和/proc/stat檔案. https://yq.aliyun.com/articles/53583/

[7].  Interpreting iostat Output. https://blog.serverfault.com/2010/07/06/777852755/

[8].  容易被誤讀的IOSTAT. http://linuxperf.com/?p=156

[9].  深入了解iostat. http://bean-li.github.io/dive-into-iostat/

[10].  深入分析diskstats.http://ykrocku.github.io/blog/2014/04/11/diskstats/

[11]. I/O statistics fields. https://www.kernel.org/doc/Documentation/iostats.txt

[12]. 深入了解Linux TCP backlog. https://www.jianshu.com/p/7fde92785056

[13]. netstat Recv-Q和Send-Q. https://blog.csdn.net/sjin_1314/article/details/9853163

[14]. netstat用法及TCP state解析. https://www.cnblogs.com/vigarbuaa/archive/2012/03/07/2383064.html

[15]. /proc/net/tcp中各項參數說明. https://blog.csdn.net/justlinux2010/article/details/21028797

[16]. What does the fields in sar -B output mean? https://serverfault.com/questions/270283/what-does-the-fields-in-sar-b-output-mean

[17]. 如何控制Linux清理cache機制. https://www.zhihu.com/question/59053036/answer/171176545

[18]. Understanding page faults and memory swap-in/outs: when should you worry? http://blog.scoutapp.com/articles/2015/04/10/understanding-page-faults-and-memory-swap-in-outs-when-should-you-worry

[19].了解LINUX的MEMORY OVERCOMMIT. http://linuxperf.com/?p=102