天天看點

調優 基礎 sysdig

1.小指令 sysdig

curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | bash

執行 sysdig -cl | less出現的結果

Category: Application

---------------------

httplog         HTTP requests log

httptop         Top HTTP requests

memcachelog     memcached requests log

Category: CPU Usage

-------------------

spectrogram     Visualize OS latency in real time.

subsecoffset    Visualize subsecond offset execution time.

topcontainers_cpu

                Top containers by CPU usage

topprocs_cpu    Top processes by CPU usage

Category: Errors

----------------

topcontainers_error

                Top containers by number of errors

topfiles_errors Top files by number of errors

topprocs_errors top processes by number of errors

Category: I/O

-------------

echo_fds        Print the data read and written by processes.

fdbytes_by      I/O bytes, aggregated by an arbitrary filter field

fdcount_by      FD count, aggregated by an arbitrary filter field

fdtime_by       FD time group by

iobytes         Sum of I/O bytes on any type of FD

iobytes_file    Sum of file I/O bytes

spy_file        Echo any read/write made by any process to all files. Optionall

                y, you can provide the name of one file to only intercept reads

                /writes to that file.

stderr          Print stderr of processes

stdin           Print stdin of processes

stdout          Print stdout of processes

topcontainers_file

                Top containers by R+W disk bytes

topfiles_bytes  Top files by R+W bytes

topfiles_time   Top files by time

topprocs_file   Top processes by R+W disk bytes

Category: Logs

--------------

spy_logs        Echo any write made by any process to a log file. Optionally, e

                xport the events around each log message to file.

spy_syslog      Print every message written to syslog. Optionally, export the e

                vents around each syslog message to file.

Category: Misc

around          Export to file the events around the time range where the given

                 filter matches.

Category: Net

iobytes_net     Show total network I/O bytes

spy_ip          Show the data exchanged with the given IP address

spy_port        Show the data exchanged using the given IP port number

topconns        Top network connections by total bytes

topcontainers_net

                Top containers by network I/O

topports_server Top TCP/UDP server ports by R+W bytes

topprocs_net    Top processes by network I/O

Category: Performance

bottlenecks     Slowest system calls

fileslower      Trace slow file I/O

netlower        Trace slow network I/0

proc_exec_time  Show process execution time

scallslower     Trace slow syscalls

topscalls       Top system calls by number of calls

topscalls_time  Top system calls by time

Category: Security

------------------

list_login_shells

                List the login shell IDs

shellshock_detect

                print shellshock attacks

spy_users       Display interactive user activity

Category: System State

----------------------

lscontainers    List the running containers

lsof            List (and optionally filter) the open file descriptors.

netstat         List (and optionally filter) network connections.

ps              List (and optionally filter) the machine processes.

Category: Tracers

-----------------

tracers_2_statsd

                Export spans duration as statds metrics.

Use the -i flag to get detailed information about a specific chisel

2.sysdig案例分析 - 用fdbytes_by chisel來分析磁盤I/O活動

http://shanker.blog.51cto.com/1189689/1771418

今天來分享一下fdbytes_by的用法,該案例可以探測到系統的那個檔案的I/O占用最高(不光是file,還可以是network I/O),而且可以查到哪個程序在讀寫該檔案,并且可以檢視到核心級的I/O活動明細。應用場景可以觀察一下你的檔案系統是否是在高效運轉,或者調查一個磁盤I/O延遲的故障。配合dstat --top-io可以更容易定位到程序名字,但是今天介紹的主要是sysdig的fdbytes_by chisel用法,可以想象成沒有dstat工具可用的場景下

首先我們先來看一下今天的主角fdbytes_by的用法明細:

# sysdig -i fdbytes_by 

Groups FD activity based on the given filter field, and returns the key that ge

nerated the most input+output bytes. For example, this script can be used to li

st the processes or TCP ports that generated most traffic.

Args:

[string] key - The filter field used for grouping

答題意思是以檔案描述符的各種活動所産生的IO大小來進行排序。

2.1 首先我們來抓取30M的sysdig包來用分析使用。

sysdig -w fdbytes_by.scap -C 30

2.2 然後我們來分析這次抓包沒個檔案描述符對檔案系統的I/O活動:

sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.type

Bytes               fd.type             

--------------------------------------------------------------------------------

45.16M              file

9.30M               ipv4

87.55KB             unix

316B                <NA>

60B                 pipe

可以看到file占用的45.16M,是最大的FD,

2.3然後我們來看一下按目錄的I/O活動來排序

# sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.directory

Bytes               fd.directory        

38.42M              /etc

7.59M               /

5.04M               /var/www/html

1.38M               /var/log/nginx

304.73KB            /root/.zsh_history/root

7.31KB              /lib/x86_64-linux-gnu

2.82KB              /dev

2.76KB              /dev/pts

1.62KB              /usr/lib/x86_64-linux-gnu

發現通路最多的是/etc目錄

2.4 那我們看一下,具體通路的是哪個檔案呢

# sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.name fd.directory=/etc

Bytes               fd.name             

38.42M              /etc/services

2.5 Bingo!找到了,原來是/etc/services被通路的最多,因為services是系統檔案,是以可以判斷肯定是read的操作達到了38.42M,那我們來看一下哪個程序通路的此檔案呢?

# sysdig -r fdbytes_by.scap0 -c fdbytes_by proc.name "fd.filename=services and fd.directory=/etc"

Bytes               proc.name           

38.42M              nscd

2.6 找到元兇了,原來是nscd緩存程式,那他為什麼會讀取這麼多次的services檔案呢?在繼續看:

# sysdig -r fdbytes_by.scap0 -A -s 4096 -c echo_fds proc.name=nscd 

原來是nscd在讀取services中定義的端口跟服務名稱之間的關系,我在抓包的過程中是運作了ab做nginx的靜态頁面壓力測試,本來希望看到的是nginx的讀寫會很高,沒想到中途出現了這個nscd來搗亂:

ab -k -c 2000 -n 300000 

http://shanker.heyoa.com/index.html

# sysdig -r fdbytes_by.scap0 -c topprocs_file 

Bytes               Process             PID                 

38.42M              nscd                1343

6.43M               nginx               4804

304.89KB            zsh                 32402

9.20KB              ab                  20774

2.79KB              screen              18338

2.37KB              sshd                12812

後來我分别測試了一下開啟nscd的情況下ab的測試時間,和不開nscd做緩存的情況下,确實開啟nscd做本地services的緩存會提高10.189%。

ab -k -c 2000 -n 300000 http://shanker.heyoa.com/index.html  0.94s user 2.77s system 9% cpu 38.561 total

ab -k -c 2000 -n 300000 http://shanker.heyoa.com/index.html  0.93s user 2.79s system 10% cpu 34.632 total

nscd緩存加速可以參考之前的這篇文章

http://shanker.blog.51cto.com/1189689/1735058

至此,整個分析就結束了,本文隻是一個例子,跟大家分享如何使用chisel的fdbytes_by,sysdig還提供了很多chisel共大家分析系統。

3.性能調優之綜合篇 - Linux系統性能監控和故障排查利器Sysdig

http://shanker.blog.51cto.com/1189689/1768735

Sysdig最新版提供了Docker容器鏡像,可以很友善的直接拉取Docker鏡像,另一方它提供容器級别的資訊采集指令(sysdig -pc container.name=your_container_name),支援查詢指定容器之間的網絡流量、指定容器的CPU使用率等。

公司旗下的商用軟體Sysdig Cloud則是容器級别的系統資訊和網絡流量監控、調試軟體,這個在CoreOS Fest 大會上有介紹,它支援Real-Time Dashboard, Historical Replay, Dynamic Topology and Intelligent Alert, 可以想象成Nagios對系統的監控

軟體安裝請參考官方文檔:http://www.sysdig.org/install/ 相對于SystemTap的安裝Sysdig更容易些,本篇文章有點長就不浪費在安裝上了,熟悉Ansible的可以去直接用sysdig的Galaxy:https://galaxy.ansible.com/detail#/role/692

Sysdig的文法在record 和replay系統跟蹤方面跟Tcpdump和perf很像;在系統性能分析方面的文法chisels又跟SystemTap和dstat的--top*很像,隻不過SystemTap需要自己寫tap(代碼寫好了,比Sysdig強大), Sysdig是已經幫你寫好了;在互動式使用方面又跟htop很像。

最簡單的使用方法是直接輸入sysdig, 他會捕獲系統的每一個事件并且直接輸出到螢幕。

本文轉自 liqius 51CTO部落格,原文連結:http://blog.51cto.com/szgb17/1889311,如需轉載請自行聯系原作者

繼續閱讀