1.小指令 sysdig
curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | bash
執行 sysdig -cl | less出現的結果
Category: Application
---------------------
httplog HTTP requests log
httptop Top HTTP requests
memcachelog memcached requests log
Category: CPU Usage
-------------------
spectrogram Visualize OS latency in real time.
subsecoffset Visualize subsecond offset execution time.
topcontainers_cpu
Top containers by CPU usage
topprocs_cpu Top processes by CPU usage
Category: Errors
----------------
topcontainers_error
Top containers by number of errors
topfiles_errors Top files by number of errors
topprocs_errors top processes by number of errors
Category: I/O
-------------
echo_fds Print the data read and written by processes.
fdbytes_by I/O bytes, aggregated by an arbitrary filter field
fdcount_by FD count, aggregated by an arbitrary filter field
fdtime_by FD time group by
iobytes Sum of I/O bytes on any type of FD
iobytes_file Sum of file I/O bytes
spy_file Echo any read/write made by any process to all files. Optionall
y, you can provide the name of one file to only intercept reads
/writes to that file.
stderr Print stderr of processes
stdin Print stdin of processes
stdout Print stdout of processes
topcontainers_file
Top containers by R+W disk bytes
topfiles_bytes Top files by R+W bytes
topfiles_time Top files by time
topprocs_file Top processes by R+W disk bytes
Category: Logs
--------------
spy_logs Echo any write made by any process to a log file. Optionally, e
xport the events around each log message to file.
spy_syslog Print every message written to syslog. Optionally, export the e
vents around each syslog message to file.
Category: Misc
around Export to file the events around the time range where the given
filter matches.
Category: Net
iobytes_net Show total network I/O bytes
spy_ip Show the data exchanged with the given IP address
spy_port Show the data exchanged using the given IP port number
topconns Top network connections by total bytes
topcontainers_net
Top containers by network I/O
topports_server Top TCP/UDP server ports by R+W bytes
topprocs_net Top processes by network I/O
Category: Performance
bottlenecks Slowest system calls
fileslower Trace slow file I/O
netlower Trace slow network I/0
proc_exec_time Show process execution time
scallslower Trace slow syscalls
topscalls Top system calls by number of calls
topscalls_time Top system calls by time
Category: Security
------------------
list_login_shells
List the login shell IDs
shellshock_detect
print shellshock attacks
spy_users Display interactive user activity
Category: System State
----------------------
lscontainers List the running containers
lsof List (and optionally filter) the open file descriptors.
netstat List (and optionally filter) network connections.
ps List (and optionally filter) the machine processes.
Category: Tracers
-----------------
tracers_2_statsd
Export spans duration as statds metrics.
Use the -i flag to get detailed information about a specific chisel
2.sysdig案例分析 - 用fdbytes_by chisel來分析磁盤I/O活動
http://shanker.blog.51cto.com/1189689/1771418
今天來分享一下fdbytes_by的用法,該案例可以探測到系統的那個檔案的I/O占用最高(不光是file,還可以是network I/O),而且可以查到哪個程序在讀寫該檔案,并且可以檢視到核心級的I/O活動明細。應用場景可以觀察一下你的檔案系統是否是在高效運轉,或者調查一個磁盤I/O延遲的故障。配合dstat --top-io可以更容易定位到程序名字,但是今天介紹的主要是sysdig的fdbytes_by chisel用法,可以想象成沒有dstat工具可用的場景下
首先我們先來看一下今天的主角fdbytes_by的用法明細:
# sysdig -i fdbytes_by
Groups FD activity based on the given filter field, and returns the key that ge
nerated the most input+output bytes. For example, this script can be used to li
st the processes or TCP ports that generated most traffic.
Args:
[string] key - The filter field used for grouping
答題意思是以檔案描述符的各種活動所産生的IO大小來進行排序。
2.1 首先我們來抓取30M的sysdig包來用分析使用。
sysdig -w fdbytes_by.scap -C 30
2.2 然後我們來分析這次抓包沒個檔案描述符對檔案系統的I/O活動:
sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.type
Bytes fd.type
--------------------------------------------------------------------------------
45.16M file
9.30M ipv4
87.55KB unix
316B <NA>
60B pipe
可以看到file占用的45.16M,是最大的FD,
2.3然後我們來看一下按目錄的I/O活動來排序
# sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.directory
Bytes fd.directory
38.42M /etc
7.59M /
5.04M /var/www/html
1.38M /var/log/nginx
304.73KB /root/.zsh_history/root
7.31KB /lib/x86_64-linux-gnu
2.82KB /dev
2.76KB /dev/pts
1.62KB /usr/lib/x86_64-linux-gnu
發現通路最多的是/etc目錄
2.4 那我們看一下,具體通路的是哪個檔案呢
# sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.name fd.directory=/etc
Bytes fd.name
38.42M /etc/services
2.5 Bingo!找到了,原來是/etc/services被通路的最多,因為services是系統檔案,是以可以判斷肯定是read的操作達到了38.42M,那我們來看一下哪個程序通路的此檔案呢?
# sysdig -r fdbytes_by.scap0 -c fdbytes_by proc.name "fd.filename=services and fd.directory=/etc"
Bytes proc.name
38.42M nscd
2.6 找到元兇了,原來是nscd緩存程式,那他為什麼會讀取這麼多次的services檔案呢?在繼續看:
# sysdig -r fdbytes_by.scap0 -A -s 4096 -c echo_fds proc.name=nscd
原來是nscd在讀取services中定義的端口跟服務名稱之間的關系,我在抓包的過程中是運作了ab做nginx的靜态頁面壓力測試,本來希望看到的是nginx的讀寫會很高,沒想到中途出現了這個nscd來搗亂:
ab -k -c 2000 -n 300000
http://shanker.heyoa.com/index.html
# sysdig -r fdbytes_by.scap0 -c topprocs_file
Bytes Process PID
38.42M nscd 1343
6.43M nginx 4804
304.89KB zsh 32402
9.20KB ab 20774
2.79KB screen 18338
2.37KB sshd 12812
後來我分别測試了一下開啟nscd的情況下ab的測試時間,和不開nscd做緩存的情況下,确實開啟nscd做本地services的緩存會提高10.189%。
ab -k -c 2000 -n 300000 http://shanker.heyoa.com/index.html 0.94s user 2.77s system 9% cpu 38.561 total
ab -k -c 2000 -n 300000 http://shanker.heyoa.com/index.html 0.93s user 2.79s system 10% cpu 34.632 total
nscd緩存加速可以參考之前的這篇文章
http://shanker.blog.51cto.com/1189689/1735058
至此,整個分析就結束了,本文隻是一個例子,跟大家分享如何使用chisel的fdbytes_by,sysdig還提供了很多chisel共大家分析系統。
3.性能調優之綜合篇 - Linux系統性能監控和故障排查利器Sysdig
http://shanker.blog.51cto.com/1189689/1768735
Sysdig最新版提供了Docker容器鏡像,可以很友善的直接拉取Docker鏡像,另一方它提供容器級别的資訊采集指令(sysdig -pc container.name=your_container_name),支援查詢指定容器之間的網絡流量、指定容器的CPU使用率等。
公司旗下的商用軟體Sysdig Cloud則是容器級别的系統資訊和網絡流量監控、調試軟體,這個在CoreOS Fest 大會上有介紹,它支援Real-Time Dashboard, Historical Replay, Dynamic Topology and Intelligent Alert, 可以想象成Nagios對系統的監控
軟體安裝請參考官方文檔:http://www.sysdig.org/install/ 相對于SystemTap的安裝Sysdig更容易些,本篇文章有點長就不浪費在安裝上了,熟悉Ansible的可以去直接用sysdig的Galaxy:https://galaxy.ansible.com/detail#/role/692
Sysdig的文法在record 和replay系統跟蹤方面跟Tcpdump和perf很像;在系統性能分析方面的文法chisels又跟SystemTap和dstat的--top*很像,隻不過SystemTap需要自己寫tap(代碼寫好了,比Sysdig強大), Sysdig是已經幫你寫好了;在互動式使用方面又跟htop很像。
最簡單的使用方法是直接輸入sysdig, 他會捕獲系統的每一個事件并且直接輸出到螢幕。
本文轉自 liqius 51CTO部落格,原文連結:http://blog.51cto.com/szgb17/1889311,如需轉載請自行聯系原作者