linux服务器上经常遇到一些系统和应用上的问题,如何分析排查,需要利器,下面总结列表了一些常用工具、trace tool;最后也列举了最近hadoop社区在开发发展的分布式系统的trace tool。
引用linux-performance-analysis-and-tools中图片,说明这些tool试用层次位置
uname -a 或 cat /proc/version #print system information
linux hadoopst2.cm6 2.6.18-164.el5 #1 smp tue aug 18 15:51:48 edt 2009 x86_64 x86_64 x86_64 gnu/linux
uptime
15:42:46 up 674 days, 6 min, 35 users, load average: 1.30, 5.97, 11.53
cat /etc/redhat-release
red hat enterprise linux server release 5.4 (tikanga)
lsb_release
lsb version: :core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch
cat /proc/cpuinfo
cat /proc/meminfo
lspci - list all pci devices
lsusb - list usb devices
last, lastb - show listing of last logged in users
lsmod — show the status of modules in the linux kernel
modprobe - add and remove modules from the linux kernel
ps
to print a process tree: ps -ejh / ps axjf
to get info about threads: ps -elf / ps axms
ulimit -a
lsof - list open files, unix一切皆文件
lsof -p pid
rpm/yum
rpm -qf file #文件所属rpm包
rpm -ql rpm #rpm包含文件
/var/log/yum.log #yum 更新包日志
/etc/xxx #系统级程序配置目录, 如
/etc/yum.repos.d/ yum源配置
/var/log/xxx #日志目录, 如
/var/log/cron #crontab日志,可以查看调度执行情况
ntpd - network time protocol (ntp) daemon,同步集群中机器时间
squid - proxy caching server,集群webui的代理
mpstat - report processors related statistics. 注意%sys %iowait值
vmstat - report virtual memory statistics
iostat - report central processing unit (cpu) statistics and input/output statistics for devices and partitions.
netstat - print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
netstat -atpn | grep pid
ganglia - a scalable distributed monitoring system for high-performance computing systems such as clusters and grids.
sar/tsar - collect, report, or save system activity information; tsar是淘宝自己改进的版本
定时采样(每分钟),可查历史记录(默认5分钟),可弥补ganglia显示更详细信息
iftop - the "top" bandwidth consumers shown. iftop wiki
iotop
vmtouch, portable file system cache diagnostics and control
telnet/nc ip port - 确认目标端口是否可访问,只ping通不一定端口可访问,可能防火墙等禁止
ifconfig/ifup/ifdown - configure a network interface
traceroute - print the route packets trace to network host
nslookup - query internet name servers interactively
tcpdump - dump traffic on a network, 类似开源工具 wireshark, netsniff-ng, 更多工具比较
lynx - a general purpose distributed information browser for the world wide web
tcpcp - allows cooperating applications to pass ownership of tcp connection endpoints from one linux host to another one.
ldconfig - configure dynamic linker run time bindings
ldconfig -p | grep so 查看so是否在link cache中
ldd - print shared library dependencies, 查看exe或so依赖的so
nm - list symbols from object files,可grep查找是否存在相关的symbol,是否undefined.
readelf - displays information about elf files. 可现实elf相关信息,如32/64位,适用的os,处理器
gdb
cat /proc/$pid/[cmdline|environ|limits|status|...] - 进程相关信息
pstack - print a stack trace of a running process
pmap - report memory map of a process
jdk tools and utilities
java troubleshooting tools
jinfo - print java process information, 如classpath,java.libary.path(jni so目录)
jstack - print a stack trace of a running java process,可查看死锁情况
jmap - report memory map of a java process
jmap -histo:live 可触发full gc
jmap -dump:live,file=$file 可dump heap内存,用于jhat等工具debug分析object在heap的占用情况
jhat - heap dump browser - starts a web server on a heap dump file (eg, produced by jmap -dump), allowing the heap to be browsed.
起http服务,浏览器访问查看
-j-mxxxxm ,分析大文件时需要加大heap大小
若有对象数据超大或内存占用过多,极有可能memory leak
memory analyzer (mat) - eclipse plugin,java heap analyzer
可视化工具,但受到机器内存的限制,无法分析太大的heap dump file
jdb - 可起服务做server,eclipse等工具远程连接调试
jstat - java virtual machine statistics monitoring tool
jstatd - virtual machine jstat daemon,可配合jvisualvm
jvisualvm - java virtual machine monitoring, troubleshooting, and profiling tool;可远程连接jstatd/jmx, 可视化展示工具:演示
jvmtop - in a top-like manner, displays jvm internal metrics (e.g. memory information) of running java processes.
jvm performance optimization jvm开发者写的优化文章
overview
compilers
garbage collection
concurrently compacting gc
scalability
hprof - heap profiler: java -agentlib:hprof
写log,但系统在线或无法源码时
strace - trace system calls and signals
示例:strace/ltrace的应用实例
示例:可跟踪系统调用时间,如机器cpu:%sys高的问题
blktrace, generate traces of the i/o traffic on block devices
ltrace - a library call tracer
xtrace
gprof - a performance analysis tool, sampling and call-graph profiling
valgrind - an instrumentation framework for building dynamic analysis tools. automatically detect many memory management and threading bugs, and profile your programs in detail
systemtap - a simple command line interface and scripting language for writing instrumentation for a live running kernel plus user-space applications for complex tasks that may require live analysis, programmable on-line response, and whole-system symbolic access.
linux版dtrace(sun在solaris上开发的)
功能强大,kernel, user-space app,cross language(java perl python ruby),build-in markers(pg mysql)
can write and reuse simple scripts to deeply examine the activities of a live system
data can be extracted, filtered, and summarized quickly and safely, to enable diagnoses of complex performance or functional problems
丰富的 "tapset" script library
btrace - dynamic tracing tool for the java platform. userguide
基于动态字节码修改技术(hotswap)来实现运行时java程序的跟踪和替换, 实现原理
btrace使用总结
详细介绍
byteman - simplifies tracing and testing of java programs. can modify a running application without needing to stop and restart it.
define rules specifying the side effects you want to inject 而 btrace类java语法
dapper, a large-scale distributed systems tracing infrastructure
x-trace, a network diagnostic tool designed to provide users and network operators with better visibility into increasingly complex internet applications.
htrace, a tracing framework intended for use with distributed systems written in java
add tracing to hdfs
update htrace for hbase
部分内容有引用微博其他童鞋的,如有问题可以及时联系。