Linux系统及应用问题分析排查工具

linux服务器上经常遇到一些系统和应用上的问题，如何分析排查，需要利器，下面总结列表了一些常用工具、trace tool；最后也列举了最近hadoop社区在开发发展的分布式系统的trace tool。

引用linux-performance-analysis-and-tools中图片，说明这些tool试用层次位置

uname -a 或 cat /proc/version #print system information

linux hadoopst2.cm6 2.6.18-164.el5 #1 smp tue aug 18 15:51:48 edt 2009 x86_64 x86_64 x86_64 gnu/linux

uptime

15:42:46 up 674 days, 6 min, 35 users, load average: 1.30, 5.97, 11.53

cat /etc/redhat-release

red hat enterprise linux server release 5.4 (tikanga)

lsb_release

lsb version: :core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch

cat /proc/cpuinfo

cat /proc/meminfo

lspci - list all pci devices

lsusb - list usb devices

last, lastb - show listing of last logged in users

lsmod — show the status of modules in the linux kernel

modprobe - add and remove modules from the linux kernel

to print a process tree: ps -ejh / ps axjf

to get info about threads: ps -elf / ps axms

ulimit -a

lsof - list open files, unix一切皆文件

lsof -p pid

rpm/yum

rpm -qf file #文件所属rpm包

rpm -ql rpm #rpm包含文件

/var/log/yum.log #yum 更新包日志

/etc/xxx #系统级程序配置目录，如

/etc/yum.repos.d/ yum源配置

/var/log/xxx #日志目录，如

/var/log/cron #crontab日志，可以查看调度执行情况

ntpd - network time protocol (ntp) daemon，同步集群中机器时间

squid - proxy caching server，集群webui的代理

mpstat - report processors related statistics. 注意%sys %iowait值

vmstat - report virtual memory statistics

iostat - report central processing unit (cpu) statistics and input/output statistics for devices and partitions.

netstat - print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships

netstat -atpn | grep pid

ganglia - a scalable distributed monitoring system for high-performance computing systems such as clusters and grids.

sar/tsar - collect, report, or save system activity information; tsar是淘宝自己改进的版本

定时采样（每分钟），可查历史记录（默认5分钟），可弥补ganglia显示更详细信息

iftop - the "top" bandwidth consumers shown. iftop wiki

iotop

vmtouch, portable file system cache diagnostics and control

telnet/nc ip port - 确认目标端口是否可访问，只ping通不一定端口可访问，可能防火墙等禁止

ifconfig/ifup/ifdown - configure a network interface

traceroute - print the route packets trace to network host

nslookup - query internet name servers interactively

tcpdump - dump traffic on a network，类似开源工具 wireshark, netsniff-ng, 更多工具比较

lynx - a general purpose distributed information browser for the world wide web

tcpcp - allows cooperating applications to pass ownership of tcp connection endpoints from one linux host to another one.

ldconfig - configure dynamic linker run time bindings

ldconfig -p | grep so 查看so是否在link cache中

ldd - print shared library dependencies，查看exe或so依赖的so

nm - list symbols from object files，可grep查找是否存在相关的symbol，是否undefined.

readelf - displays information about elf files. 可现实elf相关信息，如32/64位，适用的os，处理器

gdb

cat /proc/$pid/[cmdline|environ|limits|status|...] - 进程相关信息

pstack - print a stack trace of a running process

pmap - report memory map of a process

jdk tools and utilities

java troubleshooting tools

jinfo - print java process information, 如classpath，java.libary.path（jni so目录）

jstack - print a stack trace of a running java process，可查看死锁情况

jmap - report memory map of a java process

jmap -histo:live 可触发full gc

jmap -dump:live,file=$file 可dump heap内存，用于jhat等工具debug分析object在heap的占用情况

jhat - heap dump browser - starts a web server on a heap dump file (eg, produced by jmap -dump), allowing the heap to be browsed.

起http服务，浏览器访问查看

-j-mxxxxm ，分析大文件时需要加大heap大小

若有对象数据超大或内存占用过多，极有可能memory leak

memory analyzer (mat) - eclipse plugin，java heap analyzer

可视化工具，但受到机器内存的限制，无法分析太大的heap dump file

jdb - 可起服务做server，eclipse等工具远程连接调试

jstat - java virtual machine statistics monitoring tool

jstatd - virtual machine jstat daemon，可配合jvisualvm

jvisualvm - java virtual machine monitoring, troubleshooting, and profiling tool；可远程连接jstatd/jmx, 可视化展示工具：演示

jvmtop - in a top-like manner, displays jvm internal metrics (e.g. memory information) of running java processes.

jvm performance optimization jvm开发者写的优化文章

overview

compilers

garbage collection

concurrently compacting gc

scalability

hprof - heap profiler： java -agentlib:hprof

写log，但系统在线或无法源码时

strace - trace system calls and signals

示例：strace/ltrace的应用实例

示例：可跟踪系统调用时间，如机器cpu:%sys高的问题

blktrace, generate traces of the i/o traffic on block devices

ltrace - a library call tracer

xtrace

gprof - a performance analysis tool, sampling and call-graph profiling

valgrind - an instrumentation framework for building dynamic analysis tools. automatically detect many memory management and threading bugs, and profile your programs in detail

systemtap - a simple command line interface and scripting language for writing instrumentation for a live running kernel plus user-space applications for complex tasks that may require live analysis, programmable on-line response, and whole-system symbolic access.

linux版dtrace（sun在solaris上开发的）

功能强大，kernel， user-space app，cross language（java perl python ruby），build-in markers（pg mysql）

can write and reuse simple scripts to deeply examine the activities of a live system

data can be extracted, filtered, and summarized quickly and safely, to enable diagnoses of complex performance or functional problems

丰富的 "tapset" script library

btrace - dynamic tracing tool for the java platform. userguide

基于动态字节码修改技术(hotswap)来实现运行时java程序的跟踪和替换, 实现原理

btrace使用总结

详细介绍

byteman - simplifies tracing and testing of java programs. can modify a running application without needing to stop and restart it.

define rules specifying the side effects you want to inject 而 btrace类java语法

dapper, a large-scale distributed systems tracing infrastructure

x-trace, a network diagnostic tool designed to provide users and network operators with better visibility into increasingly complex internet applications.

htrace， a tracing framework intended for use with distributed systems written in java

add tracing to hdfs

update htrace for hbase

部分内容有引用微博其他童鞋的，如有问题可以及时联系。

Linux系统及应用问题分析排查工具

继续阅读

Java String.format方法的简单使用

neo4j之cypher使用文档

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

samba服务器的功能

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

【Linux】UDP广播报文接收速率问题

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

Linux设备模型（中）之上层容器

scala (3) Function 和 Method

PowerPC平台 Linux移植三