Linux系統及應用問題分析排查工具

linux伺服器上經常遇到一些系統和應用上的問題，如何分析排查，需要利器，下面總結清單了一些常用工具、trace tool；最後也列舉了最近hadoop社群在開發發展的分布式系統的trace tool。

引用linux-performance-analysis-and-tools中圖檔，說明這些tool試用層次位置

uname -a 或 cat /proc/version #print system information

linux hadoopst2.cm6 2.6.18-164.el5 #1 smp tue aug 18 15:51:48 edt 2009 x86_64 x86_64 x86_64 gnu/linux

uptime

15:42:46 up 674 days, 6 min, 35 users, load average: 1.30, 5.97, 11.53

cat /etc/redhat-release

red hat enterprise linux server release 5.4 (tikanga)

lsb_release

lsb version: :core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch

cat /proc/cpuinfo

cat /proc/meminfo

lspci - list all pci devices

lsusb - list usb devices

last, lastb - show listing of last logged in users

lsmod — show the status of modules in the linux kernel

modprobe - add and remove modules from the linux kernel

to print a process tree: ps -ejh / ps axjf

to get info about threads: ps -elf / ps axms

ulimit -a

lsof - list open files, unix一切皆檔案

lsof -p pid

rpm/yum

rpm -qf file #檔案所屬rpm包

rpm -ql rpm #rpm包含檔案

/var/log/yum.log #yum 更新包日志

/etc/xxx #系統級程式配置目錄，如

/etc/yum.repos.d/ yum源配置

/var/log/xxx #日志目錄，如

/var/log/cron #crontab日志，可以檢視排程執行情況

ntpd - network time protocol (ntp) daemon，同步叢集中機器時間

squid - proxy caching server，叢集webui的代理

mpstat - report processors related statistics. 注意%sys %iowait值

vmstat - report virtual memory statistics

iostat - report central processing unit (cpu) statistics and input/output statistics for devices and partitions.

netstat - print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships

netstat -atpn | grep pid

ganglia - a scalable distributed monitoring system for high-performance computing systems such as clusters and grids.

sar/tsar - collect, report, or save system activity information; tsar是淘寶自己改進的版本

定時采樣（每分鐘），可查曆史記錄（預設5分鐘），可彌補ganglia顯示更詳細資訊

iftop - the "top" bandwidth consumers shown. iftop wiki

iotop

vmtouch, portable file system cache diagnostics and control

telnet/nc ip port - 确認目标端口是否可通路，隻ping通不一定端口可通路，可能防火牆等禁止

ifconfig/ifup/ifdown - configure a network interface

traceroute - print the route packets trace to network host

nslookup - query internet name servers interactively

tcpdump - dump traffic on a network，類似開源工具 wireshark, netsniff-ng, 更多工具比較

lynx - a general purpose distributed information browser for the world wide web

tcpcp - allows cooperating applications to pass ownership of tcp connection endpoints from one linux host to another one.

ldconfig - configure dynamic linker run time bindings

ldconfig -p | grep so 檢視so是否在link cache中

ldd - print shared library dependencies，檢視exe或so依賴的so

nm - list symbols from object files，可grep查找是否存在相關的symbol，是否undefined.

readelf - displays information about elf files. 可現實elf相關資訊，如32/64位，适用的os，處理器

gdb

cat /proc/$pid/[cmdline|environ|limits|status|...] - 程序相關資訊

pstack - print a stack trace of a running process

pmap - report memory map of a process

jdk tools and utilities

java troubleshooting tools

jinfo - print java process information, 如classpath，java.libary.path（jni so目錄）

jstack - print a stack trace of a running java process，可檢視死鎖情況

jmap - report memory map of a java process

jmap -histo:live 可觸發full gc

jmap -dump:live,file=$file 可dump heap記憶體，用于jhat等工具debug分析object在heap的占用情況

jhat - heap dump browser - starts a web server on a heap dump file (eg, produced by jmap -dump), allowing the heap to be browsed.

起http服務，浏覽器通路檢視

-j-mxxxxm ，分析大檔案時需要加大heap大小

若有對象資料超大或記憶體占用過多，極有可能memory leak

memory analyzer (mat) - eclipse plugin，java heap analyzer

可視化工具，但受到機器記憶體的限制，無法分析太大的heap dump file

jdb - 可起服務做server，eclipse等工具遠端連接配接調試

jstat - java virtual machine statistics monitoring tool

jstatd - virtual machine jstat daemon，可配合jvisualvm

jvisualvm - java virtual machine monitoring, troubleshooting, and profiling tool；可遠端連接配接jstatd/jmx, 可視化展示工具：示範

jvmtop - in a top-like manner, displays jvm internal metrics (e.g. memory information) of running java processes.

jvm performance optimization jvm開發者寫的優化文章

overview

compilers

garbage collection

concurrently compacting gc

scalability

hprof - heap profiler： java -agentlib:hprof

寫log，但系統線上或無法源碼時

strace - trace system calls and signals

示例：strace/ltrace的應用執行個體

示例：可跟蹤系統調用時間，如機器cpu:%sys高的問題

blktrace, generate traces of the i/o traffic on block devices

ltrace - a library call tracer

xtrace

gprof - a performance analysis tool, sampling and call-graph profiling

valgrind - an instrumentation framework for building dynamic analysis tools. automatically detect many memory management and threading bugs, and profile your programs in detail

systemtap - a simple command line interface and scripting language for writing instrumentation for a live running kernel plus user-space applications for complex tasks that may require live analysis, programmable on-line response, and whole-system symbolic access.

linux版dtrace（sun在solaris上開發的）

功能強大，kernel， user-space app，cross language（java perl python ruby），build-in markers（pg mysql）

can write and reuse simple scripts to deeply examine the activities of a live system

data can be extracted, filtered, and summarized quickly and safely, to enable diagnoses of complex performance or functional problems

豐富的 "tapset" script library

btrace - dynamic tracing tool for the java platform. userguide

基于動态位元組碼修改技術(hotswap)來實作運作時java程式的跟蹤和替換, 實作原理

btrace使用總結

詳細介紹

byteman - simplifies tracing and testing of java programs. can modify a running application without needing to stop and restart it.

define rules specifying the side effects you want to inject 而 btrace類java文法

dapper, a large-scale distributed systems tracing infrastructure

x-trace, a network diagnostic tool designed to provide users and network operators with better visibility into increasingly complex internet applications.

htrace， a tracing framework intended for use with distributed systems written in java

add tracing to hdfs

update htrace for hbase

部分内容有引用微網誌其他童鞋的，如有問題可以及時聯系。

Linux系統及應用問題分析排查工具

繼續閱讀

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

samba伺服器的功能

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

【Linux】UDP廣播封包接收速率問題

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

Linux裝置模型（中）之上層容器

scala (3) Function 和 Method

PowerPC平台 Linux移植三