筆者是WEB JAVA背景開發,最近線上上遇到過幾次服務不可用問題,基本現象是接口請求無響應或響應非常慢達到分鐘級别。一般問題發生時我們都會去檢視日志,經常遇到沒有日志的情況(此時服務無法響應client請求),甚至要去找幾個小時前的日志現象發生時,有些接口甚至沒有日志列印,查找起來很困難,利用jvm的線程棧工具jstack對于查找問題有很大幫助。
文章以SpringBoot為架構開發一個web demo應用,以接口代碼示例幾種可能會導緻服務無法響應的案例,并講述如何以jstack等工具排查問題。
環境:單核CPU虛拟機CentOS6 + JAVA8 + SpringBoot
JAVA jstack日志檔案中有以下幾種狀态需要關注的:
1.死鎖,Deadlock,線程死鎖;
2.執行中,Runnable,線程執行過程中可能會遇到第三方IO等阻塞或循環,仍需要關注;
3.等待資源, Waiting on condition,線程等待條件,可能是在等待網絡資源響應請求,具體需結合棧資訊
stacktrace
進行分析;
4.等待擷取螢幕,Waiting on monitor entry,一般是互斥鎖實作線程同步;
5.條件等待/定時等待,Object.wait() 或 TIMED_WAITING,Object.wait()是讓目前線程阻塞,并出讓目前線程的擁有的Object鎖,直到被持有Object鎖的其它線程調用Object.notify()喚醒才繼續執行
6.停止/停止中:Parked/Parking。
死循環
死循環或長時間循環計算,占用CPU計算資源,導緻CPU占滿。本例虛拟機CPU為1核,是以CPU占用用率達到
100%
,如果是多核,則占用率為
1/n
,如四核則為
25%
代碼示例
@RequestMapping("loop")
public void threadLoopDemo() throws Exception{
int num = 0;
long start = System.currentTimeMillis() / 1000;
while (true) {
log.info("====> 測試 Loop");
num++;
if (num == Integer.MAX_VALUE) {
log.info("====> rest num");
num = 0;
}
if (System.currentTimeMillis() / 1000 - start > 1000) {
return;
}
}
}
現象說明
top -c
CPU占用情況,發現此時CPU占用
100%
,說明以上死循環獨占CPU資源。
ps -mp 3168 -o THREAD,tid,time
或
top -H -p 3168
(3168為程序号
<pid>
) 可列印出程序對應的線程id及運作時間time,可以看到
nid=3187
的線程占用CPU
82.8%
,且運作時間為
2min

線程堆棧
jstack 3168
列印輸出找到對應問題的堆棧,從下往上看,發現tomcat NIO Channel被locked,說明該請求的線程未釋放,仍在執行;同時找到出問題的代碼位置。
nid=0xc73
換成十進制為
nid=3187
,即上述占用CPU高且時間較少的線程。
"http-nio-10015-exec-1" #19 daemon prio=5 os_prio=0 tid=0x00007f75e4050800 nid=0xc73 runnable [0x00007f75e84e4000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000000eb1a0b48> (a java.io.BufferedOutputStream) // log日志,輸出到file日志檔案
at java.io.PrintStream.write(PrintStream.java:482)
- locked <0x00000000eb18a468> (a java.io.PrintStream)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at ch.qos.logback.core.joran.spi.ConsoleTarget$1.write(ConsoleTarget.java:37)
at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:131)
.... 問題代碼出處
at com.ljyhust.demo.web.ThreadTestDemoController.threadLoopDemo(ThreadTestDemoController.java:20)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
..... 此處省略
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
- locked <0x00000000ec476d30> (a org.apache.tomcat.util.net.NioChannel)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
網絡/IO阻塞
大量網絡IO阻塞,導緻占用服務線程,導緻服務無法響應。
這裡以大家熟悉
tomcat
服務為例,tomcat設有最大線程數
maxThreads
和最大排隊數
acceptCount
,這兩個參數可以在
server.xml
檔案中配置。tomcat處理請求可分為以下3種情況:
1.接收一個請求,當啟動的線程數或正在運作的線程數
< maxThreads
時,則
tomcat
會啟動一個線程來處理該請求;
2.接收一個請求,當啟動的線程數或正在運作的線程數
> maxThreads
時,則
tomcat
會把請求放入等待隊列,等待空閑線程執行請求;
3.接收一個請求,當啟動的線程數或正在運作的線程數
> maxThreads
&& 請求隊列已滿時,則
tomcat
會直接拒絕請求,此時用戶端現象是
connection refused
(連接配接被拒絕)
如果大量的線程在執行請求的過程中由于IO阻塞,則導緻線程池占滿,服務則無法響應新的請求。
代碼示例
blockIo
接口是問題代碼,請求第三方google應用,如果請求緩慢或阻塞則會導緻請求線程阻塞,當大量請求線程阻塞占滿
tomcat
線程池時,則服務無法響應新進來的請求甚至拒絕請求,導緻服務“假死”。通過
jmeter
模拟 3000 并發請求,檢視其它接口如
getTime
是否能正常響應。
@RequestMapping("blockIo")
public Object blockIoDemo() throws Exception {
JSONObject resJson = new JSONObject();
try {
JSONObject resStr = RestClientUtil.getRestTemplate().getForObject("http://10.247.63.25:10015/demo/threadTest/resBlock", JSONObject.class);
log.info("=====> 擷取text/html {}", resStr);
} catch (Exception e) {
e.printStackTrace();
}
resJson.put("code", "100");
return resJson;
}
@RequestMapping("getTime")
public Object getServerTime() throws Exception {
log.info("=====> 請求開始");
JSONObject resJson = new JSONObject();
String format = DateFormatUtils.format(new Date(), "yyyy-MM-dd HH:mm:ss");
resJson.put("code", "100");
resJson.put("reqTime", format);
log.info("=====> 請求結束");
return resJson;
}
現象說明
-
接口響應
并發前後,請求
接口,發現并發前getTime
接口響應時間為getTime
,而并發後響應時間則變為32ms
,說明服務響應已受到影響。12.19s
利用jstack工具分析JVM線程 利用jstack工具分析JVM線程 -
CPU狀态
CPU占用并不高,此次并發對CPU并未造成明顯影響。
利用jstack工具分析JVM線程 -
jstack線程堆棧
線程堆棧資訊如下,其中有大量的
線程,跟蹤TIMED_WAITING
找到stacktrace
位置,這個是at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)
apache httpclient
連接配接池代碼段,這段代碼說明http連接配接池不夠用,造成大量請求等待。
我們來看看
狀态的線程RUNNABLE
,該線程正在執行請求,跟蹤代碼出處從下往上找,nid=0xab7
說明正在讀取網絡資源。java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
// 請求進入等待隊列中
"http-nio-10015-exec-188" #208 daemon prio=5 os_prio=0 tid=0x0000000001bdb800 nid=0xab9 waiting on condition [0x00007f920f2af000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e29149c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:256)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2120)
at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:307)
at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:65)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:193)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:186)
at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:108)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:282)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:269)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
....
at com.ljyhust.demo.web.ThreadTestDemoController.blockIoDemo(ThreadTestDemoController.java:46)
at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:832)
at ....
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
- locked <0x00000000ee823bb0> (a org.apache.tomcat.util.net.NioChannel)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
"http-nio-10015-exec-187" #207 daemon prio=5 os_prio=0 tid=0x0000000001bd9800 nid=0xab8 waiting on condition [0x00007f920f3b0000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e142d7b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:256)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2120)
at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:137)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:307)
at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:65)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:193)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:186)
at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:108)
...
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:528)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1099)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:670)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
- locked <0x00000000ee821b30> (a org.apache.tomcat.util.net.NioChannel)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
// 請求第三方資源read
"http-nio-10015-exec-186" #206 daemon prio=5 os_prio=0 tid=0x0000000001bd7800 nid=0xab7 runnable [0x00007f920f4b1000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
...
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:91)
at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:596)
at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:557)
at org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:264)
at com.ljyhust.demo.web.ThreadTestDemoController.blockIoDemo(ThreadTestDemoController.java:46)
at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
...
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1520)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1476)
- locked <0x00000000ec853808> (a org.apache.tomcat.util.net.NioChannel)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
死鎖
死鎖是由于多線程争奪互斥資源導緻的。例如 1、2線程分别占用A、B鎖,但同時在臨界區代碼中又需要B、A鎖,由于各自擷取了對方所需要的鎖,最終導緻死鎖。
滿足死鎖的條件有以下四個,缺一不可:
1.互斥條件,即不能同時被兩個或兩個以上的線程占有;
2.不可搶占條件,即已占用的鎖不能被其它線程搶奪;
3.占有且申請條件,即程序已經占有了一個鎖,但又需要申請/等待另外一個鎖;
4.循環等待條件,即等待其它線程的鎖,而其它線程又等待更多線程的鎖,且形成一個等待循環。
代碼示例
@RequestMapping("deadLock")
public Object deadLockDemo() throws Exception {
log.info("=====> 請求開始");
JSONObject resJson = new JSONObject();
Thread t1 = new Thread(new Runnable() {
@Override
public void run() {
try {
deadLockThreadDemo.getLockAB();
} catch (Exception e) {
e.printStackTrace();
}
}
});
Thread t2 = new Thread(new Runnable() {
@Override
public void run() {
try {
deadLockThreadDemo.getLockBA();
} catch (Exception e) {
e.printStackTrace();
}
}
});
t1.start();
t2.start();
log.info("=====> 請求結束");
resJson.put("code", "100");
return resJson;
}
public void getLockAB() throws Exception {
// 鎖A
synchronized (objectA) {
try {
Thread.sleep(2000);
} catch (InterruptedException e) {
e.printStackTrace();
}
// 鎖B
log.info("線程1嘗試擷取B鎖");
synchronized (objectB) {
log.info("線程1擷取到B鎖");
}
}
}
public void getLockBA() throws Exception {
// 鎖B
synchronized (objectB) {
try {
Thread.sleep(2000);
} catch (InterruptedException e) {
e.printStackTrace();
}
// 鎖A
log.info("線程2嘗試擷取A鎖");
synchronized (objectA) {
log.info("線程2擷取到A鎖");
}
}
}
現象說明
-
jstack線程堆棧
線程堆棧如下所示,老套路從下往上看棧資訊,線程
先鎖Thread-6
對象然後等待0x00000000ebe97e18
;線程0x00000000ebe97e08
先鎖Thread-5
對象然後待0x00000000ebe97e08
,兩個線程彼此等待,導緻死鎖0x00000000ebe97e18
。BLOCKED (on object monitor)
"Thread-6" #24 daemon prio=5 os_prio=0 tid=0x00007f6ce8ca6800 nid=0xbd7 waiting for monitor entry [0x00007f6d081f8000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ljyhust.demo.service.DeadLockThreadDemo.getLockBA(DeadLockThreadDemo.java:42)
- waiting to lock <0x00000000ebe97e08> (a java.lang.Object)
- locked <0x00000000ebe97e18> (a java.lang.Object)
at com.ljyhust.demo.web.ThreadTestDemoController$2.run(ThreadTestDemoController.java:89)
at java.lang.Thread.run(Thread.java:748)
"Thread-5" #23 daemon prio=5 os_prio=0 tid=0x00007f6ce8576800 nid=0xbd6 waiting for monitor entry [0x00007f6d1cee3000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ljyhust.demo.service.DeadLockThreadDemo.getLockAB(DeadLockThreadDemo.java:26)
- waiting to lock <0x00000000ebe97e18> (a java.lang.Object)
- locked <0x00000000ebe97e08> (a java.lang.Object)
at com.ljyhust.demo.web.ThreadTestDemoController$1.run(ThreadTestDemoController.java:78)
at java.lang.Thread.run(Thread.java:748)
結論
無論是CPU飙高還是服務響應緩慢,當從日志中找不出問題甚至沒有日志列印的時候,可以利用
jstack
指令列印線程堆棧資訊、結合
top -H -p <pid>
可能會找到問題原因。尤其是當服務調用其它資源較多時,而又找不到具體哪個服務問題時,不妨試下這個指令找找看。