标簽
PostgreSQL , pg_terminate_backend , pg_cancel_backend , hang , pstack , strace
https://github.com/digoal/blog/blob/master/201807/20180720_02.md#%E8%83%8C%E6%99%AF 背景
當PostgreSQL程序無法被cancel, terminate時,程序處于什麼狀态?為什麼無法退出?
https://github.com/digoal/blog/blob/master/201807/20180720_02.md#%E4%BE%8B%E5%AD%90 例子
1、無法被kill的程序
Type "help" for help.
postgres=# select pg_cancel_backend(60827);
pg_cancel_backend
-------------------
t
(1 row)
postgres=# select pg_terminate_backend(60827);
pg_terminate_backend
----------------------
t
(1 row)
postgres=# select pg_terminate_backend(60827);
pg_terminate_backend
----------------------
t
(1 row)
2、檢視程序當時的STACK,卡在__epoll_wait_nocancel
$pstack 60827
#0 0x00007f4bced78f13 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x0000000000753c35 in WaitEventSetWait ()
#2 0x000000000076d103 in ConditionVariableSleep ()
#3 0x00000000004cc4e1 in _bt_parallel_seize ()
#4 0x00000000004ce433 in ?? ()
#5 0x00000000004ce72e in ?? ()
#6 0x00000000004cf071 in _bt_first ()
#7 0x00000000004ccc2d in btgettuple ()
#8 0x00000000004c617a in index_getnext_tid ()
#9 0x0000000000650f87 in ?? ()
#10 0x000000000063efa1 in ExecScan ()
#11 0x000000000063d7c7 in ?? ()
#12 0x000000000064719e in ?? ()
#13 0x000000000064903c in ?? ()
#14 0x000000000063d7c7 in ?? ()
#15 0x000000000064c0c1 in ?? ()
#16 0x000000000063d7c7 in ?? ()
#17 0x000000000064719e in ?? ()
#18 0x000000000064903c in ?? ()
#19 0x000000000063d7c7 in ?? ()
#20 0x000000000063c4f0 in standard_ExecutorRun ()
#21 0x00007f4bc4cd7288 in ?? () from pg_stat_statements.so
#22 0x00007f4bc48cf87f in ?? () from auto_explain.so
#23 0x000000000077ed0b in ?? ()
#24 0x00000000007800d0 in PortalRun ()
#25 0x000000000077dc88 in PostgresMain ()
#26 0x000000000070782c in PostmasterMain ()
#27 0x000000000067d060 in main ()
3、檢視程序的strace
$strace -e trace=all -T -tt -p 60827
Process 60827 attached - interrupt to quit
19:21:14.881369 epoll_wait(270,
^C <unfinished ...>
Process 60827 detached
4、檢視這個系統調用的描述,等待某個FD的IO
$man epoll_wait
EPOLL_WAIT(2) Linux Programmer’s Manual EPOLL_WAIT(2)
NAME
epoll_wait, epoll_pwait - wait for an I/O event on an epoll file descriptor
SYNOPSIS
#include <sys/epoll.h>
int epoll_wait(int epfd, struct epoll_event *events,
int maxevents, int timeout);
int epoll_pwait(int epfd, struct epoll_event *events,
int maxevents, int timeout,
const sigset_t *sigmask);
5、檢視epoll_wait(270, 這個270 FD對應的是什麼
#cd /proc/60827/fd
#ll 270
lrwx------ 1 xxxxxx xxxxxxxxxxx 64 Jul 19 15:01 270 -> anon_inode:[eventpoll]
6、引起epoch_wait的PG調用WaitEventSetWait
src/backend/storage/ipc/latch.c
/*
* Wait for events added to the set to happen, or until the timeout is
* reached. At most nevents occurred events are returned.
*
* If timeout = -1, block until an event occurs; if 0, check sockets for
* readiness, but don't block; if > 0, block for at most timeout milliseconds.
*
* Returns the number of events occurred, or 0 if the timeout was reached.
*
* Returned events will have the fd, pos, user_data fields set to the
* values associated with the registered event.
*/
int
WaitEventSetWait(WaitEventSet *set, long timeout,
WaitEvent *occurred_events, int nevents,
uint32 wait_event_info)
{
int returned_events = 0;
instr_time start_time;
instr_time cur_time;
long cur_timeout = -1;
Assert(nevents > 0);
/*
* Initialize timeout if requested. We must record the current time so
* that we can determine the remaining timeout if interrupted.
*/
if (timeout >= 0)
{
INSTR_TIME_SET_CURRENT(start_time);
Assert(timeout >= 0 && timeout <= INT_MAX);
cur_timeout = timeout;
}
pgstat_report_wait_start(wait_event_info);
#ifndef WIN32
waiting = true;
#else
/* Ensure that signals are serviced even if latch is already set */
pgwin32_dispatch_queued_signals();
#endif
while (returned_events == 0)
{
int rc;
/*
* Check if the latch is set already. If so, leave the loop
* immediately, avoid blocking again. We don't attempt to report any
* other events that might also be satisfied.
*
* If someone sets the latch between this and the
* WaitEventSetWaitBlock() below, the setter will write a byte to the
* pipe (or signal us and the signal handler will do that), and the
* readiness routine will return immediately.
*
* On unix, If there's a pending byte in the self pipe, we'll notice
* whenever blocking. Only clearing the pipe in that case avoids
* having to drain it every time WaitLatchOrSocket() is used. Should
* the pipe-buffer fill up we're still ok, because the pipe is in
* nonblocking mode. It's unlikely for that to happen, because the
* self pipe isn't filled unless we're blocking (waiting = true), or
* from inside a signal handler in latch_sigusr1_handler().
*
* On windows, we'll also notice if there's a pending event for the
* latch when blocking, but there's no danger of anything filling up,
* as "Setting an event that is already set has no effect.".
*
* Note: we assume that the kernel calls involved in latch management
* will provide adequate synchronization on machines with weak memory
* ordering, so that we cannot miss seeing is_set if a notification
* has already been queued.
*/
if (set->latch && set->latch->is_set)
{
occurred_events->fd = PGINVALID_SOCKET;
occurred_events->pos = set->latch_pos;
occurred_events->user_data =
set->events[set->latch_pos].user_data;
occurred_events->events = WL_LATCH_SET;
occurred_events++;
returned_events++;
break;
}
/*
* Wait for events using the readiness primitive chosen at the top of
* this file. If -1 is returned, a timeout has occurred, if 0 we have
* to retry, everything >= 1 is the number of returned events.
*/
rc = WaitEventSetWaitBlock(set, cur_timeout,
occurred_events, nevents);
if (rc == -1)
break; /* timeout occurred */
else
returned_events = rc;
/* If we're not done, update cur_timeout for next iteration */
if (returned_events == 0 && timeout >= 0)
{
INSTR_TIME_SET_CURRENT(cur_time);
INSTR_TIME_SUBTRACT(cur_time, start_time);
cur_timeout = timeout - (long) INSTR_TIME_GET_MILLISEC(cur_time);
if (cur_timeout <= 0)
break;
}
}
#ifndef WIN32
waiting = false;
#endif
pgstat_report_wait_end();
return returned_events;
}
https://github.com/digoal/blog/blob/master/201807/20180720_02.md#strace-pstack%E7%9A%84%E4%BD%BF%E7%94%A8%E6%95%99%E7%A8%8B%E8%BD%AC%E8%BD%BD strace, pstack的使用教程(轉載)
如何使用strace+pstack利器分析程式性能
http://www.cnblogs.com/bangerlee/archive/2012/04/30/2476190.html http://www.cnblogs.com/bangerlee/archive/2012/02/20/2356818.htmlhttps://github.com/digoal/blog/blob/master/201807/20180720_02.md#%E5%BC%95%E8%A8%80 引言
有時我們需要對程式進行優化、減少程式響應時間。除了一段段地對代碼進行時間複雜度分析,我們還有更便捷的方法嗎?
若能直接找到影響程式運作時間的函數調用,再有針對地對相關函數進行代碼分析和優化,那相比漫無目的地看代碼,效率就高多了。
将strace和pstack工具結合起來使用,就可以達到以上目的。strace跟蹤程式使用的底層系統調用,可輸出系統調用被執行的時間點以及各個調用耗時;pstack工具對指定PID的程序輸出函數調用棧。
下面我們通過一個簡單的消息收發程式,說明使用strace、pstack進行程式分析的具體方法。
https://github.com/digoal/blog/blob/master/201807/20180720_02.md#%E7%A8%8B%E5%BA%8F%E8%AF%B4%E6%98%8E 程式說明
該程式是一個簡單的socket程式,由server/client組成。server端監聽某端口,等待client的連接配接,client連接配接server後定時向server發送消息,server每接收一條消息後向client發送響應消息。程式server與client互動如下圖示:
在程式運作起來之後,發現server接收到client的submit消息之後,需要較長時間才發出resp響應。通過tcpdump抓包發現,time2與time1的時間間隔在1s左右:
由上初步分析可知,消息響應慢是server端程式問題。下面我們來看如何使用strace和pstack分析server端程式響應慢的原因。
https://github.com/digoal/blog/blob/master/201807/20180720_02.md#strace%E6%9F%A5%E7%9C%8B%E7%B3%BB%E7%BB%9F%E8%B0%83%E7%94%A8 strace檢視系統調用
首先我們拉起server/client程式,并使用strace對server程序進行跟蹤:
# ps -elf | grep server | grep -v grep
0 S root 16739 22642 0 76 0 - 634 1024 14:26 pts/2 00:00:00 ./server
# strace -o server.strace -Ttt -p 16739
Process 16739 attached - interrupt to quit
稍等一段時間之後,我們将strace停掉, server.strace檔案中有以下輸出:
14:46:39.741366 select(8, [3 4], NULL, NULL, {1, 0}) = 1 (in [4], left {0, 1648}) <0.998415>
14:46:40.739965 recvfrom(4, "hello", 6, 0, NULL, NULL) = 5 <0.000068>
14:46:40.740241 write(1, "hello\n", 6) = 6 <0.000066>
14:46:40.740414 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000046>
14:46:40.740565 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 <0.000048>
14:46:40.740715 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000046>
14:46:40.740853 nanosleep({1, 0}, {1, 0}) = 0 <1.000276>
14:46:41.741284 sendto(4, "hello\0", 6, 0, NULL, 0) = 6 <0.000111>
可以看到server接收資料之後(對應recvfrom調用),經過1s左右時間将消息發出(對應sendto調用),從響應時間看,與抓包的結果吻合。又可以看出nanosleep系統調用耗費了1s時間。
因而可以斷定響應延時由nanosleep對應的函數調用造成。
那具體是哪一個函數調用呢?在strace輸出結果中并不能找到答案,因其輸出顯示都是系統調用,要顯示程式中函數調用棧資訊,就輪到pstack上場了。
https://github.com/digoal/blog/blob/master/201807/20180720_02.md#pstack%E6%9F%A5%E7%9C%8B%E5%87%BD%E6%95%B0%E5%A0%86%E6%A0%88 pstack檢視函數堆棧
pstack是一個腳本工具,其核心實作就是使用了gdb以及thread apply all bt指令,下面我們使用pstack檢視server程序函數堆棧:
# sh pstack.sh 16739
#0 0x00002ba1f8152650 in __nanosleep_nocancel () from /lib64/libc.so.6
#1 0x00002ba1f8152489 in sleep () from /lib64/libc.so.6
#2 0x00000000004007bb in ha_ha ()
#3 0x0000000000400a53 in main ()
從以上資訊可以看出,函數調用關系為:main->ha_ha->sleep,因而我們可以找到ha_ha函數進行分析和優化修改。
https://github.com/digoal/blog/blob/master/201807/20180720_02.md#%E5%B0%8F%E7%BB%93 小結
本文通過一個server/client程式事例,說明了使用strace和pstack分析響應延時的方法。
由最初server端響應慢現象,到使用strace跟蹤出具體耗時的系統調用,再到使用pstack查到程式中具體的耗時函數,一步步找到了影響程式運作時間的程式代碼。
更多地了解底層,從作業系統層面着手,更有助于程式性能分析與優化。
本文中使用的server/client程式和pstack腳本可從這裡下載下傳。
strace 通用的完整用法 :
strace -o output.txt -T -tt -e trace=all -p 10423
上面的含義是 跟蹤28979程序的所有系統調用(-e trace=all),并統計系統調用的花費時間,以及開始時間(并以可視化的時分秒格式顯示),最後将記錄結果存在
output.txt檔案裡面。
限制strace隻跟蹤特定的系統調用 :
如果你已經知道你要找什麼,你可以讓strace隻跟蹤一些類型的系統調用。例如,在nginx執行程式時,你需要監視的系統調用epoll_wait。
讓strace隻記錄epoll_wait的調用用這個指令:
strace -f -o epoll-strace.txt -e epoll_wait -p 10423
指令strace跟蹤的是系統調用,對于nginx本身的函數調用關系無法給出更為明朗的資訊,如果我們發現nginx目前運作不正常,想知道nginx目前内部到底在執行什麼函數,
那麼指令pstack就是一個非常友善實用的工具。pstack的使用也非常簡單,後面跟程序id即可,比如在無用戶端請求的情況下,nginx阻塞在epoll_wait系統調用處,此時
利用pstack檢視到的nginx函數調用堆棧關系如下:
從main()函數到epoll_wait()函數的調用關系一目了然,和在gdb内看到的堆棧資訊一樣。我們可以利用此進行分析優化等。