标簽
PostgreSQL , Linux , defunct , 僵屍 , ipcrm , kill -9 , pg_ctl stop -m immediate
https://github.com/digoal/blog/blob/master/201804/20180409_03.md#%E8%83%8C%E6%99%AF 背景
在某些非常特殊的情況下,可能遇到資料庫無法停庫,也無法連接配接(幾乎處于不可用狀态)的情況。即使使用-m immediate這樣的暴力停庫都無法停下來。
例如,當資料庫的某些使用者程序進入了defunct狀态時。
2951 20411 37111 0 14:28 ? 00:00:00 postgres: digoal postgres [local] idle
2951 37111 1 0 2017 ? 00:05:48 /home/digoal/pgsql/bin/postgres -D /home/digoal/pgdata
2951 37121 37111 0 2017 ? 00:01:27 [postgres] <defunct>
2951 37128 37111 0 2017 ? 00:09:09 [postgres] <defunct>
2951 37129 37111 0 2017 ? 00:01:28 postgres: stats collector process
2951 38325 37111 0 19:33 ? 00:00:00 [postgres] <defunct>
.................
37111是PG的主程序,其他是它的子程序。
此時使用如下方法都無法停庫。
pg_ctl stop -m fast -D /home/digoal/pgdata
pg_ctl stop -m immediate -D /home/digoal/pgdata
https://github.com/digoal/blog/blob/master/201804/20180409_03.md#linux-%E5%A4%84%E7%90%86--z-%E5%83%B5%E5%B0%B8%E8%BF%9B%E7%A8%8B Linux 處理 Z 僵屍程序
如何kill殺掉linux系統中的僵屍defunct程序
在 Unix系統管理中,當用ps指令觀察程序的執行狀态時,經常看到某些程序的狀态欄為defunct,這就是所謂的“僵屍”程序。“僵屍”程序是一個早已 死亡的程序,但在程序表(processs table)中仍占了一個位置(slot)。由于程序表的容量是有限的,是以,defunct程序不僅占用系統的記憶體資源,影響系統的性能,而且如果其數 目太多,還會導緻系統癱瘓。
我們知道,每個Unix程序在程序表裡都有一個進入點(entry),核心程式執行該程序時使用到的一切資訊都存儲在進入點。當用ps指令察看系統中的程序資訊時,看到的就是程序表中的相關資料。當以fork()系統調用建立一個新的程序後,核心程序就會在程序表中給這個新程序配置設定一個進入點,然後将相關資訊存儲在該進入點所對應的程序表内。這些資訊中有一項是其父程序的識别碼。當這個程序走完了自己的生命周期後,它會執行exit()系統調用,此時原來進 程表中的資料會被該程序的退出碼(exit code)、執行時所用的CPU時間等資料所取代,這些資料會一直保留到系統将它傳遞給它的父程序為止。由此可見,defunct程序的出現時間是在子進 程終止後,但是父程序尚未讀取這些資料之前。
defunct程序是不能直接kill -9殺掉的,否則就不叫僵屍程序了。
知道了defunct程序産生的原因,就可以輕易的kill掉defunct程序。
方法有二:
1,重新開機伺服器電腦,這個是最簡單,最易用的方法,但是如果你伺服器電腦上運作有其他的程式,那麼這個方法,代價很大。是以,盡量使用下面一種方法。
2,找到該defunct僵屍程序的父程序,将該程序的父程序殺掉,則此defunct程序将自動消失。如何找到defunct僵屍程序的父程序?很簡單,一句指令就夠了:ps -ef | grep defunct_process_pid。
https://github.com/digoal/blog/blob/master/201804/20180409_03.md#postgresql-%E8%BF%9B%E5%85%A5%E5%83%B5%E5%B0%B8%E7%8A%B6%E6%80%81%E5%90%8E%E7%9A%84%E5%A4%84%E7%90%86 PostgreSQL 進入僵屍狀态後的處理
如開篇提到的CASE,如何把PG停下來呢?
分為三個級别的操作,請依次嘗試,最後萬不得已使用第三種方法。
https://github.com/digoal/blog/blob/master/201804/20180409_03.md#1%E6%B8%A9%E6%9F%94%E5%81%9C%E5%BA%93 1、溫柔停庫
pg_ctl stop -m fast -D /home/digoal/pgdata
https://github.com/digoal/blog/blob/master/201804/20180409_03.md#2%E7%B4%A7%E6%80%A5%E5%BF%AB%E9%80%9F%E5%81%9C%E5%BA%93%E4%B8%8D%E5%88%B7%E7%9B%98%E9%80%9A%E5%B8%B8%E7%94%A8%E4%BA%8E%E7%B4%A7%E6%80%A5%E5%81%9C%E7%94%B5%E5%8F%AA%E6%9C%89ups%E7%9A%84%E5%A4%84%E7%90%86 2、緊急快速停庫(不刷盤,通常用于緊急停電隻有UPS的處理)
pg_ctl stop -m immediate -D /home/digoal/pgdata
https://github.com/digoal/blog/blob/master/201804/20180409_03.md#3kill%E6%B8%85%E7%90%86 3、KILL清理
首先要檢視啟動資料庫的主程序
《explain postmaster.pid》cd /home/digoal/pgdata
cat postmaster.pid
37111
/home/digoal/pgdata
1511749309
1921
/tmp
*
1921001 2007007232
1、殺掉postmaster 程序的所有子程序
ps -ewf|grep 37111|grep -v grep|awk '{print "kill -9 "$2}'|grep -v 37111
2、殺掉postmaster 程序
kill -9 37111
3、清理shared memory和sem
$ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x002dcaa9 2007007232 digoal 600 48 0
------ Semaphore Arrays --------
key semid owner perms nsems
0x0002243f 65538 admin 666 2
0x002dcaa9 1912963075 digoal 600 17
0x002dcaaa 1912995844 digoal 600 17
0x002dcaab 1913028613 digoal 600 17
0x002dcaac 1913061382 digoal 600 17
0x002dcaad 1913094151 digoal 600 17
0x002dcaae 1913126920 digoal 600 17
0x002dcaaf 1913159689 digoal 600 17
0x002dcab0 1913192458 digoal 600 17
0x002dcab1 1913225227 digoal 600 17
0x002dcab2 1913257996 digoal 600 17
0x002dcab3 1913290765 digoal 600 17
0x002dcab4 1913323534 digoal 600 17
0x002dcab5 1913356303 digoal 600 17
0x002dcab6 1913389072 digoal 600 17
0x002dcab7 1913421841 digoal 600 17
0x002dcab8 1913454610 digoal 600 17
0x002dcab9 1913487379 digoal 600 17
0x002dcaba 1913520148 digoal 600 17
0x002dcabb 1913552917 digoal 600 17
0x002dcabc 1913585686 digoal 600 17
0x002dcabd 1913618455 digoal 600 17
0x002dcabe 1913651224 digoal 600 17
0x002dcabf 1913683993 digoal 600 17
0x002dcac0 1913716762 digoal 600 17
0x002dcac1 1913749531 digoal 600 17
0x002dcac2 1913782300 digoal 600 17
0x002dcac3 1913815069 digoal 600 17
0x002dcac4 1913847838 digoal 600 17
0x002dcac5 1913880607 digoal 600 17
0x002dcac6 1913913376 digoal 600 17
0x002dcac7 1913946145 digoal 600 17
0x002dcac8 1913978914 digoal 600 17
0x002dcac9 1914011683 digoal 600 17
0x002dcaca 1914044452 digoal 600 17
0x002dcacb 1914077221 digoal 600 17
0x002dcacc 1914109990 digoal 600 17
0x002dcacd 1914142759 digoal 600 17
0x002dcace 1914175528 digoal 600 17
0x002dcacf 1914208297 digoal 600 17
0x002dcad0 1914241066 digoal 600 17
0x002dcad1 1914273835 digoal 600 17
0x002dcad2 1914306604 digoal 600 17
0x002dcad3 1914339373 digoal 600 17
0x002dcad4 1914372142 digoal 600 17
0x002dcad5 1914404911 digoal 600 17
0x002dcad6 1914437680 digoal 600 17
0x002dcad7 1914470449 digoal 600 17
0x002dcad8 1914503218 digoal 600 17
0x002dcad9 1914535987 digoal 600 17
0x002dcada 1914568756 digoal 600 17
0x002dcadb 1914601525 digoal 600 17
0x002dcadc 1914634294 digoal 600 17
0x002dcadd 1914667063 digoal 600 17
0x002dcade 1914699832 digoal 600 17
0x002dcadf 1914732601 digoal 600 17
0x002dcae0 1914765370 digoal 600 17
0x002dcae1 1914798139 digoal 600 17
0x002dcae2 1914830908 digoal 600 17
0x002dcae3 1914863677 digoal 600 17
0x002dcae4 1914896446 digoal 600 17
0x002dcae5 1914929215 digoal 600 17
0x002dcae6 1914961984 digoal 600 17
0x002dcae7 1914994753 digoal 600 17
0x002dcae8 1915027522 digoal 600 17
0x002dcae9 1915060291 digoal 600 17
0x002dcaea 1915093060 digoal 600 17
0x002dcaeb 1915125829 digoal 600 17
0x002dcaec 1915158598 digoal 600 17
0x002dcaed 1915191367 digoal 600 17
0x002dcaee 1915224136 digoal 600 17
0x002dcaef 1915256905 digoal 600 17
0x002dcaf0 1915289674 digoal 600 17
0x002dcaf1 1915322443 digoal 600 17
0x002dcaf2 1915355212 digoal 600 17
0x002dcaf3 1915387981 digoal 600 17
0x002dcaf4 1915420750 digoal 600 17
0x002dcaf5 1915453519 digoal 600 17
0x002dcaf6 1915486288 digoal 600 17
0x002dcaf7 1915519057 digoal 600 17
0x002dcaf8 1915551826 digoal 600 17
0x002dcaf9 1915584595 digoal 600 17
0x002dcafa 1915617364 digoal 600 17
0x002dcafb 1915650133 digoal 600 17
0x002dcafc 1915682902 digoal 600 17
0x002dcafd 1915715671 digoal 600 17
0x002dcafe 1915748440 digoal 600 17
0x002dcaff 1915781209 digoal 600 17
0x002dcb00 1915813978 digoal 600 17
0x002dcb01 1915846747 digoal 600 17
0x002dcb02 1915879516 digoal 600 17
0x002dcb03 1915912285 digoal 600 17
0x002dcb04 1915945054 digoal 600 17
0x002dcb05 1915977823 digoal 600 17
0x002dcb06 1916010592 digoal 600 17
0x002dcb07 1916043361 digoal 600 17
0x002dcb08 1916076130 digoal 600 17
0x002dcb09 1916108899 digoal 600 17
0x002dcb0a 1916141668 digoal 600 17
0x002dcb0b 1916174437 digoal 600 17
0x002dcb0c 1916207206 digoal 600 17
0x002dcb0d 1916239975 digoal 600 17
0x002dcb0e 1916272744 digoal 600 17
0x002dcb0f 1916305513 digoal 600 17
0x002dcb10 1916338282 digoal 600 17
0x002dcb11 1916371051 digoal 600 17
0x002dcb12 1916403820 digoal 600 17
0x002dcb13 1916436589 digoal 600 17
0x002dcb14 1916469358 digoal 600 17
0x002dcb15 1916502127 digoal 600 17
0x002dcb16 1916534896 digoal 600 17
0x002dcb17 1916567665 digoal 600 17
0x002dcb18 1916600434 digoal 600 17
0x002dcb19 1916633203 digoal 600 17
0x002dcb1a 1916665972 digoal 600 17
0x002dcb1b 1916698741 digoal 600 17
0x002dcb1c 1916731510 digoal 600 17
0x002dcb1d 1916764279 digoal 600 17
0x002dcb1e 1916797048 digoal 600 17
0x002dcb1f 1916829817 digoal 600 17
0x002dcb20 1916862586 digoal 600 17
0x002dcb21 1916895355 digoal 600 17
0x002dcb22 1916928124 digoal 600 17
0x002dcb23 1916960893 digoal 600 17
0x002dcb24 1916993662 digoal 600 17
0x002dcb25 1917026431 digoal 600 17
0x002dcb26 1917059200 digoal 600 17
0x002dcb27 1917091969 digoal 600 17
0x002dcb28 1917124738 digoal 600 17
0x002dcb29 1917157507 digoal 600 17
0x002dcb2a 1917190276 digoal 600 17
0x002dcb2b 1917223045 digoal 600 17
0x002dcb2c 1917255814 digoal 600 17
------ Message Queues --------
key msqid owner perms used-bytes messages
注意,隻清理37111這個PG執行個體建立的shared memory和sem,不要把别人的清理掉了。
1、清理共享記憶體段
$ipcrm -m 2007007232
2、清理sem
ipcs|awk '{print "ipcrm -s "$2}'
......
ipcrm -s 1912995844
ipcrm -s 1913028613
ipcrm -s 1913061382
ipcrm -s 1913094151
ipcrm -s 1913126920
......
ipcrm -s 1917124738
ipcrm -s 1917157507
ipcrm -s 1917190276
ipcrm -s 1917223045
ipcrm -s 1917255814
然後資料庫就幹淨的關閉了,可以啟動資料庫了。這樣就做到了不重新開機伺服器把資料庫從僵屍狀态複活。
pg_ctl start -D /home/digoal/pgdata