天天看點

PostgreSQL Linux 下 僵屍狀态的處理

标簽

PostgreSQL , Linux , defunct , 僵屍 , ipcrm , kill -9 , pg_ctl stop -m immediate

https://github.com/digoal/blog/blob/master/201804/20180409_03.md#%E8%83%8C%E6%99%AF 背景

在某些非常特殊的情況下,可能遇到資料庫無法停庫,也無法連接配接(幾乎處于不可用狀态)的情況。即使使用-m immediate這樣的暴力停庫都無法停下來。

例如,當資料庫的某些使用者程序進入了defunct狀态時。

2951     20411 37111  0 14:28 ?        00:00:00 postgres: digoal postgres [local] idle                                           
2951     37111     1  0  2017 ?        00:05:48 /home/digoal/pgsql/bin/postgres -D /home/digoal/pgdata  
2951     37121 37111  0  2017 ?        00:01:27 [postgres] <defunct>  
2951     37128 37111  0  2017 ?        00:09:09 [postgres] <defunct>  
2951     37129 37111  0  2017 ?        00:01:28 postgres: stats collector process                                                   
2951     38325 37111  0 19:33 ?        00:00:00 [postgres] <defunct>  
.................  
           

37111是PG的主程序,其他是它的子程序。

此時使用如下方法都無法停庫。

pg_ctl stop -m fast -D /home/digoal/pgdata  
  
pg_ctl stop -m immediate -D /home/digoal/pgdata  
           

https://github.com/digoal/blog/blob/master/201804/20180409_03.md#linux-%E5%A4%84%E7%90%86--z-%E5%83%B5%E5%B0%B8%E8%BF%9B%E7%A8%8B Linux 處理 Z 僵屍程序

如何kill殺掉linux系統中的僵屍defunct程序

在 Unix系統管理中,當用ps指令觀察程序的執行狀态時,經常看到某些程序的狀态欄為defunct,這就是所謂的“僵屍”程序。“僵屍”程序是一個早已 死亡的程序,但在程序表(processs table)中仍占了一個位置(slot)。由于程序表的容量是有限的,是以,defunct程序不僅占用系統的記憶體資源,影響系統的性能,而且如果其數 目太多,還會導緻系統癱瘓。

我們知道,每個Unix程序在程序表裡都有一個進入點(entry),核心程式執行該程序時使用到的一切資訊都存儲在進入點。當用ps指令察看系統中的程序資訊時,看到的就是程序表中的相關資料。當以fork()系統調用建立一個新的程序後,核心程序就會在程序表中給這個新程序配置設定一個進入點,然後将相關資訊存儲在該進入點所對應的程序表内。這些資訊中有一項是其父程序的識别碼。當這個程序走完了自己的生命周期後,它會執行exit()系統調用,此時原來進 程表中的資料會被該程序的退出碼(exit code)、執行時所用的CPU時間等資料所取代,這些資料會一直保留到系統将它傳遞給它的父程序為止。由此可見,defunct程序的出現時間是在子進 程終止後,但是父程序尚未讀取這些資料之前。

defunct程序是不能直接kill -9殺掉的,否則就不叫僵屍程序了。

知道了defunct程序産生的原因,就可以輕易的kill掉defunct程序。

方法有二:

1,重新開機伺服器電腦,這個是最簡單,最易用的方法,但是如果你伺服器電腦上運作有其他的程式,那麼這個方法,代價很大。是以,盡量使用下面一種方法。

2,找到該defunct僵屍程序的父程序,将該程序的父程序殺掉,則此defunct程序将自動消失。如何找到defunct僵屍程序的父程序?很簡單,一句指令就夠了:ps -ef | grep defunct_process_pid。

https://github.com/digoal/blog/blob/master/201804/20180409_03.md#postgresql-%E8%BF%9B%E5%85%A5%E5%83%B5%E5%B0%B8%E7%8A%B6%E6%80%81%E5%90%8E%E7%9A%84%E5%A4%84%E7%90%86 PostgreSQL 進入僵屍狀态後的處理

如開篇提到的CASE,如何把PG停下來呢?

分為三個級别的操作,請依次嘗試,最後萬不得已使用第三種方法。

https://github.com/digoal/blog/blob/master/201804/20180409_03.md#1%E6%B8%A9%E6%9F%94%E5%81%9C%E5%BA%93 1、溫柔停庫

pg_ctl stop -m fast -D /home/digoal/pgdata  
           

https://github.com/digoal/blog/blob/master/201804/20180409_03.md#2%E7%B4%A7%E6%80%A5%E5%BF%AB%E9%80%9F%E5%81%9C%E5%BA%93%E4%B8%8D%E5%88%B7%E7%9B%98%E9%80%9A%E5%B8%B8%E7%94%A8%E4%BA%8E%E7%B4%A7%E6%80%A5%E5%81%9C%E7%94%B5%E5%8F%AA%E6%9C%89ups%E7%9A%84%E5%A4%84%E7%90%86 2、緊急快速停庫(不刷盤,通常用于緊急停電隻有UPS的處理)

pg_ctl stop -m immediate -D /home/digoal/pgdata  
           

https://github.com/digoal/blog/blob/master/201804/20180409_03.md#3kill%E6%B8%85%E7%90%86 3、KILL清理

首先要檢視啟動資料庫的主程序

《explain postmaster.pid》
cd /home/digoal/pgdata  
  
cat postmaster.pid  
  
37111  
/home/digoal/pgdata  
1511749309  
1921  
/tmp  
*  
  1921001 2007007232  
           

1、殺掉postmaster 程序的所有子程序

ps -ewf|grep 37111|grep -v grep|awk '{print "kill -9 "$2}'|grep -v 37111  
           

2、殺掉postmaster 程序

kill -9 37111  
           

3、清理shared memory和sem

$ipcs  
  
------ Shared Memory Segments --------  
key        shmid      owner      perms      bytes      nattch     status        
0x002dcaa9 2007007232 digoal  600        48         0                         
  
------ Semaphore Arrays --------  
key        semid      owner      perms      nsems       
0x0002243f 65538      admin      666        2           
0x002dcaa9 1912963075 digoal  600        17          
0x002dcaaa 1912995844 digoal  600        17          
0x002dcaab 1913028613 digoal  600        17          
0x002dcaac 1913061382 digoal  600        17          
0x002dcaad 1913094151 digoal  600        17          
0x002dcaae 1913126920 digoal  600        17          
0x002dcaaf 1913159689 digoal  600        17          
0x002dcab0 1913192458 digoal  600        17          
0x002dcab1 1913225227 digoal  600        17          
0x002dcab2 1913257996 digoal  600        17          
0x002dcab3 1913290765 digoal  600        17          
0x002dcab4 1913323534 digoal  600        17          
0x002dcab5 1913356303 digoal  600        17          
0x002dcab6 1913389072 digoal  600        17          
0x002dcab7 1913421841 digoal  600        17          
0x002dcab8 1913454610 digoal  600        17          
0x002dcab9 1913487379 digoal  600        17          
0x002dcaba 1913520148 digoal  600        17          
0x002dcabb 1913552917 digoal  600        17          
0x002dcabc 1913585686 digoal  600        17          
0x002dcabd 1913618455 digoal  600        17          
0x002dcabe 1913651224 digoal  600        17          
0x002dcabf 1913683993 digoal  600        17          
0x002dcac0 1913716762 digoal  600        17          
0x002dcac1 1913749531 digoal  600        17          
0x002dcac2 1913782300 digoal  600        17          
0x002dcac3 1913815069 digoal  600        17          
0x002dcac4 1913847838 digoal  600        17          
0x002dcac5 1913880607 digoal  600        17          
0x002dcac6 1913913376 digoal  600        17          
0x002dcac7 1913946145 digoal  600        17          
0x002dcac8 1913978914 digoal  600        17          
0x002dcac9 1914011683 digoal  600        17          
0x002dcaca 1914044452 digoal  600        17          
0x002dcacb 1914077221 digoal  600        17          
0x002dcacc 1914109990 digoal  600        17          
0x002dcacd 1914142759 digoal  600        17          
0x002dcace 1914175528 digoal  600        17          
0x002dcacf 1914208297 digoal  600        17          
0x002dcad0 1914241066 digoal  600        17          
0x002dcad1 1914273835 digoal  600        17          
0x002dcad2 1914306604 digoal  600        17          
0x002dcad3 1914339373 digoal  600        17          
0x002dcad4 1914372142 digoal  600        17          
0x002dcad5 1914404911 digoal  600        17          
0x002dcad6 1914437680 digoal  600        17          
0x002dcad7 1914470449 digoal  600        17          
0x002dcad8 1914503218 digoal  600        17          
0x002dcad9 1914535987 digoal  600        17          
0x002dcada 1914568756 digoal  600        17          
0x002dcadb 1914601525 digoal  600        17          
0x002dcadc 1914634294 digoal  600        17          
0x002dcadd 1914667063 digoal  600        17          
0x002dcade 1914699832 digoal  600        17          
0x002dcadf 1914732601 digoal  600        17          
0x002dcae0 1914765370 digoal  600        17          
0x002dcae1 1914798139 digoal  600        17          
0x002dcae2 1914830908 digoal  600        17          
0x002dcae3 1914863677 digoal  600        17          
0x002dcae4 1914896446 digoal  600        17          
0x002dcae5 1914929215 digoal  600        17          
0x002dcae6 1914961984 digoal  600        17          
0x002dcae7 1914994753 digoal  600        17          
0x002dcae8 1915027522 digoal  600        17          
0x002dcae9 1915060291 digoal  600        17          
0x002dcaea 1915093060 digoal  600        17          
0x002dcaeb 1915125829 digoal  600        17          
0x002dcaec 1915158598 digoal  600        17          
0x002dcaed 1915191367 digoal  600        17          
0x002dcaee 1915224136 digoal  600        17          
0x002dcaef 1915256905 digoal  600        17          
0x002dcaf0 1915289674 digoal  600        17          
0x002dcaf1 1915322443 digoal  600        17          
0x002dcaf2 1915355212 digoal  600        17          
0x002dcaf3 1915387981 digoal  600        17          
0x002dcaf4 1915420750 digoal  600        17          
0x002dcaf5 1915453519 digoal  600        17          
0x002dcaf6 1915486288 digoal  600        17          
0x002dcaf7 1915519057 digoal  600        17          
0x002dcaf8 1915551826 digoal  600        17          
0x002dcaf9 1915584595 digoal  600        17          
0x002dcafa 1915617364 digoal  600        17          
0x002dcafb 1915650133 digoal  600        17          
0x002dcafc 1915682902 digoal  600        17          
0x002dcafd 1915715671 digoal  600        17          
0x002dcafe 1915748440 digoal  600        17          
0x002dcaff 1915781209 digoal  600        17          
0x002dcb00 1915813978 digoal  600        17          
0x002dcb01 1915846747 digoal  600        17          
0x002dcb02 1915879516 digoal  600        17          
0x002dcb03 1915912285 digoal  600        17          
0x002dcb04 1915945054 digoal  600        17          
0x002dcb05 1915977823 digoal  600        17          
0x002dcb06 1916010592 digoal  600        17          
0x002dcb07 1916043361 digoal  600        17          
0x002dcb08 1916076130 digoal  600        17          
0x002dcb09 1916108899 digoal  600        17          
0x002dcb0a 1916141668 digoal  600        17          
0x002dcb0b 1916174437 digoal  600        17          
0x002dcb0c 1916207206 digoal  600        17          
0x002dcb0d 1916239975 digoal  600        17          
0x002dcb0e 1916272744 digoal  600        17          
0x002dcb0f 1916305513 digoal  600        17          
0x002dcb10 1916338282 digoal  600        17          
0x002dcb11 1916371051 digoal  600        17          
0x002dcb12 1916403820 digoal  600        17          
0x002dcb13 1916436589 digoal  600        17          
0x002dcb14 1916469358 digoal  600        17          
0x002dcb15 1916502127 digoal  600        17          
0x002dcb16 1916534896 digoal  600        17          
0x002dcb17 1916567665 digoal  600        17          
0x002dcb18 1916600434 digoal  600        17          
0x002dcb19 1916633203 digoal  600        17          
0x002dcb1a 1916665972 digoal  600        17          
0x002dcb1b 1916698741 digoal  600        17          
0x002dcb1c 1916731510 digoal  600        17          
0x002dcb1d 1916764279 digoal  600        17          
0x002dcb1e 1916797048 digoal  600        17          
0x002dcb1f 1916829817 digoal  600        17          
0x002dcb20 1916862586 digoal  600        17          
0x002dcb21 1916895355 digoal  600        17          
0x002dcb22 1916928124 digoal  600        17          
0x002dcb23 1916960893 digoal  600        17          
0x002dcb24 1916993662 digoal  600        17          
0x002dcb25 1917026431 digoal  600        17          
0x002dcb26 1917059200 digoal  600        17          
0x002dcb27 1917091969 digoal  600        17          
0x002dcb28 1917124738 digoal  600        17          
0x002dcb29 1917157507 digoal  600        17          
0x002dcb2a 1917190276 digoal  600        17          
0x002dcb2b 1917223045 digoal  600        17          
0x002dcb2c 1917255814 digoal  600        17          
  
------ Message Queues --------  
key        msqid      owner      perms      used-bytes   messages      
           

注意,隻清理37111這個PG執行個體建立的shared memory和sem,不要把别人的清理掉了。

1、清理共享記憶體段

$ipcrm -m 2007007232  
           

2、清理sem

ipcs|awk '{print "ipcrm -s "$2}'  
           
......  
ipcrm -s 1912995844  
ipcrm -s 1913028613  
ipcrm -s 1913061382  
ipcrm -s 1913094151  
ipcrm -s 1913126920  
......  
ipcrm -s 1917124738  
ipcrm -s 1917157507  
ipcrm -s 1917190276  
ipcrm -s 1917223045  
ipcrm -s 1917255814  
           

然後資料庫就幹淨的關閉了,可以啟動資料庫了。這樣就做到了不重新開機伺服器把資料庫從僵屍狀态複活。

pg_ctl start -D /home/digoal/pgdata  
           

https://github.com/digoal/blog/blob/master/201804/20180409_03.md#%E5%8F%82%E8%80%83 參考

http://www.cnblogs.com/zeushuang/archive/2012/11/29/2794375.html