早晨收到報警資訊後,登陸資料庫執行ps -ef檢視oracle的背景程序都在,使用conn /as sysdba的方式登陸資料庫,提示連接配接到空閑的執行個體,使用easy connect 方式連接配接則報oracle執行個體無法配置設定記憶體,從報錯提示上看,就像oracle資料庫執行個體未打開的狀态!分析alert日志不斷出現如下錯誤資訊:
Process J002 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_cjq0_18577.trc:
trace檔案摘要資訊如下:
*** 2012-07-26 10:20:31.068
Process J002 is dead (pid=13857 req_ver=1136 cur_ver=1136 state=KSOSP_SPAWNED).
*** 2012-07-26 10:20:32.069
Process J002 is dead (pid=13876 req_ver=1594 cur_ver=1594 state=KSOSP_SPAWNED).
google查詢一番後,發現大部分描述和oracle的程序數設定有關,又或者是記憶體不足引起!于是在oracle 10g環境下測試,線上資料庫環境為11.2.0.3
一:設定processes初始化參數值為20,重新開機資料庫後,已經占用19個程序
SQL> select count(*) from v$process;
COUNT(*)
----------
19
SQL> show parameter process;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 20
新的會話連接配接,則報連接配接到空閑的執行個體,alert日志則出現相應的報錯
[root@db1 ~]# su - oracle
[oracle@db1 ~]$ sqlplus /nolog
SQL*Plus: Release 10.2.0.1.0 - Production on Wed Jul 4 13:50:22 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
SQL> conn /as sysdba
Connected to an idle instance.
[oracle@db1 dbs]$ tail -f /u01/app/oracle/admin/db1/bdump/alert_db1.log
Wed Jul 4 13:52:23 2012
ksvcreate: Process(q000) creation failed
Wed Jul 4 13:52:35 2012
Process q001 died, see its trace file
ksvcreate: Process(q001) creation failed
Wed Jul 4 13:52:37 2012
Process m000 died, see its trace file
ksvcreate: Process(m000) creation failed
二:于是想到監控oracle的程序和會話數來進一步确定問題
1:首先要對使用者進行顯示授權,否則後面建立存儲過程編譯将會報錯
SQL> grant select on V_$SESSION to hr;
Grant succeeded.
SQL> grant select on V_$PROCESS to hr;
Grant succeeded.
2:建表,用來存儲結果
SQL> create table session_monitor(time timestamp,session_count number,process_count number);
Table created.
3:建立存儲過程,将資料插入表
SQL> create or replace procedure proc_session
2 is
3 v_session number(8);
4 v_process number(8);
5 begin
6 select count(*) into v_session from v$session;
7 select count(*) into v_process from v$process;
8 insert into session_monitor values (sysdate,v_session,v_process);
9 commit;
10 end proc_session;
Procedure created.
4:建立任務
SQL> var job number;
SQL> begin
2 sys.dbms_job.submit(job => :job,
3 what => 'proc_session();',
4 next_date => sysdate,
5 interval => 'sysdate+2/1440');
6* end;
PL/SQL procedure successfully completed.
5:測試效果
SQL> exec proc_session;
PL/SQL procedure successfully completed.
SQL> alter session set nls_date_format='YYYY-MM-DD-HH24:MI:SS';
Session altered.
SQL> select * from session_monitor;
TIME SESSION_COUNT PROCESS_COUNT
---------------------------------------- ------------- -------------
26-JUL-12 03.02.12.000000 PM 140 155
26-JUL-12 03.02.14.000000 PM 141 157
SQL> select job,next_date from user_jobs where what='proc_session();';
JOB NEXT_DATE
---------- -------------------
145 2012-07-26-15:04:14
SQL> select * from session_monitor;
26-JUL-12 03.04.14.000000 PM 87 94
26-JUL-12 03.02.14.000000 PM 141 157
6:如果要删除任務,則運作下列的指令,145代表user_jobs視圖中的job列
2 dbms_job.remove(145);
3 end;
三:使用nagios+fetion,定時去監控會話和程序數
1:建立監控腳本,該腳本放任務計劃中運作,每2分鐘自動執行
[root@server240 libexec]# cat session_oracle.sh
#!/bin/sh
rm -rf /tmp/session_oracle.log
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/db1
/u01/app/oracle/product/11.2.0/db1/bin/sqlplus hr/[email protected]:1521/orcl <<EOF
set echo off
set feedback off
spool /tmp/session_oracle.log
alter session set nls_date_format='YYYY-MM-DD:HH24:MI:SS';
select session_count from (select * from session_monitor order by time desc ) where rownum=1;
select process_count from (select * from session_monitor order by time desc ) where rownum=1;
spool off
set echo on
set feedback on EOF
2:建立第二腳本,用來處理前面監控腳本的日志輸出,将結果傳回給監控伺服器
[root@server240 ~]# cat /tmp/session_oracle.log
SQL> alter session set nls_date_format='YYYY-MM-DD:HH24:MI:SS';
SQL> select session_count from (select * from session_monitor order by time desc ) where rownum=1;
SESSION_COUNT
-------------
138
SQL> select process_count from (select * from session_monitor order by time desc ) where rownum=1;
PROCESS_COUNT
153
SQL> spool off
[root@server240 libexec]# cat check_oracle_session.sh
STATE_OK=0
STATE_CRITICAL=2
if [ -f /tmp/session_oracle.log ];then
SESSION=$(grep -A 2 'SESSION_COUNT' /tmp/session_oracle.log |tail -1|sed 's/[ ][ ]*//g')
PROCESS=$(grep -A 2 'PROCESS_COUNT' /tmp/session_oracle.log |tail -1|sed 's/[ ][ ]*//g')
else
echo "something wrong,please check monitor script"
exit $STATE_CRITICAL
fi
if [ $SESSION -gt 500 ] || [ $PROCESS -gt 500 ];then
echo "Current session is $SESSION,process is $PROCESS "
echo "Current session is $SESSION,process is $PROCESS "
exit $STATE_OK
fi
四:實際效果
[root@server198 ~]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.240 -c check_oracle_session
Current session is 138,process is 153
<a target="_blank" href="http://blog.51cto.com/attachment/201207/172435717.jpg"></a>
五:後期觀察結果,估計和記憶體問題相關
[oracle@server240 ~]$ sar -r |grep 10:20
10:20:02 AM 3481492 21195164 85.89 675584 13661448 3683012 413552 10.10
[oracle@server240 ~]$ sar -r |grep 04:40
04:40:01 PM 2076748 22599908 91.58 734088 14581728 4048864 47700 1.16
<a target="_blank" href="http://blog.51cto.com/attachment/201207/172606316.jpg"></a>
本文轉自斬月部落格51CTO部落格,原文連結http://blog.51cto.com/ylw6006/943377如需轉載請自行聯系原作者
ylw6006