oracle資料庫的systemstat dump生成trace檔案雖然比較簡單,但是怎麼從trace檔案中浩如煙海的資訊中提煉有用資訊,并作出分析診斷是一件技術活,下面收集、整理如何分析解讀systemstat dump産生的trace檔案。
如果要人工去解讀systemstat
dump生成的trace檔案,真是一件體力活,因為這些trace檔案動不動就幾百m甚至更大,它産生的跟蹤檔案包含了系統中所有程序的程序狀态等信
息。每個程序對應跟蹤檔案中的一段内容,反映該程序的狀态資訊,包括程序資訊,會話資訊,enqueues資訊(主要是lock的資訊),緩沖區的資訊和
該程序在sga區中持有的(held)對象的狀态等資訊。dump
systemstate産生的跟蹤檔案是從dump那一刻開始到dump任務完成之間一段事件内的系統内所有程序的資訊。
我們需要的是找到導緻問題出現的程序的資訊,然後根據程序的資訊判斷導緻問題出現的root cause,并在分析問題後解決問題。
幸好網上有人寫了這個一個工具ass109.awk ,可以節約分析systemstat dump檔案或跟蹤檔案(trace
file)的時間,可以将trace檔案中關鍵資訊梳理、整理出來,當然如果了解詳細資訊,還是必須人工去解讀。下面貼上一個例子,是我在學習中的一個案
例,本人也是在學習、研究過程中,如有分析不對的地方敬請指出
從輸出資訊我們能判斷我當時做了3次系統狀态轉儲(實際也是執行了三次oradebug dump systemstate 266),從system state 2,我們可以看到有3個blocker,我們以其中部分資訊做分析
其中的一個blocker的latch是855675900,而且這個latch造成了程序20、17、16、19、14、21、15的waiting for 'sql*net message from client',從下面資訊可以看到hold住latch 855675900是oracle@xxxx (j000)程序,也就是job的程序。也就說,由于這個j000程序的異常,hold住了855675900 的latch。
to be able to read a systemstate, or navigate through a systemstate in order to identify what sessions are doing and ,
in the case of a waiting session, which session(s) hold the resource it requires
this document is intended for dbas.
how to use this document
each
wait scenario will be given , along with key points in the systemstate
which can be used to match to the corresponding entry
in
your own systemstates. it will then give you examples of matching
holders (ie what you need to find in the rest of the systemstate
to be able to identify who is blocking)
what is a systemstate ?
a
systemstate is made up of the processstate of each process in the
instance found at the time the systemstate was called for. each
processtate is made up of so (state objects) which hold details of the state of current objects owned by each process.
how to navigate through a systemstate
what you need to do is start by determining what most session are waiting for (or in the case of a session you know
is blocked, the process number of the process). so - you will now have either a process xx or a , for example, 'latch free'
which you need to begin with. what you then need to do is navigate (through vi or a windows editor) and find either
process xx or the first example of 'latch free'. if you are using process xx then you now need to find what the process
is waiting for. you will now have :-
process xx waits for yyyyyyy
what you then need to do is find, by using this guide, the so for the resource the session is waiting for and then find (by
searching back from that point) the process xx of that session. you now have:-
process xx waits for yyyyyyy
process yy holds yyyyyyy
you then begin again, finding the resource it is waiting
for (if any) and that resources holder. eventually you will come to a process which is on the cpu (last waited) or you will
have navigated back to a process you already know about. in the case of the process which is on the cpu you will then need
to
get an errorstack to determine why it is blocking. in the case of a
'deadlock' you will now have a dependency tree of the form:-
process yy holds yyyyyyy and waits for zzzzzzzz
process zz holds zzzzzzz ... etc etc
common wait scenarios and corresponding entries
1 - enqueues
process 41
...
waiting for 'enq: tx - row lock contention' blocking sess=0x39b3a5c90 seq=152 wait_time=0 seconds since wait
started=796
name|mode=54580006, usn< 54580006 is split into ascii 54 + ascii 58 (tx) + mode 0006 (x) ...
to find more details on the enqueue, simply do a search for the string 'req:' (searching down)
so: 39ad80d60, type: 5, owner: 393cb85e0, flag: init/-/-/0x00
(enqueue) tx-00020009-0001fa04 did: 0001-0029-00000090
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 flag: 0x6
res: 39aef20c8, req: x, prv: 39aef20e8, own: 39b383aa8, sess: 39b383aa8, proc: 39b7384f0
now you have the enqueue name as a string(tx-00020009-0001fa04) which you can use to search for the holder:-
(enqueue) tx-00020009-0001fa04 did: 0001-002e-00000014
res: 39aef20c8, mode: x, prv: 39aef20d8, own: 39b3a5c90, sess: 39b3a5c90, proc: 39b73ac78
we can see we hold the enqueue (mode:x) in a incompatible mode to the req:x request...
2 - rowcache locks
process 19:
waiting for 'row cache lock' blocking sess=0x0 seq=2174 wait_time=0
cache id=7, mode=0, request=3 *
we do not hold it currently (mode=0), but want it in shared (mode=3) ...
--------------------------------------------------------------------------------
so: 7000000c6de7678, type: 48, owner: 7000000a6c97cf8, flag: init/-/-/0x00
row cache enqueue: count=1 session=7000000a660b8b0 object=7000000eedc13a0, request=s*here we see the request is shared(s)
savepoint=2148
row cache parent object: address=7000000eedc13a0 cid=7(dc_users)*dc_users is the cache type indicated by 7
hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002
own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6deed98] mode=x *the holder has it in this mode
status=valid/-/-/-/-/-/-/-/-
request=n release=true flags=0
to find the holder, search for object,mode of holder ( ie object=7000000eedc13a0, mode=x):-
so: 7000000c6de84e8, type: 48, owner: 7000000c42297a0, flag: init/-/-/0x00
row cache enqueue: count=1 session=7000000a6702710 object=7000000eedc13a0, mode=x*this confirms the mode we thought the holder had (x)
savepoint=109
row cache parent object: address=7000000eedc13a0 cid=7(dc_users)
own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6df1b08] mode=x
instance lock id=qh 00000440 00000000
set=0, complete=false
set=1, complete=false
set=2, complete=false
data=
3 - library cache pins (10g - mutexes)
process 16:
waiting for 'cursor: pin s wait on x' blocking sess=0x0 seq=58849 wait_time=0 seconds since wait started=0
idn=535d1a6c, value=c1600000000, where|sleeps=5003f2428
to find more details use the idn=xxxxxx to search down in the systemstate (ie idn=535d1a6c)
kgx atomic operation log 7000002e5b9d160
mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper get_shrd *we can see (a) that sid 3094 holds it (3094,0) and (b) we want it in shared (get_shrd)
cursor pin uid 2489 efd 0 whr 5 slp 58733
opr=2 pso=70000028c47def0 flg=0
pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0
ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600
hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8
to find the holder, search for idn xxxxxxx oper until you find one which is held (ie not get_xxx)( ie idn 535d1a6c oper):-
kgx atomic operation log 7000002cd934270
mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper excl *we can see sid 3094 holds in exclusive (excl)
cursor pin uid 3094 efd 0 whr 7 slp 0
opr=3 pso=7000002a71c4180 flg=0
4 - library cache pins (pre 10g - non mutex)
process 20:
waiting for 'library cache pin' blocking sess=0x0 seq=575 wait_time=0
handle address=c00000006c0f8490, pin address=c0000000689b19a8, 10*mode+namespace=14
to find more details use the handle=xxxxxx to search down in the systemstate (ie handle=c00000006c0f8490) and you will see a 'request' line
so: c0000000689b19a8, type: 34, owner: c00000006cf85e80, flag: init/-/-/0x00
library object pin: pin=c0000000689b19a8 handle=c00000006c0f8490 request=s lock=c00000006d00e218 *we can see we want it in shared (s)
user=c00000005eeafeb0 session=c00000005eeafeb0 count=0 mask=0000 savepoint=17 flags=[00]
to find the holder, search for 'handle=xxxxxxxxx mode' oper until you find one which is held in an incompatible mode( ie handle=c00000006c0f8490 mode):-
so: c00000006b1f4780, type: 34, owner: c0000000699758e8, flag: init/-/-/0x00
library object pin: pin=c00000006b1f4780 handle=c00000006c0f8490 mode=x lock=c00000006b6c40a0 *we hold it in exclusive (x)
user=c00000005edf0f48 session=c00000005edf0f48 count=1 mask=0001 savepoint=49 flags=[00]
5 - library cache lock
process 35:
waiting for 'library cache lock' blocking sess=0x0 seq=35844 wait_time=0 seconds since wait started=14615
handle address=70000030de975a8, lock address=70000026947e190, 100*mode+namespace=12d
to find more details use the handle address in the form handle=address to search down in the systemstate (ie handle=70000030de975a8)
so: 70000026947e190, type: 53, owner: 700000308d726f0, flag: init/-/-/0x00
library object lock: lock=70000026947e190 handle=70000030de975a8 request=x *we want it in exclusive (x)
call pin=0 session pin=0 hpc=0000 hlc=0000
htl=70000026947e210[7000002b333ffe8,7000002b333ffe8] htb=7000002b333ffe8 ssga=7000002b333f2a0
user=700000307a7ca68 session=700000307a7ca68 count=0 flags=[0000] savepoint=0x23e411
library object handle: handle=70000030de975a8 mtx=70000030de976d8(0) cdp=0
name=acselp.poliza *this is the object we are trying to lock
to find the holder, search for 'handle=xxxxxxxxxx mode=' until you find one which is held (but not in null)( ie handle=70000030de975a8 mode=):-
so: 700000288b03ae0, type: 53, owner: 7000002cc697468, flag: init/-/-/0x00
library object lock: lock=700000288b03ae0 handle=70000030de975a8 mode=s *we hold in in shared (s)
htl=700000288b03b60[7000002a179a1a8,7000002b3800878] htb=7000002b3800878 ssga=7000002b37ffb30
user=70000030fafab00 session=70000030fafab00 count=1 flags=[0000] savepoint=0x417
name=acselp.poliza *this confirms the object
6 - latch free
process 8:
waiting for 'latch free' blocking sess=0x0 seq=4577 wait_time=0
address=99ff60018, number=9d, tries=0 *9d is the latch# from v$latchname in hex
if you look towards the top of the process dump you will see the exact latch we are waiting for and even who holds it:
waiting for 99ff60018 child library cache level=5 child#=3
location from where latch is held: kglic: child
context saved from call: 26
state=busy
possible holder pid = 127 ospid=23086 *this tell us process 127 (ospid:23086) holds it
wtr=99ff60018, next waiter 9993858b8
so - process 127 holds it. if we now go to process 127 we will see :-
holding 99ff60018 child library cache level=5 child#=3
other useful information
if
you wish to find what object a handle refers to then use the
handle=xxxxxxxxxx until you come across the library object handle. ie handle=c00000006c0f8490:-
library object handle: handle=c00000006c0f8490
name=select user from dual *this is the name of the handle
hash=cd1ceca0 timestamp=03-23-2007 09:00:00
namespace=crsr flags=ron/tim/pn0/sml/[12010000]*it is a cursor (crsr).. but we can tell that by the name!