天天看點

如何分析解讀systemstat dump産生的trc檔案

    oracle資料庫的systemstat dump生成trace檔案雖然比較簡單,但是怎麼從trace檔案中浩如煙海的資訊中提煉有用資訊,并作出分析診斷是一件技術活,下面收集、整理如何分析解讀systemstat dump産生的trace檔案。

如果要人工去解讀systemstat

dump生成的trace檔案,真是一件體力活,因為這些trace檔案動不動就幾百m甚至更大,它産生的跟蹤檔案包含了系統中所有程序的程序狀态等信

息。每個程序對應跟蹤檔案中的一段内容,反映該程序的狀态資訊,包括程序資訊,會話資訊,enqueues資訊(主要是lock的資訊),緩沖區的資訊和

該程序在sga區中持有的(held)對象的狀态等資訊。dump

systemstate産生的跟蹤檔案是從dump那一刻開始到dump任務完成之間一段事件内的系統内所有程序的資訊。

我們需要的是找到導緻問題出現的程序的資訊,然後根據程序的資訊判斷導緻問題出現的root cause,并在分析問題後解決問題。

幸好網上有人寫了這個一個工具ass109.awk ,可以節約分析systemstat dump檔案或跟蹤檔案(trace

file)的時間,可以将trace檔案中關鍵資訊梳理、整理出來,當然如果了解詳細資訊,還是必須人工去解讀。下面貼上一個例子,是我在學習中的一個案

例,本人也是在學習、研究過程中,如有分析不對的地方敬請指出

從輸出資訊我們能判斷我當時做了3次系統狀态轉儲(實際也是執行了三次oradebug dump systemstate 266),從system state 2,我們可以看到有3個blocker,我們以其中部分資訊做分析

如何分析解讀systemstat dump産生的trc檔案

其中的一個blocker的latch是855675900,而且這個latch造成了程序20、17、16、19、14、21、15的waiting for 'sql*net message from client',從下面資訊可以看到hold住latch 855675900是oracle@xxxx (j000)程序,也就是job的程序。也就說,由于這個j000程序的異常,hold住了855675900 的latch。

如何分析解讀systemstat dump産生的trc檔案

to be able to read a systemstate, or navigate through a systemstate in order to identify what sessions are doing and ,

in the case of a waiting session, which session(s) hold the resource it requires

this document is intended for dbas.

how to use this document 

each

wait scenario will be given , along with key points in the systemstate

which can be used to match to the corresponding entry 

in

your own systemstates. it will then give you examples of matching

holders (ie what you need to find in the rest of the systemstate 

to be able to identify who is blocking) 

what is a systemstate ? 

a

systemstate is made up of the processstate of each process in the

instance found at the time the systemstate was called for. each 

processtate is made up of so (state objects) which hold details of the state of current objects owned by each process. 

how to navigate through a systemstate

what you need to do is start by determining what most session are waiting for (or in the case of a session you know 

is blocked, the process number of the process). so - you will now have either a process xx or a , for example, 'latch free' 

which you need to begin with. what you then need to do is navigate (through vi or a windows editor) and find either 

process xx or the first example of 'latch free'. if you are using process xx then you now need to find what the process 

is waiting for. you will now have :-

process xx waits for yyyyyyy

what you then need to do is find, by using this guide, the so for the resource the session is waiting for and then find (by 

searching back from that point) the process xx of that session. you now have:- 

process xx waits for yyyyyyy 

process yy holds yyyyyyy 

you then begin again, finding the resource it is waiting 

for (if any) and that resources holder. eventually you will come to a process which is on the cpu (last waited) or you will 

have navigated back to a process you already know about. in the case of the process which is on the cpu you will then need 

to

get an errorstack to determine why it is blocking. in the case of a

'deadlock' you will now have a dependency tree of the form:- 

process yy holds yyyyyyy and waits for zzzzzzzz 

process zz holds zzzzzzz ... etc etc 

common wait scenarios and corresponding entries

1 - enqueues 

process 41

... 

waiting for 'enq: tx - row lock contention' blocking sess=0x39b3a5c90 seq=152 wait_time=0 seconds since wait

started=796

name|mode=54580006, usn< 54580006 is split into ascii 54 + ascii 58 (tx) + mode 0006 (x) ...

to find more details on the enqueue, simply do a search for the string 'req:' (searching down) 

so: 39ad80d60, type: 5, owner: 393cb85e0, flag: init/-/-/0x00

(enqueue) tx-00020009-0001fa04 did: 0001-0029-00000090

lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 flag: 0x6

res: 39aef20c8, req: x, prv: 39aef20e8, own: 39b383aa8, sess: 39b383aa8, proc: 39b7384f0

now you have the enqueue name as a string(tx-00020009-0001fa04) which you can use to search for the holder:-

(enqueue) tx-00020009-0001fa04 did: 0001-002e-00000014

res: 39aef20c8, mode: x, prv: 39aef20d8, own: 39b3a5c90, sess: 39b3a5c90, proc: 39b73ac78

we can see we hold the enqueue (mode:x) in a incompatible mode to the req:x request... 

2 - rowcache locks

process 19:

waiting for 'row cache lock' blocking sess=0x0 seq=2174 wait_time=0

cache id=7, mode=0, request=3 *

we do not hold it currently (mode=0), but want it in shared (mode=3) ... 

--------------------------------------------------------------------------------

so: 7000000c6de7678, type: 48, owner: 7000000a6c97cf8, flag: init/-/-/0x00

row cache enqueue: count=1 session=7000000a660b8b0 object=7000000eedc13a0, request=s*here we see the request is shared(s) 

savepoint=2148

row cache parent object: address=7000000eedc13a0 cid=7(dc_users)*dc_users is the cache type indicated by 7 

hash=2a057ebe typ=9 transaction=7000000c42297a0 flags=00000002

own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6deed98] mode=x *the holder has it in this mode

status=valid/-/-/-/-/-/-/-/-

request=n release=true flags=0

to find the holder, search for object,mode of holder ( ie object=7000000eedc13a0, mode=x):- 

so: 7000000c6de84e8, type: 48, owner: 7000000c42297a0, flag: init/-/-/0x00

row cache enqueue: count=1 session=7000000a6702710 object=7000000eedc13a0, mode=x*this confirms the mode we thought the holder had (x)

savepoint=109

row cache parent object: address=7000000eedc13a0 cid=7(dc_users)

own=7000000eedc1480[7000000c6de8518,7000000c6de8518] wat=7000000eedc1490[7000000c6de7568,7000000c6df1b08] mode=x

instance lock id=qh 00000440 00000000

set=0, complete=false

set=1, complete=false

set=2, complete=false

data=

3 - library cache pins (10g - mutexes)

process 16:

waiting for 'cursor: pin s wait on x' blocking sess=0x0 seq=58849 wait_time=0 seconds since wait started=0

idn=535d1a6c, value=c1600000000, where|sleeps=5003f2428

to find more details use the idn=xxxxxx to search down in the systemstate (ie idn=535d1a6c)

kgx atomic operation log 7000002e5b9d160

mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper get_shrd *we can see (a) that sid 3094 holds it (3094,0) and (b) we want it in shared (get_shrd)

cursor pin uid 2489 efd 0 whr 5 slp 58733

opr=2 pso=70000028c47def0 flg=0

pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0

ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600

hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8

to find the holder, search for idn xxxxxxx oper until you find one which is held (ie not get_xxx)( ie idn 535d1a6c oper):- 

kgx atomic operation log 7000002cd934270

mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper excl *we can see sid 3094 holds in exclusive (excl)

cursor pin uid 3094 efd 0 whr 7 slp 0

opr=3 pso=7000002a71c4180 flg=0

4 - library cache pins (pre 10g - non mutex)

process 20:

waiting for 'library cache pin' blocking sess=0x0 seq=575 wait_time=0

handle address=c00000006c0f8490, pin address=c0000000689b19a8, 10*mode+namespace=14

to find more details use the handle=xxxxxx to search down in the systemstate (ie handle=c00000006c0f8490) and you will see a 'request' line

so: c0000000689b19a8, type: 34, owner: c00000006cf85e80, flag: init/-/-/0x00 

library object pin: pin=c0000000689b19a8 handle=c00000006c0f8490 request=s lock=c00000006d00e218 *we can see we want it in shared (s)

user=c00000005eeafeb0 session=c00000005eeafeb0 count=0 mask=0000 savepoint=17 flags=[00]

to find the holder, search for 'handle=xxxxxxxxx mode' oper until you find one which is held in an incompatible mode( ie handle=c00000006c0f8490 mode):- 

so: c00000006b1f4780, type: 34, owner: c0000000699758e8, flag: init/-/-/0x00

library object pin: pin=c00000006b1f4780 handle=c00000006c0f8490 mode=x lock=c00000006b6c40a0 *we hold it in exclusive (x)

user=c00000005edf0f48 session=c00000005edf0f48 count=1 mask=0001 savepoint=49 flags=[00]

5 - library cache lock

process 35:

waiting for 'library cache lock' blocking sess=0x0 seq=35844 wait_time=0 seconds since wait started=14615

handle address=70000030de975a8, lock address=70000026947e190, 100*mode+namespace=12d

to find more details use the handle address in the form handle=address to search down in the systemstate (ie handle=70000030de975a8)

so: 70000026947e190, type: 53, owner: 700000308d726f0, flag: init/-/-/0x00

library object lock: lock=70000026947e190 handle=70000030de975a8 request=x *we want it in exclusive (x)

call pin=0 session pin=0 hpc=0000 hlc=0000

htl=70000026947e210[7000002b333ffe8,7000002b333ffe8] htb=7000002b333ffe8 ssga=7000002b333f2a0

user=700000307a7ca68 session=700000307a7ca68 count=0 flags=[0000] savepoint=0x23e411

library object handle: handle=70000030de975a8 mtx=70000030de976d8(0) cdp=0

name=acselp.poliza *this is the object we are trying to lock

to find the holder, search for 'handle=xxxxxxxxxx mode=' until you find one which is held (but not in null)( ie handle=70000030de975a8 mode=):- 

so: 700000288b03ae0, type: 53, owner: 7000002cc697468, flag: init/-/-/0x00

library object lock: lock=700000288b03ae0 handle=70000030de975a8 mode=s *we hold in in shared (s)

htl=700000288b03b60[7000002a179a1a8,7000002b3800878] htb=7000002b3800878 ssga=7000002b37ffb30

user=70000030fafab00 session=70000030fafab00 count=1 flags=[0000] savepoint=0x417

name=acselp.poliza *this confirms the object

6 - latch free

process 8: 

waiting for 'latch free' blocking sess=0x0 seq=4577 wait_time=0

address=99ff60018, number=9d, tries=0 *9d is the latch# from v$latchname in hex

if you look towards the top of the process dump you will see the exact latch we are waiting for and even who holds it:

waiting for 99ff60018 child library cache level=5 child#=3 

location from where latch is held: kglic: child

context saved from call: 26

state=busy

possible holder pid = 127 ospid=23086 *this tell us process 127 (ospid:23086) holds it

wtr=99ff60018, next waiter 9993858b8

so - process 127 holds it. if we now go to process 127 we will see :-

holding 99ff60018 child library cache level=5 child#=3 

other useful information

if

you wish to find what object a handle refers to then use the

handle=xxxxxxxxxx until you come across the library object handle. ie handle=c00000006c0f8490:-

library object handle: handle=c00000006c0f8490

name=select user from dual *this is the name of the handle

hash=cd1ceca0 timestamp=03-23-2007 09:00:00

namespace=crsr flags=ron/tim/pn0/sml/[12010000]*it is a cursor (crsr).. but we can tell that by the name!