天天看點

oracle IO性能分析

我們在IO問題發生時間段:

1,可以查dba_hist_system_event 來看該IO相關等待時間的平均響應時間的變化趨勢,

2,直接查dba_hist_event_histogram或者sys.WRH$_EVENT_HISTOGRAM , 來看該IO相關等待事件的等待時間直方圖分布。

3, 我們也可以從整體資料庫的IO變化量上(dba_hist_sysstat), 看出問題時段的總體IO量, 比正常時候的IO量到底是否有增加。

4, 我們也可以從某個IO function 即某個程序上面(dba_hist_iostat_function),看問題時間段某個程序的IO request次數、IO 的量。

比如說LGWR程序的寫request次數、 寫的量。

說明:

1, 如上視圖均可關聯dba_hist_snapshot

2, 所有dba_hist 相關視圖都是由對應的v$ 動态性能視圖彙聚而來, 當時的資料可以查對應的v$視圖。

福利: 後附sql。

例2:

直方圖分布可能更加精确, 因為檢視平均響應時間可能會因為粒度太粗、時間跨度長而被平均了。

正常情況下,應該<8ms的占據了90%甚至更多, > 32ms 的應該占比接近為0。

可以看到, 昨天smscp異常時候,在16 – 32 ms 之間的已經占比很多了。

END_INTERVAL_TIME    SNAP_ID    TIME_HISTG    TOTAL_WAIT    PER_OF_WAITS           

1 22-10月-14 03.00.49.801 下午 20536 <1ms 89464 0.71

2 22-10月-14 03.00.49.801 下午 20536 <2ms 89464 11.32

3 22-10月-14 03.00.49.801 下午 20536 <4ms 89464 26.39

4 22-10月-14 03.00.49.801 下午 20536 <8ms 89464 18.28

5 22-10月-14 03.00.49.801 下午 20536 <16ms 89464 13.16

6 22-10月-14 03.00.49.801 下午 20536 <32ms 89464 8.21

7 22-10月-14 03.00.49.801 下午 20536 <64ms 89464 7.56

8 22-10月-14 03.00.49.801 下午 20536 <128ms 89464 6.06

9 22-10月-14 03.00.49.801 下午 20536 <256ms 89464 4.72

10 22-10月-14 03.00.49.801 下午 20536 <512ms 89464 2.72

11 22-10月-14 03.00.49.801 下午 20536 <1s 89464 0.83

12 22-10月-14 03.00.49.801 下午 20536 <2s 89464 0.03

13 22-10月-14 03.00.49.801 下午 20536 <4s 89464 0

14 22-10月-14 03.00.49.801 下午 20536 >=4s 89464 0

正常情況下是這樣的,8ms以上的占比很少:

END_INTERVAL_TIME    SNAP_ID    TIME_HISTG    TOTAL_WAIT    PER_OF_WAITS           

1 22-10月-14 07.00.49.701 下午 20540 <1ms 1091640 38.28

2 22-10月-14 07.00.49.701 下午 20540 <2ms 1091640 51.09

3 22-10月-14 07.00.49.701 下午 20540 <4ms 1091640 9.03

4 22-10月-14 07.00.49.701 下午 20540 <8ms 1091640 1.37

5 22-10月-14 07.00.49.701 下午 20540 <16ms 1091640 0.17

6 22-10月-14 07.00.49.701 下午 20540 <32ms 1091640 0.04

7 22-10月-14 07.00.49.701 下午 20540 <64ms 1091640 0.01

8 22-10月-14 07.00.49.701 下午 20540 <128ms 1091640 0

9 22-10月-14 07.00.49.701 下午 20540 <256ms 1091640 0

10 22-10月-14 07.00.49.701 下午 20540 <512ms 1091640 0

11 22-10月-14 07.00.49.701 下午 20540 <1s 1091640 0

12 22-10月-14 07.00.49.701 下午 20540 <2s 1091640 0

13 22-10月-14 07.00.49.701 下午 20540 <4s 1091640 0

14 22-10月-14 07.00.49.701 下午 20540 >=4s 1091640 0

例3:

可以看到昨天下午的整庫的physical read total IO requests 并沒有明顯增加, 但是physical read total bytes 增加到了3000GB 一個小時。

其它時候也有這麼高的,但是都是在晚上, 隻有昨天是在下午15點這個生産時間的snapshot,開發有大量direct path read 是以對應用産生了較大影響。

例4:可以看到問題發生時段,10.22 14點-15點, LGWR的large_write_megabytes 和requests有明顯增加, 而 平時有如此高峰的是有晚上和淩晨。

Smll read 的bytes和request 都沒有增加。

附sql:

--- 查等待事件的等待時間直方圖分布:

select b.end_interval_time,a.*

from

(select snap_id,

decode(wait_time_milli,
          1,
          '<1ms',
          2,
          '<2ms',
          4,
          '<4ms',
          8,
         '<8ms',
         16,
         '<16ms',
         32,
         '<32ms',
         64,
         '<64ms',
         128,
         '<128ms',
         256,
         '<256ms',
         512,
         '<512ms',
         1024,
         '<1s',
         2048,
         '<2s',
         4096,
         '<4s',
         8192,
         '>=4s',
         wait_time_milli) time_histg,
  total_wait,
  round(delta_wait_count / total_wait * 100, 2) as per_of_waits           

from (select snap_id,

wait_time_milli,
          delta_wait_count,
          sum(delta_wait_count) over(partition by snap_id order by snap_id) as total_wait
     from (select xx.snap_id,
                  xx.wait_time_milli,
                  xx.wait_count - yy.wait_count as delta_wait_count
             from dba_hist_event_histogram xx,
                  dba_hist_event_histogram yy
            where xx.event_name = 'log file parallel write'
              and xx.event_id = yy.event_id
              and xx.snap_id = yy.snap_id + 1
              and xx.instance_number = yy.instance_number
              and xx.wait_time_milli = yy.wait_time_milli
              and xx.snap_id in (20540)))           

order by snap_id, wait_time_milli) a , dba_hist_snapshot b

where a.snap_id=b.snap_id;

--- 查等待事件的平均等待時間的變化趨勢:

with t as (

select s.snap_id, s.instance_number, s.end_interval_time ,

total_waits - lag(total_waits, 1) over (partition by s.instance_number order by s.snap_id) waits,
     (time_waited_micro - lag(time_waited_micro, 1) over (partition by s.instance_number order by s.snap_id))/1000 twt
from dba_hist_system_event ev, dba_hist_snapshot s           

where ev.instance_number = s.instance_number and ev.snap_id = s.snap_id

and event_name = 'log file parallel write'
 and s.end_interval_time BETWEEN TO_DATE('20140401 10:00:00', 'yyyymmdd hh24:mi:ss') AND  TO_DATE('20140414 11:00:00', 'yyyymmdd hh24:mi:ss') )           

select to_char(end_interval_time,'YYYYMMDD HH24:MI'), instance_number, sum(waits), sum(twt), round(sum(twt)/sum(waits), 2) wt

from t

where (to_char(end_interval_time, 'hh24') between 9 and 12 or to_char(end_interval_time, 'hh24') between 14 and 17)

group by to_char(end_interval_time,'YYYYMMDD HH24:MI'), instance_number

order by 1, instance_number;

--- 查整庫IO相關名額:

--get_stat_10g.sql

with s as (

select sn.instance_number, sn.snap_id, sn.end_interval_time stime, (s.value - lag(s.value, 1) over (partition by sn.instance_number order by sn.snap_id)) value

--(sn.end_interval_time - lag(sn.end_interval_time, 1) over (order by sn.snap_id))2460*60 elap_time,

--round((s.value - lag(s.value, 1) over (order by sn.snap_id))/((sn.end_interval_time - lag(sn.end_interval_time, 1) over (order by sn.snap_id))2460*60)) value_s

from dba_hist_snapshot sn, dba_hist_sysstat s

where s.snap_id = sn.snap_id and s.stat_name = '&1' and s.instance_number = sn.instance_number

)

select instance_number , snap_id, stime, value ,

rank() over (partition by instance_number order by value desc) ranking, min(value) over (partition by instance_number) min_phy,

max(value) over (partition by instance_number) max_phy, round(avg(value) over (partition by instance_number)) avg_phy

from s

where (to_char(stime, 'hh24') between 9 and 12 or to_char(stime, 'hh24') between 14 and 17)

and to_char(stime, 'D') not in ('1', '7')

and stime between to_date(&2, 'yyyymmdd') and to_date(&3, 'yyyymmdd')

order by instance_number, snap_id

/

---eg: @get_stat_10g.sql 'parse count (hard)' 20100716 20100726

---physical read total IO requests

---physical read total bytes

---physical write total IO requests

---physical write total bytes

---以上四個名額即IOPS(requests)、IO吞吐量(bytes)

--- 查整庫IO相關名額2:

select e.snap_id,

to_char(e.startup_time, 'yyyy-mm-dd hh24:mi:ss') instance_startup_time,
   to_char(e.end_interval_time, 'yyyy-mm-dd hh24:mi:ss') snapshot_end_time,
   b.value - a.value IO_requests,
   round((d.value - c.value)/1024/1024/1024) IO_GB           

from dba_hist_sysstat a,

dba_hist_sysstat  b,
   dba_hist_sysstat  c,
   dba_hist_sysstat  d,
   dba_hist_snapshot e           

where a.stat_name = 'physical read total IO requests'

and b.stat_name = 'physical read total IO requests'

and c.stat_name = 'physical read total bytes'

and d.stat_name = 'physical read total bytes'

and a.snap_id = e.snap_id - 1

and b.snap_id = e.snap_id

and c.snap_id = e.snap_id - 1

and d.snap_id = e.snap_id

and e.end_interval_time BETWEEN

TO_DATE('20141021 09:00:00', 'yyyymmdd hh24:mi:ss') AND
   TO_DATE('20141023 18:00:00', 'yyyymmdd hh24:mi:ss')           

order by e.begin_interval_time;

--- 查某個IO 程序的讀寫名額, 例如LGWR:

SQL> select * from dba_hist_iostat_function_name;

DBID FUNCTION_ID FUNCTION_NAME           

---------- ----------- ------------------------------

1154030563 0 RMAN

1154030563 1 DBWR

1154030563 2 LGWR

1154030563 3 ARCH

1154030563 4 XDB

1154030563 5 Streams AQ

1154030563 6 Data Pump

1154030563 7 Recovery

1154030563 8 Buffer Cache Reads

1154030563 9 Direct Reads

1154030563 10 Direct Writes

1154030563 11 Smart Scan

1154030563 12 Archive Manager

1154030563 13 Others

14 rows selected

SQL>

--- 查某個IO程序例如LGWR的IO requset和IO bytes dba_hist_iostat_function:

to_char(e.end_interval_time, 'yyyy-mm-dd hh24:mi:ss') snapshot_end_time,
   b.SMALL_READ_MEGABYTES  - a.SMALL_READ_MEGABYTES     SMALL_READ_MEGABYTES   ,                                                                                                
   b.SMALL_WRITE_MEGABYTES - a.SMALL_WRITE_MEGABYTES    SMALL_WRITE_MEGABYTES  ,                                                                                              
   b.LARGE_READ_MEGABYTES  - a.LARGE_READ_MEGABYTES     LARGE_READ_MEGABYTES   ,                                                                                             
   b.LARGE_WRITE_MEGABYTES - a.LARGE_WRITE_MEGABYTES    LARGE_WRITE_MEGABYTES  ,                                                                                            
   b.SMALL_READ_REQS       - a.SMALL_READ_REQS          SMALL_READ_REQS        ,                                                                                           
   b.SMALL_WRITE_REQS      - a.SMALL_WRITE_REQS         SMALL_WRITE_REQS       ,                                                                                          
   b.LARGE_READ_REQS       - a.LARGE_READ_REQS          LARGE_READ_REQS        ,                                                                                         
   b.LARGE_WRITE_REQS      - a.LARGE_WRITE_REQS         LARGE_WRITE_REQS       ,                                                                                        
   b.NUMBER_OF_WAITS       - a.NUMBER_OF_WAITS          NUMBER_OF_WAITS        ,                                                                                       
   b.WAIT_TIME             - a.WAIT_TIME                WAIT_TIME                                                                                   

from dba_hist_iostat_function a,

dba_hist_iostat_function  b,
   dba_hist_snapshot e           

where a.FUNCTION_NAME = 'LGWR'

and b.FUNCTION_NAME = 'LGWR'

TO_DATE('20141021 09:00:00', 'yyyymmdd hh24:mi:ss') AND
   TO_DATE('20141023 18:00:00', 'yyyymmdd hh24:mi:ss')