近日遇到一個問題,asm的磁盤組無法挂載,之前是正常的,由于一些其他的操作,資料庫啟動失敗,當問題排除時候,發現在資料庫整體啟動時,挂載磁盤組的環節出現問題。
環境介紹
#########################################
硬體:vmware esx虛拟機
os: red hat linux 5
oracle version: 11.2.0.2
asm disk是通過 asmlib挂載的
這個磁盤組隻有一個虛拟出的硬碟,是 /dev/sdb1.
下面是我整個分析的過程
1. 首先通過asm alert.log,發現如下錯誤,磁盤挂載失敗,無法找到磁盤組
sql> alter diskgroup data mount
note: cache registered group data number=1 incarn=0xc28a1e2d
note: cache began mount (first) of group data number=1 incarn=0xc28a1e2d
tue dec 11 18:06:55 2012
error: no pst quorum in group: required 2, found 0 <<<<<<<<<<<
note: cache dismounting (clean) group 1/0xc28a1e2d (data)
note: dbwr not being msg'd to dismount
note: lgwr not being msg'd to dismount
note: cache dismounted group 1/0xc28a1e2d (data)
note: cache ending mount (fail) of group data number=1 incarn=0xc28a1e2d
note: cache deleting context for group data 1/0xc28a1e2d
gmon dismounting group 1 at 8 for pid 17, osid 32163
error: diskgroup data was not mounted
ora-15032: not all alterations performed
ora-15017: diskgroup "data" cannot be mounted
ora-15063: asm discovered an insufficient number of disks for diskgroup "data"
error: alter diskgroup data mount
2. 首先檢查asm pfile 檔案,未發現異常
asm_diskgroups='data'
instance_type='asm'
large_pool_size=12m
remote_login_passwordfile='exclusive'
3. 嘗試通過以下指令檢查磁盤是否實體存在,是如何對應實體裝置的,發現查詢不到asm磁盤
[grid@lgto_test ~]$ kfod disks=all
----non output----
[grid@lgto_test peer]$ cd /dev/oracleasm/disks/
[grid@lgto_test disks]$ ls
[grid@lgto_test disks]$ /etc/init.d/oracleasm listdisks
4. 但是直接檢查實體裝置,/dev/sdb1是存在的,說明os已經識别該硬碟裝置,隻是asmlib無法正常識别:
查詢對應的實體硬碟
[oracle@oel ~]$ /etc/init.d/oracleasm querydisk -d disk1 disk "disk1" is a valid asm disk on device [8,17]
[oracle@oel ~]$ ls -l /dev/ |grep 8|grep 17 brw-r----- 1 root disk 8, 17 oct 16 14:01 sdb1
[root@lgto_test ~]# ls -lst /dev/sd*
0 brw-r----- 1 root disk 8, 0 dec 11 19:29 /dev/sda
0 brw-r----- 1 root disk 8, 2 dec 11 19:29 /dev/sda2
0 brw-r----- 1 root disk 8, 16 dec 11 19:29 /dev/sdb
0 brw-r----- 1 root disk 8, 17 dec 11 19:29 /dev/sdb1 <<<<<<<this is the missed diskgroup
0 brw-r----- 1 root disk 8, 1 dec 11 11:29 /dev/sda1
5. 起先是考慮是否是磁盤頭損害,導緻無法asmlib識别該磁盤, dump磁盤頭發現沒有問題.
#od -c /dev/sdb1
……
0000040 o r c l d i s k d a t a d g 0 1
7760040 o r c l d i s k d a t a d g 0 1
這裡補充下,如果磁盤頭資訊丢失,将會顯示如下
0000040 o r c l d i s k \0 \0 \0 \0 \0 \0 \0 \0
如果顯示這個結果,需要通過以下方式renamedisk,具體可以參考文檔oracleasm listdisks cannot see disks (doc id 392527.1)
use the "oracleasm renamedisk" utility to add an asmlib label to the disk:
/etc/init.d/oracleasm renamedisk /dev/<device> <asmlib_label>
if it fails, use the "-f" switch:
/etc/init.d/oracleasm renamedisk -f /dev/<device> <asmlib_label>
6. 重新開機asmlib ,檢查是否是asmlib 問題
[root@lgto_test ~]# /etc/init.d/oracleasm restart
dropping oracle asmlib disks:
[ ok ]
shutting down the oracle asmlib driver: [failed]
檢查檔案系統oracleasm檔案系統已經成功挂載
[root@lgto_test ~]# df -ha
filesystem size used avail use% mounted on
oracleasmfs 0 0 0 - /dev/oracleasm
7. 檢查 /dev/sdb1狀态,檢視是否已經marked為asm disk,顯示已經标記成功
[root@lgto_test ~]# oracleasm querydisk /dev/sdb1
device "/dev/sdb1" is marked an asm disk with the label "datadg01"
[root@lgto_test ~]# /sbin/service oracleasm scandisks
scanning the system for oracle asmlib disks:
[root@lgto_test ~]# /etc/init.d/oracleasm listdisks
----non output---
8. 檢查 rpm package也沒有問題
[grid@lgto_test ~]$ rpm -qa|grep oracleasm
oracleasmlib-2.0.4-1.el5
oracleasm-support-2.1.7-1.el5
oracleasm-2.6.18-308.el5-2.0.5-1.el5
9. 收集 kfed logs,沒有檢查到異常新資訊。
[root@lgto_test ~]# /oracle/ora11g/product/app/grid/bin/kfed read /dev/sdb1
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: kfbtyp_diskhead
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: t=0 numb=0x0
kfbh.block.obj: 2147483648 ; 0x008: type=0x8 numb=0x0
kfbh.check: 3351358462 ; 0x00c: 0xc7c1abfe
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: orcldiskdatadg01 ; 0x000: length=16
kfdhdb.driver.reserved[0]: 1096040772 ; 0x008: 0x41544144
kfdhdb.driver.reserved[1]: 825247556 ; 0x00c: 0x31304744
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum: 0 ; 0x024: 0x0000
kfdhdb.grptyp: 1 ; 0x026: kfdgtp_external
kfdhdb.hdrsts: 3 ; 0x027: kfdhdr_member
kfdhdb.dskname: datadg01 ; 0x028: length=8
kfdhdb.grpname: data ; 0x048: length=4
kfdhdb.fgname: datadg01 ; 0x068: length=8
kfdhdb.capname: ; 0x088: length=0
kfdhdb.crestmp.hi: 32977140 ; 0x0a8: hour=0x14 days=0x7 mnth=0xc year=0x7dc
kfdhdb.crestmp.lo: 1642529792 ; 0x0ac: usec=0x0 msec=0x1c1 secs=0x1e mins=0x18
kfdhdb.mntstmp.hi: 32977140 ; 0x0b0: hour=0x14 days=0x7 mnth=0xc year=0x7dc
kfdhdb.mntstmp.lo: 1664549888 ; 0x0b4: usec=0x0 msec=0x1c1 secs=0x33 mins=0x18
kfdhdb.secsize: 512 ; 0x0b8: 0x0200
kfdhdb.blksize: 4096 ; 0x0ba: 0x1000
kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact: 113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize: 204797 ; 0x0c4: 0x00031ffd
kfdhdb.pmcnt: 3 ; 0x0c8: 0x00000003
kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001
kfdhdb.altlocn: 2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002
kfdhdb.redomirrors[0]: 0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]: 0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]: 0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]: 0 ; 0x0de: 0x0000
kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi: 32977140 ; 0x0e4: hour=0x14 days=0x7 mnth=0xc year=0x7dc
kfdhdb.grpstmp.lo: 1642390528 ; 0x0e8: usec=0x0 msec=0x139 secs=0x1e mins=0x18
kfdhdb.vfstart: 0 ; 0x0ec: 0x00000000
kfdhdb.vfend: 0 ; 0x0f0: 0x00000000
kfdhdb.spfile: 58 ; 0x0f4: 0x0000003a
kfdhdb.spfflg: 1 ; 0x0f8: 0x00000001
kfdhdb.ub4spare[0]: 0 ; 0x0fc: 0x00000000
kfdhdb.ub4spare[1]: 0 ; 0x100: 0x00000000
kfdhdb.ub4spare[2]: 0 ; 0x104: 0x00000000
kfdhdb.ub4spare[3]: 0 ; 0x108: 0x00000000
kfdhdb.ub4spare[4]: 0 ; 0x10c: 0x00000000
kfdhdb.ub4spare[5]: 0 ; 0x110: 0x00000000
kfdhdb.ub4spare[6]: 0 ; 0x114: 0x00000000
kfdhdb.ub4spare[7]: 0 ; 0x118: 0x00000000
kfdhdb.ub4spare[8]: 0 ; 0x11c: 0x00000000
kfdhdb.ub4spare[9]: 0 ; 0x120: 0x00000000
kfdhdb.ub4spare[10]: 0 ; 0x124: 0x00000000
kfdhdb.ub4spare[11]: 0 ; 0x128: 0x00000000
kfdhdb.ub4spare[12]: 0 ; 0x12c: 0x00000000
kfdhdb.ub4spare[13]: 0 ; 0x130: 0x00000000
kfdhdb.ub4spare[14]: 0 ; 0x134: 0x00000000
kfdhdb.ub4spare[15]: 0 ; 0x138: 0x00000000
kfdhdb.ub4spare[16]: 0 ; 0x13c: 0x00000000
kfdhdb.ub4spare[17]: 0 ; 0x140: 0x00000000
kfdhdb.ub4spare[18]: 0 ; 0x144: 0x00000000
kfdhdb.ub4spare[19]: 0 ; 0x148: 0x00000000
kfdhdb.ub4spare[20]: 0 ; 0x14c: 0x00000000
kfdhdb.ub4spare[21]: 0 ; 0x150: 0x00000000
kfdhdb.ub4spare[22]: 0 ; 0x154: 0x00000000
kfdhdb.ub4spare[23]: 0 ; 0x158: 0x00000000
kfdhdb.ub4spare[24]: 0 ; 0x15c: 0x00000000
kfdhdb.ub4spare[25]: 0 ; 0x160: 0x00000000
kfdhdb.ub4spare[26]: 0 ; 0x164: 0x00000000
kfdhdb.ub4spare[27]: 0 ; 0x168: 0x00000000
kfdhdb.ub4spare[28]: 0 ; 0x16c: 0x00000000
kfdhdb.ub4spare[29]: 0 ; 0x170: 0x00000000
kfdhdb.ub4spare[30]: 0 ; 0x174: 0x00000000
kfdhdb.ub4spare[31]: 0 ; 0x178: 0x00000000
kfdhdb.ub4spare[32]: 0 ; 0x17c: 0x00000000
kfdhdb.ub4spare[33]: 0 ; 0x180: 0x00000000
kfdhdb.ub4spare[34]: 0 ; 0x184: 0x00000000
kfdhdb.ub4spare[35]: 0 ; 0x188: 0x00000000
kfdhdb.ub4spare[36]: 0 ; 0x18c: 0x00000000
kfdhdb.ub4spare[37]: 0 ; 0x190: 0x00000000
kfdhdb.ub4spare[38]: 0 ; 0x194: 0x00000000
kfdhdb.ub4spare[39]: 0 ; 0x198: 0x00000000
kfdhdb.ub4spare[40]: 0 ; 0x19c: 0x00000000
kfdhdb.ub4spare[41]: 0 ; 0x1a0: 0x00000000
kfdhdb.ub4spare[42]: 0 ; 0x1a4: 0x00000000
kfdhdb.ub4spare[43]: 0 ; 0x1a8: 0x00000000
kfdhdb.ub4spare[44]: 0 ; 0x1ac: 0x00000000
kfdhdb.ub4spare[45]: 0 ; 0x1b0: 0x00000000
kfdhdb.ub4spare[46]: 0 ; 0x1b4: 0x00000000
kfdhdb.ub4spare[47]: 0 ; 0x1b8: 0x00000000
kfdhdb.ub4spare[48]: 0 ; 0x1bc: 0x00000000
kfdhdb.ub4spare[49]: 0 ; 0x1c0: 0x00000000
kfdhdb.ub4spare[50]: 0 ; 0x1c4: 0x00000000
kfdhdb.ub4spare[51]: 0 ; 0x1c8: 0x00000000
kfdhdb.ub4spare[52]: 0 ; 0x1cc: 0x00000000
kfdhdb.ub4spare[53]: 0 ; 0x1d0: 0x00000000
kfdhdb.acdb.aba.seq: 0 ; 0x1d4: 0x00000000
kfdhdb.acdb.aba.blk: 0 ; 0x1d8: 0x00000000
kfdhdb.acdb.ents: 0 ; 0x1dc: 0x0000
kfdhdb.acdb.ub2spare: 0 ; 0x1de: 0x0000
10. 階段總結,通過以上的分析,得出以下總結
1. asmlib 正常
2. rpm包正常
3. 磁盤頭沒有損壞和丢失資訊
4. 該硬體已經被系統正常識别
目前問題就是為什麼asmlib 不能正常掃描并識别到該硬碟
11. 最後在檢查檔案 /etc/sysconfig/oracleasm時,發現問題,我們需要掃描到的磁盤是/dev/sdb1,可是在這個配置檔案中卻排除掃描sdb*的磁盤,和我們希望的是相悖的,将oracleasm_scanexclude="" 設為空,并重新開機asmlib,最後問題解決。
[grid@lgto_test disks]$ more /etc/sysconfig/oracleasm
# oracleasm_enabeled: 'true' means to load the driver on boot.
oracleasm_enabled=true
# oracleasm_uid: default user owning the /dev/oracleasm mount point.
oracleasm_uid=grid
# oracleasm_gid: default group owning the /dev/oracleasm mount point.
oracleasm_gid=asmadmin
# oracleasm_scanboot: 'true' means scan for asm disks on boot.
oracleasm_scanboot=true
# oracleasm_scanorder: matching patterns to order disk scanning
oracleasm_scanorder="mapper mpath"
# oracleasm_scanexclude: matching patterns to exclude disks from scan
oracleasm_scanexclude="sdb" <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
12. 重新開機asmlib并确認磁盤狀态
# /etc/init.d/oracleasm restart
# /sbin/service oracleasm scandisks