天天看点

oracle之 RAC Interconnect之HAIP

0、 背景

Oracle 从11.2.0.2开始引入了一个新特性叫做Redundant Interconnect,简称HAIP。HAIP的目的用来代替操作系统级别的网卡绑定以实现Active-Active的模式进行数据传输。一来可以实现传统操作系统网卡绑定带来的故障转移的功能,另一方面则可以更加充分利用其负载均衡的特性最大程度的减少因为gc等待带来的性能问题。

HAIP的历史可以追溯到Oracle 10g时代,那个时候CRS中就已经包含了HAIP的雏形. 在安装10g安装CRS的时候,在选择私有网络的时候可以选择多个私有网卡, 虽然官方文档中没有提及,但是很多Oracle的销售甚至工程师都宣称其可以提供高可用性,但实际测试中却往往不尽如人意。

从11.2.0.2这一功能开始得到完善,最终形成了HAIP。 我们可以看到在GI升级到11.2.0.2以后,会自动生成一个叫做ora.cluster_interconnect.haip的资源(这也是haip名字的来历),管理这个资源的是ohasd.bin进程。其对应的log位于$GRID_HOME/log//ohasd/ohasd.log 以及GRID_HOME/log//agent/ohasd/orarootagent_root/orarootagent_root.log这两个位置。在HAIP资源online以后,通过操作系统命令ifconfig -a就能查看到多了类似与eth0:1的虚拟网卡,HAIP地址为169.254.X.X, 当然也可以在数据库级别查看V_$CLUSTER_INTERCONNECTS视图HAIP的地址。HAIP对应的地址由系统自动分配,无法由用户手工进行指定。

由于HAIP使用的是169.254.X.X的地址段,所以在GI准备升级到11.2.0.2+以前都需要检查此地址段是否已经被占用,否则可能会遇到一些意想不到的情况,最终导致GI无法启动而整个升级则不得不失败告终。一个比较典型的情况是在IBM的IMM web interface就是使用这个地址段的。那么Oracle为啥偏偏要选择这个地址段呢?简单的回答就是因为169.254.X.X地址段是预留的ip地址段。 此地址段用于ip地址的​​autoconfiguration​​​以及link localaddress, 并且早已经被RFC标准化——RFC3927。对这个感兴趣的读者可访问​​Zero configuration networking​​​, ​​Link local address​​​和​​RFC3927​​来获取更多相关信息。

HAIP对于Oracle Clusterware以及RDBMS的要求很严格。这两者的版本都需要在11.2.0.2以上。简单的举个例子,有这么一种组合: GI的版本在11.2.0.3, RDBMS的版本在10.2.0.5。这种模式下虽然心跳网卡上会生成169.254.X.X地址段的虚地址,但实际上低于11.2.0.2的RDBMS是无法感知到这个地址的存在的,所以也无法使用HAIP提供的高可用特性。另外请注意在早期的HAIP的文档中并没有强制要求使用HAIP的私有网卡地址必须在同一个子网,结果导致了很多问题——尤其是在AIX平台,现在新版的文档/MOS已经对此明确要求。

值得一提的是早期的HAIP存在不少bug。在MOS note 11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip [ID 1210883.1] 上提供了大多数已知的HAIP,请读者自行阅读,我这里简单的Bug号方便由于某些原因暂时无法登录MOS的读者:(注意其中两个note虽然并非HAIP本身的bug,但是同样与HAIP相关)

bug 12674817

Bug 10332426

Bug 10363902

Bug 10357258

Bug 10397652

Bug 10253028

Bug 9795321

Bug 11077756

Bug 12546712

Note 1366211.1

bug 10114953

Note 1447517.1

HAIP无法被禁用,当然某些不支持HAIP的平台例如Microsoft Windows除外。如果用户使用的是操作系统级别的绑定或者没有使用私网的绑定,则可以通过在rdbms和asm的init.ora/spfile中设置CLUSTER_INTECONNECTS指定私网地址将HAIP覆盖(如果有多个私网地址,请用英文:分隔),虽然说HAIP本身依然存在,但是ASM实例和RDBMS实例以后就不会使用HAIP。

就鄙人看来,HAIP到目前为止还太新,仅仅是一个“看上去很美”的特性,Oracle总是幻想实现一切不依赖于操作系统的内建高可用特性。但是和Oracle大多数看上去很诱人新特性一样,想法很好, 实现很糟糕。当然在11.2.0.4出来以后HAIP Bug基本会修复完,到时候再去使用HAIP也不迟,目前推荐使用操作系统级别的绑定理由只有一个——因为它更稳定可靠。下一篇将主要介绍OS级别的故障转移。

1. HAIP简介

Oracle从11.2.0.2开始引入了一个新特性网络冗余技术HAIP。HAIP的目的用来代替操作系统级别的网卡绑定以实现Active-Active的模式进行数据传输。一来可以实现传统操作系统网卡绑定带来的故障转移的功能,另一方面则可以更加充分利用其负载均衡的特性最大程度的减少因为gc等待带来的性能问题。

如果更多的网络适配器被指定,clusterware可以一次激活最多4个专用网络适配器。ora.cluster_interconnect.haip 将为Oracle RAC、Oracle ASM、Oracle ACFS等启用一至四个连接本地HAIP的互联通信网络适配器,注意,如果存在sun cluster,HAIT特性将在11.2.0.2中禁用。

Grid将自动选择连接本地保留地址169.254.*.*作为HAIP子网,并且不会尝试适用任何169.254.*.*地址,如果它已经被在用于其它目的使用。由于HAIP,在默认情况下,网络流量将被所有活动的网络接口负载均衡。并且如果其中一个失败或者变成不可连接状态,相应的HAIP地址将透明的转移到相对的其它网络适配器。

当Grid中启动集群中的第一个节点,HAIP地址数量是由有多少个私有网络适配器是活动状态所决定的。如果只有一个活跃的私有网络,那么Grid将创建一个,如果有两个,Grid将创建两个,如果大于两个,Grid将创建4个HAIPs.即使更多的私有网络适配器随后被激活,HAIPs的数量是不会改变的,要使得新的网络适配器变成活动状态,则要重启集群所有的节点。

2. HAIP服务

由于HAIP是默认开启的,使用169.254.*.*私有地址,可以通过oifcfg和ifconfig来查看:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

[grid@racdb01 ~]$ oifcfg iflist

eth0  172.19.17.0

eth1  192.168.1.0

eth1  169.254.0.0

[grid@racdb01 ~]$ oifcfg getif

eth1  192.168.1.0  global  cluster_interconnect

eth0  172.19.17.0  global  public

[oracle@racdb03 ~]$ ifconfig

eth0      Link encap:Ethernet  HWaddr 00:50:56:B2:19:64  

          inet addr:172.19.17.205  Bcast:172.19.17.255  Mask:255.255.255.0

          inet6 addr: fe80::250:56ff:feb2:1964/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:1994 errors:0 dropped:0 overruns:0 frame:0

          TX packets:477 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:183143 (178.8 KiB)  TX bytes:72031 (70.3 KiB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:B2:2F:EE  

          inet addr:192.168.1.205  Bcast:192.168.1.255  Mask:255.255.255.0

          inet6 addr: fe80::250:56ff:feb2:2fee/64 Scope:Link

          RX packets:55266 errors:0 dropped:0 overruns:0 frame:0

          TX packets:55911 errors:0 dropped:0 overruns:0 carrier:0

          RX bytes:25402212 (24.2 MiB)  TX bytes:28256359 (26.9 MiB)

eth1:1    Link encap:Ethernet  HWaddr 00:50:56:B2:2F:EE  

          inet addr:169.254.151.97  Bcast:169.254.255.255  Mask:255.255.0.0

eth2      Link encap:Ethernet  HWaddr 00:50:56:B2:1F:AE  

          inet addr:192.168.1.215  Bcast:192.168.1.255  Mask:255.255.255.0

          inet6 addr: fe80::250:56ff:feb2:1fae/64 Scope:Link

          RX packets:327 errors:0 dropped:0 overruns:0 frame:0

          TX packets:33 errors:0 dropped:0 overruns:0 carrier:0

          RX bytes:70358 (68.7 KiB)  TX bytes:4841 (4.7 KiB)

lo        Link encap:Local Loopback  

          inet addr:127.0.0.1  Mask:255.0.0.0

          inet6 addr: ::1/128 Scope:Host

          UP LOOPBACK RUNNING  MTU:16436  Metric:1

          RX packets:6763 errors:0 dropped:0 overruns:0 frame:0

          TX packets:6763 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:4258261 (4.0 MiB)  TX bytes:4258261 (4.0 MiB)

在数据库和ASM实例的启动阶段,可以从alert_.log中找到:

Cluster communication is configured to use the following interface(s) for this instance

169.254.151.97

在数据库中通过gv$cluster_interconnects视图可以显示haip的信息

SQL> select * from gv$cluster_interconnects;  

   INST_ID NAME            IP_ADDRESS       IS_ SOURCE

---------- --------------- ---------------- --- -------------------------------

         1 eth1:1          169.254.134.108  NO

         3 eth1:1          169.254.151.97   NO

         2 eth1:1          169.254.31.191   NO

而集群架构中的相关结构:

[grid@racdb01 ~]$ crsctl stat res -t -init | grep -1 ha

      1        ONLINE  ONLINE       racdb01                  Started            

ora.cluster_interconnect.haip

3. 添加新的专用网络适配器

下面我们对racdb01, racdb02, racdb03三个节点继续添加一块网卡eth02(地址分配192.168.1.211,192.168.1.213,192.168.1.215),然后通过oifcfg将这块网卡添加到ocr中。

[root@racdb03 ~]# oifcfg setif -global eth2/192.168.1.0:cluster_interconnect

[root@racdb03 ~]# oifcfg getif

eth2  192.168.1.0  global  cluster_interconnect

然后分别重启三个节点的集群服务,HAIP会自动生效:

在三个节点分别做完配置,重启集群,HAIP自动生效。

         3 eth1:1          169.254.79.164   NO

         3 eth2:1          169.254.135.126  NO

         2 eth1:1          169.254.42.101   NO

         2 eth2:1          169.254.165.198  NO

         1 eth1:1          169.254.30.165   NO

         1 eth2:1          169.254.208.93   NO

4. 模拟网络故障

断开racdb03的eth1:

# ifconfig eth1 down

数据库和ASM的alert中并未出现任何报错,说明IP地址已经进行了透明的偏移。

[参考 http://www.luocs.com/archives/281.html]

[root@racdb03 log]# oifcfg iflist

eth2  192.168.1.0

eth2  169.254.128.0

eth2  169.254.0.0

#ip a

4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

    link/ether 00:50:56:b2:1f:ae brd ff:ff:ff:ff:ff:ff

    inet 192.168.1.215/24 brd 192.168.1.255 scope global eth2

    inet 169.254.135.126/17 brd 169.254.255.255 scope global eth2:1

    inet 169.254.79.164/17 brd 169.254.127.255 scope global eth2:2

    inet6 fe80::250:56ff:feb2:1fae/64 scope link

       valid_lft forever preferred_lft forever

查看ohasd的日志$GRID_HOME/log/ohasd/ohasd.log

2013-08-12 13:45:34.441: [GIPCHGEN][2995967744]gipchaInterfaceFail: marking interface failing 0x7f56701b7b50 { host '', haName 'CLSFRAME_racdb-cluster', local (nil), ip '192.168.1.205:46662', subnet '192.168.1.0', mask '255.255.255.0', mac '00-50-56-b2-2f-ee', ifname 'eth1', numRef 0, numFail 0, idxBoot 0, flags 0x184d }

2013-08-12 13:45:34.714: [GIPCHGEN][3627513600]gipchaInterfaceDisable: disabling interface 0x7f56701b7b50 { host '', haName 'CLSFRAME_racdb-cluster', local (nil), ip '192.168.1.205:46662', subnet '192.168.1.0', mask '255.255.255.0', mac '00-50-56-b2-2f-ee', ifname 'eth1', numRef 0, numFail 0, idxBoot 0, flags 0x19cd }

2013-08-12 13:45:34.714: [GIPCHDEM][3627513600]gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x7f56701b7b50 { host '', haName 'CLSFRAME_racdb-cluster', local (nil), ip '192.168.1.205:46662', subnet '192.168.1.0', mask '255.255.255.0', mac '00-50-56-b2-2f-ee', ifname 'eth1', numRef 0, numFail 0, idxBoot 0, flags 0x19ed }

查看agent日志$GRID_HOME/log/racdb03/agent/ohasd/orarootagent_root,可以看到moving ip ‘169.254.79.164’ from inf ‘eth1′ to inf ‘eth2’:

2013-08-12 13:45:29.815: [ USRTHRD][3418347264]{0:0:2} failed to receive ARP request

2013-08-12 13:45:31.577: [ USRTHRD][3420448512]{0:0:2} HAIP:  Updating member info HAIP1;192.168.1.0#0

2013-08-12 13:45:31.579: [ USRTHRD][3420448512]{0:0:2} HAIP:  Moving ip '169.254.79.164' from inf 'eth1' to inf 'eth2'

2013-08-12 13:45:31.579: [ USRTHRD][3420448512]{0:0:2} pausing thread

2013-08-12 13:45:31.579: [ USRTHRD][3420448512]{0:0:2} posting thread

2013-08-12 13:45:31.579: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]start {

2013-08-12 13:45:31.579: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]start }

2013-08-12 13:45:31.579: [ USRTHRD][3395221248]{0:0:2} [NetHAWork] thread started

2013-08-12 13:45:31.579: [ USRTHRD][3395221248]{0:0:2}  Arp::sCreateSocket {

2013-08-12 13:45:31.597: [ USRTHRD][3395221248]{0:0:2}  Arp::sCreateSocket }

2013-08-12 13:45:32.097: [ USRTHRD][3395221248]{0:0:2} Starting Probe for ip 169.254.79.164

2013-08-12 13:45:32.097: [ USRTHRD][3395221248]{0:0:2} Transitioning to Probe State

2013-08-12 13:45:32.097: [ USRTHRD][3395221248]{0:0:2}  Arp::sProbe {

2013-08-12 13:45:32.097: [ USRTHRD][3395221248]{0:0:2} Arp::sSend:  sending type 1

2013-08-12 13:45:32.097: [ USRTHRD][3395221248]{0:0:2}  Arp::sProbe }

2013-08-12 13:45:33.135: [ USRTHRD][3395221248]{0:0:2}  Arp::sProbe {

2013-08-12 13:45:33.135: [ USRTHRD][3395221248]{0:0:2} Arp::sSend:  sending type 1

2013-08-12 13:45:33.135: [ USRTHRD][3395221248]{0:0:2}  Arp::sProbe }

2013-08-12 13:45:34.428: [ USRTHRD][3395221248]{0:0:2}  Arp::sProbe {

2013-08-12 13:45:34.428: [ USRTHRD][3395221248]{0:0:2} Arp::sSend:  sending type 1

2013-08-12 13:45:34.429: [ USRTHRD][3395221248]{0:0:2}  Arp::sProbe }

2013-08-12 13:45:34.429: [ USRTHRD][3395221248]{0:0:2} Transitioning to Announce State

2013-08-12 13:45:36.425: [ USRTHRD][3395221248]{0:0:2}  Arp::sAnnounce {

2013-08-12 13:45:36.425: [ USRTHRD][3395221248]{0:0:2} Arp::sSend:  sending type 1

2013-08-12 13:45:36.425: [ USRTHRD][3395221248]{0:0:2}  Arp::sAnnounce } :

这时打开eth1网卡来模拟地址恢复

ifconfig eth1 up

查看日志$GRID_HOME/log/racdb03/agent/ohasd/orarootagent_root,可以看到IP地址飘回:Moving ip ‘169.254.79.164’ from inf ‘eth2′ to inf ‘eth1′

50

51

2013-08-12 14:19:49.675: [ USRTHRD][3420448512]{0:0:2} HAIP:  Updating member info HAIP1;192.168.1.0#0;192.168.1.0#1

2013-08-12 14:19:49.676: [ USRTHRD][3420448512]{0:0:2} HAIP:  Moving ip '169.254.79.164' from inf 'eth2' to inf 'eth1'

2013-08-12 14:19:49.676: [ USRTHRD][3420448512]{0:0:2} pausing thread

2013-08-12 14:19:49.676: [ USRTHRD][3420448512]{0:0:2} posting thread

2013-08-12 14:19:49.676: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]start {

2013-08-12 14:19:49.677: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]start }

2013-08-12 14:19:49.677: [ USRTHRD][3418347264]{0:0:2} [NetHAWork] thread started

2013-08-12 14:19:49.677: [ USRTHRD][3418347264]{0:0:2}  Arp::sCreateSocket {

2013-08-12 14:19:49.692: [ USRTHRD][3418347264]{0:0:2}  Arp::sCreateSocket }

2013-08-12 14:19:50.192: [ USRTHRD][3418347264]{0:0:2} Starting Probe for ip 169.254.79.164

2013-08-12 14:19:50.192: [ USRTHRD][3418347264]{0:0:2} Transitioning to Probe State

2013-08-12 14:19:50.192: [ USRTHRD][3418347264]{0:0:2}  Arp::sProbe {

2013-08-12 14:19:50.192: [ USRTHRD][3418347264]{0:0:2} Arp::sSend:  sending type 1

2013-08-12 14:19:50.192: [ USRTHRD][3418347264]{0:0:2}  Arp::sProbe }

2013-08-12 14:19:51.688: [ USRTHRD][3418347264]{0:0:2}  Arp::sProbe {

2013-08-12 14:19:51.688: [ USRTHRD][3418347264]{0:0:2} Arp::sSend:  sending type 1

2013-08-12 14:19:51.688: [ USRTHRD][3418347264]{0:0:2}  Arp::sProbe }

2013-08-12 14:19:52.339: [ora.crf][3899438848]{0:0:2} [check] clsdmc_respget return: status=0, ecode=0

2013-08-12 14:19:52.339: [ora.crf][3899438848]{0:0:2} [check] Check return = 0, state detail = NULL

2013-08-12 14:19:52.713: [ USRTHRD][3418347264]{0:0:2}  Arp::sProbe {

2013-08-12 14:19:52.713: [ USRTHRD][3418347264]{0:0:2} Arp::sSend:  sending type 1

2013-08-12 14:19:52.713: [ USRTHRD][3418347264]{0:0:2}  Arp::sProbe }

2013-08-12 14:19:52.713: [ USRTHRD][3418347264]{0:0:2} Transitioning to Announce State

2013-08-12 14:19:54.718: [ USRTHRD][3418347264]{0:0:2}  Arp::sAnnounce {

2013-08-12 14:19:54.718: [ USRTHRD][3418347264]{0:0:2} Arp::sSend:  sending type 1

2013-08-12 14:19:54.718: [ USRTHRD][3418347264]{0:0:2}  Arp::sAnnounce }

[  clsdmc][3406784256]CLSDMC.C returnbuflen=8, extraDataBuf=E6, returnbuf=9801CEA0

2013-08-12 14:19:56.632: [ora.ctssd][3406784256]{0:0:2} [check] clsdmc_respget return: status=0, ecode=0, returnbuf=[0x7f8a9801cea0], buflen=8

2013-08-12 14:19:56.632: [ora.ctssd][3406784256]{0:0:2} [check] translateReturnCodes, return = 0, state detail = OBSERVERCheckcb data [0x7f8a9

801cea0]: mode[0xe6] offset[2 ms].

2013-08-12 14:19:56.720: [ USRTHRD][3418347264]{0:0:2}  Arp::sAnnounce {

2013-08-12 14:19:56.720: [ USRTHRD][3418347264]{0:0:2} Arp::sSend:  sending type 1

2013-08-12 14:19:56.720: [ USRTHRD][3418347264]{0:0:2}  Arp::sAnnounce }

2013-08-12 14:19:56.720: [ USRTHRD][3418347264]{0:0:2} Transitioning to Defend State

2013-08-12 14:19:57.220: [ USRTHRD][3418347264]{0:0:2} VipActions::startIp {

2013-08-12 14:19:57.220: [ USRTHRD][3418347264]{0:0:2} Adding 169.254.79.164 on eth1:1

2013-08-12 14:19:57.220: [ USRTHRD][3418347264]{0:0:2} VipActions::startIp }

2013-08-12 14:19:57.221: [ USRTHRD][3418347264]{0:0:2} Assigned IP:  169.254.79.164 on interface eth1

2013-08-12 14:19:57.677: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]stop {

2013-08-12 14:19:57.695: [ USRTHRD][3395221248]{0:0:2} [NetHAWork] thread stopping

2013-08-12 14:19:57.695: [ USRTHRD][3395221248]{0:0:2} Thread:[NetHAWork]isRunning is reset to false here

2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]stop }

2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} VipActions::stopIp {

2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} NetInterface::sStopIp {

2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} Stopping ip '169.254.79.164', inf 'eth2', mask '192.168.1.0'

2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} Stopping ip 169.254.79.164 on inf eth2:2

2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} NetInterface::sStopIp }

2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} VipActions::stopIp }

2013-08-12 14:19:57.710: [ USRTHRD][3420448512]{0:0:2} IptoClean '169.254.0.1', ip '169.254.0.0', mask '255.255.128.0'

2013-08-12 14:19:57.710: [ USRTHRD][3420448512]{0:0:2} USING HAIP[  0 ]:  eth1 - 169.254.79.164

2013-08-12 14:19:57.710: [ USRTHRD][3420448512]{0:0:2} USING HAIP[  1 ]:  eth2 - 169.254.135.126