天天看点

Keepalived双主模型中vrrp_script中权重改变故障排查

故障重现

keepalived配置如下

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

<code># vi /etc/keepalived/keepalived.conf</code>

<code>! Configuration File </code><code>for</code> <code>keepalived</code>

<code>global_defs {</code>

<code>   </code><code>notification_email {</code>

<code>         </code><code>root@localhost</code>

<code>   </code><code>}</code>

<code>   </code><code>notification_email_from [email protected]</code>

<code>   </code><code>smtp_connect_timeout 3</code>

<code>   </code><code>smtp_server 127.0.0.1</code>

<code>   </code><code>router_id LVS_DEVEL</code>

<code>}</code>

<code>vrrp_script chk_maintaince_down {</code>

<code>   </code><code>script </code><code>"[[ -f /etc/keepalived/down ]] &amp;&amp; exit 1 || exit 0"</code>

<code>   </code><code>interval 1</code>

<code>   </code><code>weight 2</code>

<code>vrrp_script chk_haproxy {</code>

<code>    </code><code>script </code><code>"killall -0 haproxy"</code>

<code>    </code><code>interval 1</code>

<code>    </code><code>weight 2</code>

<code>vrrp_instance VI_1 {</code>

<code>    </code><code>interface eth0</code>

<code>    </code><code>state MASTER</code>

<code>    </code><code>priority 100</code>

<code>    </code><code>virtual_router_id 125</code>

<code>    </code><code>garp_master_delay 1</code>

<code>    </code><code>authentication {</code>

<code>        </code><code>auth_type PASS</code>

<code>        </code><code>auth_pass 1e3459f77aba4ded</code>

<code>    </code><code>}</code>

<code>    </code><code>track_interface {</code>

<code>       </code><code>eth0</code>

<code>    </code><code>virtual_ipaddress {</code>

<code>        </code><code>172.16.25.10</code><code>/16</code> <code>dev eth0 label eth0:0</code>

<code>    </code><code>track_script {</code>

<code>        </code><code>chk_haproxy</code>

<code>        </code><code>chk_maintaince_down</code>

<code>    </code><code>notify_master </code><code>"/etc/keepalived/notify.sh master 172.16.25.10"</code>

<code>    </code><code>notify_backup </code><code>"/etc/keepalived/notify.sh backup 172.16.25.10"</code>

<code>    </code><code>notify_fault </code><code>"/etc/keepalived/notify.sh fault 172.16.25.10"</code>

<code>vrrp_instance VI_2 {</code>

<code>    </code><code>state BACKUP</code>

<code>    </code><code>priority 99</code>

<code>    </code><code>virtual_router_id 126</code>

<code>        </code><code>auth_pass 7615c4b7f518cede</code>

<code>        </code><code>172.16.25.11</code><code>/16</code> <code>dev eth0 label eth0:1</code>

<code>    </code><code>notify_master </code><code>"/etc/keepalived/notify.sh master 172.16.25.11"</code>

<code>    </code><code>notify_backup </code><code>"/etc/keepalived/notify.sh backup 172.16.25.11"</code>

<code>    </code><code>notify_fault </code><code>"/etc/keepalived/notify.sh fault 172.16.25.11"</code>

<code># vi /etc/keepalived/notify.sh</code>

<code>#!/bin/bash</code>

<code># Author: Jason.Yu &lt;[email protected]&gt;</code>

<code># description: An example of notify script</code>

<code>#</code>

<code>contact=</code><code>'root@localhost'</code>

<code>notify() {</code>

<code>    </code><code>mailsubject=</code><code>"`hostname` to be $1: $2 floating"</code>

<code>    </code><code>mailbody=</code><code>"`date '+%F %H:%M:%S'`: vrrp transition, `hostname` changed to be $1"</code>

<code>    </code><code>echo</code> <code>$mailbody | mail -s </code><code>"$mailsubject"</code> <code>$contact</code>

<code>case</code> <code>"$1"</code> <code>in</code>

<code>    </code><code>master)</code>

<code>        </code><code>notify master $2</code>

<code>        </code><code>/etc/rc</code><code>.d</code><code>/init</code><code>.d</code><code>/haproxy</code> <code>start</code>

<code>        </code><code>exit</code> <code>0</code>

<code>    </code><code>;;</code>

<code>    </code><code>backup)</code>

<code>        </code><code>notify backup $2</code>

<code>        </code><code>/etc/rc</code><code>.d</code><code>/init</code><code>.d</code><code>/haproxy</code> <code>stop</code>

<code>    </code><code>fault)</code>

<code>        </code><code>notify fault $2</code>

<code>    </code><code>*)</code>

<code>        </code><code>echo</code> <code>'Usage: `basename $0` {master|backup|fault}'</code>

<code>        </code><code>exit</code> <code>1</code>

<code>esac</code>

引发的故障1:keepalived宕机恢复后VIP集体漂移故障

引发的故障2:haproxy服务停止后重启VIP集体漂移故障

<a href="http://s3.51cto.com/wyfs02/M02/25/8B/wKiom1NjiNXxYZ09AAqSu-zGRV0109.jpg" target="_blank"></a>

原因

每次主备状态切换时,会引发notify_backup,而在notify.sh脚本中backup部分会执行/etc/rc.d/init.d/haproxy stop,导致权重在2个节点上都改变一次,从而单一节点上对于所有instance的权重都处于最大或者最小,故VIP集体漂移也就不奇怪了;

<a href="http://s3.51cto.com/wyfs02/M00/25/8C/wKioL1NjgxGDtI_tAANRGF5oM7E741.jpg" target="_blank"></a>

<a href="http://s3.51cto.com/wyfs02/M01/25/8C/wKioL1NjgzeAEzh0AAPRB17UL2E301.jpg" target="_blank"></a>

解决方法

修改notify.sh脚本,在处理backup部分,只发送通知邮件,而无需刻意停止haproxy服务;

<code>       </code><code># /etc/rc.d/init.d/haproxy stop # 注释掉或删除此行</code>

<code>        </code><code># /etc/rc.d/init.d/haproxy stop # 同上</code>

调整后的正常权重改变流程

<a href="http://s3.51cto.com/wyfs02/M00/25/8B/wKiom1Njg52TKUfZAAJwmR7i0yw788.jpg" target="_blank"></a>

vrrp_script中节点权重改变算法

vrrp_script 里的script返回值为0时认为检测成功,其它值都会当成检测失败;

weight 为正时,脚本检测成功时此weight会加到priority上,检测失败时不加;

主失败:

主 priority &lt; 从 priority + weight 时会切换。

主成功:

主 priority + weight &gt; 从 priority + weight 时,主依然为主

weight 为负时,脚本检测成功时此weight不影响priority,检测失败时priority – abs(weight)

主 priority – abs(weight) &lt; 从priority 时会切换主从

主成功:

主 priority &gt; 从priority 主依然为主

本文转自 xxrenzhe11 51CTO博客,原文链接:http://blog.51cto.com/xxrenzhe/1405571,如需转载请自行联系原作者

继续阅读