天天看點

corosync+pacemaker+drbd實作mysql的高可用性

由于drbd核心子產品代碼隻在linux核心2.6.3.33以後的版本中才有,是以我們要同時安裝核心子產品和管理工具:

[root@node1 ~]# uname -r

2.6.18-164.el5

拓撲:

<a href="http://5493845.blog.51cto.com/attachment/201211/2/5493845_1351848027m4A0.png"></a>

ip位址規劃:

node1.a.com:192.168.1.4

node2.a.com:192.168.1.5

VIP:192.168.1.6

一:環境準備

Node1.a.com配置

1.位址配置

2.修改主機名:

[root@node1 ~]# vim /etc/sysconfig/network

NETWORKING=yes

NETWORKING_IPV6=no

HOSTNAME=node1.a.com

3.臨時配置主機名

[root@node1 ~]# hostname node1.zzdx.com

4.同步時間

[root@node1 ~]# hwclock –s

5.修改hosts檔案

[root@node1 ~]# vim /etc/hosts

5 192.168.1.4     node1.zzdx.com 

6 192.168.1.5     node2.zzdx.com

Node2.zzdx.com配置

2.修改主機名

[root@localhost ~]# vim /etc/sysconfig/network

  NETWORKING=yes

  NETWORKING_IPV6=no

  HOSTNAME=node2.zzdx.com

[root@localhost ~]# hostname node2.zzdx.com

[root@node2 ~]# hwclock –s

5 192.168.1.4 node1.zzdx.com 

6 192.168.1.5 node2.zzdx.com

Node1和node2上配置無障礙通訊:

Node1:

[root@node1 ~]# ssh-keygen -t rsa

[root@node1 ~]# ssh-copy-id -i .ssh/id_rsa.pub node2.zzdx.com

Node2:

[root@node2 ~]# ssh-keygen -t rsa

[root@node2 ~]# ssh-copy-id -i .ssh/id_rsa.pub node1.zzdx.com

二:下載下傳相關軟體包

1.軟體包在/root下

[root@node1 ~]# ll

total 162468

-rw-r--r-- 1 root root    271360 Oct 30 19:44 cluster-glue-1.0.6-1.6.el5.i386.rpm

-rw-r--r-- 1 root root    133254 Oct 30 19:44 cluster-glue-libs-1.0.6-1.6.el5.i386.rpm

-rw-r--r-- 1 root root    170052 Oct 30 19:44 corosync-1.2.7-1.1.el5.i386.rpm

-rw-r--r-- 1 root root    158502 Oct 30 19:44 corosynclib-1.2.7-1.1.el5.i386.rpm

-rw-r--r-- 1 root root    221868 Oct 30 19:46 drbd83-8.3.8-1.el5.centos.i386.rpm

-rw-r--r-- 1 root root    165591 Oct 30 19:44 heartbeat-3.0.3-2.3.el5.i386.rpm

-rw-r--r-- 1 root root    289600 Oct 30 19:44 heartbeat-libs-3.0.3-2.3.el5.i386.rpm

-rw-r--r-- 1 root root    125974 Oct 30 19:45 kmod-drbd83-8.3.8-1.el5.centos.i686.rpm

-rw-r--r-- 1 root root     60458 Oct 30 19:44 libesmtp-1.0.4-5.el5.i386.rpm

-rw-r--r-- 1 root root 162247449 Oct 30 19:47 mysql-5.5.15-linux2.6-i686.tar.gz

-rw-r--r-- 1 root root    207085 Oct 30 19:44 openais-1.1.3-1.6.el5.i386.rpm

-rw-r--r-- 1 root root     94614 Oct 30 19:45 openaislib-1.1.3-1.6.el5.i386.rpm

-rw-r--r-- 1 root root    796813 Oct 30 19:45 pacemaker-1.1.5-1.1.el5.i386.rpm

-rw-r--r-- 1 root root    207925 Oct 30 19:45 pacemaker-cts-1.1.5-1.1.el5.i386.rpm

-rw-r--r-- 1 root root    332026 Oct 30 19:45 pacemaker-libs-1.1.5-1.1.el5.i386.rpm

-rw-r--r-- 1 root root     32818 Oct 30 19:45 perl-TimeDate-1.16-5.el5.noarch.rpm

-rw-r--r-- 1 root root    388632 Oct 30 19:45 resource-agents-1.0.4-1.1.el5.i386.rpm

将軟體包拷貝到node2節點

[root@node1 ~]# scp *.rpm  node2.zzdx.com:/root

[root@node1 ~]# scp mysql-5.5.15-linux2.6-i686.tar.gz node2.zzdx.com:/root/

2.編輯本地yum

[root@node1 ~]# vim /etc/yum.repos.d/rhel-debuginfo.repo

  1 [rhel-server]

  2 name=Red Hat Enterprise Linux server

  3 baseurl=file:///mnt/cdrom/Server/

  4 enabled=1

  5 gpgcheck=1

  6 gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release

  7 [rhel-Cluster]

  8 name=Red Hat Enterprise Linux Cluster

  9 baseurl=file:///mnt/cdrom/Cluster

10 enabled=1

11 gpgcheck=1

12 gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release

13 [rhel-ClusterStorage]

14 name=Red Hat Enterprise Linux ClusterStorage

15 baseurl=file:///mnt/cdrom/ClusterStorage

16 enabled=1

17 gpgcheck=1

18 gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release

将yum拷貝到node2

[root@node1 ~]# scp /etc/yum.repos.d/rhel-debuginfo.repo node2.zzdx.com:/etc/yum.repos.d/

建立挂載點

[root@node1 ~]# mkdir /mnt/cdrom

[root@node1 ~]# mount /dev/cdrom /mnt/cdrom/

[root@node1 ~]# ssh node2.zzdx.com 'mkdir /mnt/cdrom '

[root@node1 ~]# ssh node2.zzdx.com 'mount /dev/cdrom /mnt/cdrom '

三:建立磁盤分區

[root@node1 ~]# fdisk /dev/sdb

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel

Building a new DOS disklabel. Changes will remain in memory only,

until you decide to write them. After that, of course, the previous

content won't be recoverable.

The number of cylinders for this disk is set to 2610.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

   (e.g., DOS FDISK, OS/2 FDISK)

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n    #建立分區

Command action

   e   extended

   p   primary partition (1-4)

P     #主分區

Partition number (1-4): 1

First cylinder (1-2610, default 1):

Using default value 1

Last cylinder or +size or +sizeM or +sizeK (1-2610, default 2610): +1G

Command (m for help): w   #儲存退出

The partition table has been altered!

Calling ioctl() to re-read partition table.

Syncing disks.

重新加載分區

[root@node1 ~]# partprobe /dev/sdb

[root@node1 ~]# cat /proc/partitions

[root@node2 ~]# fdisk /dev/sdb

Command (m for help): n

p

Command (m for help): w

重新加載分區:

[root@node2 ~]# partprobe /dev/sdb

[root@node2 ~]# cat /proc/partitions

四:安裝配置drbd

1.安裝drbd

[root@node1 ~]# yum localinstall -y drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm --nogpgcheck

[root@node2 ~]# yum localinstall drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm -y --nogpgcheck

2.加載drbd子產品:

[root@node1 ~]# modprobe drbd

[root@node1 ~]# lsmod |grep drbd

[root@node2 ~]# modprobe drbd

[root@node2 ~]# lsmod |grep drbd

3.修該配置檔案

[root@node1 ~]# cp -p /usr/share/doc/drbd83-8.3.8/drbd.conf /etc/

[root@node1 ~]# cd /etc/drbd.d/

[root@node1 drbd.d]# cp -p global_common.conf global_common.conf.bak

[root@node1 drbd.d]# vim global_common.conf

  1 global {

  2         usage-count yes;

  3         # minor-count dialog-refresh disable-ip-verification

  4 }

  5

  6 common {

  7         protocol C;

  8

  9         handlers {

11                 pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.    sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b &gt; /proc/sysrq-trigger     ; reboot -f";

12                 local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib    /drbd/notify-emergency-shutdown.sh; echo o &gt; /proc/sysrq-trigger ; halt -f    ";

13         }

14

15         startup {

16                 wfc-timeout 120;

17                 degr-wfc-timeout 100;

18         }

19

20         disk {

21                 on-io-error detach;

22         }

23

24         net {

25                 cram-hmac-alg "sha1";

26                 shared-secret "mydrbd123";

27         }

28

29         syncer {

30                 rate 100M;

31         }

32 }

[root@node1 drbd.d]# vim /etc/drbd.d/mysql.res

  1 resource mysql {

  2 on node1.zzdx.com {

  3 device /dev/drbd0;

  4 disk /dev/sdb1;

  5 address 192.168.1.4:7898;

  6 meta-disk internal;

  7 }

  8 on node2.zzdx.com {

  9 device /dev/drbd0;

10 disk /dev/sdb1;

11 address 192.168.1.5:7898;

12 meta-disk internal;

13 }

14 }

将檔案複制到node2上:

[root@node1 drbd.d]# scp /etc/drbd.conf node2.zzdx.com:/etc/

[root@node1 drbd.d]# scp /etc/drbd.d/* node2.zzdx.com:/etc/drbd.d/

4.檢測配置檔案,建立nfs資源

//分别在node1和node2上初始化定義的mysql的資源  

//檢測配置檔案(兩次執行如下指令)

[root@node1 drbd.d]# drbdadm adjust mysql

  --==  Thank you for participating in the global usage survey  ==--

The server's response is:

0: Failure: (119) No valid meta-data signature found.

==&gt; Use 'drbdadm create-md res' to initialize meta-data area. &lt;==

Command 'drbdsetup 0 disk /dev/sdb1 /dev/sdb1 internal --set-defaults --create-device --on-io-error=detach' terminated with exit code 10

drbdsetup 0 show:5: delay-probe-volume 0k =&gt; 0k out of range [4..1048576]k.

[root@node1 drbd.d]# drbdadm create-md mysql

Writing meta data...

initializing activity log

NOT initialized bitmap

New drbd meta data block successfully created.

[root@node2 ~]#  drbdadm create-md mysql

[root@node2 ~]#  ll /dev/drbd0

brw-rw---- 1 root root 147, 0 Oct 30 20:17 /dev/drbd0

5.啟動drbd服務

[root@node1 drbd.d]# service drbd start

[root@node2 ~]# service drbd start

6.檢視drbd狀态

[root@node1 ~]# service drbd status

drbd driver loaded OK; device status:

version: 8.3.8 (api:88/proto:86-94)

GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16

m:res    cs         ro                   ds                         p  mounted  fstype

0:mysql  Connected  Secondary/Secondary  Inconsistent/Inconsistent  C

[root@node1 ~]# drbd-overview

0:mysql  Connected Secondary/Secondary Inconsistent/Inconsistent C r----

[root@node2 ~]# service drbd status

0:mysql  Connected  Secondary/Secondary  Inconsistent/Inconsistent  C

[root@node2 ~]# drbd-overview

  0:mysql  Connected Secondary/Secondary Inconsistent/Inconsistent C r----

7.設定drbd的主節點

從上面的資訊中可以看出此時兩個節點均處于Secondary狀态。于是,我們接下來需要将其中一個節點設定為Primary,這裡将node1設定為主節點,故要在node1上執行如下指令:可以看到檔案同步過程。

[root@node1 ~]# drbdadm -- --overwrite-data-of-peer primary mysql

  0:mysql  Connected Primary/Secondary UpToDate/UpToDate C r----

  0:mysql  Connected Secondary/Primary UpToDate/UpToDate C r----

8.建立檔案系統(隻可以在primary節點上進行,這裡node1是primary節點)

[root@node1 ~]# mkfs -t ext3 /dev/drbd0

[root@node1 ~]# mkdir -pv /mnt/mysqldata

[root@node1 ~]# mount /dev/drbd0 /mnt/mysqldata/

[root@node1 ~]# cd /mnt/mysqldata/

[root@node1 mysqldata]# echo "123"&gt;f1

[root@node1 mysqldata]# cd

[root@node1 ~]# umount /mnt/mysqldata/

[root@node1 ~]# drbdadm secondary mysql

  0:mysql  Connected Secondary/Secondary UpToDate/UpToDate C r----

9.将node2設定為primary節點

[root@node2 ~]# mkdir -pv /mnt/mysqldata

[root@node2 ~]# drbdadm primary mysql

[root@node2 ~]# mount /dev/drbd0 /mnt/mysqldata/

[root@node2 ~]# cd /mnt/mysqldata/

[root@node2 mysqldata]# ll

total 20

-rw-r--r-- 1 root root     4 Oct 30  2012 f1

drwx------ 2 root root 16384 Oct 30  2012 lost+found

[root@node2 mysqldata]# cd

[root@node2 ~]# umount /mnt/mysqldata/

至此,drbd建立成功!!!

五:mysql的安裝與配置

1.将node1設定為primary節點

[root@node2 ~]# drbdadm secondary mysql

[root@node1 ~]#  drbdadm primary mysql

2.在node1上安裝mysql

[root@node1 ~]# groupadd -r mysql

[root@node1 ~]# useradd -g mysql -r mysql

[root@node1 ~]# mkdir -pv /mnt/mysqldata/data

[root@node1 ~]# chown -R mysql.mysql /mnt/mysqldata/data/

[root@node1 ~]# ll /mnt/mysqldata/

total 24

drwxr-xr-x 2 mysql mysql  4096 Oct 30 21:33 data

-rw-r--r-- 1 root  root      4 Oct 30 21:20 f1

drwx------ 2 root  root  16384 Oct 30 21:19 lost+found

[root@node1 ~]# tar -zxvf mysql-5.5.15-linux2.6-i686.tar.gz -C /usr/local/

[root@node1 ~]# cd /usr/local/

[root@node1 local]# ln -sv mysql-5.5.15-linux2.6-i686/ mysql

[root@node1 local]# cd mysql

[root@node1 mysql]# chown -R mysql:mysql .

[root@node1 mysql]# scripts/mysql_install_db --user=mysql --datadir=/mnt/mysqldata/data/

[root@node1 mysql]# chown -R root .

[root@node1 mysql]# cp support-files/my-large.cnf /etc/my.cnf

[root@node1 mysql]# vim /etc/my.cnf

39 thread_concurrency = 2

40 datadir = /mnt/mysqldata/data/

[root@node1 mysql]# cp support-files/mysql.server /etc/rc.d/init.d/mysqld

[root@node1 mysql]# scp /etc/my.cnf node2.zzdx.com:/etc/

[root@node1 mysql]# scp /etc/rc.d/init.d/mysqld node2.zzdx.com:/etc/rc.d/init.d/

[root@node1 mysql]# chkconfig --add mysqld

[root@node1 mysql]# chkconfig mysqld off

[root@node1 mysql]# chkconfig --list mysqld

mysqld          0:off 1:off 2:off 3:off 4:off 5:off 6:off

[root@node1 mysql]# service mysqld start

[root@node1 mysql]# ll /mnt/mysqldata/data/

total 28748

-rw-rw---- 1 mysql mysql  5242880 Oct 30 21:43 ib_logfile0

-rw-rw---- 1 mysql mysql  5242880 Oct 30 21:43 ib_logfile1

-rw-rw---- 1 mysql mysql 18874368 Oct 30 21:43 ibdata1

drwx------ 2 mysql root      4096 Oct 30 21:36 mysql

-rw-rw---- 1 mysql mysql      107 Oct 30 21:43 mysql-bin.000001

-rw-rw---- 1 mysql mysql       19 Oct 30 21:43 mysql-bin.index

-rw-rw---- 1 mysql root      1703 Oct 30 21:43 node1.zzdx.com.err

-rw-rw---- 1 mysql mysql        5 Oct 30 21:43 node1.zzdx.com.pid

drwx------ 2 mysql mysql     4096 Oct 30 21:36 performance_schema

drwx------ 2 mysql root      4096 Oct 30 21:36 test

[root@node1 mysql]# service mysqld stop

為了使用mysql的安裝符合系統使用規範,并将其開發元件導出給系統使用,這裡還需要進行如下步驟:輸出mysql的man手冊至man指令的查找路徑:添加如下行即可:

[root@node1 mysql]# vim /etc/man.config

 48 MANPATH /usr/local/mysql/man

輸出mysql的頭檔案至系統頭檔案路徑/usr/include,這可以通過簡單的建立連結實作:

[root@node1 mysql]# ln -sv /usr/local/mysql/include /usr/include/mysql

輸出mysql的庫檔案給系統庫查找路徑:(檔案隻要是在/etc/ld.so.conf.d/下并且字尾是.conf就可以)而後讓系統重新載入系統庫

[root@node1 mysql]# echo '/usr/local/mysql/lib' &gt;&gt; /etc/ld.so.conf.d/mysql.conf

[root@node1 mysql]# ldconfig -v |grep mysql

/usr/local/mysql/lib:

libmysqlclient.so.18 -&gt; libmysqlclient_r.so.18.0.0

修改PATH環境變量,讓系統所有使用者可以直接使用mysql的相關指令:

[root@node1 mysql]# vim /etc/profile

 58 PATH=$PATH:/usr/local/mysql/bin

[root@node1 mysql]#  . /etc/profile

[root@node1 mysql]# echo $PATH

/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/mysql/bin:/usr/local/mysql/bin

[root@node1 mysql]# umount /mnt/mysqldata/

3.将node2設定為primary節點,node1設定為secondary節點

[root@node1 mysql]# drbdadm secondary mysql

[root@node1 mysql]# drbd-overview

4.在node2上安裝mysql

[root@node2 ~]# groupadd -r mysql

[root@node2 ~]# useradd -g mysql -r mysql

[root@node2 ~]# ll /mnt/mysqldata/

drwxr-xr-x 5 mysql mysql  4096 Oct 30  2012 data

[root@node2 ~]# tar -zxvf mysql-5.5.15-linux2.6-i686.tar.gz -C /usr/local/

[root@node2 ~]# cd /usr/local/

[root@node2 local]# ln -sv mysql-5.5.15-linux2.6-i686/ mysql

[root@node2 local]# cd mysql

一定不能對資料庫進行初始化,因為我們在node1上已經初始化了:

[root@node2 mysql]# chown -R root:mysql .

mysql主配置檔案和sysc服務腳本已經從node1複制過來了,不用在添加。

[root@node2 mysql]# chkconfig --add mysqld

[root@node2 mysql]# chkconfig mysqld off

[root@node2 mysql]# chkconfig --list mysqld

[root@node2 mysql]# service mysqld start

Starting MySQL....                                         [  OK  ]

[root@node2 mysql]# ll /mnt/mysqldata/data/

total 28756

-rw-rw---- 1 mysql mysql  5242880 Oct 30 21:45 ib_logfile0

-rw-rw---- 1 mysql mysql 18874368 Oct 30 21:44 ibdata1

-rw-rw---- 1 mysql mysql      126 Oct 30 21:44 mysql-bin.000001

-rw-rw---- 1 mysql mysql      107 Oct 30 21:45 mysql-bin.000002

-rw-rw---- 1 mysql mysql       38 Oct 30 21:45 mysql-bin.index

-rw-rw---- 1 mysql root      2125 Oct 30 21:44 node1.zzdx.com.err

-rw-rw---- 1 mysql root       941 Oct 30 21:45 node2.zzdx.com.err

-rw-rw---- 1 mysql mysql        5 Oct 30 21:45 node2.zzdx.com.pid

[root@node2 mysql]# service mysqld stop      #測試mysql後關閉

為了使用mysql的安裝符合系統使用規範,并将其開發元件導出給系統使用,這裡還需要進行如下步驟: 輸出mysql的man手冊至man指令的查找路徑:添加如下行即可:

[root@node2 mysql]# vim /etc/man.config

[root@node2 mysql]# ln -sv /usr/local/mysql/include /usr/include/mysql

[root@node2 mysql]# echo '/usr/local/mysql/lib' &gt;&gt; /etc/ld.so.conf.d/mysql.conf

[root@node2 mysql]# ldconfig -v |grep mysql

[root@node2 mysql]#  vim /etc/profile

 59 PATH=$PATH:/usr/local/mysql/bin

[root@node2 mysql]#  . /etc/profile

[root@node2 mysql]# echo $PATH

/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/mysql/bin

[root@node2 mysql]# umount /mnt/mysqldata/

六:corosync+pacemaker的安裝和配置

1.安裝軟體包

[root@node1 mysql]# cd

[root@node1 ~]# yum install -y *.rpm --nogpgcheck

[root@node2 mysql]# cd

[root@node2 ~]# yum install -y *.rpm --nogpgcheck

2.對node1和node2進行配置

[root@node1 ~]# cd /etc/corosync/

[root@node1 corosync]# cp corosync.conf.example corosync.conf   #生成配置檔案

[root@node1 corosync]# vim corosync.conf

10                 bindnetaddr: 192.168.1.0

33 service {

34 ver: 0

35 name: pacemaker

36 use_mgmtd: yes

37 }

38 aisexec {

39 user: root

40 group: root

41 }

[root@node1 corosync]# mkdir -pv /var/log/cluster

[root@node1 corosync]# corosync-keygen

Corosync Cluster Engine Authentication key generator.

Gathering 1024 bits for key from /dev/random.

Press keys on your keyboard to generate entropy.

Writing corosync key to /etc/corosync/authkey.

将檔案拷貝到node2上(拷貝時一定要用-p):

[root@node1 corosync]# scp -p authkey corosync.conf node2.zzdx.com:/etc/corosync/

[root@node1 corosync]# ssh node2.zzdx.com  'mkdir -pv /var/log/cluster'

3.在節點node1和node2上進行檢測

1:在node1和node2節點上面啟動 corosync 的服務

[root@node1 corosync]# service corosync start

Starting Corosync Cluster Engine (corosync):               [  OK  ]

[root@node2 corosync]# service corosync start

2:在node1上驗證corosync引擎是否正常啟動了

[root@node1 corosync]# grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages

Oct 30 23:37:33 node1 corosync[1317]:   [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.

Oct 30 23:37:33 node1 corosync[1317]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'

3: 在node1上檢視初始化成員節點通知是否發出

[root@node1 corosync]# grep -i totem /var/log/messages

Oct 30 23:37:33 node1 corosync[1317]:   [TOTEM ] Initializing transport (UDP/IP).

Oct 30 23:37:33 node1 corosync[1317]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Oct 30 23:37:34 node1 corosync[1317]:   [TOTEM ] The network interface [192.168.1.4] is now up.

Oct 30 23:37:34 node1 corosync[1317]:   [TOTEM ] Process pause detected for 524 ms, flushing membership messages.

Oct 30 23:37:34 node1 corosync[1317]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.

Oct 30 23:38:40 node1 corosync[1317]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.

4: 在node1上檢查過程中是否有錯誤産生(避免stonith的錯誤)

[root@node1 corosync]#  grep -i error: /var/log/messages |grep -v unpack_resources

出現如下1個錯誤: 

Feb 7 22:51:43 node1 corosync[5149]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process mgmtd exited (pid=5161, rc=100)此處 

解決方法: 

仔細看了/var/log/messages日志,或者使用crm_verify -L檢查一下錯誤,其實沒必要解除安裝重裝。這個錯誤是由于缺少snoith裝置引起的,并不會影響corosync的運作。可以忽略這個錯誤。

[root@node1 corosync]# crm_verify -L

crm_verify[1359]: 2012/10/30_23:43:28 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined

crm_verify[1359]: 2012/10/30_23:43:28 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option

crm_verify[1359]: 2012/10/30_23:43:28 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity

Errors found during check: config not valid

  -V may provide more details

5: 在node1上檢查pacemaker時候已經啟動了?(如下顯示正常啟動)

[root@node1 corosync]# grep -i pcmk_startup /var/log/messages

Oct 30 23:37:34 node1 corosync[1317]:   [pcmk  ] info: pcmk_startup: CRM: Initialized

Oct 30 23:37:34 node1 corosync[1317]:   [pcmk  ] Logging: Initialized pcmk_startup

Oct 30 23:37:34 node1 corosync[1317]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 4294967295

Oct 30 23:37:34 node1 corosync[1317]:   [pcmk  ] info: pcmk_startup: Service: 9

Oct 30 23:37:34 node1 corosync[1317]:   [pcmk  ] info: pcmk_startup: Local hostname: node1.zzdx.com

1: 在node2上驗證corosync引擎是否正常啟動了

[root@node2 corosync]# grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages

Oct 30 23:27:32 node2 corosync[1242]:   [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.

Oct 30 23:27:32 node2 corosync[1242]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'

2: 在node2上檢視初始化成員節點通知是否發出

[root@node2 corosync]#  grep -i totem /var/log/messages

Oct 30 23:27:32 node2 corosync[1242]:   [TOTEM ] Initializing transport (UDP/IP).

Oct 30 23:27:32 node2 corosync[1242]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Oct 30 23:27:32 node2 corosync[1242]:   [TOTEM ] The network interface [192.168.1.5] is now up.

Oct 30 23:27:33 node2 corosync[1242]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.

3: 在node2上檢查過程中是否有錯誤産生(避免stonith的錯誤,如下顯示隻有stonith錯誤,可忽略)

[root@node2 corosync]# grep -i error: /var/log/messages |grep -v unpack_resources

Oct 30 23:27:33 node2 corosync[1242]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process mgmtd exited (pid=1254, rc=100

4: 在node2上檢查pacemaker時候已經啟動了

[root@node2 corosync]# grep -i pcmk_startup /var/log/messages

Oct 30 23:27:32 node2 corosync[1242]:   [pcmk  ] info: pcmk_startup: CRM: Initialized

Oct 30 23:27:32 node2 corosync[1242]:   [pcmk  ] Logging: Initialized pcmk_startup

Oct 30 23:27:32 node2 corosync[1242]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 4294967295

Oct 30 23:27:32 node2 corosync[1242]:   [pcmk  ] info: pcmk_startup: Service: 9

Oct 30 23:27:32 node2 corosync[1242]:   [pcmk  ] info: pcmk_startup: Local hostname: node2.zzdx.com

5.在node1和node2上檢測群集狀态

[root@node1 corosync]#  crm status

============

Last updated: Tue Oct 30 23:49:33 2012

Stack: openais

Current DC: node1.zzdx.com - partition with quorum

Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f

2 Nodes configured, 2 expected votes

0 Resources configured.

Online: [ node1.zzdx.com node2.zzdx.com ]

[root@node2 corosync]# crm status

Last updated: Tue Oct 30 23:38:57 2012

七:群集管理

1.配置群集的工作屬性

corosync預設啟用了stonith,而目前叢集并沒有相應的stonith裝置,是以此預設配置目前尚不可用,這可以通過如下指令先禁用stonith:

[root@node1 corosync]# cd

[root@node1 ~]# crm configure property stonith-enabled=false

[root@node2 corosync]# cd

[root@node2 ~]# crm configure property stonith-enabled=false

對于雙節點的叢集來說,我們要配置此選項來忽略quorum,即這時候票數不起作用,一個節點也能正常運作

[root@node1 ~]# crm configure property no-quorum-policy=ignore

[root@node2 ~]# crm configure property no-quorum-policy=ignore

定義資源的粘性值,使資源不能再節點之間随意的切換,因為這樣是非常浪費系統的資源的。

資源黏性值範圍及其作用:

0:這是預設選項。資源放置在系統中的最适合位置。這意味着當負載能力“較好”或較差的節點變得可用時才轉移資源。此選項的作用基本等同于自動故障回複,隻是資源可能會轉移到非之前活動的節點上;

大于0:資源更願意留在目前位置,但是如果有更合适的節點可用時會移動。值越高表示資源越願意留在目前位置;

小于0:資源更願意移離目前位置。絕對值越高表示資源越願意離開目前位置;

INFINITY:如果不是因節點不适合運作資源(節點關機、節點待機、達到migration-threshold 或配置更改)而強制資源轉移,資源總是留在目前位置。此選項的作用幾乎等同于完全禁用自動故障回複;

-INFINITY:資源總是移離目前位置;

我們這裡可以通過以下方式為資源指定預設黏性值:

[root@node1 ~]# crm configure rsc_defaults resource-stickiness=100

[root@node2 ~]# crm configure rsc_defaults resource-stickiness=100

2.檢視群集的狀态,将node1設定為primary

3.配置drbd的群集資源

1.檢視目前群集的配置資訊,確定已經配置全局屬性參數為兩節點群集所使用

  0:mysql  Connected Primary/Secondary UpToDate/UpToDate C r----

[root@node1 ~]# crm configure show

node node1.zzdx.com

node node2.zzdx.com

property $id="cib-bootstrap-options" \

dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \

cluster-infrastructure="openais" \

expected-quorum-votes="2" \

stonith-enabled="false" \

no-quorum-policy="ignore"

rsc_defaults $id="rsc-options" \

resource-stickiness="100"

2.将drbd設定為群集資源

[root@node1 ~]# service drbd stop  

Stopping all DRBD resources: .

[root@node1 ~]# chkconfig drbd off

[root@node1 ~]# ssh node2.zzdx.com 'service drbd stop'

[root@node1 ~]# ssh node2.zzdx.com  'chkconfig drbd off'

drbd not loaded

[root@node1 ~]# ssh node2.zzdx.com 'drbd-overview'

提供drbd的RA目前由OCF歸類為linbit,其路徑為/usr/lib/ocf/resource.d/linbit/drbd。我們可以使用如下指令來檢視此RA及RA的meta資訊:

[root@node1 ~]# crm ra classes

heartbeat

lsb

ocf / heartbeat linbit pacemaker

stonith

[root@node1 ~]# crm ra list ocf linbit

drbd

檢視drbd的資源代理的相關資訊:

[root@node1 ~]# crm ra info ocf:linbit:drbd

This resource agent manages a DRBD resource

as a master/slave resource. DRBD is a shared-nothing replicated storage

device. (ocf:linbit:drbd)

Master/Slave OCF Resource Agent for DRBD

Parameters (* denotes required, [] the default):

drbd_resource* (string): drbd resource name

    The name of the drbd resource from the drbd.conf file.

drbdconf (string, [/etc/drbd.conf]): Path to drbd.conf

    Full path to the drbd.conf file.

Operations' defaults (advisory minimum):

    start         timeout=240

    promote       timeout=90

    demote        timeout=90

    notify        timeout=90

    stop          timeout=100

    monitor_Slave interval=20 timeout=20 start-delay=1m

monitor_Master interval=10 timeout=20 start-delay=1m

drbd需要同時運作在兩個節點上,但隻能有一個節點(primary/secondary模型)是Master,而另一個節點為Slave;是以,它是一種比較特殊的叢集資源,其資源類型為多狀态(Multi-state)clone類型,即主機節點有Master和Slave之分,且要求服務剛啟動時兩個節點都處于slave狀态。

[root@node1 ~]# crm

crm(live)# configure

crm(live)configure#  primitive mysqldrbd ocf:heartbeat:drbd params drbd_resource="mysql" op monitor role="Master" interval="30s"  op monitor role="Slave" interval="31s" op start timeout="240s" op stop timeout="100s"

crm(live)configure# ms MS_mysqldrbd mysqldrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify="true"

crm(live)configure#  verify

crm(live)configure# commit

crm(live)configure# exit

Bye

檢視目前叢集運作狀态:

[root@node1 ~]# crm status

Last updated: Wed Oct 31 00:21:56 2012

1 Resources configured.

Master/Slave Set: MS_mysqldrbd [mysqldrbd]

     Masters: [ node1.zzdx.com ]

     Slaves: [ node2.zzdx.com ]

由上面的資訊可以看出此時的drbd服務的Primary節點為node1.junjie.com,Secondary節點為node2.junjie.com。當然,也可以在node1上使用如下指令驗正目前主機是否已經成為mysql資源的Primary節點

[root@node1 ~]# drbdadm role mysql

Primary/Secondary

我們實作将drbd設定自動挂載至/mysqldata目錄。此外,此自動挂載的叢集資源需要運作于drbd服務的Master節點上,并且隻能在drbd服務将某節點設定為Primary以後方可啟動。

確定兩個節點上的裝置已經解除安裝

[root@node1 ~]# umount /dev/drbd0

umount: /dev/drbd0: not mounted

[root@node2 ~]# umount /dev/drbd0

以下還在node1上操作:

crm(live)configure# primitive MysqlFS ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/mnt/mysqldata" fstype="ext3"  op start timeout=60s op stop timeout=60s

crm(live)configure# verify

4.定義mysql資源

[root@node1 ~]# crm configure primitive myip ocf:heartbeat:IPaddr params ip=192.168.1.6

[root@node1 ~]#  crm configure primitive mysqlserver lsb:mysqld

5.配置資源的各種限制

叢集擁有所有必需資源,但它可能還無法進行正确處理。資源限制則用以指定在哪些群集節點上運作資源,以何種順序裝載資源,以及特定資源依賴于哪些其它資源。pacemaker共給我們提供了三種資源限制方法:

1)Resource Location(資源位置):定義資源可以、不可以或盡可能在哪些節點上運作

2)Resource Collocation(資源排列):排列限制用以定義叢集資源可以或不可以在某個節點上同時運作

3)Resource Order(資源順序):順序限制定義叢集資源在節點上啟動的順序。

定義限制時,還需要指定分數。各種分數是叢集工作方式的重要組成部分。其實,從遷移資源到決定在已降級叢集中停止哪些資源的整個過程是通過以某種方式修改分數來實作的。分數按每個資源來計算,資源分數為負的任何節點都無法運作該資源。在計算出資源分數後,叢集選擇分數最高的節點。INFINITY(無窮大)目前定義為 1,000,000。加減無窮大遵循以下3個基本規則:

1)任何值 + 無窮大 = 無窮大

2)任何值 - 無窮大 = -無窮大

3)無窮大 - 無窮大 = -無窮大

定義資源限制時,也可以指定每個限制的分數。分數表示指派給此資源限制的值。分數較高的限制先應用,分數較低的限制後應用。通過使用不同的分數為既定資源建立更多位置限制,可以指定資源要故障轉移至的目标節點的順序。

我們要定義如下的限制:

scrm(live)configure# show

crm(live)configure# colocation MysqlFS_with_mysqldrbd inf: MysqlFS MS_mysqldrbd:Master myip mysqlserver

crm(live)configure# order MysqlFS_after_mysqldrbd inf: MS_mysqldrbd:promote MysqlFS:start

crm(live)configure# order myip_after_MysqlFS mandatory: MysqlFS myip

crm(live)configure# order mysqlserver_after_myip  mandatory: myip mysqlserver

6.檢視資源和狀态

primitive MysqlFS ocf:heartbeat:Filesystem \

params device="/dev/drbd0" directory="/mnt/mysqldata" fstype="ext3" \

op start interval="0" timeout="60s" \

op stop interval="0" timeout="60s"

primitive myip ocf:heartbeat:IPaddr \

params ip="192.168.1.6"

primitive mysqldrbd ocf:heartbeat:drbd \

params drbd_resource="mysql" \

op monitor interval="30s" role="Master" \

op monitor interval="31s" role="Slave" \

op start interval="0" timeout="240s" \

op stop interval="0" timeout="100s"

primitive mysqlserver lsb:mysqld

ms MS_mysqldrbd mysqldrbd \

meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"

colocation MysqlFS_with_mysqldrbd inf: MysqlFS MS_mysqldrbd:Master myip mysqlserver

order MysqlFS_after_mysqldrbd inf: MS_mysqldrbd:promote MysqlFS:start

order myip_after_MysqlFS inf: MysqlFS myip

order mysqlserver_after_myip inf: myip mysqlserver

Last updated: Wed Oct 31 00:44:42 2012

Current DC: node2.zzdx.com - partition with quorum

4 Resources configured.

 Masters: [ node1.zzdx.com ]

MysqlFS (ocf::heartbeat:Filesystem): Started node1.zzdx.com

myip (ocf::heartbeat:IPaddr): Started node1.zzdx.com

mysqlserver (lsb:mysqld): Started node1.zzdx.com

由此可見服務在node1節點上運作正常!!!

7.檢視服務運作

[root@node1 ~]# service mysqld status

MySQL running (8773)                                       [  OK  ]

[root@node1 ~]# ifconfig |less

eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:08:30:09 

          inet addr:192.168.1.6  Bcast:192.168.1.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          Interrupt:67 Base address:0x2000

[root@node1 ~]# mount

/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw)

devpts on /dev/pts type devpts (rw,gid=5,mode=620)

/dev/sda1 on /boot type ext3 (rw)

tmpfs on /dev/shm type tmpfs (rw)

none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

/dev/hdc on /mnt/cdrom type iso9660 (ro)

/dev/drbd0 on /mnt/mysqldata type ext3 (rw)

[root@node2 ~]# service mysqld status

MySQL is not running                                       [FAILED]

[root@node2 ~]# mount

8.繼續測試群集:

在node1上操作,讓node1下線:

[root@node1 ~]# crm node standby

Last updated: Wed Oct 31 00:57:58 2012

Node node1.zzdx.com: standby

Online: [ node2.zzdx.com ]

    Masters: [ node2.zzdx.com ]

     Stopped: [ mysqldrbd:0 ]

MysqlFS (ocf::heartbeat:Filesystem): Started node2.zzdx.com

myip (ocf::heartbeat:IPaddr): Started node2.zzdx.com

mysqlserver (lsb:mysqld): Started node2.zzdx.com

在node2上檢視服務運作:

MySQL running (7952)                                       [  OK  ]

[root@node2 ~]# ifconfig |less

eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:E8:F5:BD 

現在一切正常,我們可以驗證mysql服務是否能被正常通路:

首先,在node2上面建立一個使用者user1,密碼:123456.

我們定義的是通過VIP:192.168.1.6來通路mysql服務,現在node2上建立一個可以讓某個網段主機能通路的賬戶(這個内容會同步drbd裝置同步到node1上):

[root@node2 ~]# mysql

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 1

Server version: 5.5.15-log MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql&gt; grant all on *.* to user1@'192.168.%.%' identified by '123456';

Query OK, 0 rows affected (0.00 sec)

mysql&gt; flush privileges;

mysql&gt; quit;

使用客戶機進行測試:

192.168.1.66主機ping VIP :192.168.1.6

使用客戶機192.168.1.66通路mysql資料庫(VIP:192.168.1.6)

首先在用戶端上安裝mysql工具:

[root@node1 ~]# mkdir /mnt/cdrom/

[root@node1 ~]# cd /mnt/cdrom/Server/

[root@node1 Server]# vim /etc/yum.repos.d/rhel-debuginfo.repo

  5 gpgcheck=0

[root@node1 Server]# yum install mysql  -y

登入mysql

[root@node1 Server]# mysql -u user1 -p -h 192.168.1.6  

本文轉自 liuyatao666 51CTO部落格,原文連結:http://blog.51cto.com/5503845/1048455,如需轉載請自行聯系原作者

繼續閱讀