天天看点

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

项目拓扑图:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

corosync 具体配置:

1.配置IP   setup

Linux下高可用群集之corosync+openais+pacemaker+web+drbd
Linux下高可用群集之corosync+openais+pacemaker+web+drbd

2.保证名称你能够相互解析:uname –r 必须相同

[root@www1 ~]# uname -rn

www1.gjp.com 2.6.18-164.el5

www1.gjp.com上的配置:

[root@gjp99 ~]# cat /etc/sysconfig/network

NETWORKING=yes

NETWORKING_IPV6=yes

HOSTNAME=www1.gjp.com

[root@gjp99 ~]# hostname www1.gjp.com

[root@gjp99 ~]# hostname

www1.gjp.com

logout登出重新登陆即可!

3.保证系统时钟一致

[root@www1 ~]# hwclock -s

[root@www1 ~]# clock

Tue 23 Oct 2012 05:20:36 PM CST  -0.017990 seconds

4.修改hosts(代替dns)

[root@www1 ~]# cat /etc/hosts

# Do not remove the following line, or various programs

# that require network functionality will fail.

127.0.0.1   localhost.localdomain  localhost

::1        localhost6.localdomain6 localhost6

192.168.2.1     www1.gjp.com    www1

192.168.2.2     www2.gjp.com    www2

[root@www1 ~]# ping www2.gjp.com

PING www2.gjp.com (192.168.2.2) 56(84) bytes of data.

64 bytes from www2.gjp.com (192.168.2.2): icmp_seq=1 ttl=64 time=3.45 ms

64 bytes from www2.gjp.com (192.168.2.2): icmp_seq=2 ttl=64 time=0.658 ms

名称已经能够相互解析!

5. 挂载光盘并安装corosync所需安装包

[root@www1 ~]# mkdir /mnt/cdrom

[root@www1 ~]# mount /dev/cdrom /mnt/cdrom

mount: block device /dev/cdrom is write-protected, mounting read-only

[root@www2 ~]# scp *.rpm www2:/root

在www2上拷贝上传的rpm包到www1的root目录下:

[root@www1 ~]# yum localinstall -y *.rpm –nogpgcheck

6.编辑corosync的配置文档

[root@www1 ~]# cd /etc/corosync/

[root@www1 corosync]# ll

total 20

-rw-r--r-- 1 root root 5384 Jul 28  2010 amf.conf.example

-rw-r--r-- 1 root root  436 Jul 28  2010 corosync.conf.example

drwxr-xr-x 2 root root 4096 Jul 28  2010 service.d

drwxr-xr-x 2 root root 4096 Jul 28  2010 uidgid.d

[root@www1 corosync]# cp corosync.conf.example corosync.conf

[root@www1 corosync]# vim corosync.conf

compatibility: whitetank  (表示兼容corosync 0.86的版本,向后兼容,兼容老的版本,一些

                           新的功能可能无法实用)

(图腾的意思  ,多个节点传递心跳时的相关协议的信息)

totem {

        version: 2  版本号

        secauth: off  是否×××安全认证

        threads: 0   多少个现成认证  0 无限制

        interface {

                ringnumber: 0  

                bindnetaddr: 192 168.2.0  通过哪个网络地址进行通讯,可以给个主机地址(给成192.168.2.0

                mcastaddr: 226.94.1.1

                mcastport: 5405

        }  

}

logging {

        fileline: off

        to_stderr: no  是否发送标准出错

        to_logfile: yes  日志

        to_syslog: yes   系统日志  (建议关掉一个),会降低性能

        logfile: /var/log/cluster/corosync.log  (手动创建目录)

        debug: off  排除时可以起来

        timestamp: on 日志中是否记录时间

      一下是openais的东西,可以不用×××

        logger_subsys {

                subsys: AMF

                debug: off

amf {

        mode: disabled

}

补充一些东西,前面只是底层的东西,因为要用pacemaker

service {

        ver: 0

        name: pacemaker

虽然用不到openais ,但是会用到一些子选项

aisexec {

        user: root

        group: root

7.为了便面其他主机加入该集群,需要认证,生成一个authkey

[root@www1 corosync]# corosync-keygen

[root@www1 corosync]# ll

total 28

-r-------- 1 root root  128 Oct 24 13:59 authkey

-rw-r--r-- 1 root root  538 Oct 24 13:56 corosync.conf

drwxr-xr-x 2 root root 4096 Jul 28  2010 service.d

[root@www1 corosync]# scp -p authkey corosync.conf www2:/etc/corosync/

8.该目录必须提前创建

[root@www1 ~]# mkdir /var/log/cluster

[root@www1 corosync]# ssh www2 'mkdir  /var/log/cluster

9.启动corosync服务

[root@www1 corosync]# service corosync start

Starting Corosync Cluster Engine (corosync):               [  OK  ]

[root@www1 corosync]# ssh www2 'service corosync start'

root@www2's password:

Starting Corosync Cluster Engine (corosync):

[  OK  ]

10.检测corosync是否无误

验证corosync引擎是否正常启动了

[root@www1 corosync]# grep -i  -e "corosync cluster engine" -e "configuration file" /var/log/messages

Oct 24 11:09:04 www1 smartd[3260]: Opened configuration file /etc/smartd.conf

Oct 24 11:09:04 www1 smartd[3260]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices

Oct 24 17:08:33 www1 corosync[26362]:   [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.

Oct 24 17:08:33 www1 corosync[26362]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

查看初始化成员节点通知是否发出

[root@www1 corosync]# grep -i totem /var/log/messages

Oct 24 17:08:33 www1 corosync[26362]:   [TOTEM ] Initializing transport (UDP/IP).

Oct 24 17:08:33 www1 corosync[26362]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Oct 24 17:08:33 www1 corosync[26362]:   [TOTEM ] The network interface is down.

Oct 24 17:08:34 www1 corosync[26362]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.

[root@www2 ~]# grep -i totem /var/log/messages

Oct 24 17:09:07 www2 corosync[28610]:   [TOTEM ] Initializing transport (UDP/IP).

Oct 24 17:09:07 www2 corosync[28610]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Oct 24 17:09:07 www2 corosync[28610]:   [TOTEM ] The network interface is down.

Oct 24 17:09:08 www2 corosync[28610]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.

检查过程中是否有错误产生

[root@www1 corosync]# grep -i error:  /var/log/messages  |grep -v unpack_resources

[root@www2 ~]# grep -i error:  /var/log/messages  |grep -v unpack_resources

不显示任何信息证明正确无误!

检查pacemaker时候已经启动了

Oct 24 17:08:33 www1 corosync[26362]:   [TOTEM ] Initializing transmit/receive sec

Oct 24 17:08:34 www1 corosync[26362]:   [TOTEM ] A processor joined or left the me

[root@www1 corosync]# grep -i error:  /var/log/messages  |grep -v unpack_resources

[root@www1 corosync]# grep -i pcmk_startup /var/log/messages

Oct 24 17:08:34 www1 corosync[26362]:   [pcmk  ] info: pcmk_startup: CRM: Initialized

Oct 24 17:08:34 www1 corosync[26362]:   [pcmk  ] Logging: Initialized pcmk_startup

Oct 24 17:08:34 www1 corosync[26362]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 4294967295

Oct 24 17:08:34 www1 corosync[26362]:   [pcmk  ] info: pcmk_startup: Service: 9

Oct 24 17:08:34 www1 corosync[26362]:   [pcmk  ] info: pcmk_startup: Local hostname: www1.gjp.com

[root@www2 ~]# grep -i pcmk_startup /var/log/messages

前集群的节点上启动另外一个节点

[root@www1 ~]# /etc/init.d/corosync start

[root@www1 ~]# ssh www2  '/etc/init.d/corosync start'

Starting Corosync Cluster Engine (corosync): [  OK  ]

[root@www2 corosync]# crm status

============

Last updated: Wed Oct 24 20:11:19 2012

Stack: openais

Current DC: www1.gjp.com - partition with quorum

Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f

2 Nodes configured, 2 expected votes

0 Resources configured.

============

Online: [ www1.gjp.com www2.gjp.com ]

提示:集群的节点之间的时间应该是同步的,

提供高可用服务

在corosync中,定义服务可以用两种借口

1.图形接口  (使用hb—gui)

2.crm  (pacemaker 提供,是一个shell)

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

用于查看cib的相关信息

如何验证该文件的语法错误

[root@www1 corosync]# crm_verify  -L

crm_verify[4329]: 2012/10/25_14:59:35 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined

crm_verify[4329]: 2012/10/25_14:59:35 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option

crm_verify[4329]: 2012/10/25_14:59:35 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity

Errors found during check: config not valid

  -V may provide more details

可以看到有stonith错误,在高可用的环境里面,会禁止实用任何支援

可以禁用stonith

[root@www1 corosync]# crm

crm(live)# configure

crm(live)configure#  property stonith-enabled=false

crm(live)configure# commit

crm(live)configure# show

node www1.gjp.com

node www2.gjp.com

property $id="cib-bootstrap-options" \

    dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \

    cluster-infrastructure="openais" \

    expected-quorum-votes="2" \

    stonith-enabled="false"

再次进行检查

[root@www1 corosync]# crm_verify  -L

    没有错误了

    系统上有专门的stonith命令

stonith   -L   显示stonith所指示的类型

crm可以使用交互式模式 

可以执行help

保存在cib里面,以xml的格式

11.资源的配置

集群的资源类型有4种

primitive   本地主资源 (只能运行在一个节点上)

group     把多个资源轨道一个组里面,便于管理

clone    需要在多个节点上同时启用的  (如ocfs2  ,stonith ,没有主次之分)

master    有主次之分,如drbd

现在用的资源

ip地址  http服务  共享存储

用资源代理进行配置

ocf  lsb的

使用list可以查看

crm(live)# help

This is the CRM command line interface program.

Available commands:

    cib              manage shadow CIBs

    resource         resources management

    configure        CRM cluster configuration

    node             nodes management

    options          user preferences

    ra               resource agents information center

    status           show cluster status

    quit,bye,exit    exit the program

    help             show help

    end,cd,up        go back one level

crm(live)# ra

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

(是/etc/init.d目录下的)

crm(live)ra# list ocf heartbeat

实用info或者meta 用于显示一个资源的详细信息

  meta ocf:heartbeat:IPaddr  各个子项用:分开

crm(live)ra# meta ocf:heartbeat:IPaddr 

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

配置一个资源,可以在configuration 下面进行配置

1.先资源名字

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

crm(live)configure# commit

crm(live)configure# end

crm(live)# status

Last updated: Thu Oct 25 15:18:54 2012

1 Resources configured.

Online: [ www1.gjp.com www2.gjp.com ]

webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com

可以看出该资源在node1上启动

[root@www1 corosync]# ifconfig |less

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

[root@www1 corosync]# mount /dev/cdrom /mnt/cdrom

mount: block device /dev/cdrom is write-protected, mounting read-only

[root@www1 corosync]# yum install httpd -y

[root@www1 corosync]# service httpd status

httpd is stopped

[root@www1 corosync]# chkconfig --list |grep httpd

httpd              0:off    1:off    2:off    3:off    4:off    5:off    6:off

[root@www1 corosync]# crm

crm(live)# ra

crm(live)ra# classes

heartbeat

lsb

ocf / heartbeat pacemaker

stonith

定义web服务资源

  在两个节点上都要进行安装

  安装完毕后,可以查看httpd的lsb脚本

[root@www1 corosync]# crm ra list lsb

或者

crm(live)# ra

crm(live)ra# list lsb

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

crm(live)ra# end

crm(live)configure# primitive webserver lsb:httpd

定义httpd的资源

primitive webip ocf:heartbeat:IPaddr \

    params ip="192.168.2.66"

primitive webserver lsb:httpd

    stonith-enabled="false"

Last updated: Thu Oct 25 16:06:46 2012

2 Resources configured.

webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com

webserver    (lsb:httpd):    Started www1.gjp.com

Failed actions:

    webserver_monitor_0 (node=www2.gjp.com, call=3, rc=5, status=complete): not installed

如果www2.gjp.com上面已安装http服务,则会出现,ip在www1上,服务在www2上运行!

[root@www1 ~]# service httpd status

httpd (pid  4897) is running...

[root@www1 ~]# echo "www1.gjp.com">/var/www/html/index.html

[root@www1 ~]# crm

crm(live)configure# help group

The `group` command creates a group of resources.

Usage:

...............

        group <name> <rsc> [<rsc>...]

          [meta attr_list]

          [params attr_list]

        attr_list :: [$id=<id>] <attr>=<val> [<attr>=<val>...] | $id-ref=<id>

Example:

        group internal_www disk0 fs0 internal_ip apache \

          meta target_role=stopped

...............

crm(live)configure# group web webip webserver

crm(live)configure# commit

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

客户机测试:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

[root@www1 ~]# crm status

Last updated: Thu Oct 25 16:34:28 2012

Resource Group: web

     webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com

     webserver    (lsb:httpd):    Started www1.gjp.com

模拟www1已经死掉:

[root@www1 ~]# service corosync stop

Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]

Waiting for corosync services to unload:.......            [  OK  ]

[root@www1 ~]# service httpd status

httpd is stopped

[root@www2 Server]# crm status

Last updated: Thu Oct 25 16:43:01 2012

Current DC: www2.gjp.com - partition WITHOUT quorum

Online: [ www2.gjp.com ]

OFFLINE: [ www1.gjp.com ]

发现总有这个错误提示:

安装提示在www2上安装http服务,必须重启服务,否则,不能识别到,错误仍在!

[root@www2 Server]# service corosync stop

Waiting for corosync services to unload:^[[A.^H.....       [  OK  ]

[root@www2 Server]# service corosync start

[root@www2 Server]# crm status

Last updated: Thu Oct 25 16:47:18 2012

     webserver    (lsb:httpd):    Started www1.gjp.com

解决www2接管不了服务的问题:

[root@www1 ~]# service corosync start

[root@www1 ~]# service httpd status

httpd (pid  5233) is running...

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

在www2上创建网站:

[root@www2 Server]# echo "www2.gjp.com " >/var/www/html/index.html

一旦www1死掉:

[root@www1 ~]# service corosync stop

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

能够正常访问:

[root@www2 Server]# service httpd status

httpd (pid  4656) is running...

Last updated: Thu Oct 25 17:12:16 2012

     webip    (ocf::heartbeat:IPaddr):    Started www2.gjp.com

     webserver    (lsb:httpd):    Started www2.gjp.com

如果www1恢复了,不能把权利争夺过来!观看如下配置:

[root@www1 ~]# crm status

Last updated: Thu Oct 25 17:20:44 2012

Current DC: www2.gjp.com - partition with quorum

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

随便刷新都定位到www2上!

除非www2上的corosync服务死掉

Linux下高可用群集之corosync+openais+pacemaker+web+drbd
Linux下高可用群集之corosync+openais+pacemaker+web+drbd

www2.gjp.com上的配置:

HOSTNAME=www2.gjp.com

[root@gjp99 ~]# hostname www2.gjp.com

[root@www2 ~]# hwclock -s

[root@www2 ~]# clock

Tue 23 Oct 2012 05:20:32 PM CST  -0.018132 seconds

[root@www2 .ssh]# cat /etc/hosts

192.168.2.1     www1.gjp.com      www1

192.168.2.2     www2.gjp.com      www2

[root@www2 ~]# ping www1.gjp.com

PING www1.gjp.com (192.168.2.1) 56(84) bytes of data.

64 bytes from www1.gjp.com (192.168.2.1): icmp_seq=1 ttl=64 time=1.11 ms

64 bytes from www1.gjp.com (192.168.2.1): icmp_seq=2 ttl=64 time=0.506 ms

[root@www2 ~]# cat /etc/yum.repos.d/rhel-debuginfo.repo

[rhel-server]

name=Red Hat Enterprise Linux Server

baseurl=file:///mnt/cdrom/Server

enabled=1

gpgcheck=1

gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release

[rhel-cluster]

name=Red Hat Enterprise Linux Cluster

baseurl=file:///mnt/cdrom/Cluster

要保证光驱已连接:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

[root@www2 ~]# mkdir /mnt/cdrom

[root@www2 ~]# mount /dev/cdrom /mnt/cdrom

[root@www2 ~]# yum grouplist all

Loaded plugins: rhnplugin, security

This system is not registered with RHN.

RHN support will be disabled.

Setting up Group Process

rhel-cluster                                                                | 1.3 kB     00:00    

rhel-cluster/primary                                                        | 6.5 kB     00:00    

rhel-server                                                                 | 1.3 kB     00:00    

rhel-server/primary                                                         | 732 kB     00:00    

rhel-cluster/group                                                          | 101 kB     00:00    

rhel-server/group                                                           | 1.0 MB     00:00    

Done

实现同一网段内的无障碍通讯!

[root@www2 ~]# ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

/root/.ssh/id_rsa already exists.

Overwrite (y/n)?

[root@www2 ~]# cd .ssh/

[root@www2 .ssh]# ls

id_rsa  id_rsa.pub

[root@www2 .ssh]# ssh-copy-id -i id_rsa.pub www1

10

The authenticity of host 'www1 (192.168.2.1)' can't be established.

RSA key fingerprint is 87:be:8b:a4:bd:11:11:10:c2:ec:2d:ef:02:68:f6:0e.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'www1,192.168.2.1' (RSA) to the list of known hosts.

root@www1's password:

Now try logging into the machine, with "ssh 'www1'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[root@www2 .ssh]# scp /etc/yum.repos.d/rhel-debuginfo.repo  www1:/etc/yum.repos.d/

rhel-debuginfo.repo                                              100%  318     0.3KB/s   00:00   

[root@www2 .ssh]# date

Wed Oct 24 11:30:30 CST 2012

[root@www2 .ssh]# ssh www1 'date'

Wed Oct 24 11:30:40 CST 2012

[root@www1 ~]# ssh-keygen -t rsa

[root@www1 .ssh]# ssh-copy-id -i id_rsa.pub www2

上传所需软件包:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

[root@www2 ~]# mount /dev/cdrom /mnt/cdrom

[root@www2 ~]# yum localinstall -y *.rpm –nogpgcheck

[root@www2 ~]#  grep -i  -e "corosync cluster engine" -e "configuration file" /var/log/messages

Oct 24 11:09:03 www2 smartd[3259]: Opened configuration file /etc/smartd.conf

Oct 24 11:09:03 www2 smartd[3259]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices

Oct 24 17:09:07 www2 corosync[28610]:   [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.

Oct 24 17:09:07 www2 corosync[28610]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

[root@www2 Server]# yum install httpd -y

DRBD的配置:

www1 的配置:

[root@www1 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 2610.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n

Command action

   e   extended

   p   primary partition (1-4)

e

Selected partition 4

First cylinder (1354-2610, default 1354):

Using default value 1354

Last cylinder or +size or +sizeM or +sizeK (1354-2610, default 2610):

Using default value 2610

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes

255 heads, 63 sectors/track, 2610 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1   *           1          13      104391   83  Linux

/dev/sda2              14        1288    10241437+  83  Linux

/dev/sda3            1289        1353      522112+  82  Linux swap / Solaris

/dev/sda4            1354        2610    10096852+   5  Extended

First cylinder (1354-2610, default 1354): p

Last cylinder or +size or +sizeM or +sizeK (1354-2610, default 2610): +2g

/dev/sda4            1354        2610    10096852+   5  Extended

/dev/sda5            1354        1597     1959898+  83  Linux

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.

The kernel still uses the old table.

The new table will be used at the next reboot.

Syncing disks.

[root@www1 ~]# partprobe /dev/sda

[root@www1 ~]# cat /proc/partitions

major minor  #blocks  name

   8     0   20971520 sda

   8     1     104391 sda1

   8     2   10241437 sda2

   8     3     522112 sda3

   8     4          0 sda4

   8     5    1959898 sda5

在节点2上做同样配置

安装drbd,用来构建分布式存储。

这里要选用适合自己系统的版本进行安装,我用到的是

drbd83-8.3.8-1.el5.centos.i386.rpm

kmod-drbd83-8.3.8-1.el5.centos.i686.rpm

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

[root@www1 ~]# yum localinstall -y drbd83-8.3.8-1.el5.centos.i386.rpm –nogpgcheck

[root@www1 ~]# yum localinstall -y kmod-drbd83-8.3.8-1.el5.centos.i686.rpm --nogpgcheck

在节点2上做同样操作

[root@www1 ~]# cp /usr/share/doc/drbd83-8.3.8/drbd.conf  /etc

cp: overwrite `/etc/drbd.conf'? y  必须选择覆盖!  

[root@www1 ~]# scp /etc/drbd.conf  www2:/etc/

[root@www1 ~]# vim /etc/drbd.d/global_common.conf

[root@www1 ~]# cat /etc/drbd.d/global_common.conf

global {

        usage-count yes;

        # minor-count dialog-refresh disable-ip-verification

common {

        protocol C;

        startup {

                wfc-timeout  120;

                degr-wfc-timeout 120;

         }

        disk {

                  on-io-error detach;

                  fencing resource-only;

          }

        net {

                cram-hmac-alg "sha1";

                shared-secret  "mydrbdlab";

        syncer {

                  rate  100M;

         }

}

[root@www1 ~]# vim /etc/drbd.d/web.res

[root@www1 ~]# cat /etc/drbd.d/web.res

resource  web {

        on www1.gjp.com {

        device   /dev/drbd0;

        disk    /dev/sda5;

        address  192.168.2.1:7789;

        meta-disk       internal;

        on www2.gjp.com {

        address  192.168.2.2:7789;

开始初始化

双方借点上都要执行

drbdadm   create-md web

在双方的节点上启动服务

service drbd start

查看状态

[root@www1 ~]# drbdadm create-md web

[root@www1 ~]# service drbd start

启动时必须双方一块启动!

www2的配置:

[root@www2 Server]# fdisk /dev/sda

/dev/sda3            1289        1353      522112+  82  Linux swap / Solaris

[root@www2 Server]# partprobe /dev/sda

[root@www2 Server]# cat /proc/partitions

[root@www1 ~]# scp drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm www2:/root

drbd83-8.3.8-1.el5.centos.i386.rpm              100%  217KB 216.7KB/s   00:00   

kmod-drbd83-8.3.8-1.el5.centos.i686.rpm         100%  123KB 123.0KB/s   00:00 

[root@www2 ~]# rpm -ivh drbd83-8.3.8-1.el5.centos.i386.rpm

warning: drbd83-8.3.8-1.el5.centos.i386.rpm: Header V3 DSA signature: NOKEY, key ID e8562897

Preparing...                ########################################### [100%]

   1:drbd83                 warning: /etc/drbd.conf created as /etc/drbd.conf.rpmnew

########################################### [100%]

[root@www2 ~]# rpm -ivh kmod-drbd83-8.3.8-1.el5.centos.i686.rpm

warning: kmod-drbd83-8.3.8-1.el5.centos.i686.rpm: Header V3 DSA signature: NOKEY, key ID e8562897

   1:kmod-drbd83            ########################################### [100%]

[root@www2 ~]# cp /usr/share/doc/drbd83-8.3.8/drbd.conf   /etc/

cp: overwrite `/etc/drbd.conf'? y

[root@www2 ~]# scp www1:/etc/drbd.d/global_common.conf  /etc/drbd.d/global_common.conf

global_common.conf                              100%  505     0.5KB/s   00:00

[root@www2 ~]# scp www1:/etc/drbd.d/web.res  /etc/drbd.d/web.res

web.res                                         100%  348     0.3KB/s   00:00

[root@www2 ~]# drbdadm   create-md web

[root@www2 ~]# service drbd start

Starting DRBD resources: [

web

Found valid meta data in the expected location, 2006929408 bytes into /dev/sda5.

d(web) s(web) n(web) ].

[root@www1 ~]# service drbd start

Starting DRBD resources: [ ].

[root@www1 ~]# cat /proc/drbd

version: 8.3.8 (api:88/proto:86-94)

GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16

0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----

    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:195980

都为second 状态,没有同步

也可以

drbd-overview

[root@www1 ~]# drbdadm   -- --overwrite-data-of-peer primary web

[root@www1 ~]# vim /etc/drbd.d/global_common.conf

可调整同步速率:rate

[root@www1 ~]# cat /proc/drbd

0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----

    ns:259716 nr:0 dw:0 dr:267904 al:0 bm:15 lo:1 pe:31 ua:256 ap:0 ep:1 wo:b oos:1701048

    [=>..................] sync'ed: 13.4% (1701048/1959800)K delay_probe: 25

    finish: 0:00:37 speed: 45,120 (23,520) K/sec

[root@www1 ~]# cat /proc/drbd

0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----

    ns:1959800 nr:0 dw:0 dr:1959800 al:0 bm:120 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

[root@www1 ~]# drbd-overview

  0:web  Connected Primary/Secondary UpToDate/UpToDate C r----

[root@www2 ~]# cat /proc/drbd

    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1959800

[root@www2 ~]# cat /proc/drbd

0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----

    ns:0 nr:1959800 dw:1959800 dr:0 al:0 bm:120 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:

创建文件系统(在主节点上实现)

mkfs -t ext3  -L drbdweb  /dev/drbd0

[root@www1 ~]# mkfs -t ext3  -L drbdweb  /dev/drbd0

mke2fs 1.39 (29-May-2006)

Filesystem label=drbdweb

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

245280 inodes, 489950 blocks

24497 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=503316480

15 block groups

32768 blocks per group, 32768 fragments per group

16352 inodes per group

Superblock backups stored on blocks:

    32768, 98304, 163840, 229376, 294912

Writing inode tables: done                           

Creating journal (8192 blocks): done

Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 37 mounts or

180 days, whichever comes first.  Use tune2fs -c or -i to override.

[root@www1 ~]# mkdir /web

[root@www1 ~]# mount /dev/drbd0 /web/

[root@www1 ~]# mount

/dev/sda2 on / type ext3 (rw)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw)

devpts on /dev/pts type devpts (rw,gid=5,mode=620)

/dev/sda1 on /boot type ext3 (rw)

tmpfs on /dev/shm type tmpfs (rw)

none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

/dev/hdc on /mnt/cdrom type iso9660 (ro)

/dev/drbd0 on /web type ext3 (rw)

[root@www1 ~]# cd /web

[root@www1 web]# echo "web1 " >index.html

[root@www1 web]# ll

-rw-r--r-- 1 root root     6 Oct 25 21:11 index.html

drwx------ 2 root root 16384 Oct 25 20:57 lost+found

[root@www2 ~]# mkdir /web2

[root@www2 ~]# mount /dev/drbd0 /web2

mount: block device /dev/drbd0 is write-protected, mounting read-only

mount: Wrong medium type

从设备没有任何权限!

[root@www1 ~]# umount /web

[root@www1 ~]# drbdadm secondary web

0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r----

    ns:2024140 nr:0 dw:64340 dr:1959937 al:24 bm:135 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

[root@www2 ~]# drbdadm primary web

[root@www2 ~]# ll /web2

[root@www2 ~]# cat /proc/drbd

    ns:40 nr:2024140 dw:2024180 dr:221 al:1 bm:120 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

[root@www2 ~]# cd /web2

[root@www2 web2]# touch gjp.txt

[root@www2 web2]# ll

-rw-r--r-- 1 root root     0 Oct 25 21:16 gjp.txt

注意:还原为www1为主,www2为辅,则必须把www2上的挂载点卸掉!然后再设置主备!

    ns:2024140 nr:96 dw:64436 dr:1959937 al:24 bm:135 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

[root@www1 ~]# cd /var/www/html

[root@www1 html]# ll

total 4

-rw-r--r-- 1 root root  0 Oct 25 16:14 gjp1

-rw-r--r-- 1 root root 13 Oct 25 16:23 index.html

[root@www1 html]# mv index.html /web/

mv: overwrite `/web/index.html'? y

必须覆盖,原来的index.html是随便写的,不是网站

[root@www1 html]# cd /web/

-rw-r--r-- 1 root root    13 Oct 25 16:23 index.html

[root@www1 web]# vim /etc/httpd/conf/httpd.conf

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

将其修改为:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

在www2上改变为

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

下面实现COROSYNC自动调用brdb

brdb自动挂载挂载点, 由于访问的网站已经放到挂载点 /web下,所以全都能够自动实现!

修改如下:

由于两台corosync使用的是同一个配置文件,所以两台设备上的挂载点必须相同

即在www2上建立挂载点/web   并修改httpd.conf的网站默认目录/web

corosync 如何与drbd绑定?

把drbd添加到corosync服务上

代码添加

crm configure primitive drbd_web_FS ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/web" fstype="ext3"

crm configure primitive httpd_drbd_web ocf:heartbeat:drbd params drbd_resource="web" op monitor interval="60s" role="Master" timeout="40s" op monitor interval="70s" role="Slave" timeout="40s"

crm configure master MS_Webdrbd httpd_drbd_web meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"

crm configure colocation drbd_web_FS_on_MS_Webdrbd inf: drbd_web_FS MS_Webdrbd:Master

crm configure order drbd_web_FS_after_MS_Webdrbd inf: MS_Webdrbd:promote drbd_web_FS:start

crm configure property no-quorum-policy="ignore"

配置查看:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

[root@www1 ~]# cd /etc/drbd.d/

[root@www1 drbd.d]# vim global_common.conf

global {

        usage-count no;   //注意该为no 

        # minor-count dialog-refresh disable-ip-verification

common {

handlers {

                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b &gt; /proc/sysrq-trigger ; reboot -f";

                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b &gt; /proc/sysrq-trigger ; reboot -f";

                local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o &gt; /proc/sysrq-trigger ; halt -f"; 

                fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; 

                split-brain "/usr/lib/drbd/notify-split-brain.sh root"; 

                out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; 

                before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; 

                after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;

}

                wfc-timeout  120;

                degr-wfc-timeout 120;

         }  

        disk {

                  on-io-error detach;

                 fencing resource-only;

          }

        net {

                cram-hmac-alg "sha1";

                shared-secret  "mydrbdlab";

         }

        syncer {

                  rate  100M;

由于我们在/etc/drbd.d/global_common.conf配置文件中开启了资源隔离和脑列处理机制,所以在crm的配置文件cib中将会自动出现一个位置约束配置,当主节点宕机之后,禁止从节点变为主节点,以免当主节点恢复的时候产生脑裂,进行资源争用,但是我们此时只是为了验证资源能够流转,所以将这个位置约束删除:

[root@www1 drbd.d]# crm configure edit

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

两台都要删除这两行!

node www1.gjp.com \

        attributes standby="on"  

node www2.gjp.com \

        attributes standby="off"

primitive drbd_web_FS ocf:heartbeat:Filesystem \

        params device="/dev/drbd0" directory="/web" fstype="ext3"

primitive httpd_drbd_web ocf:heartbeat:drbd \

        params drbd_resource="web" \

        op monitor interval="60s" role="Master" timeout="40s" \

        op monitor interval="70s" role="Slave" timeout="40s"

primitive webip ocf:heartbeat:IPaddr \

        params ip="192.168.2.66"

primitive webserver lsb:httpd

group web webip webserver

ms MS_Webdrbd httpd_drbd_web \

        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"

colocation drbd_web_FS_on_MS_Webdrbd inf: drbd_web_FS MS_Webdrbd:Master

order drbd_web_FS_after_MS_Webdrbd inf: MS_Webdrbd:promote drbd_web_FS:start

property $id="cib-bootstrap-options" \

        dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \

        cluster-infrastructure="openais" \

        expected-quorum-votes="2" \

        stonith-enabled="false" \

        no-quorum-policy="ignore"

[root@www1 drbd.d]# crm status

============

Last updated: Sun Oct 28 15:55:51 2012

Stack: openais

Current DC: www1.gjp.com - partition with quorum

Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f

2 Nodes configured, 2 expected votes

3 Resources configured.

Resource Group: web

     webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com

     webserver    (lsb:httpd):    Started www1.gjp.com

drbd_web_FS    (ocf::heartbeat:Filesystem):    Started www1.gjp.com

Master/Slave Set: MS_Webdrbd [httpd_drbd_web]

     Masters: [ www1.gjp.com ]

     Stopped: [ httpd_drbd_web:1 ]

[root@www1 drbd.d]# service drbd status

drbd driver loaded OK; device status:

version: 8.3.8 (api:88/proto:86-94)

GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16

m:res  cs            ro               ds                         p                 mounted  fstype

0:web  WFConnection  Primary/Unknown  UpToDate/Outdated  C  /web     ext3

[root@www1 drbd.d]# crm status

Last updated: Sun Oct 28 16:08:38 2012

    webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com

     webserver    (lsb:httpd):    Started www1.gjp.com

drbd_web_FS    (ocf::heartbeat:Filesystem):    Started www2.gjp.com

     Masters: [ www2.gjp.com ]

     Slaves: [ www1.gjp.com ]

发现出现脑裂现象,解决如下:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

[root@www1 drbd.d]# watch -n 1 'crm status'

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

已可以同步:查看挂载点

[root@www1 drbd.d]# mount

/dev/sda2 on / type ext3 (rw)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw)

devpts on /dev/pts type devpts (rw,gid=5,mode=620)

/dev/sda1 on /boot type ext3 (rw)

tmpfs on /dev/shm type tmpfs (rw)

none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

观看www2上的状态:

[root@www2 drbd.d]# mount

sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

[root@www2 drbd.d]# service httpd status

httpd is stopped

[root@www2 drbd.d]# service drbd status

drbd driver loaded OK; device status:

GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16

m:res  cs          ro                 ds                 p      mounted  fstype

0:web  StandAlone  Secondary/Unknown  UpToDate/Outdated  r----

模拟www1死掉了!

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

[root@www2 drbd.d]# crm status

Last updated: Sun Oct 28 17:25:27 2012

Node www1.gjp.com: standby

Online: [ www2.gjp.com ]

     webip    (ocf::heartbeat:IPaddr):    Started www2.gjp.com

     webserver    (lsb:httpd):    Started www2.gjp.com

     Stopped: [ httpd_drbd_web:0 ]

drbd_web_FS    (ocf::heartbeat:Filesystem):    Started www2.gjp.com

[root@www2 drbd.d]# service httpd status

httpd (pid  8509) is running...

Linux下高可用群集之corosync+openais+pacemaker+web+drbd

能够正常访问!

eth0      Link encap:Ethernet  HWaddr 00:0C:29:99:12:74 

          inet addr:192.168.2.2  Bcast:192.168.2.255  Mask:255.255.255.0

          inet6 addr: fe80::20c:29ff:fe99:1274/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:192191 errors:0 dropped:0 overruns:0 frame:0

          TX packets:103068 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:121841514 (116.1 MiB)  TX bytes:13390418 (12.7 MiB)

          Interrupt:67 Base address:0x2000