实现NFS HA共享目录

需求描述

有需求，才会有新产物生成。对于做平台，关键就是如何提高稳定、安全、高效的集群资源供用户使用。本次调整，实现了平台的软件安装路径和用户家目录统一管理，保障用户7*24小时用户登录平台。

平台使用Openldap进行用户统一管理，自取代nis使用至今，未出现过中断问题，相对来说还是比较稳定靠谱的。使用openldap统一进行用户管理，就需要平台中所有节点有统一共享的/home目录。

另外平台中所有软件安装采用module的方式，灵活加载及更换版本。实现该功能也需要平台中所有节点共享同一存储路径/opt，如果不使用共享的方式，就需要定期进行同步，保持所有节点的目录内容一致，这样管理维护起来成本会比较高。

解决方案

若要保证稳定，就需要有冗余节点。硬件上使用盘阵来解决，采用双控制器，配置多路径。前端节点至少2个，互相冗余。软件上使用Corosync+Pacemaker提供HA解决方案。

开始干活

配置存储和服务器

1. 硬件配置

一台DELL MD系列盘阵、2台R720服务器

盘阵接口：12Gb SAS

硬盘配置：12*4T 7.2k NL-SAS

采用9+2 Raid6,1块热备盘

2. IP地址分布

盘阵配置了2个管理IP：192.168.242.38/192.168.242.39

两台服务器IP：192.168.242.3/192.168.242.4

HOSTNAME:hpc-242-003/hpc-242-004

虚拟IP：192.168.242.40

心跳线地址:10.0.0.3/10.0.0.4

使用一根网线将两台服务器的2个网卡直连

3. 存储映射

这里将由11块磁盘组成的Raid6创建了2个VD，分别作为/opt目录和/home目录使用。映射时需要将这两组vd同时映射到2台服务器上，最终效果如下：

[[email protected] ~]# multipath -ll
3600a0980006e2e77000001a3563a2f05 dm-1 DELL,MD34xx
size=5.0T features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 7:0:0:1 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 7:0:1:1 sde 8:64 active ready running
3600a0980006e2e77000001a6563a2f24 dm-0 DELL,MD34xx
size=28T features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 7:0:0:0 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 7:0:1:0 sdd 8:48 active ready running

在另外一个节点上也可看到相同的设备

[[email protected] ~]# multipath -ll
    3600a0980006e2e77000001a3563a2f05 dm-0 DELL,MD34xx
    size=5.0T features='0' hwhandler='0' wp=rw
    |-+- policy='round-robin 0' prio=1 status=active
    | `- 5:0:1:1 sde 8:64 active ready running
    `-+- policy='round-robin 0' prio=1 status=enabled
      `- 5:0:0:1 sdc 8:32 active ready running
    3600a0980006e2e77000001a6563a2f24 dm-1 DELL,MD34xx
    size=28T features='0' hwhandler='0' wp=rw
    |-+- policy='round-robin 0' prio=1 status=active
    | `- 5:0:1:0 sdd 8:48 active ready running
    `-+- policy='round-robin 0' prio=1 status=enabled
      `- 5:0:0:0 sdb 8:16 active ready running

这里，存储的映射就做好了。具体做法这里不做详细说明，使用不同的存储整列操作方式可能都不相同，大致原理就是通过盘阵的管理界面将主机、sas端口、VD进行绑定，这里将两组vd作为一组资源、两台主机作为一组资源，通过sas线将组资源进行映射即可。

4. 软件安装及配置

需要安装的软件有多路径软件multipath，HA组件：Corosync+Pacemaker。

安装multipath

yum install device-mapper-multipath.x86_64

multipath安装完成后，默认没有配置文件，可将/usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf拷贝至/etc目录下

cp /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf  /etc/multipath.conf

拷贝过来后简单修改一个参数：

defaults {
        user_friendly_names yes
}

将yes改成no，否则会出现别名，然后启动multipath服务

/etc/init.d/multipathd start

multipath服务启动后执行multipath -ll可能还发现不了磁盘设备，把系统重启一下即可。

安装Corosync+Pacemaker

yum install -y corosync

配置corosync

[[email protected] corosync]# cat corosync.conf
# Autoconfigured by Intel Manager for Lustre
# DO NOT EDIT -- CHANGES MADE HERE WILL BE LOST
compatibility: whitetank

totem {
    version: 2
    secauth: off
    threads: 0
    token: 5000
    token_retransmits_before_loss_const: 10
    max_messages: 20
    rrp_mode: active
        interface {
        ringnumber: 0
        bindnetaddr: 10.0.0.0
        mcastaddr: 226.94.0.1
        mcastport: 4870
        ttl: 1
    }

}

logging {
    fileline: off
    to_stderr: no
    to_logfile: no
    to_syslog: yes
    logfile: /var/log/cluster/corosync.log
    debug: off
    timestamp: on
    logger_subsys {
        subsys: AMF
        debug: off
    }
}

amf {
    mode: disabled
}

service {
    name: pacemaker
    ver: 1
}

生成密钥文件

corosync-keygen

将秘钥文件和配置文件复制到另外一个节点上

scp /etc/corosync/authkey 192.168.242.4:/etc/corosync/
scp /etc/corosync/corosync.conf 192.168.242.4:/etc/corosync/corosync.conf

好了，到这里corosync配置完成，下面我们配置pacemaker

安装pacemaker

yum install -y pacemaker

pacemaker安装后，默认是没有crm 交互命令界面的，需要安装crmsh才行。

安装crmsh

wget  http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/crmsh-1.2.6-0.rc2.2.1.x86_64.rpm

rpm -ivh crmsh-1.2.6-0.rc2.2.1.x86_64.rpm

会提示有很多依赖包需要安装

yum install -y python-dateutil python-lxml

再次执行安装

rpm -ivh crmsh-1.2.6-0.rc2.2.1.x86_64.rpm

启动corosync，pacemaker服务

我们在配置文件中，将pacemaker整合进corosync中，corosync启动的同时也会启动pacemaker，我这里目前测试启动时还需手动启动pacemaker服务，停止时只要把corosync进程kill掉，pacemaker进程也会死掉。

/etc/init.d/corosync start
/etc/init.d/pacemaker start

两个节点上都启动后，检查状态是否正常。正常后我们进行资源添加。

添加资源

这里将盘阵分配出的2个VD作为资源添加到pacemaker中，另外使用nfs的方式将存储共享出去，所以还需要一个虚拟ip作为资源进行管理。

crm configure primitive optdir ocf:heartbeat:Filesystem params device=/dev/mapper/3600a0980006e2e77000001a3563a2f05p1 directory=/opt fstype=xfs op start timeout=60 op stop timeout=60

crm configure primitive homedir ocf:heartbeat:Filesystem params device=/dev/mapper/3600a0980006e2e77000001a6563a2f24p1 directory=/home options=rw,usrquota,grpquota fstype=xfs op start timeout=60 op stop timeout=60

crm configure primitive vip ocf:heartbeat:IPaddr params ip=192.168.242.40 nic=eth0:0 cidr_netmask=24

查看资源状态

[root@hpc-242-004 corosync]# crm resource list
 Resource Group: webservice
     homedir    (ocf::heartbeat:Filesystem):    Started 
     vip    (ocf::heartbeat:IPaddr):    Started 
     optdir (ocf::heartbeat:Filesystem):    Started

或者通过crm_mon

Last updated: Fri Nov  6 17:19:59 2015
Last change: Fri Nov  6 10:16:49 2015
Stack: classic openais (with plugin)
Current DC: hpc-242-003 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured


Online: [ hpc-242-003 hpc-242-004 ]

 Resource Group: webservice
     homedir    (ocf::heartbeat:Filesystem):    Started hpc-242-003
     vip        (ocf::heartbeat:IPaddr):        Started hpc-242-003
     optdir     (ocf::heartbeat:Filesystem):    Started hpc-242-003

这里重点介绍下如何添加资源，查看当前集群系统所支持的类型

[root@hpc-242-003 ~]# crm
crm(live)# ra
crm(live)ra# classes
lsb
ocf / heartbeat pacemaker
service
stonith
crm(live)ra#

查看某种类别下的所用资源代理的列表

crm(live)ra# list lsb
auditd                blk-availability      corosync              corosync-notifyd      crond                 dkms_autoinstaller    functions
gmond                 halt                  htcacheclean          httpd                 ip6tables             iptables              iscsi
iscsid                kdump                 killall               lvm2-lvmetad          lvm2-monitor          mdmonitor             multipathd
netconsole            netfs                 network               nfs                   nfslock               nscd                  nslcd
pacemaker             postfix               quota_nld             rdisc                 restorecond           rpcbind               rpcgssd
rpcidmapd             rpcsvcgssd            rsyslog               salt-minion           sandbox               saslauthd             single
sshd                  sysstat               udev-post             winbind               zabbix_agentd         
crm(live)ra# list ocf heartbeat
CTDB               Delay              Dummy              Filesystem         IPaddr             IPaddr2            IPsrcaddr          LVM
MailTo             Route              SendArp            Squid              VirtualDomain      Xinetd             apache             conntrackd
db2                dhcpd              ethmonitor         exportfs           iSCSILogicalUnit   mysql              named              nfsnotify
nfsserver          pgsql              postfix            rsyncd             symlink            tomcat             
crm(live)ra#

查看某个资源代理的配置方法,通过info命令可详细查看添加资源时配置参数的格式

crm(live)ra# info ocf:heartbeat:Filesystem
Manages filesystem mounts (ocf:heartbeat:Filesystem)

Resource script for Filesystem. It manages a Filesystem on a
shared storage medium. 

The standard monitor operation of depth 0 (also known as probe)
checks if the filesystem is mounted. If you want deeper tests,
set OCF_CHECK_LEVEL to one of the following values:

10: read first 16 blocks of the device (raw read)

This doesn't exercise the filesystem at all, but the device on
which the filesystem lives. This is noop for non-block devices
such as NFS, SMBFS, or bind mounts.

20: test if a status file can be written and read

The status file must be writable by root. This is not always the
case with an NFS mount, as NFS exports usually have the
"root_squash" option set. In such a setup, you must either use
read-only monitoring (depth=10), export with "no_root_squash" on
your NFS server, or grant world write permissions on the
directory where the status file is to be placed.

Parameters (*: required, []: default):

device* (string): block device
    The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specification.

directory* (string): mount point
    The mount point for the filesystem.

fstype* (string): filesystem type
    The type of filesystem to be mounted.

options (string): 
    Any extra options to be given as -o options to mount.

    For bind mounts, add "bind" here and set fstype to "none".

接下来将3个资源添加到了一个资源组webservice里，目的在于始终保持三个资源同时在一个节点上启动，添加资源组方式：

[root@hpc-242-004 corosync]# crm
crm(live)# configure
crm(live)configure# group webservice vip homedir optdir

定义一个webservice资源组并添加资源

为了保障稳定切换，还需要配置一些参数

[root@hpc-242-004 corosync]# crm
crm(live)# configure
crm(live)configure# property no-quorum-policy=ignore 
crm(live)configure# verify
crm(live)configure# commit

verify是确认配置文件是否正确，commit是确认对配置进行修改。在命令行配置资源时，只要不用commit提交配置好资源，就不会生效，一但用commit命令提交，就会写入到cib.xml的配置文件中。

我们需要考虑一个问题：资源由故障节点切换到正常节点后，当故障节点恢复后，资源需不需切换回来，我们这里是不需要的，因为每做一次切换，服务就会有短暂的中断，对业务多多少少都会有一些小的影响，所以这里我们就需要设置资源黏性。

资源黏性是指：资源更倾向于运行在哪个节点。

资源黏性值范围及其作用：

0：这是默认选项。资源放置在系统中的最适合位置。这意味着当负载能力“较好”或较差的节点变得可用时才转移资源。此选项的作用基本等同于自动故障回复，只是资源可能会转移到非之前活动的节点上；

大于0：资源更愿意留在当前位置，但是如果有更合适的节点可用时会移动。值越高表示资源越愿意留在当前位置；

小于0：资源更愿意移离当前位置。绝对值越高表示资源越愿意离开当前位置；

INFINITY：如果不是因节点不适合运行资源（节点关机、节点待机、达到migration-threshold 或配置更改）而强制资源转移，资源总是留在当前位置。此选项的作用几乎等同于完全禁用自动故障回复；

-INFINITY：资源总是移离当前位置；

我们这里可以通过以下方式为资源指定默认黏性值： rsc_defaults resource-stickiness=0

crm(live)configure# rsc_defaults resource-stickiness=0

配置完成后，我们查看下所有配置：

[[email protected]242-004 corosync]# crm
crm(live)# configure
crm(live)configure# show
node hpc-242-003
node hpc-242-004
primitive homedir Filesystem \
    params device="/dev/mapper/3600a0980006e2e77000001a6563a2f24p1" directory="/home" options="rw,usrquota,grpquota" fstype=xfs \
    op start timeout=10 interval=0 \
    op stop timeout=10 interval=0
primitive optdir Filesystem \
    params device="/dev/mapper/3600a0980006e2e77000001a3563a2f05p1" directory="/opt" fstype=xfs \
    op start timeout=10 interval=0 \
    op stop timeout=10 interval=0
primitive vip IPaddr \
    params ip=192.168.242.40 nic="eth0:0" cidr_netmask=24
group webservice homedir vip optdir \
    meta target-role=Started
property cib-bootstrap-options: \
    dc-version=1.1.11-97629de \
    expected-quorum-votes=2 \
    no-quorum-policy=ignore \
    symmetric-cluster=true \
    cluster-infrastructure="classic openais (with plugin)" \
    last-lrm-refresh=1446713502 \
    stonith-enabled=false
rsc_defaults rsc-options: \
    resource-stickiness=0
rsc_defaults rsc_defaults-options: \
    failure-timeout=20m \
    migration-threshold=3

总结

完成配置后，进行效果测试，将处于active状态的机器关机或重启，可发现我们创建的webservice组资源会在另外一台节点上启动，两块磁盘会切换挂载到新的active节点，因客户采用的是虚拟IP挂载，切换过程因资源的漂移，客户端访问会有短暂中断，但总体影响不大。

Corosync+Pacemaker功能非常强大，前面有提到过，可以管理资源非常多，如httpd、mysql、oracle等等，很多应用的高可用都可通过Corosync+Pacemaker方式来实现，但实际测试过程中发现稳定性还是有些小问题，测试时如果不是采用重启或关机，而是直接将active状态节点的corosync进程kill，切换会有点问题，这个后续再进行深入研究。

实现NFS HA共享目录

需求描述

解决方案

开始干活

总结

继续阅读

Apache (You don't have permission to access / on this server.）

debian9升级4.9.0内核到4.19.2内核过程

centOS7 配置 vsftpd 虚拟用户及权限Vsftpd配置虚拟用户及权限

linux-svn卸载与安装

vsftp虚拟多用户多权限一键部署脚本

Ubuntu14.04 LTS下安装mongodb

Nginx服务优化（1）——隐藏版本号、修改用户与组、网页缓存时间、日志切割、连接超时一、隐藏版本号二、修改用户与组三、配置Nginx网页缓存时间四、实现Nginx日志分割五、配置Nginx实现连接超时六、补充关于时间日期的命令

httpd服务的部署、启动、配置和简单优化一、部署二、启动三、配置文件

配置网页内容访问

手动安装Intel network I217-LM网卡的Linux驱动

禁止ubuntu系统弹出报错界面

Ubuntu Linux下Apache的配置文件

samba服务器的功能

【Linux】UDP广播报文接收速率问题

Linux设备模型（中）之上层容器

PowerPC平台 Linux移植三