天天看點

Hadoop叢集部署權限總結 1. 開始之前 2. 安裝 kerberos 3. hadoop 內建 kerberos 3. 安裝 ldap 4. 內建 sentry 5. 如何添加新使用者并設定權限?

這是一篇總結的文章,主要介紹 Hadoop 叢集快速部署權限的步驟以及一些注意事項,包括 Hadoop 各個元件內建 kerberos、openldap 和 sentry 的過程。如果你想了解詳細的過程,請參考本部落格中其他的文章。

1. 開始之前

hadoop 叢集一共有三個節點,每個節點的 ip、hostname、角色如下:

192.168.56.121 cdh1 NameNode、kerberos-server、ldap-server、sentry-store
192.168.56.122 cdh2 DataNode、yarn、hive、impala
192.168.56.123 cdh3 DataNode、yarn、hive、impala
           

一些注意事項:

  • 作業系統為 CentOs6.2
  • Hadoop 版本為 CDH5.2
  • hostname 請使用小寫,因為 kerberos 中區分大小寫,而 hadoop 中會使用 hostname 的小寫替換 

    _HOST

    ,impala 直接使用 hostname 替換 

    _HOST

  • 開始之前,請确認 hadoop 叢集部署安裝成功,不管是否配置 HA,請規劃好每個節點的角色。我這裡為了簡單,以三個節點的叢集為例做說明,你可以參考本文并結合你的實際情況做調整。
  • 請确認防火牆關閉,以及叢集内和 kerberos 以及 ldap 伺服器保持時鐘同步。
  • cdh1 為管理節點,故需要做好 cdh1 到叢集所有節點的無密碼登陸,包括其本身。

叢集中每個節點的 hosts 如下:

$ cat /etc/hosts
127.0.0.1       localhost

192.168.56.121 cdh1
192.168.56.122 cdh2
192.168.56.123 cdh3
           

為了友善管理叢集,使用 cdh1 作為管理節點,并在 /opt/shell 目錄編寫了幾腳本,/opt/shell/cmd.sh 用于批量執行指令:

$ cat /opt/shell/cmd.sh

#!/bin/sh

for node in 121 122 123;do
    echo "==============="192.168.56.$node"==============="
    ssh 192.168.56.$node $1
done
           

/opt/shell/cmd.sh 用于批量執行指令:

$ cat /opt/shell/syn.sh

#!/bin/sh

for node in 121 122 123;do
    echo "==============="192.168.56.$node"==============="
    scp -r $1 192.168.56.$node:$2
done
           

/opt/shell/cluster.sh 用于批量維護叢集各個服務:

$ cat /opt/shell/cluster.sh
#!/bin/sh
for node in 121 122 123;do
    echo "==============="192.168.56.$node"==============="
    ssh 192.168.56.$node 'for src in `ls /etc/init.d|grep '$1'`;do service $src '$2'; done'
done
           

2. 安裝 kerberos

在 cdh1 節點修改 /etc/krb5.conf 如下:

[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log

[libdefaults]
default_realm = JAVACHEN.COM
dns_lookup_realm = false
dns_lookup_kdc = false
clockskew = 120
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
renewable = true
udp_preference_limit = 1
default_tgs_enctypes = arcfour-hmac
default_tkt_enctypes = arcfour-hmac

[realms]
JAVACHEN.COM = {
 kdc = cdh1:88
 admin_server = cdh1:749
}

[domain_realm]
.javachen.com = JAVACHEN.COM
javachen.com = JAVACHEN.COM

[kdc]
profile=/var/kerberos/krb5kdc/kdc.conf
           

修改/var/kerberos/krb5kdc/kdc.conf 如下:

[kdcdefaults]
 v4_mode = nopreauth
 kdc_ports = 88
 kdc_tcp_ports = 88

[realms]
 JAVACHEN.COM = {
  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes =  des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal des-cbc-crc:v4 des-cbc-crc:afs3
  max_life = 24h
  max_renewable_life = 7d
  default_principal_flags = +renewable, +forwardable
 }
           

修改 /var/kerberos/krb5kdc/kadm5.acl 如下:

*/[email protected]  *
           

将 cdh1 上的 /etc/krb5.conf 同步到叢集各個節點上:

sh /opt/shell/syn.sh /etc/krb5.conf /etc/krb5.conf
           

在 kerberos 伺服器節點上使用下面腳本初始化 kerberos:

yum install krb5-server krb5-libs krb5-auth-dialog krb5-workstation  -y

rm -rf /var/kerberos/krb5kdc/*.keytab /var/kerberos/krb5kdc/prin*

kdb5_util create -r JAVACHEN.COM -s

chkconfig --level 35 krb5kdc on
chkconfig --level 35 kadmin on
service krb5kdc restart
service kadmin restart

echo -e "root\nroot" | kadmin.local -q "addprinc root/admin"

DNS=JAVACHEN.COM
hostname=`hostname -i`

#讀取/etc/host檔案中ip為 192.168.56 開頭的機器名稱并排除自己(kerberos 伺服器)
for host in  `cat /etc/hosts|grep 192.168.56|grep -v $hostname|awk '{print $2}'` ;do
    for user in hdfs; do
        kadmin.local -q "addprinc -randkey $user/[email protected]$DNS"
        kadmin.local -q "xst -k /var/kerberos/krb5kdc/$user-un.keytab $user/[email protected]$DNS"
    done
    for user in HTTP hive yarn mapred impala zookeeper sentry llama zkcli ; do
        kadmin.local -q "addprinc -randkey $user/[email protected]$DNS"
        kadmin.local -q "xst -k /var/kerberos/krb5kdc/$user.keytab $user/[email protected]$DNS"
    done
done

cd /var/kerberos/krb5kdc/
echo -e "rkt hdfs-un.keytab\nrkt HTTP.keytab\nwkt hdfs.keytab" | ktutil

#kerberos 重新初始化之後,還需要添加下面代碼用于內建 ldap

kadmin.local -q "addprinc [email protected]"
kadmin.local -q "addprinc -randkey ldap/[email protected]"
kadmin.local -q "ktadd -k /etc/openldap/ldap.keytab ldap/[email protected]"

/etc/init.d/slapd restart

#測試 ldap 是否可以正常使用
ldapsearch -x -b 'dc=javachen,dc=com'
           

将其儲存為 /root/init_kerberos.sh,然後運作:

sh /root/init_kerberos.sh
           

将上面生成的 keytab 同步到其他節點并設定權限:

sh /opt/shell/syn.sh /opt/keytab/hdfs.keytab /etc/hadoop/conf/
sh /opt/shell/syn.sh /opt/keytab/mapred.keytab /etc/hadoop/conf/
sh /opt/shell/syn.sh /opt/keytab/yarn.keytab /etc/hadoop/conf/
sh /opt/shell/syn.sh /opt/keytab/hive.keytab /etc/hive/conf/
sh /opt/shell/syn.sh /opt/keytab/impala.keytab /etc/impala/conf/
sh /opt/shell/syn.sh /opt/keytab/zookeeper.keytab /etc/zookeeper/conf/
sh /opt/shell/syn.sh /opt/keytab/zkcli.keytab /etc/zookeeper/conf/
sh /opt/shell/syn.sh /opt/keytab/sentry.keytab /etc/sentry/conf/

sh /opt/shell/cmd.sh "chown hdfs:hadoop /etc/hadoop/conf/hdfs.keytab ;chmod 400 /etc/hadoop/conf/*.keytab"
sh /opt/shell/cmd.sh "chown mapred:hadoop /etc/hadoop/conf/mapred.keytab ;chmod 400 /etc/hadoop/conf/*.keytab"
sh /opt/shell/cmd.sh "chown yarn:hadoop /etc/hadoop/conf/yarn.keytab ;chmod 400 /etc/hadoop/conf/*.keytab"
sh /opt/shell/cmd.sh "chown hive:hadoop /etc/hive/conf/hive.keytab ;chmod 400 /etc/hive/conf/*.keytab"
sh /opt/shell/cmd.sh "chown impala:hadoop /etc/impala/conf/impala.keytab ;chmod 400 /etc/impala/conf/*.keytab"
sh /opt/shell/cmd.sh "chown zookeeper:hadoop /etc/zookeeper/conf/*.keytab ;chmod 400 /etc/zookeeper/conf/*.keytab"

# sentry 隻安裝在 cdh1 節點
chown sentry:hadoop /etc/sentry/conf/*.keytab ;chmod 400 /etc/sentry/conf/*.keytab
           

在叢集中每個節點安裝 kerberos 用戶端:

批量擷取 root/admin 使用者的 ticket

3. hadoop 內建 kerberos

更新每個節點上的 JCE 檔案并修改 /etc/default/hadoop-hdfs-datanode,并且修改 hdfs、yarn、mapred、hive 的配置檔案。

如果配置了 HA,則先配置 zookeeper 內建 kerberos。

同步配置檔案:

sh /opt/shell/syn.sh /etc/hadoop/conf /etc/hadoop
sh /opt/shell/syn.sh /etc/zookeeper/conf /etc/zookeeper

sh /opt/shell/cmd.sh "cd /etc/hadoop/conf/; chown root:yarn container-executor.cfg ; chmod 400 container-executor.cfg"

sh /opt/shell/syn.sh /etc/hive/conf /etc/hive
           

接下來就是依次擷取每個服務對應的 ticket 并啟動對應的服務,我建立了一個腳本 /opt/shell/manager_cluster.sh 來做這件事:

#!/bin/bash

role=$1
dir=$role
command=$2

if [ X"$role" == X"hdfs" ];then
    dir=hadoop
fi

if [ X"$role" == X"yarn" ];then
        dir=hadoop
fi

if [ X"$role" == X"mapred" ];then
        dir=hadoop
fi

echo $dir $role $command
for node in 121 122 123 ;do
    echo "========192.168.56.$node========"
    ssh 192.168.56.$node '
        host=`hostname -f| tr "[:upper:]" "[:lower:]"`
        path="'$role'/$host"
        #echo $path
        principal=`klist -k /etc/'$dir'/conf/'$role'.keytab | grep $path | head -n1 | cut -d " " -f5`
        echo $principal
        if [ X"$principal" == X ]; then
            principal=`klist -k /etc/'$dir'/conf/'$role'.keytab | grep $path | head -n1 | cut -d " " -f4`
            echo $principal
            if [ X"$principal" == X ]; then
                    echo "Failed to get hdfs Kerberos principal"
                    exit 1
            fi
        fi
        kinit -kt /etc/'$dir'/conf/'$role'.keytab $principal
        if [ $? -ne 0 ]; then
                echo "Failed to login as hdfs by kinit command"
                exit 1
        fi
        kinit -R
        for src in `ls /etc/init.d|grep '$role'`;do service $src '$command'; done

    '
done
           

啟動指令:

# 啟動 zookeeper
sh /opt/shell/manager_cluster.sh zookeeper restart

# 擷取 hdfs 服務的 ticket
sh /opt/shell/manager_cluster.sh hdfs status

# 使用普通腳本依次啟動 hadoop-hdfs-zkfc、hadoop-hdfs-journalnode、hadoop-hdfs-namenode、hadoop-hdfs-datanode
sh /opt/shell/cluster.sh hadoop-hdfs-zkfc restart
sh /opt/shell/cluster.sh hadoop-hdfs-journalnode restart
sh /opt/shell/cluster.sh hadoop-hdfs-namenode restart
sh /opt/shell/cluster.sh hadoop-hdfs-datanode restart

sh /opt/shell/manager_cluster.sh yarn restart
sh /opt/shell/manager_cluster.sh mapred restart

sh /opt/shell/manager_cluster.sh hive restart
           

修改 impala 配置檔案并同步到其他節點,然後啟動 impala 服務:

\cp /etc/hadoop/conf/core-site.xml /etc/impala/conf/
\cp /etc/hadoop/conf/hdfs-site.xml /etc/impala/conf/
\cp /etc/hive/conf/hive-site.xml /etc/impala/conf/

sh /opt/shell/syn.sh /etc/impala/conf /etc/impala/
sh /opt/shell/syn.sh /etc/default/impala /etc/default/impala
sh /opt/shell/manager_cluster.sh impala restart
           

到此,叢集應該啟動成功了。

3. 安裝 ldap

使用下面指令在 cdh1 節點快速安裝 ldap-server:

yum install db4 db4-utils db4-devel cyrus-sasl* krb5-server-ldap -y
yum install openldap openldap-servers openldap-clients openldap-devel compat-openldap -y

# 更新配置庫:
rm -rf /var/lib/ldap/*
cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG
chown -R ldap.ldap /var/lib/ldap

# 備份原來的 slapd-conf
cp -rf /etc/openldap/slapd.d /etc/openldap/slapd.d.bak

cp /usr/share/doc/krb5-server-ldap-1.10.3/kerberos.schema /etc/openldap/schema/
touch /etc/openldap/slapd.conf

echo "include /etc/openldap/schema/corba.schema
include /etc/openldap/schema/core.schema
include /etc/openldap/schema/cosine.schema
include /etc/openldap/schema/duaconf.schema
include /etc/openldap/schema/dyngroup.schema
include /etc/openldap/schema/inetorgperson.schema
include /etc/openldap/schema/java.schema
include /etc/openldap/schema/misc.schema
include /etc/openldap/schema/nis.schema
include /etc/openldap/schema/openldap.schema
include /etc/openldap/schema/ppolicy.schema
include /etc/openldap/schema/collective.schema
include /etc/openldap/schema/kerberos.schema" > /etc/openldap/slapd.conf

echo -e "pidfile /var/run/openldap/slapd.pid\nargsfile /var/run/openldap/slapd.args" >> /etc/openldap/slapd.conf
slaptest -f /etc/openldap/slapd.conf -F /etc/openldap/slapd.d
chown -R ldap:ldap /etc/openldap/slapd.d && chmod -R 700 /etc/openldap/slapd.d

#重新開機服務
chkconfig --add slapd
chkconfig --level 345 slapd on

/etc/init.d/slapd restart
           

內建 kerberos:

# 建立管理者使用者
kadmin.local -q "addprinc [email protected]"
kadmin.local -q "addprinc -randkey ldap/[email protected]"

rm -rf /etc/openldap/ldap.keytab
kadmin.local -q "ktadd -k /etc/openldap/ldap.keytab ldap/[email protected]"

chown -R ldap:ldap /etc/openldap/ldap.keytab
/etc/init.d/slapd restart
           

建立 modify.ldif 檔案用于更新資料庫:

dn: olcDatabase={2}bdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=javachen,dc=com

dn: olcDatabase={2}bdb,cn=config
changetype: modify
replace: olcRootDN
# Temporary lines to allow initial setup
olcRootDN: uid=ldapadmin,ou=people,dc=javachen,dc=com

dn: olcDatabase={2}bdb,cn=config
changetype: modify
add: olcRootPW
olcRootPW: secret

dn: cn=config
changetype: modify
add: olcAuthzRegexp
olcAuthzRegexp: uid=([^,]*),cn=GSSAPI,cn=auth uid=$1,ou=people,dc=javachen,dc=com

dn: olcDatabase={2}bdb,cn=config
changetype: modify
add: olcAccess
# Everyone can read everything
olcAccess: {0}to dn.base="" by * read
# The ldapadm dn has full write access
olcAccess: {1}to * by dn="uid=ldapadmin,ou=people,dc=javachen,dc=com" write by * read
           

運作下面指令更新資料庫:

ldapmodify -Y EXTERNAL -H ldapi:/// -f modify.ldif
           

添加使用者群組,建立 setup.ldif 如下:

dn: dc=javachen,dc=com
objectClass: top
objectClass: dcObject
objectclass: organization
o: javachen com
dc: javachen

dn: ou=people,dc=javachen,dc=com
objectclass: organizationalUnit
ou: people
description: Users

dn: ou=group,dc=javachen,dc=com
objectClass: organizationalUnit
ou: group

dn: uid=ldapadmin,ou=people,dc=javachen,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
cn: LDAP admin account
uid: ldapadmin
sn: ldapadmin
uidNumber: 1001
gidNumber: 100
homeDirectory: /home/ldap
loginShell: /bin/bash
           

運作下面指令導入到資料庫:

ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f setup.ldif
           

接下來,可以在 ldap 伺服器上建立一些本地系統使用者,然後将這些使用者導入到 ldap 服務中。

先安裝 migrationtools 然後修改 /usr/share/migrationtools/migrate_common.ph 檔案中的 defalut DNS domain 和 defalut base。

# 建立 admin 組
groupadd admin

# 建立 test 和 hive 使用者,用于後面測試 sentry
useradd test hive
usermod -G admin test
usermod -G admin hive

# 将關鍵使用者導入到 ldap
grep -E "bi_|hive|test" /etc/passwd  >/opt/passwd.txt
/usr/share/migrationtools/migrate_passwd.pl /opt/passwd.txt /opt/passwd.ldif
ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f /opt/passwd.ldif

# 将 admin 組導入到 ldap
grep -E "admin" /etc/group  >/opt/group.txt
/usr/share/migrationtools/migrate_group.pl /opt/group.txt /opt/group.ldif
ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f /opt/group.ldif
           

然後,你可以依次為每個使用者設定密碼,使用下面指令:

ldappasswd -x -D 'uid=ldapadmin,ou=people,dc=javachen,dc=com' -w secret "uid=hive,ou=people,dc=javachen,dc=com" -S
           

另外,這些使用者群組都是存在于 ldap 伺服器上的,需要将其遠端挂載到 hadoop 的每個節點上,否則,你需要在每個節點建立對應的使用者群組(目前,測試是這樣的)。

4. 內建 sentry

這部分建議使用資料庫的方式存儲規則,不建議生産環境使用檔案儲存方式。

詳細的配置,請參考 Impala和Hive內建Sentry

通過 beeline 使用 

hive/[email protected]

 連接配接 hive-server2 建立一些角色群組:

create role admin_role;
GRANT ALL ON SERVER server1 TO ROLE admin_role;
GRANT ROLE admin_role TO GROUP admin;
GRANT ROLE admin_role TO GROUP hive;

create role test_role;
GRANT ALL ON DATABASE testdb TO ROLE test_role;
GRANT ALL ON DATABASE default TO ROLE test_role;
GRANT ROLE test_role TO GROUP test;
           

上面 amdin 和 hive 組具有所有資料庫的管理者權限,而 test 組隻有 testdb 和 default 庫的讀寫權限。

在 impala-shell 中通過 ldap 的方式傳入不同的使用者,可以測試讀寫權限。

5. 如何添加新使用者并設定權限?

TODO

Enjoy it !

繼續閱讀