轉載請注明出處:http://www.cnblogs.com/xiaodf/
之前的部落格介紹了通過Kerberos + Sentry的方式實作了hive server2的身份認證和權限管理功能,本文主要介紹Spark SQL JDBC方式操作Hive庫時的身份認證和權限管理實作。
ThriftServer是一個JDBC/ODBC接口,使用者可以通過JDBC/ODBC連接配接ThriftServer來通路SparkSQL的資料。ThriftServer在啟動的時候,會啟動了一個sparkSQL的應用程式,而通過JDBC/ODBC連接配接進來的用戶端共同分享這個sparkSQL應用程式的資源,也就是說不同的使用者之間可以共享資料;ThriftServer啟動時還開啟一個偵聽器,等待JDBC用戶端的連接配接和送出查詢。是以,在配置ThriftServer的時候,至少要配置ThriftServer的主機名和端口,如果要使用hive資料的話,還要提供hive metastore的uris。
前提:
本文是在以下幾個部署前提下進行的實驗:
(1)CDH 開啟了Kerberos身份認證,并安裝了Sentry;
(2)Hive權限通過Sentry服務控制;
(3)HDFS開啟了HDFS ACL與Sentry的權限同步功能,通過sql語句更改Hive表的權限,會同步到相應的HDFS檔案。
以上各項配置可參考我之前部落格:http://www.cnblogs.com/xiaodf/p/5968248.html
1、Thrift Server 安裝配置
1.1. 下載下傳spark安裝包
CDH自帶的spark不支援thrift server,是以需要自行下載下傳spark編譯好的安裝包,下載下傳位址如下:http://spark.apache.org/downloads.html
本文下載下傳的spark版本為1.5.2,
1.2. 添加配置檔案
将叢集hive-site.xml檔案拷貝到spark目錄的conf下
[[email protected] spark-1.5.2-bin-hadoop2.6]# cd conf/
[[email protected] conf]# ll
total 52
-rw-r--r-- 1 root root 202 Oct 25 13:05 docker.properties.template
-rw-r--r-- 1 root root 303 Oct 25 13:05 fairscheduler.xml.template
-rw-r--r-- 1 root root 5708 Oct 25 13:08 hive-site.xml
-rw-r--r-- 1 root root 949 Oct 25 13:05 log4j.properties.template
-rw-r--r-- 1 root root 5886 Oct 25 13:05 metrics.properties.template
-rw-r--r-- 1 root root 80 Oct 25 13:05 slaves.template
-rw-r--r-- 1 root root 507 Oct 25 13:05 spark-defaults.conf.template
-rwxr-xr-x 1 root root 4299 Oct 25 13:08 spark-env.sh
-rw-r--r-- 1 root root 3418 Oct 25 13:05 spark-env.sh.template
-rwxr-xr-x 1 root root 119 Oct 25 13:09 stopjdbc.sh
修改hive-site.xml參數hive.server2.enable.doAs為true,注意doAs務必是true,否則spark jdbc使用者權限控制會失效。
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
生成spark-env.sh檔案,并添加參數
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
此處HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
1.3. 生成啟動thrift服務腳本startjdbc.sh
調用start-thriftserver.sh腳本啟動thrift server
#!/bin/sh
#start Spark-thriftserver
export YARN_CONF_DIR=/etc/hadoop/conf
file="hive-site.xml"
dir=$(pwd)
cd conf/
if [ ! -e "$file" ]
then
cp /etc/hive/conf/hive-site.xml $dir/conf/
fi
cd ../sbin
./start-thriftserver.sh --name SparkJDBC --master yarn-client --num-executors 10 --executor-memory 2g --executor-cores 4 --driver-memory 10g
--driver-cores 2 --conf spark.storage.memoryFraction=0.2 --conf spark.shuffle.memoryFraction=0.6 --hiveconf hive.server2.thrift.port=10001
--hiveconf hive.server2.logging.operation.enabled=true --hiveconf hive.server2.authentication.kerberos.principal=hive/[email protected]
--hiveconf hive.server2.authentication.kerberos.keytab /home/hive.keytab
上面腳本實際上就是送出了一個spark job,其中主要參數如下:
master :指定spark送出模式為yarn-client
hive.server2.thrift.port : 指定thrift server的端口
hive.server2.authentication.kerberos.principal:指定啟動thrift server的超級管理者principal,此處超級管理者為hive
hive.server2.authentication.kerberos.keytab : 超級管理者對應的keytab
執行startjdbc.sh需要kinit到hive庫的超管來執行,hive庫的超管需要在開啟sentry與hdfs權限同步基礎上,被賦予整個hive庫的權限,即對hive庫的hdfs整個目錄也有所有權限。
1.4. 生成關閉thrift服務腳本stopjdbc.sh
#!/bin/sh
# Stop SparkJDBC
cd sbin
./spark-daemon.sh stop org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 1
2、認證測試
Spark SQL Thriftserver認證,目的是讓不同的使用者,使用不同的身份來登入beeline。使用Kerberos,的确可以解決服務互相認證、使用者認證的功能。
2.1. 啟動thrift server
使用使用管理者賬戶啟動,已配置在啟動腳本中。thriftserver實際是個spark Job,通過spark-submit送出到YARN上去,需要這個賬戶用來通路YARN和HDFS;如果使用一些普通賬戶,由于HDFS權限不足,可能啟動不了,因為需要往HDFS寫一些東西。
[[email protected] spark-1.5.2-bin-hadoop2.6]# ./startjdbc.sh
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /home/iie/spark-1.5.2-bin-hadoop2/spark-1.5.2-bin-hadoop2.6/sbin/../logs/
spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-t162.out
......
16/10/25 16:56:07 INFO thrift.ThriftCLIService: Starting ThriftBinaryCLIService on port 10001 with 5...500 worker threads
可以通過輸出日志檢視服務啟動情況
[[email protected] spark-1.5.2-bin-hadoop2.6]# tailf /home/iie/spark-1.5.2-bin-hadoop2/spark-1.5.2-bin-hadoop2.6/sbin/../logs/
spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-t162.out
2.2. 通過beeline連接配接thrift server
因為服務啟動了kerberos身份認證,沒有認證時連接配接服務會報錯,如下所示:
[[email protected] ~]# beeline -u "jdbc:hive2://t162:10001/;principal=hive/[email protected]"
16/10/25 16:59:04 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it.
scan complete in 2ms
Connecting to jdbc:hive2://t162:10001/;principal=hive/[email protected]
16/10/25 16:59:06 [main]: ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
我們用user1使用者進行認證,就可以連接配接了,使用者事先已建立,建立方式見http://www.cnblogs.com/xiaodf/p/5968282.html
[[email protected] ~]# kinit user1
Password for [email protected]:
[[email protected] ~]# beeline -u "jdbc:hive2://t162:10001/;principal=hive/[email protected]"
16/10/25 17:01:46 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it.
scan complete in 3ms
Connecting to jdbc:hive2://t162:10001/;principal=hive/[email protected]
Connected to: Spark SQL (version 1.5.2)
Driver: Hive JDBC (version 1.1.0-cdh5.7.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.7.2 by Apache Hive
0: jdbc:hive2://t162:10001/>
3、權限測試
不同的使用者通過kinit使用自己的Principal+密碼通過Kerberos的AS認證拿到TGT,就可以登入到spark sql thriftserver上去檢視庫、表;
不過由于sts還不支援sqlbased authorization,是以還隻能做到底層hdfs的權限隔離,比較可惜;相對來說hive的完整度高一些,支援SQLstandard authorization。
因為事先我們已經開啟了HDFS ACL與Sentry的權限同步功能,是以spark sql jdbc 的使用者權限通過hive2的權限設定來實作。即先jdbc登入hive2 ,再利用hive sql語句進行使用者權限設定,然後表和資料庫的權限會同步到對應的HDFS目錄和檔案,進而實作spark sql thriftserver基于底層hdfs的使用者權限隔離。
如下所示,user1對test庫的table1表有權限,對test庫的table2表無權限,讀table2表時顯示無hdfs權限,即權限設定成功!
0: jdbc:hive2://node1:10000/> select * from test.table1 limit 1;
+--------------+-------------+---------------------+----------+-----------+----------+---------------------------+-----------+-----------+------------------------+------------+---------------+-------------+--+
| cint | cbigint | cfloat | cdouble | cdecimal | cstring | cvarchar | cboolean | ctinyint | ctimestamp | csmallint | cipv4 | cdate |
+--------------+-------------+---------------------+----------+-----------+----------+---------------------------+-----------+-----------+------------------------+------------+---------------+-------------+--+
| 15000000001 | 1459107060 | 1.8990000486373901 | 1.7884 | 1.92482 | 中文測試1 | /browser/addBasicInfo.do | true | -127 | 2014-05-14 00:53:21.0 | -63 | 0 | 2014-05-14 |
+--------------+-------------+---------------------+----------+-----------+----------+---------------------------+-----------+-----------+------------------------+------------+---------------+-------------+--+
1 row selected (3.165 seconds)
0: jdbc:hive2://node1:10000/> select * from test.table2 limit 10;
Error: org.apache.hadoop.security.AccessControlException: Permission denied: user=user1, access=READ_EXECUTE, inode="/user/hive/warehouse/test.db/table2":hive:hive:drwxrwx--x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkAccessAcl(DefaultAuthorizationProvider.java:365)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:258)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:175)
at org.apache.sentry.hdfs.SentryAuthorizationProvider.checkPermission(SentryAuthorizationProvider.java:178)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6617)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6524)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:5061)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:5022)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:882)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getListing(AuthorizationProviderProxyClientProtocol.java:335)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:615)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) (state=,code=0)
權限測試可參考之前部落格:http://www.cnblogs.com/xiaodf/p/5968282.html,此處略
4、問題
4.1 問題1
使用spark1.6.0版本,啟動thrift server服務後,執行“show databases”報如下錯誤:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
查詢資料說這可能是1.6版本的一個bug,換成1.5.2版本後,沒有這個問題了。下面為此問題的查詢連結:https://forums.databricks.com/questions/7207/spark-thrift-server-on-kerberos-enabled-hadoophive.html
問題2
Spark SQL ThriftServer服務啟動7天後,使用者在用beeline指令去連接配接服務報錯連不上了。
服務日志報一下錯誤:
17/01/18 13:46:08 INFO HiveMetaStore.audit: ugi=hive/[email protected] ip=unknown-ip-addr cmd=Metastore shutdown complete.
17/01/18 13:46:08 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before.
17/01/18 13:46:08 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before.
17/01/18 13:46:08 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before.
17/01/18 13:46:08 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before.
17/01/18 13:46:09 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before.
17/01/18 13:46:12 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before.
17/01/18 13:46:17 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before.
17/01/18 13:46:19 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before.
17/01/18 13:46:19 WARN ipc.Client: Couldn't setup connection for hive/[email protected] to t162/t161:8020
17/01/18 13:46:19 WARN thrift.ThriftCLIService: Error opening session:
org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException:
Failed on local exception: java.io.IOException: Couldn't setup connection for hive/[email protected] to t162/t161:8020; Host Details :
local host is: "t162/t161"; destination host is: "t162":8020;
at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:264)
at org.apache.spark.sql.hive.thriftserver.SparkSQLSessionMa
原因:建立kerberos庫時我們設定了principal的認證有效期和最大renew時間,如下/etc/krb5.conf檔案内容所示:
[libdefaults]
default_realm = HADOOP.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
renewable=true
7天後認證無法renew導緻服務認證失敗,使用者連不上服務了。未解決這個問題我們需要定時重新kinit下服務principal,我們對服務啟動腳本進行一下修改,添加定時認證腳本,如下所示:
#!/bin/sh
#start Spark-thriftserver
export YARN_CONF_DIR=/etc/hadoop/conf
file="hive-site.xml"
dir=$(pwd)
cd conf/
if [ ! -e "$file" ]
then
cp /etc/hive/conf/hive-site.xml $dir/conf/
fi
cd ../sbin
./start-thriftserver.sh --name SparkJDBC --master yarn-client --num-executors 10 --executor-memory 2g --executor-cores 4 --driver-memory 10g --driver-cores 2 --conf spark.storage.memoryFraction=0.2 --conf spark.shuffle.memoryFraction=0.6 --hiveconf hive.server2.thrift.port=10001 --hiveconf hive.server2.logging.operation.enabled=true --hiveconf hive.server2.authentication.kerberos.principal=hive/[email protected] --hiveconf hive.server2.authentication.kerberos.keytab=/home/hive.keytab
while(true)
do
kinit -kt /home/hive.keytab hive/[email protected]
sleep 6*24h
done &
經測試,問題解決!
轉載于:https://www.cnblogs.com/xiaodf/p/5997742.html