簡介
eppelin是一個基于Web的notebook,提供互動資料分析和可視化。背景支援接入多種資料處理引擎,如spark,hive等。支援多種語言: Scala(Apache Spark)、Python(Apache Spark)、SparkSQL、 Hive、 Markdown、Shell等。本文主要介紹Zeppelin中Interpreter和SparkInterpreter的實作原理。
- 官方網址: http://zeppelin.apache.org/
- Zeppelin 下載下傳位址:
wget https://mirror.bit.edu.cn/apache/zeppelin/zeppelin-0.8.2/zeppelin-0.8.2-bin-all.tgz
- 解壓 安裝 Zeppelin
# 解壓
[hadoop@hadoop001 software]$ tar -zxvf zeppelin-0.8.2-bin-all.tgz -C ~/app/
# 配置檔案
[hadoop@hadoop001 ~]$ cd app/zeppelin-0.8.2-bin-all/conf
[hadoop@hadoop001 conf]$ vi zeppelin-site.xml
# 修改兩個配置 其他預設
<property>
<name>zeppelin.server.addr</name>
<value>hadoop001</value> #自己主機的ip 或 0.0.0.0
<description>Server binding address</description>
</property>
<property>
<name>zeppelin.server.port</name>
<value>8084</value> #注意端口是否被占用 預設端口 8080
<description>Server port.</description>
</property>
[hadoop@hadoop001 conf]$ vi zeppelin-env.sh
export JAVA_HOME=/root/apps/jdk1.8.0_221
export SPARK_HOME=/home/hadoop/app/spark-2.4.4-bin-2.6.0-cdh5.15.1
export SPARK_APP_NAME="ZeppelinAaron"
export HADOOP_CONF_DIR=/root/apps/hadoop/etc/hadoop
-
啟動 Zeppelin
./zeppelin-daemon.sh start
-------------------------------------------------------至此 Zeppelin 已經完成安裝-----------------------------------------------------------------------
hive on Zeppelin
-
配置Hive Interpreter
nterpreter 是Zeppelin裡最重要的概念,每一種Interpreter對應一個引擎。Hive對應的Interpreter是Jdbc Interpreter, 因為Zeppelin是通過Hive的Jdbc接口來運作Hive SQL。
接下來你可以在Zeppelin的Interpreter頁面配置Jdbc Interpreter來啟用Hive。首先我想說明的是Zeppelin的Jdbc Interpreter可以支援所有Jdbc協定的資料庫,Zeppelin 的Jdbc Interpreter預設是連接配接Postgresql。
啟動Hive,可以有2種選擇:
- 修改預設jdbc interpreter的配置項(這種配置下,在Note裡用hive可以直接 %jdbc 開頭)
- 建立一個新的Jdbc interpreter,命名為hive (這種配置下,在Note裡用hive可以直接 %hive 開頭)
這裡我會選用第2種方法。建立一個新的hive interpreter,然後配置以下基本的屬性(你需要根據自己的環境做配置)
配置項 | 值 |
---|---|
default.driver | org.apache.hive.jdbc.HiveDriver (注: Zeppelin中無此jar,需要自己添加依賴) |
default.url | jdbc:hive2://hadoop001:10000 (端口号10000,是 hive 中 HiveServer2預設值) |
default.user | hadoop |
default.password | *********** |
- 添加依賴 (注: jar版本需要與hive的版本一緻) hive.url的預設配置形式是 jdbc:hive2://host:port/<db_name>, 這裡的host是你的hiveserver2的機器名,port是 hiveserver2的thrift 端口 (如果你的hiveserver2用的是binary模式,那麼對應的hive配置是hive.server2.thrift.port (預設是10000),如果是http模式,那麼對應的hive配置是hive.server2.thrift.http.port,(預設是10001) 。db_name是你要連的hive 資料庫的名字,預設是default。
Zeppelin的初體驗--安裝,hive on Zeppelin簡介hive on Zeppelinhive on zeppelin 問題解決: - hive 開啟hiveserver2
[[email protected] bin]$ ./hiveserver2 start
[[email protected] lib]$ jps -m
9984 RunJar /home/hadoop/app/hive/lib/hive-service-1.1.0-cdh5.15.1.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///home/hadoop/app/hive/auxlib/hive-exec-1.1.0-cdh5.15.1-core.jar start
- Zeppelin 操作hive中的表
Zeppelin的初體驗--安裝,hive on Zeppelin簡介hive on Zeppelinhive on zeppelin 問題解決:
hive on zeppelin 問題解決:
- java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
- 導入jar–hive-service-1.1.0.jar
java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)
- java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.auth.HiveAuthUtils
- 導入hive-common-1.1.0
java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.auth.HiveAuthUtils
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hive.jdbc.HiveConnection.createUnderlyingTransport(HiveConnection.java:376)
at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:396)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:201)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:168)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnectionFromPool(JDBCInterpreter.java:425)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:443)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:692)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
- java.lang.NoClassDefFoundError: com/google/common/primitives/Ints
- –導入jar guava-14.0.1.jar
java.lang.NoClassDefFoundError: com/google/common/primitives/Ints
at org.apache.hive.service.cli.Column.<init>(Column.java:150)
at org.apache.hive.service.cli.ColumnBasedSet.<init>(ColumnBasedSet.java:51)
at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37)
at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:367)
at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getResults(JDBCInterpreter.java:567)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:749)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.google.common.primitives.Ints
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 20 more
- java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
- 解決: 檢視HiveServer2日志 發現 權限問題
- Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=hive, access=EXECUTE, inode="/tmp/hadoop-yarn/staging":hadoop:supergroup:drwx------
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:295)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:737)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)