天天看点

Zeppelin的初体验--安装,hive on Zeppelin简介hive on Zeppelinhive on zeppelin 问题解决:

简介

  eppelin是一个基于Web的notebook,提供交互数据分析和可视化。后台支持接入多种数据处理引擎,如spark,hive等。支持多种语言: Scala(Apache Spark)、Python(Apache Spark)、SparkSQL、 Hive、 Markdown、Shell等。本文主要介绍Zeppelin中Interpreter和SparkInterpreter的实现原理。

  • 官方网址: http://zeppelin.apache.org/
  • Zeppelin 下载地址:

    wget https://mirror.bit.edu.cn/apache/zeppelin/zeppelin-0.8.2/zeppelin-0.8.2-bin-all.tgz

  • 解压 安装 Zeppelin
# 解压
[hadoop@hadoop001 software]$ tar -zxvf zeppelin-0.8.2-bin-all.tgz -C ~/app/
# 配置文件
[hadoop@hadoop001 ~]$ cd app/zeppelin-0.8.2-bin-all/conf

[hadoop@hadoop001 conf]$ vi zeppelin-site.xml 
# 修改两个配置 其他默认
<property>
  <name>zeppelin.server.addr</name>
  <value>hadoop001</value>  #自己主机的ip 或 0.0.0.0
  <description>Server binding address</description>
</property>

<property>
  <name>zeppelin.server.port</name>
  <value>8084</value>  #注意端口是否被占用 默认端口 8080
  <description>Server port.</description>
</property>

[hadoop@hadoop001 conf]$ vi zeppelin-env.sh
export JAVA_HOME=/root/apps/jdk1.8.0_221
export SPARK_HOME=/home/hadoop/app/spark-2.4.4-bin-2.6.0-cdh5.15.1
export SPARK_APP_NAME="ZeppelinAaron"
export HADOOP_CONF_DIR=/root/apps/hadoop/etc/hadoop

           
  • 启动 Zeppelin

    ./zeppelin-daemon.sh start

    -------------------------------------------------------至此 Zeppelin 已经完成安装-----------------------------------------------------------------------

hive on Zeppelin

  • 配置Hive Interpreter

      nterpreter 是Zeppelin里最重要的概念,每一种Interpreter对应一个引擎。Hive对应的Interpreter是Jdbc Interpreter, 因为Zeppelin是通过Hive的Jdbc接口来运行Hive SQL。

       接下来你可以在Zeppelin的Interpreter页面配置Jdbc Interpreter来启用Hive。首先我想说明的是Zeppelin的Jdbc Interpreter可以支持所有Jdbc协议的数据库,Zeppelin 的Jdbc Interpreter默认是连接Postgresql。

启动Hive,可以有2种选择:

  • 修改默认jdbc interpreter的配置项(这种配置下,在Note里用hive可以直接 %jdbc 开头)
  • 创建一个新的Jdbc interpreter,命名为hive (这种配置下,在Note里用hive可以直接 %hive 开头)

这里我会选用第2种方法。创建一个新的hive interpreter,然后配置以下基本的属性(你需要根据自己的环境做配置)

配置项
default.driver org.apache.hive.jdbc.HiveDriver (注: Zeppelin中无此jar,需要自己添加依赖)
default.url jdbc:hive2://hadoop001:10000 (端口号10000,是 hive 中 HiveServer2默认值)
default.user hadoop
default.password ***********
  • 添加依赖 (注: jar版本需要与hive的版本一致)
    Zeppelin的初体验--安装,hive on Zeppelin简介hive on Zeppelinhive on zeppelin 问题解决:
       hive.url的默认配置形式是 jdbc:hive2://host:port/<db_name>, 这里的host是你的hiveserver2的机器名,port是 hiveserver2的thrift 端口 (如果你的hiveserver2用的是binary模式,那么对应的hive配置是hive.server2.thrift.port (默认是10000),如果是http模式,那么对应的hive配置是hive.server2.thrift.http.port,(默认是10001) 。db_name是你要连的hive 数据库的名字,默认是default。
  • hive 开启hiveserver2
[[email protected] bin]$ ./hiveserver2 start
[[email protected] lib]$ jps -m
9984 RunJar /home/hadoop/app/hive/lib/hive-service-1.1.0-cdh5.15.1.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///home/hadoop/app/hive/auxlib/hive-exec-1.1.0-cdh5.15.1-core.jar start

           
Zeppelin的初体验--安装,hive on Zeppelin简介hive on Zeppelinhive on zeppelin 问题解决:
  • Zeppelin 操作hive中的表
    Zeppelin的初体验--安装,hive on Zeppelin简介hive on Zeppelinhive on zeppelin 问题解决:

hive on zeppelin 问题解决:

  • java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
  • 导入jar–hive-service-1.1.0.jar
java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:208)
	at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
	at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
	at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
	at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)

           
  • java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.auth.HiveAuthUtils
  • 导入hive-common-1.1.0
java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.auth.HiveAuthUtils
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.hive.jdbc.HiveConnection.createUnderlyingTransport(HiveConnection.java:376)
	at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:396)
	at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:201)
	at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:168)
	at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:208)
	at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
	at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
	at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
	at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:270)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnectionFromPool(JDBCInterpreter.java:425)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:443)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:692)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
	at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
           
  • java.lang.NoClassDefFoundError: com/google/common/primitives/Ints
  • –导入jar guava-14.0.1.jar
java.lang.NoClassDefFoundError: com/google/common/primitives/Ints
	at org.apache.hive.service.cli.Column.<init>(Column.java:150)
	at org.apache.hive.service.cli.ColumnBasedSet.<init>(ColumnBasedSet.java:51)
	at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37)
	at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:367)
	at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
	at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.getResults(JDBCInterpreter.java:567)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:749)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
	at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.google.common.primitives.Ints
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 20 more
           
  • java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
  • 解决: 查看HiveServer2日志 发现 权限问题
  • Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=hive, access=EXECUTE, inode="/tmp/hadoop-yarn/staging":hadoop:supergroup:drwx------
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:295)
	at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
	at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:737)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
	at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)