天天看点

CDH6.3.2集成spark-sql完整版本

  1. 下载spark-2.4.0-bin-hadoop2.7.tgz并上传至gateway节点

地址:https://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

  1. 解压spark-4.0-bin-hadoop2.7.tgz 到 /opt/cloudera/parcels/CDH/lib/spark2 中
CDH6.3.2集成spark-sql完整版本
  1. 替换conf的配置文件

cp -r /etc/spark/conf/ /opt/cloudera/parcels/CDH/lib/spark2/conf/

复制hive-site.xml文件到/opt/cloudera/parcels/CDH/lib/spark2/conf中

cp /etc/Hadoop/conf/hive-site.xml  /opt/cloudera/parcels/CDH/lib/spark2/conf

  1. 修改conf中spark-defaults.conf配置文件

spark-defaults.conf中配置项

上传spark运行依赖jar包到hdfs相应目录中:

hdfs dfs -put /opt/cloudera/parcels/CDH/lib/spark2/jars/* /user/spark/jars/

修改对应配置项

spark.yarn.jars=hdfs://master1:8020/user/spark/jars/*

  1. 修改spark-env.sh 为spark-env
  2. 配置spark-sql命令

cd /opt/cloudera/parcels/CDH/bin

cp spark-shell spark-sql

vim spark-sql

CDH6.3.2集成spark-sql完整版本

Alternatives –install /usr/bin/spark-sql spark-sql /opt/cloudera/parcels/CDH/bin/spark-sql 1

  1. cp /opt/cloudera/parcels/CDH/lib/spark/jars/spark-lineage_2.11-2.4.0-cdh6.3.2.jar

/opt/cloudera/parcels/CDH/lib/spark2/jars   --支持血缘分析jar包,不加会报错

  1. 在部署了yarn角色的节点把/etc/hadoop/conf.cloudera.yarn下的所有文件拷到gateway节点对应位置下

scp -r ./conf.cloudera.yarn/ [email protected]:/etc/hadoop/

  1. 切换有权限的用户bdp-dwh,运行spark-sql命令
CDH6.3.2集成spark-sql完整版本

继续阅读