- 下载spark-2.4.0-bin-hadoop2.7.tgz并上传至gateway节点
地址:https://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
- 解压spark-4.0-bin-hadoop2.7.tgz 到 /opt/cloudera/parcels/CDH/lib/spark2 中
- 替换conf的配置文件
cp -r /etc/spark/conf/ /opt/cloudera/parcels/CDH/lib/spark2/conf/
复制hive-site.xml文件到/opt/cloudera/parcels/CDH/lib/spark2/conf中
cp /etc/Hadoop/conf/hive-site.xml /opt/cloudera/parcels/CDH/lib/spark2/conf
- 修改conf中spark-defaults.conf配置文件
spark-defaults.conf中配置项
上传spark运行依赖jar包到hdfs相应目录中:
hdfs dfs -put /opt/cloudera/parcels/CDH/lib/spark2/jars/* /user/spark/jars/
修改对应配置项
spark.yarn.jars=hdfs://master1:8020/user/spark/jars/*
- 修改spark-env.sh 为spark-env
- 配置spark-sql命令
cd /opt/cloudera/parcels/CDH/bin
cp spark-shell spark-sql
vim spark-sql
Alternatives –install /usr/bin/spark-sql spark-sql /opt/cloudera/parcels/CDH/bin/spark-sql 1
- cp /opt/cloudera/parcels/CDH/lib/spark/jars/spark-lineage_2.11-2.4.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/spark2/jars --支持血缘分析jar包,不加会报错
- 在部署了yarn角色的节点把/etc/hadoop/conf.cloudera.yarn下的所有文件拷到gateway节点对应位置下
scp -r ./conf.cloudera.yarn/ [email protected]:/etc/hadoop/
- 切换有权限的用户bdp-dwh,运行spark-sql命令