在启动pyspark2(CDH版本)时:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
网上搜索到的解决办法,是在Spark的spark-env.sh文件中添加下面语句:
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
但是配置之后发现Spark启动不了,日志里说是找不到hadoop
网上搜索可知,此处的hadoop是hadoop安装目录的bin下的命令,同理跑到CDH安装路径下找到hadoop,输入hadoop classpath
![](https://img.laitimes.com/img/_0nNw4CM6IyYiwiM6ICdiwiIwczX0xiRGZkRGZ0Xy9GbvNGL2EzXlpXazxSPNRVT1kEVOFzYE5kerJTYwR2MMBjVtJWd0ckW65UbM5WOHJWa5kHT20ESjBjUIF2X0hXZ0xCMx81dvRWYoNHLrdEZwZ1Rh5WNXp1bwNjW1ZUba9VZwlHdssmch1mclRXY39CXldWYtlWPzNXZj9mcw1ycz9WL49zZuBnL0QjNzQDO1AjM2EzMwkTMwIzLc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
在spark-env.sh中输入上述路径即可。