天天看點

Pycharm 搭建pyspark開發環境

一、spark安裝

  • spark下載下傳

    下載下傳位址 http://spark.apache.org/downloads.html

export SPARK_HOME=spark目錄/spark-2.4.5-bin-hadoop2.7
	export PATH=$PATH:$SPARK_HOME/bin
           

驗證spark是否安裝成功:

WARNING: Python 2.7 is not recommended.
This version is included in macOS for compatibility with legacy software.
Future versions of macOS will not include Python 2.7.
Instead, it is recommended that you transition to using 'python3' from within Terminal.

Python 2.7.16 (default, Oct 23 2019, 19:14:20)
[GCC 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.32.4) (-macos10.15-objc-s on darwin
Type "help", "copyright", "credits" or "license" for more information.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/weijiankong/Apps/spark-2.4.5-bin-hadoop2.7/jars/spark-unsafe_2.11-2.4.5.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/03/03 22:20:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.5
      /_/

Using Python version 2.7.16 (default, Oct 23 2019 19:14:20)
SparkSession available as 'spark'.
>>> 
           

搭建pyspark開發環境

  • 打開pycharm,建立project,建立一個新的虛拟環境
  • 配置項目環境變量,友善每次建立新py檔案都要再次環境變量,操作如下圖所示:
    Pycharm 搭建pyspark開發環境
    SPARK_HOME spark目錄/spark-2.4.0-bin-hadoop2.7(spark的下載下傳目錄)

添加兩個包到項目目錄下 包的路徑如下:

spark目錄/spark-2.4.5-bin-hadoop2.7/python/lib

Pycharm 搭建pyspark開發環境
Pycharm 搭建pyspark開發環境

繼續閱讀