spark Interpreter簡介
http://zeppelin.apache.org/docs/latest/interpreter/spark.html
建議大家看官網位址。
Name | Class | Description |
---|---|---|
%spark | SparkInterpreter | Creates a SparkContext and provides a Scala environment |
%spark.pyspark | PySparkInterpreter | Provides a Python environment |
%spark.r | SparkRInterpreter | Provides an R environment with SparkR support |
%spark.sql | SparkSQLInterpreter | Provides a SQL environment |
%spark.dep | DepInterpreter | Dependency loader |
zeppelin自動幫你内置建立好了SparkContext, SQLContext,SparkSession and ZeppelinContext ,他們變量名是
sc
,
sqlContext,spark
and
z 。
Note that Scala/Python/R environment shares the same SparkContext, SQLContext and ZeppelinContext instance.
spark interpreter配置
配置可以在多個地方。比如conf/zeppelin-env.sh檔案,或者在web界面上的interpreter中新增屬性。我的環境啟用了hive+sentry的簡單認證,是以會有一個身份的配置。
export MASTER=yarn-client
export ZEPPELIN_JAVA_OPTS="-Dmaster=yarn-client -Dspark.executor.memory=1g -Dspark.cores.max=4 -Dspark.executorEnv.PYTHONHASHSEED=0 -Dspark.sql.crossJoin.enabled=true"
export SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2
export SPARK_SUBMIT_OPTIONS="--driver-memory 512M --executor-memory 1G".
export SPARK_APP_NAME=zeppelin
export HADOOP_CONF_DIR=/bigdata/installer/zeppelin-0.8.2-bin-all/interpreter/spark/conf
這個目錄下/bigdata/installer/zeppelin-0.8.2-bin-all/interpreter/spark/conf的配置檔案是從/etc/hadoop/conf 拷貝過來的,外加一個/etc/hive/conf/hive-site.xml
web界面上spark interpreter主要配置如下:
HADOOP_USER_NAME hive
SPARK_HOME /bigdata/cloudera/parcels/SPARK2/lib/spark2
master yarn-client
spark.app.name zeppelin
spark.cores.max 4
spark.executor.memory 1g
zeppelin.spark.useHiveContext true
jar包依賴如下:
/opt/cloudera/parcels/SPARK2/lib/spark2/jars/jackson-databind-2.6.5.jar
/opt/cloudera/parcels/SPARK2/lib/spark2/jars/netty-all-4.0.42.Final.jar