天天看點

spark過程

scala配置

1、下載下傳解壓包

tar -xvf scala-2.10.4.tgz -C /usr/local/

2、包重命名為scala

3、配置環境變量

export SCALA_HOME=/usr/local/scala

export PATH=$PATH:/usr/local/scala/bin

4、執行生效source /etc/profile

##驗證配置

scala -version 得到

Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

如果得到以上這句話,恭喜你,scala配置成功!

maven配置

tar -xvf apache-maven-3.3.9-bin.tar.gz -C /usr/local/

2、包重命名為maven

3、配置環境變量/etc/profile

export MAVEN_HOME=/usr/local/maven

export PATH=$PATH:/usr/local/maven/bin

export MAVEN_OPTS="-Xms256m -Xmx512m"

mvn -v 得到

Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)

Maven home: /usr/local/maven

Java version: 1.7.0_55, vendor: Oracle Corporation

Java home: /usr/local/jdk/jre

Default locale: en_US, platform encoding: UTF-8

OS name: "linux", version: "2.6.32-642.el6.x86_64", arch: "i386", family: "unix"

安裝編譯spark

1、解壓源碼包:tar -zxvf spark-2.0.2-bin-hadoop2.7.tgz -C /usr/local/

cd /usr/local/ 

 mv spark-2.0.2-bin-hadoop2.7 spark-2.0.2

source /etc/profile

2、複制配置模闆檔案

cd /usr/local/spark-2.0.2/conf

cp spark-env.sh.template spark-env.sh

cp slaves.template slaves

cp spark-defaults.conf.template spark-defaults.conf

主要配置JAVA_HOME、SCALA_HOME、HADOOP_HOME、HADOOP_CONF_DIR、SPARK_MASTER_IP等

vim spark-env.sh

export JAVA_HOME=/usr/local/jdk

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop 

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_LAUNCH_WITH_SCALA=0

export SPARK_WORKER_MEMORY=1g

export SPARK_DRIVER_MEMORY=1g

export SPARK_MASTER_IP=192.168.1.114

export SPARK_LIBRARY_PATH=/usr/local/spark-2.0.2/lib

export SPARK_MASTER_WEBUI_PORT=18080

export SPARK_WORKER_DIR=/home/spark

export SPARK_MASTER_PORT=7077

export SPARK_WORKER_PORT=7078

export SPARK_LOG_DIR=/home/spark_log

export SPARK_PID_DIR='/home/spark/run'

slaves(将所有節點都加入,master節點同時也是worker節點)

spark-defaults.conf

spark.master                     yarn-client

 spark.home                       /root/spark-without-hive

 spark.eventLog.enabled           true

 spark.eventLog.dir               hdfs://Goblin01:8020/spark-log

 spark.serializer                 org.apache.spark.serializer.KryoSerializer

 spark.executor.memory            1g

 spark.driver.memory              1g

 spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

spark.master指定Spark運作模式,可以是yarn-client、yarn-cluster...

spark.home指定SPARK_HOME路徑

spark.eventLog.enabled需要設為true

spark.eventLog.dir指定路徑,放在master節點的hdfs中,端口要跟hdfs設定的端口一緻(預設為8020),否則會報錯

spark.executor.memory和spark.driver.memory指定executor和dirver的記憶體,512m或1g,既不能太大也不能太小,因為太小運作不了,太大又會影響其他服務

配置yar-site.xml,跟hdfs-site.xml在同一個路徑下($HADOOP_HOME/etc/hadoop)

ll /usr/local/hadoop/etc/hadoop/yarn-site.xml 

</property>

<property>

    <name>yarn.resourcemanager.scheduler.address</name>

    <value>haproxy:8030</value>

    <name>yarn.resourcemanager.resource-tracker.address</name>

    <value>haproxy:8035</value>

    <name>yarn.resourcemanager.admin.address</name>

    <value>mycat:8033</value>

    <name>yarn.resourcemanager.webapp.address</name>

    <value>mycat:8088</value>

  <name>yarn.resourcemanager.scheduler.class</name>

  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>

</configuration>

把spark-2.0.2複制到其他節點

啟動start-all.sh

7. 運作

1) 準備一個文本檔案放在/logs/wordcount.log内容為:

2) 運作spark-shell

本文轉自 DBAspace 51CTO部落格,原文連結:http://blog.51cto.com/dbaspace/1875951