题外话:之前搭完没有马上写,今天闲得慌,补上
自己博客的原文
------
#准备工作
------
环境:VM12 Ubuntu14.04 x64
jdk-7u79-linux-x64.gz
hadoop-2.6.4.tar.gz
spark-2.0.0-bin-hadoop2.6.tgz
scala-2.11.8.tgz
创建一个专门的用户,并加入到管理员用户组
更新软件下载源,并安装好vim
一定不要进入管理员权限模式!
#搭建过程
----------
一.首先配置路径
cd进入工作目录
tar -zxvf jdk-8u101-linux-x64.tar.gz
tar .......
然后修改环境变量文件 sudo vim /etc/profile ,在末尾添加:
export JAVA_HOME=/home/spark/work/java
export JRE_HOME=$JAVA_HOME/jre
export SCALA_HOME=/home/spark/work/scala
export SPARK_HOME=/home/spark/work/spark
export CLASS_PATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$SPARK_HOME/bin:$SCALA_HOME/bin:$JAVA_HOME/bin:/home/spark/work/hadoop/bin:$PATH
保存关闭后,source /etc/profile,使修改生效。
检验是否配好java -version
scala -version
二.修改hostname,hosts
ifconfig查看三台虚拟机的IP
sudo vim /etc/hosts
10.1.1.100 master
10.1.1.101 slave1
10.1.1.102 slave2
并测试能否ping通
eg. 输入ping slave1
三.安装ssh通信,配置免密登录
在每台主机上都生成私钥和公钥。
ssh-keygen -t rsa,一直按回车就行。
然后将slave1与slave2上的id_rsa.pub用scp命令发送给master,
scp ~/.ssh/id_rsa.pub [email protected]:~/.ssh/id_rsa.pub.slave1
scp ~/.ssh/id_rsa.pub [email protected]:~/.ssh/id_rsa.pub.slave2
在master上,将所有公钥加到用于认证的公钥文件authorized_keys中,
cat ~/.ssh/id_rsa.pub* >> ~/.ssh/authorized_keys
将公钥文件authorized_keys分发给每台slave,
scp ~/.ssh/authorized_keys [email protected]:~/.ssh/
scp ~/.ssh/authorized_keys [email protected]:~/.ssh/
然后两两通信一次 ssh IP
四.安装配置hadoop
1.hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/home/spark/work/java
# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol. Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME = $ {JSVC_HOME}
export HADOOP_CONF_DIR=/home/spark/work/hadoop/etc/hadoop</code>
2.core-site.xml
注释下面加上
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/spark/work/hadoop/tmp</value>
</property>
</configuration>
hadoop目录下新建tmp文件夹
mkdir tmp
3.hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/spark/work/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/spark/work/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
hadoop下mkdir hdfs
mkdir hdfs/data mkdir hdfs/name
4.mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5.yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
6.在slaves中配置slave节点的ip或者host,
master
slave1
slave2
7.将配置好的hadoop文件夹分发给所有slaves
scp -r ~/workspace/hadoop [email protected]:~/workspace/
格式化namenode
cd ~/workspace/hadoop #进入hadoop目录
bin/hadoop namenode -format #格式化namenode
启动hadoop
sbin/start-all.sh
命令行输入jps
主机进程图
![](/content/images/2016/10/V6G---JA--E-63WE6RS-GXT.png)
从机进程图
![](/content/images/2016/10/-4FN--5JD8FT3-B--6-3FQS.png)
hadoop的WEB控制界面
![](/content/images/2016/10/1AOIE2X61.spark-env.sh
五.SPARK环境配置
1.spark-env.sh
cd ~/workspace/spark/conf #进入spark配置目录
cp spark-env.sh.template spark-env.sh #从配置模板复制
vim spark-env.sh #添加配置内容
末尾加入
export SCALA_HOME=/home/spark/work/scala
export JAVA_HOME=/home/spark/work/java
export HADOOP_HOME=/home/spark/work/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
SPARK_MASTER_IP=master
SPARK_LOCAL_DIRS=/home/spark/work/spark
SPARK_DRIVER_MEMORY=1G
2.slaves
vim slaves在slaves文件下填上slave主机名:
slave1
slave2
分发给slave虚拟机
scp -r ~/workspace/spark [email protected]:~/workspace/
启动Spark
进入spark根目录
cd ~/workspace/spark
sbin/start-all.sh
jps检查进程,master机器上应该多出一个Master进程
slave机器上多出一个Worker进程
SPARK的WEB控制界面
done!