一、環境準備
1.三台虛拟機(CentOS7 + jdk1.8)
1)192.168.122.11 master 配置:2G記憶體+32G儲存
2)192.168.122.12 slave1 配置:1G記憶體+32G儲存
3)192.168.122.13 slave2 配置:1G記憶體+32G儲存
2.Hadoop2.6的安裝包
下載下傳位址:https://pan.baidu.com/s/1bO6IB37b75Nb7hD2KO6ixA
3.CentOS安裝、配置及jdk的安裝教程如下:
1)虛拟機的:https://blog.csdn.net/qq_34256296/article/details/81322243
2)jdk1.8的:https://blog.csdn.net/qq_34256296/article/details/81321110
4.下面配置的Hadoop檔案可以下載下傳直接拷貝到hadoop安裝目錄下的/etc/hadoop/檔案夾下用
hadoop配置檔案:https://github.com/Kara-Feite/hadoop-config
二、Hadoop安裝前的環境設定
1.關閉防火牆
systemctl stop firewalld.service
systemctl disable firewalld.service
2.修改主機名和配置hosts(依次對每個虛拟機進行修改)
1)vim /etc/hosts(在後面添加)
192.168.122.11 master
192.168.122.12 slave1
192.168.122.13 slave2
2)vim /etc/sysconfig/network
# Created by anaconda
NETWORKING=yes
HOSTNAME=master #對應主機名,master、slave1、slave2
3)vim /etc/hostname
master #對應主機名,master、slave1、slave2
3.配置無密ssh連接配接(每台機器依次執行完前一步驟,然後才能再執行下一步驟)
1)ssh-keygen -t rsa
2)cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
3)分别ssh另外兩台虛拟機
ssh slave1 cat /root/.ssh/authorized_keys >> /root/.ssh/authorized_keys
ssh slave2 cat /root/.ssh/authorized_keys >> /root/.ssh/authorized_keys
三、安裝Hadoop(這裡統一将大資料相關軟體放/usr/local/src目錄下)
1.解壓檔案夾
cd /usr/local/src
tar -xzvf hadoop-2.6.0-x64.tar.gz
2.配置hadoop環境變量
vim /etc/profile #最後一行加入如下代碼
# set hadoop environment
export HADOOP_HOME=/usr/local/src/hadoop-2.6.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
3.建立目錄,後續搭建過程中需要使用
mkdir /usr/local/src/hadoop-2.6.0/tmp
mkdir /usr/local/src/hadoop-2.6.0/var
mkdir /usr/local/src/hadoop-2.6.0/dfs
mkdir /usr/local/src/hadoop-2.6.0/dfs/name
mkdir /usr/local/src/hadoop-2.6.0/dfs/data
4.修改hadoop-env.sh檔案(cd /usr/local/src/hadoop-2.6.0/etc/hadoop/)
vim hadoop-env.sh
将:export JAVA_HOME=${JAVA_HOME}
修改為:export JAVA_HOME=/usr/local/src/jdk1.8.0_171 #修改為jdk目錄
5. 修改slaves檔案
vim slaves #添加如下
slave1
slave2
6. 修改core-site.xml檔案
vim core-site.xml #内容如下
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/src/hadoop-2.6.0/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
7. 修改hdfs-site.xml檔案
vim hdfs-site.xml #内容如下
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/src/hadoop-2.6.0/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/src/hadoop-2.6.0/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>need not permissions</description>
</property>
</configuration>
8. 修改mapred-site.xml檔案
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml #内容如下
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:49001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/usr/local/src/hadoop-2.6.0/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
9. 修改yarn-site.xml檔案
vim yarn-site.xml #内容如下
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>12288</value>
<discription>每個節點可用記憶體,機關MB,預設8182MB</discription>
</property>
</configuration>
10.将配置好的hadoop複制到其他節點
scp -rp /usr/local/src/hadoop-2.6.0/ [email protected]:/usr/local/src
scp -rp /usr/local/src/hadoop-2.6.0/ [email protected]:/usr/local/src
11.初始化hadoop叢集
cd /usr/local/src/hadoop-2.6.0/bin
./hadoop namenode -format
12.啟動hadoop叢集
cd /usr/local/src/hadoop-2.6.0/sbin
./start-all.sh
13.測試Hadoop是否安裝成功
vim hadoop_test.txt #随便寫點東西
#上傳成功說明叢集搭建成功了
hadoop fs -put hadoop_test.txt /
#删除檔案
rm hadoop_test.txt
14.Hadoop各個Web頁面
1、HDFS頁面:50070
2、YARN的管理界面:8088
3、HistoryServer的管理界面:19888
4、Zookeeper的服務端口号:2181
5、Mysql的服務端口号:3306
6、Hive.server1=10000
7、Kafka的服務端口号:9092
8、azkaban界面:8443
9、Hbase界面:16010,60010
10、Spark的界面:8080
11、Spark的URL:7077