<b>安裝hadoop</b>
準備機器:一台master,若幹台slave,配置每台機器的/etc/hosts保證各台機器之間通過機器名可以互訪,例如:
172.16.200.4 node1(master)
172.16.200.5 node2 (slave1)
172.16.200.6 node3 (slave2)
主機資訊:
機器名
ip位址
作用
master
172.16.200.4
namenode、jobtracker
node1
node2
172.16.200.5
datanode、tasktracker
node3
172.16.200.6
<b>一、修改主機名</b><b>(三台都配置)</b>
以node1為例,其它兩台做同樣配置。
vim /etc/hosts
172.16.200.4 node1
172.16.200.5 node2
172.16.200.6 node3
vim /etc/sysconfig/network
hostname= node1
重新登陸使之生效 hostname node1
ping 主機名 ---------驗證
<b>二、添加hadoop使用者,并賦予root權限 </b><b>(三台都配置)</b>
useradd hadoop
passwd hadoop (密碼和使用者名一樣)
修改 /etc/sudoers 檔案,找到下面一行,在root下面添加一行,如下所示:
root all=(all) all
hadoop all=(all) all
<b>三、配置免密碼登陸</b><b>(三台都配置)</b>
<b></b>
在hadoop啟動以後,namenode是通過ssh(secure shell)來啟動和停止各個datanode上的各種守護程序的,這就須要在節點之間執行指令的時候是不須要輸入密碼的形式,故我們須要配置ssh運用無密碼公鑰認證的形式。
以本文中的三台機器為例,現在node1是主節點,他須要連接配接node2和node3。須要确定每台機器上都安裝了ssh,并且datanode機器上sshd服務已經啟動。
( 說明:hadoop@hadoop~]$ssh-keygen -t rsa
這個指令将為hadoop上的使用者hadoop生成其密鑰對,詢問其儲存路徑時直接回車采用預設路徑,當提示要為生成的密鑰輸入passphrase的時候,直接回車,也就是将其設定為空密碼。生成的密鑰對id_rsa,id_rsa.pub,預設存儲在/home/hadoop/.ssh目錄下然後将id_rsa.pub的内容複制到每個機器(也包括本機)的/home/dbrg/.ssh/authorized_keys檔案中,如果機器上已經有authorized_keys這個檔案了,就在檔案末尾加上id_rsa.pub中的内容,如果沒有authorized_keys這個檔案,直接複制過去就行.)
<b>四、 安裝jdk (三台都安裝)</b>
export java_home=/usr/java/jdk1.7.0_67
exportpath=$path:$hadoop_home/bin
export jre_home=/usr/java/jdk1.7.0_67/jre
source /etc/profile
五、安裝hadoop
這是下載下傳後的hadoop-2.6.4.tar.gz壓縮包,
1、解壓 tar -xzvf hadoop-2.6.4.tar.gz
[hadoop@node1 hadoop-2.6.4]$ ls
bin data etc include lib libexec license.txt logs name notice.txt readme.txt sbin share var
[hadoop@node1 hadoop-2.6.4]$ pwd
/home/hadoop/hadoop-2.6.4
2、配置之前,先在本地檔案系統建立以下檔案夾:
/home/hadoop/hadoop-2.6.4/var
/home/hadoop/hadoop-2.6.4/data
/home/hadoop/hadoop-2.6.4/var/name
3、編輯配置檔案主要涉及的配置檔案有7個:都在/home/hadoop/hadoop-2.6.4/etc/hadoop檔案夾下
~/hadoop/etc/hadoop/hadoop-env.sh
~/hadoop/etc/hadoop/yarn-env.sh
~/hadoop/etc/hadoop/slaves
~/hadoop/etc/hadoop/core-site.xml
~/hadoop/etc/hadoop/hdfs-site.xml
~/hadoop/etc/hadoop/mapred-site.xml
~/hadoop/etc/hadoop/yarn-site.xml
4、進去hadoop配置檔案目錄
a、配置 hadoop-env.sh檔案-->修改java_home
# the java implementation to use.
4.2、配置 yarn-env.sh 檔案-->>修改java_home
# some java parameters
4.3、配置slaves檔案-->>增加slave節點
01 node1
02 node2
03 node3
4.4、配置 core-site.xml檔案-->>增加hadoop核心配置(hdfs檔案端口是9000、file: /home/hadoop/hadoop-2.6.4/var )
<configuration>
<property>
<name>fs.defaultfs</name>
<value><b>hdfs://node1:9000</b></value>
</property>
<name>io.file.buffer.size</name>
<value>131072</value>
<name>hadoop.tmp.dir</name>
<value><b>file:/home/hadoop/hadoop-2.6.4/var</b></value>
<description>abasefor other temporary directories.</description>
<name>hadoop.proxyuser.spark.hosts</name>
<value>*</value>
<name>hadoop.proxyuser.spark.groups</name>
</configuration>
4.5、配置 hdfs-site.xml 檔案-->>增加hdfs配置資訊(namenode、datanode端口和目錄位置)
<name>dfs.namenode.secondary.http-address</name>
<value>node1:9001</value>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop-2.6.4/name</value>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-2.6.4/data</value>
</property>
<name>dfs.replication</name>
<value>3</value>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
4.6、配置 mapred-site.xml 檔案-->>增加mapreduce配置(使用yarn架構、jobhistory使用位址以及web位址)
<name>mapreduce.framework.name</name>
<value>yarn</value>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
4.7、配置 yarn-site.xml 檔案-->>增加yarn功能
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.shufflehandler</value>
<name>yarn.resourcemanager.address</name>
<value>node1:8032</value>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:8030</value>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:8035</value>
<name>yarn.resourcemanager.admin.address</name>
<value>node1:8033</value>
<name>yarn.resourcemanager.webapp.address</name>
<value>node1:8088</value>
5、将配置好的hadoop檔案copy到另兩台slave機器上
scp -r hadoop-2.6.4 hadoop@node2:/home/hadoop/
scp -r hadoop-2.6.4 hadoop@node3:/home/hadoop/
四、驗證
1、格式化namenode:
[spark@s1pa11 opt]$ cd hadoop-2.6.0/
[spark@s1pa11 hadoop-2.6.0]$ ls
bin dfs etc include input lib libexec license.txt logs notice.txt readme.txt sbin share tmp
[spark@s1pa11 hadoop-2.6.0]$ <b>./bin/hdfs namenode -format</b>
[spark@s1pa222 .ssh]$ cd ~/opt/hadoop-2.6.0
[spark@s1pa222 hadoop-2.6.0]$ <b>./bin/hdfs namenode -format</b>
複制代碼
2、啟動hdfs:
[spark@s1pa11 hadoop-2.6.0]$ ./sbin/start-dfs.sh
15/01/05 16:41:04 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting namenodes on [s1pa11]
s1pa11: starting namenode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-namenode-s1pa11.out
s1pa222: starting datanode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-datanode-s1pa222.out
starting secondary namenodes [s1pa11]
s1pa11: starting secondarynamenode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-secondarynamenode-s1pa11.out
15/01/05 16:41:21 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[spark@s1pa11 hadoop-2.6.0]$ jps
22230 master
30889 jps
22478 worker
30498 namenode
30733 secondarynamenode
19781 resourcemanager
3、停止hdfs:
[spark@s1pa11 hadoop-2.6.0]$./sbin/stop-dfs.sh
15/01/05 16:40:28 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping namenodes on [s1pa11]
s1pa11: stopping namenode
s1pa222: stopping datanode
stopping secondary namenodes [s1pa11]
s1pa11: stopping secondarynamenode
15/01/05 16:40:48 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable
30336 jps
4、啟動yarn:
[spark@s1pa11 hadoop-2.6.0]$./sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/spark/opt/hadoop-2.6.0/logs/yarn-spark-resourcemanager-s1pa11.out
s1pa222: starting nodemanager, logging to /home/spark/opt/hadoop-2.6.0/logs/yarn-spark-nodemanager-s1pa222.out
31233 resourcemanager
31503 jps
5、停止yarn:
[spark@s1pa11 hadoop-2.6.0]$ ./sbin/stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
s1pa222: stopping nodemanager
no proxyserver to stop
31167 jps
6、檢視叢集狀态:
[hadoop@node1 hadoop-2.6.4]$<b> ./bin/hdfs dfsadmin -report</b>
16/05/26 10:51:34 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable
configured capacity: 56338194432 (52.47 gb)
present capacity: 42922237952 (39.97 gb)
dfs remaining: 42922164224 (39.97 gb)
dfs used: 73728 (72 kb)
dfs used%: 0.00%
under replicated blocks: 0
blocks with corrupt replicas: 0
missing blocks: 0
-------------------------------------------------
live datanodes (3):
name: 172.16.200.4:50010 (node1)
hostname: node1
decommission status : normal
configured capacity: 18779398144 (17.49 gb)
dfs used: 24576 (24 kb)
non dfs used: 4559396864 (4.25 gb)
dfs remaining: 14219976704 (13.24 gb)
dfs remaining%: 75.72%
configured cache capacity: 0 (0 b)
cache used: 0 (0 b)
cache remaining: 0 (0 b)
cache used%: 100.00%
cache remaining%: 0.00%
xceivers: 1
last contact: thu may 26 10:51:35 cst 2016
name: 172.16.200.5:50010 (node2)
hostname: node2
non dfs used: 4369121280 (4.07 gb)
dfs remaining: 14410252288 (13.42 gb)
dfs remaining%: 76.73%
name: 172.16.200.6:50010 (node3)
hostname: node3
non dfs used: 4487438336 (4.18 gb)
dfs remaining: 14291935232 (13.31 gb)
dfs remaining%: 76.10%
7、檢視hdfs:http://172.16.200.4:50070/
8、檢視rm:http://172.16.200.4:8088/