天天看點

Hadoop安裝實戰

<b>安裝hadoop</b>

準備機器:一台master,若幹台slave,配置每台機器的/etc/hosts保證各台機器之間通過機器名可以互訪,例如:

      172.16.200.4  node1(master)   

      172.16.200.5  node2 (slave1)   

      172.16.200.6  node3 (slave2)

     主機資訊:  

機器名

  ip位址

作用

master

172.16.200.4

namenode、jobtracker

node1

node2

172.16.200.5

datanode、tasktracker

node3

172.16.200.6

<b>一、修改主機名</b><b>(三台都配置)</b>

以node1為例,其它兩台做同樣配置。

vim /etc/hosts

172.16.200.4 node1 

172.16.200.5 node2 

172.16.200.6 node3

vim /etc/sysconfig/network

  hostname= node1

重新登陸使之生效 hostname node1

  ping 主機名   ---------驗證

<b>二、添加hadoop使用者,并賦予root權限  </b><b>(三台都配置)</b>

 useradd hadoop

passwd hadoop   (密碼和使用者名一樣)

修改 /etc/sudoers 檔案,找到下面一行,在root下面添加一行,如下所示:

root    all=(all)     all

hadoop  all=(all)     all

<b>三、配置免密碼登陸</b><b>(三台都配置)</b>

<b></b>

在hadoop啟動以後,namenode是通過ssh(secure shell)來啟動和停止各個datanode上的各種守護程序的,這就須要在節點之間執行指令的時候是不須要輸入密碼的形式,故我們須要配置ssh運用無密碼公鑰認證的形式。

以本文中的三台機器為例,現在node1是主節點,他須要連接配接node2和node3。須要确定每台機器上都安裝了ssh,并且datanode機器上sshd服務已經啟動。

( 說明:hadoop@hadoop~]$ssh-keygen  -t  rsa

這個指令将為hadoop上的使用者hadoop生成其密鑰對,詢問其儲存路徑時直接回車采用預設路徑,當提示要為生成的密鑰輸入passphrase的時候,直接回車,也就是将其設定為空密碼。生成的密鑰對id_rsa,id_rsa.pub,預設存儲在/home/hadoop/.ssh目錄下然後将id_rsa.pub的内容複制到每個機器(也包括本機)的/home/dbrg/.ssh/authorized_keys檔案中,如果機器上已經有authorized_keys這個檔案了,就在檔案末尾加上id_rsa.pub中的内容,如果沒有authorized_keys這個檔案,直接複制過去就行.)

<b>四、 安裝jdk (三台都安裝)</b>

export java_home=/usr/java/jdk1.7.0_67

exportpath=$path:$hadoop_home/bin

export jre_home=/usr/java/jdk1.7.0_67/jre

source /etc/profile

五、安裝hadoop

這是下載下傳後的hadoop-2.6.4.tar.gz壓縮包,   

1、解壓 tar -xzvf hadoop-2.6.4.tar.gz 

[hadoop@node1 hadoop-2.6.4]$ ls

bin  data  etc  include  lib  libexec  license.txt  logs  name  notice.txt  readme.txt  sbin  share  var

[hadoop@node1 hadoop-2.6.4]$ pwd

/home/hadoop/hadoop-2.6.4

2、配置之前,先在本地檔案系統建立以下檔案夾:

/home/hadoop/hadoop-2.6.4/var

/home/hadoop/hadoop-2.6.4/data

/home/hadoop/hadoop-2.6.4/var/name

3、編輯配置檔案主要涉及的配置檔案有7個:都在/home/hadoop/hadoop-2.6.4/etc/hadoop檔案夾下

~/hadoop/etc/hadoop/hadoop-env.sh

~/hadoop/etc/hadoop/yarn-env.sh

~/hadoop/etc/hadoop/slaves

~/hadoop/etc/hadoop/core-site.xml

~/hadoop/etc/hadoop/hdfs-site.xml

~/hadoop/etc/hadoop/mapred-site.xml

~/hadoop/etc/hadoop/yarn-site.xml

4、進去hadoop配置檔案目錄

a、配置 hadoop-env.sh檔案--&gt;修改java_home

# the java implementation to use.

4.2、配置 yarn-env.sh 檔案--&gt;&gt;修改java_home

# some java parameters

4.3、配置slaves檔案--&gt;&gt;增加slave節點

   01   node1

   02   node2

   03   node3

4.4、配置 core-site.xml檔案--&gt;&gt;增加hadoop核心配置(hdfs檔案端口是9000、file: /home/hadoop/hadoop-2.6.4/var  )

&lt;configuration&gt;

&lt;property&gt;

  &lt;name&gt;fs.defaultfs&lt;/name&gt;

  &lt;value&gt;<b>hdfs://node1:9000</b>&lt;/value&gt;

&lt;/property&gt;

  &lt;name&gt;io.file.buffer.size&lt;/name&gt;

  &lt;value&gt;131072&lt;/value&gt;

  &lt;name&gt;hadoop.tmp.dir&lt;/name&gt;

  &lt;value&gt;<b>file:/home/hadoop/hadoop-2.6.4/var</b>&lt;/value&gt;

  &lt;description&gt;abasefor other temporary directories.&lt;/description&gt;

  &lt;name&gt;hadoop.proxyuser.spark.hosts&lt;/name&gt;

  &lt;value&gt;*&lt;/value&gt;

  &lt;name&gt;hadoop.proxyuser.spark.groups&lt;/name&gt;

&lt;/configuration&gt;

4.5、配置  hdfs-site.xml 檔案--&gt;&gt;增加hdfs配置資訊(namenode、datanode端口和目錄位置)

  &lt;name&gt;dfs.namenode.secondary.http-address&lt;/name&gt;

  &lt;value&gt;node1:9001&lt;/value&gt;

  &lt;property&gt;

   &lt;name&gt;dfs.namenode.name.dir&lt;/name&gt;

   &lt;value&gt;file:/home/hadoop/hadoop-2.6.4/name&lt;/value&gt;

  &lt;name&gt;dfs.datanode.data.dir&lt;/name&gt;

  &lt;value&gt;file:/home/hadoop/hadoop-2.6.4/data&lt;/value&gt;

  &lt;/property&gt;

  &lt;name&gt;dfs.replication&lt;/name&gt;

  &lt;value&gt;3&lt;/value&gt;

  &lt;name&gt;dfs.webhdfs.enabled&lt;/name&gt;

  &lt;value&gt;true&lt;/value&gt;

4.6、配置  mapred-site.xml 檔案--&gt;&gt;增加mapreduce配置(使用yarn架構、jobhistory使用位址以及web位址)

   &lt;name&gt;mapreduce.framework.name&lt;/name&gt;

   &lt;value&gt;yarn&lt;/value&gt;

  &lt;name&gt;mapreduce.jobhistory.address&lt;/name&gt;

  &lt;value&gt;node1:10020&lt;/value&gt;

  &lt;name&gt;mapreduce.jobhistory.webapp.address&lt;/name&gt;

  &lt;value&gt;node1:19888&lt;/value&gt;

4.7、配置   yarn-site.xml  檔案--&gt;&gt;增加yarn功能

   &lt;name&gt;yarn.nodemanager.aux-services&lt;/name&gt;

   &lt;value&gt;mapreduce_shuffle&lt;/value&gt;

   &lt;name&gt;yarn.nodemanager.aux-services.mapreduce.shuffle.class&lt;/name&gt;

   &lt;value&gt;org.apache.hadoop.mapred.shufflehandler&lt;/value&gt;

   &lt;name&gt;yarn.resourcemanager.address&lt;/name&gt;

   &lt;value&gt;node1:8032&lt;/value&gt;

   &lt;name&gt;yarn.resourcemanager.scheduler.address&lt;/name&gt;

   &lt;value&gt;node1:8030&lt;/value&gt;

   &lt;name&gt;yarn.resourcemanager.resource-tracker.address&lt;/name&gt;

   &lt;value&gt;node1:8035&lt;/value&gt;

   &lt;name&gt;yarn.resourcemanager.admin.address&lt;/name&gt;

   &lt;value&gt;node1:8033&lt;/value&gt;

   &lt;name&gt;yarn.resourcemanager.webapp.address&lt;/name&gt;

   &lt;value&gt;node1:8088&lt;/value&gt;

5、将配置好的hadoop檔案copy到另兩台slave機器上

scp -r hadoop-2.6.4 hadoop@node2:/home/hadoop/

scp -r hadoop-2.6.4 hadoop@node3:/home/hadoop/

四、驗證

1、格式化namenode:

[spark@s1pa11 opt]$ cd hadoop-2.6.0/

[spark@s1pa11 hadoop-2.6.0]$ ls

bin  dfs  etc  include  input  lib  libexec  license.txt  logs  notice.txt  readme.txt  sbin  share  tmp

[spark@s1pa11 hadoop-2.6.0]$ <b>./bin/hdfs namenode -format</b>

[spark@s1pa222 .ssh]$ cd ~/opt/hadoop-2.6.0

[spark@s1pa222 hadoop-2.6.0]$ <b>./bin/hdfs  namenode -format</b>

複制代碼

2、啟動hdfs:

[spark@s1pa11 hadoop-2.6.0]$ ./sbin/start-dfs.sh 

15/01/05 16:41:04 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable

starting namenodes on [s1pa11]

s1pa11: starting namenode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-namenode-s1pa11.out

s1pa222: starting datanode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-datanode-s1pa222.out

starting secondary namenodes [s1pa11]

s1pa11: starting secondarynamenode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-secondarynamenode-s1pa11.out

15/01/05 16:41:21 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable

[spark@s1pa11 hadoop-2.6.0]$ jps

22230 master

30889 jps

22478 worker

30498 namenode

30733 secondarynamenode

19781 resourcemanager

3、停止hdfs:

[spark@s1pa11 hadoop-2.6.0]$./sbin/stop-dfs.sh 

15/01/05 16:40:28 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable

stopping namenodes on [s1pa11]

s1pa11: stopping namenode

s1pa222: stopping datanode

stopping secondary namenodes [s1pa11]

s1pa11: stopping secondarynamenode

15/01/05 16:40:48 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable

30336 jps

4、啟動yarn:

[spark@s1pa11 hadoop-2.6.0]$./sbin/start-yarn.sh 

starting yarn daemons

starting resourcemanager, logging to /home/spark/opt/hadoop-2.6.0/logs/yarn-spark-resourcemanager-s1pa11.out

s1pa222: starting nodemanager, logging to /home/spark/opt/hadoop-2.6.0/logs/yarn-spark-nodemanager-s1pa222.out

31233 resourcemanager

31503 jps

5、停止yarn:

[spark@s1pa11 hadoop-2.6.0]$ ./sbin/stop-yarn.sh 

stopping yarn daemons

stopping resourcemanager

s1pa222: stopping nodemanager

no proxyserver to stop

31167 jps

6、檢視叢集狀态:

[hadoop@node1 hadoop-2.6.4]$<b> ./bin/hdfs dfsadmin -report</b>

16/05/26 10:51:34 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable

configured capacity: 56338194432 (52.47 gb)

present capacity: 42922237952 (39.97 gb)

dfs remaining: 42922164224 (39.97 gb)

dfs used: 73728 (72 kb)

dfs used%: 0.00%

under replicated blocks: 0

blocks with corrupt replicas: 0

missing blocks: 0

-------------------------------------------------

live datanodes (3):

name: 172.16.200.4:50010 (node1)

hostname: node1

decommission status : normal

configured capacity: 18779398144 (17.49 gb)

dfs used: 24576 (24 kb)

non dfs used: 4559396864 (4.25 gb)

dfs remaining: 14219976704 (13.24 gb)

dfs remaining%: 75.72%

configured cache capacity: 0 (0 b)

cache used: 0 (0 b)

cache remaining: 0 (0 b)

cache used%: 100.00%

cache remaining%: 0.00%

xceivers: 1

last contact: thu may 26 10:51:35 cst 2016

name: 172.16.200.5:50010 (node2)

hostname: node2

non dfs used: 4369121280 (4.07 gb)

dfs remaining: 14410252288 (13.42 gb)

dfs remaining%: 76.73%

name: 172.16.200.6:50010 (node3)

hostname: node3

non dfs used: 4487438336 (4.18 gb)

dfs remaining: 14291935232 (13.31 gb)

dfs remaining%: 76.10%

7、檢視hdfs:http://172.16.200.4:50070/

8、檢視rm:http://172.16.200.4:8088/