hadoop的三種模式:
本地模式:本地模拟實作,不使用分布式檔案系統
僞分布式模式:5個程序在一台主機上啟動,一般開發人員調試hadoop程式使用
完全分布式模式:至少3個結點,JobTracker和NameNode在同一台主機上,secondaryNameNode一台主機,DataNode和Tasktracker一台主機
本次試驗環境:
CentOS2.6.32-358.el6.x86_64
jdk-7u21-linux-x64.rpm
hadoop-0.20.2-cdh3u6.tar.gz
一、hadoop僞分布式模式的配置
[root@localhost ~]# rpm -ivh jdk-7u21-linux-x64.rpm
[root@localhost ~]# vim /etc/profile.d/java.sh
JAVA_HOME=/usr/java/latest
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME PATH
[root@localhost ~]# tar xf hadoop-0.20.2-cdh3u6.tar.gz -C /usr/local/
[root@localhost ~]# cd /usr/local/
[root@localhost local]# ln -sv hadoop-0.20.2-cdh3u6/ hadoop
[root@localhost ~]# vim /etc/profile.d/hadoop.sh
HADOOP_HOME=/usr/local/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME PATH
測試jdk和hadoop是否正确安裝
[root@localhost ~]# java -version
[root@localhost ~]# hadoop version
建立使用者并修改hadoop檔案權限
[root@localhost ~]# useradd hduser
[root@localhost ~]# passwd hduser
[root@localhost ~]# chown -R hduser.hduser /usr/local/hadoop/
建立hadoop臨時資料儲存目錄
[root@localhost ~]# mkdir /hadoop/temp -pv
[root@localhost ~]# chown -R hduser.hduser /hadoop/
主要腳本功能:
/usr/local/hadoop/bin/start-dfs.sh 啟動namenode datanode secondarynamenode程序
/usr/local/hadoop/bin/start-mapred.sh 啟動jobtracker tasktracker
/usr/local/hadoop/bin/hadoop-daemon.sh 單獨啟動某個程序
/usr/local/hadoop/bin/start-all.sh 啟動全部程序
/usr/local/hadoop/bin/stop-all.sh 停止全部程序
主要配置檔案:
/usr/local/hadoop/conf/masters 儲存第二名稱節點的位置(secondaryNameNode)
/usr/local/hadoop/conf/slaves 儲存從節點的位置(所有運作tasktracker和datanode的結點)
/usr/local/hadoop/conf/core-site.xml 用于定義系統級别的參數
/usr/local/hadoop/conf/hdfs-site.xml HDFS的相關設定
/usr/local/hadoop/conf/mapred-site.xml HDFS的相關設定,如reduce任務的預設個數、任務所能夠使用記憶體的預設上下限等
/usr/local/hadoop/conf/hadoop-env.sh 定義hadoop運作環境相關的配置資訊
讓hadoop啟動起來,隻需修改一下配置檔案即可
[root@localhost conf]# vim core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/temp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
[root@localhost conf]# vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
[root@localhost conf]# vim hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
配置hduser通過ssh不需要密碼通路本機
[hduser@localhost ~]$ ssh-keygen -t rsa -P ''
[hduser@localhost .ssh]$ ssh-copy-id -i id_rsa.pub hduser@localhost
[hduser@localhost ~]$ hadoop namenode -format 格式化名稱結點[hduser@localhost ~]$ start-all.sh 啟動服務
[hduser@localhost ~]$ jps 檢視程序
NameNode
DataNode
JobTracker
TaskTracker
secondaryNameNode
如果以上5個程序啟動起來,說明hadoop配置成功
hadoop常用指令:
[hduser@localhost ~]$ hadoop 檢視幫助
[hduser@localhost ~]$ hadoop fs
[hduser@localhost ~]$ hadoop fs -mkdir test 在HDFS上建立目錄
[hduser@localhost ~]$ hadoop fs -ls 檢視檔案或目錄
[hduser@localhost ~]$ hadoop fs -put test.txt test 上傳本地檔案到HDFS
用hadoop自帶的任務模型測試hadoop可用性:
[hduser@localhost ~]$ hadoop jar /usr/local/hadoop/hadoop-examples-0.20.2-cdh3u6.jar 讀取jar檔案
[hduser@localhost ~]$ hadoop jar /usr/local/hadoop/hadoop-examples-0.20.2-cdh3u6.jar wordcount 檢視wordcount的文法格式
Usage: wordcount <in> <out>
in讀取檔案位置 out儲存結果位置(HDFS上,目錄不能事先存在)
[hduser@localhost ~]$ hadoop jar /usr/local/hadoop/hadoop-examples-0.20.2-cdh3u6.jar wordcount test wordcount-out
[hduser@localhost ~]$ hadoop job -list all 檢視執行過的作業
[hduser@localhost ~]$ hadoop fs -ls wordcount-out 檢視任務輸出結果
[hduser@localhost ~]$ hadoop fs -cat wordcount-out/part-r-00000
hadoop提供的web任務程序檢視界面:(通路需關閉防火牆)
JobTracker的HTTP伺服器位址和端口,預設為0.0.0.0:50030;
TaskTracker的HTTP伺服器位址和端口,預設為0.0.0.0:50060;
NameNode的HTTP伺服器位址和端口,預設為0.0.0.0:50070;
DataNode的HTTP伺服器位址和端口,預設為0.0.0.0:50075;
SecondaryNameNode的HTTP伺服器位址和端口,預設為0.0.0.0:50090;
二、hadoop完全分布式配置:
NameNode和JobTracker在一個節點上(lab201)
SecondaryNameNode(SNN)在一個節點上(lab202)
DataNode和TaskTracker在一個節點上(lab203)
三個節點上均進行如下操作(注:三個節點保持時間同步)
[root@localhost ~]# passwd hduser
[root@lab201 ~]# mkdir -pv /hadoop/temp
[root@lab201 ~]# chown -R hduser.hduser /hadoop
主結點配置(lab201):
配置主節點hduser不需要密碼通路從節點
[root@lab201 ~]# su - hduser
[hduser@lab201 ~]$ ssh-keygen -t rsa -P ''
[hduser@hjlab1 ~]$ ssh-copy-id -i .ssh/id_rsa.pub hduser@localhost
[hduser@lab201 ~]$ ssh-copy-id -i .ssh/id_rsa.pub hduser@lab201
[hduser@lab201 ~]$ ssh-copy-id -i .ssh/id_rsa.pub hduser@lab203
[hduser@lab201 conf]$ vim masters 修改SecondaryNameNode節點
lab202
[hduser@lab201 conf]$ vim slaves 修改從節點
lab203
[hduser@lab201 conf]$ vim core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/temp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://lab201:8020</value>
</property>
</configuration>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>lab201:8021</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>The actual number of replications can be specified when the file is created.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoop/data</value>
<final>ture</final>
<description>The directories where the datanode stores blocks.</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/hadoop/name</value>
<final>ture</final>
<description>The directories where the namenode stores its persistent matadata.</description>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/hadoop/namesecondary</value>
<final>ture</final>
<description>The directories where the secondarynamenode stores checkpoints.</description>
</property>
</configuration>