一、hadoop安裝(本地模式及僞分布式安裝)
hadoop曆史版本下載下傳網站:http://archive.apache.org/dist/
運作模式:
本地模式
yarn模式
hadoop組成:
common:基本元件、指令
hdfs:分布式檔案系統,安全(預設副本集)
yarn:資料作業系統(性質相當于linux OS)
mapreduce:分布式計算架構
input -> map -> shuffer -> reduce -> output
1、安裝配置jdk開發環境
[[email protected] mnt]#tar -zxvf jdk-7u67-linux-x64.tar.gz
[[email protected] mnt]#mkdir /usr/java
[[email protected] mnt]#mv jdk1.7.0_67/ /usr/java/
[[email protected] mnt]#vim /etc/profile
export JAVA_HOME=/usr/java/jdk1.7.0_67
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
[[email protected] mnt]#source /etc/profile
[[email protected] mnt]# java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
-----jdk配置成功---------------------------
2、安裝hadoop軟體
[[email protected] mnt]#tar -zxvf hadoop-2.5.0.tar.gz
[[email protected] mnt]#mv /mnt/hadoop-2.5.0 /usr/local/hadoop-2.5.0/
[[email protected] mnt]#chown -R hadoop:hadoop /usr/local/hadoop-2.5.0/
3、測試
[[email protected] hadoop-2.5.0]$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar grep input output 'dfs[a-z.]+'
[[email protected] hadoop-2.5.0]$mkdir wcinput
[[email protected] hadoop-2.5.0]$cd wcinput/
[[email protected] hadoop-2.5.0]$touch wc.input
[[email protected] hadoop-2.5.0]$vim wc.input
hadoop yarn
hadoop mapreduce
hadoop hdfs
yarn nodemanager
hadoop resourcemanager
[[email protected] hadoop-2.5.0]$cd ../
[[email protected] hadoop-2.5.0]$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount wcinput wcoutput
4、編輯配置檔案,配置hdfs
[[email protected] hadoop-2.5.0]$vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_67
[[email protected] hadoop-2.5.0]$mkdir -p data/tmp
[[email protected] hadoop-2.5.0]$vim etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://db01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.5.0/data/tmp</value>
</property>
</configuration>
[[email protected] hadoop-2.5.0]$vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
5、格式化hdfs系統
[[email protected] hadoop-2.5.0]$bin/hdfs namenode -format
6、分别啟動namenode和datanode節點
[[email protected] hadoop-2.5.0]$sbin/hadoop-daemon.sh start namenode
[[email protected] hadoop-2.5.0]$sbin/hadoop-daemon.sh start datanode
7、浏覽器通路hdfs系統
網址:http://db01:50070/
8、建立hdfs下的工作目錄,在hdfs上測試wordcount功能
[[email protected] hadoop-2.5.0]$bin/hdfs dfs -mkdir -p /user/hadoop/
[[email protected] hadoop-2.5.0]$bin/hdfs dfs -ls -R /
[[email protected] hadoop-2.5.0]$bin/hdfs dfs -mkdir -p /user/hadoop/mapreduce/wordcount/input
[[email protected] hadoop-2.5.0]$bin/hdfs dfs -put wcinput/wc.input /user/hadoop/mapreduce/wordcount/input/
[[email protected] hadoop-2.5.0]$bin/hdfs dfs -cat /user/hadoop/mapreduce/wordcount/input/wc.input
[[email protected] hadoop-2.5.0]$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /user/hadoop/mapreduce/wordcount/input/ /user/hadoop/mapreduce/wordcount/output/
[[email protected] hadoop-2.5.0]$bin/hdfs dfs -cat /user/hadoop/mapreduce/wordcount/output/part-r-00000
9、配置yarn
[[email protected] hadoop-2.5.0]$ vim etc/hadoop/yarn-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_67
[[email protected] hadoop-2.5.0]$ vim etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>db01</value>
</property>
</configuration>
[[email protected] hadoop-2.5.0]$ vim etc/hadoop/slaves
db01
10、啟動yarn
[hado[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanager
[[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanager
[[email protected] hadoop-2.5.0]$ jps
14573 NodeManager
13490 DataNode
13400 NameNode
14685 Jps
14315 ResourceManager
11、浏覽器進入yarn監控
http://db01:8088
12、配置mapreduce
[[email protected] hadoop-2.5.0]$ vim etc/hadoop/mapred-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_67
[[email protected] hadoop-2.5.0]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
[[email protected] hadoop-2.5.0]$ vim etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
13、測試wordcount
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -rm -R /user/hadoop/mapreduce/wordcount/output/
17/03/01 17:16:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/01 17:16:04 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/hadoop/mapreduce/wordcount/output
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -ls -R /
17/03/01 17:16:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
drwxr-xr-x - hadoop supergroup 0 2017-03-01 16:04 /user
drwxr-xr-x - hadoop supergroup 0 2017-03-01 16:07 /user/hadoop
drwxr-xr-x - hadoop supergroup 0 2017-03-01 16:07 /user/hadoop/mapreduce
drwxr-xr-x - hadoop supergroup 0 2017-03-01 17:16 /user/hadoop/mapreduce/wordcount
drwxr-xr-x - hadoop supergroup 0 2017-03-01 16:08 /user/hadoop/mapreduce/wordcount/input
-rw-r--r-- 1 hadoop supergroup 81 2017-03-01 16:08 /user/hadoop/mapreduce/wordcount/input/wc.input
[[email protected] hadoop-2.5.0]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /user/hadoop/mapreduce/wordcount/input/ /user/hadoop/mapreduce/wordcount/output/
17/03/01 17:18:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/01 17:18:09 INFO client.RMProxy: Connecting to ResourceManager at db01/192.168.100.231:8032
17/03/01 17:18:10 INFO input.FileInputFormat: Total input paths to process : 1
17/03/01 17:18:10 INFO mapreduce.JobSubmitter: number of splits:1
17/03/01 17:18:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488358618376_0001
17/03/01 17:18:11 INFO impl.YarnClientImpl: Submitted application application_1488358618376_0001
17/03/01 17:18:11 INFO mapreduce.Job: The url to track the job: http://db01:8088/proxy/application_1488358618376_0001/
17/03/01 17:18:11 INFO mapreduce.Job: Running job: job_1488358618376_0001
17/03/01 17:18:19 INFO mapreduce.Job: Job job_1488358618376_0001 running in uber mode : false
17/03/01 17:18:19 INFO mapreduce.Job: map 0% reduce 0%
17/03/01 17:18:25 INFO mapreduce.Job: map 100% reduce 0%
17/03/01 17:18:31 INFO mapreduce.Job: map 100% reduce 100%
17/03/01 17:18:31 INFO mapreduce.Job: Job job_1488358618376_0001 completed successfully
17/03/01 17:18:31 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=97
FILE: Number of bytes written=194147
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=209
HDFS: Number of bytes written=67
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3516
Total time spent by all reduces in occupied slots (ms)=3823
Total time spent by all map tasks (ms)=3516
Total time spent by all reduce tasks (ms)=3823
Total vcore-seconds taken by all map tasks=3516
Total vcore-seconds taken by all reduce tasks=3823
Total megabyte-seconds taken by all map tasks=3600384
Total megabyte-seconds taken by all reduce tasks=3914752
Map-Reduce Framework
Map input records=5
Map output records=10
Map output bytes=121
Map output materialized bytes=97
Input split bytes=128
Combine input records=10
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=97
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=47
CPU time spent (ms)=1690
Physical memory (bytes) snapshot=411054080
Virtual memory (bytes) snapshot=1784795136
Total committed heap usage (bytes)=275251200
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=81
File Output Format Counters
Bytes Written=67
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -ls -R /user/hadoop/mapreduce/wordcount/output/
17/03/01 17:19:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r-- 1 hadoop supergroup 0 2017-03-01 17:18 /user/hadoop/mapreduce/wordcount/output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 67 2017-03-01 17:18 /user/hadoop/mapreduce/wordcount/output/part-r-00000
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -cat /user/hadoop/mapreduce/wordcount/output/part-r-00000
17/03/01 17:20:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop 4
hdfs 1
mapreduce 1
nodemanager 1
resourcemanager 1
yarn 2
14、yarn測試wordcount(輸出檔案夾不能存在,否則會報錯)
[[email protected] hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /user/hadoop/mapreduce/wordcount/input/ /user/hadoop/mapreduce/wordcount/output2/
17/03/01 17:43:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/01 17:43:09 INFO client.RMProxy: Connecting to ResourceManager at db01/192.168.100.231:8032
17/03/01 17:43:10 INFO input.FileInputFormat: Total input paths to process : 1
17/03/01 17:43:10 INFO mapreduce.JobSubmitter: number of splits:1
17/03/01 17:43:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488358618376_0002
17/03/01 17:43:11 INFO impl.YarnClientImpl: Submitted application application_1488358618376_0002
17/03/01 17:43:11 INFO mapreduce.Job: The url to track the job: http://db01:8088/proxy/application_1488358618376_0002/
17/03/01 17:43:11 INFO mapreduce.Job: Running job: job_1488358618376_0002
17/03/01 17:43:18 INFO mapreduce.Job: Job job_1488358618376_0002 running in uber mode : false
17/03/01 17:43:18 INFO mapreduce.Job: map 0% reduce 0%
17/03/01 17:43:23 INFO mapreduce.Job: map 100% reduce 0%
17/03/01 17:43:29 INFO mapreduce.Job: map 100% reduce 100%
17/03/01 17:43:30 INFO mapreduce.Job: Job job_1488358618376_0002 completed successfully
17/03/01 17:43:30 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=97
FILE: Number of bytes written=194149
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=209
HDFS: Number of bytes written=67
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3315
Total time spent by all reduces in occupied slots (ms)=3460
Total time spent by all map tasks (ms)=3315
Total time spent by all reduce tasks (ms)=3460
Total vcore-seconds taken by all map tasks=3315
Total vcore-seconds taken by all reduce tasks=3460
Total megabyte-seconds taken by all map tasks=3394560
Total megabyte-seconds taken by all reduce tasks=3543040
Map-Reduce Framework
Map input records=5
Map output records=10
Map output bytes=121
Map output materialized bytes=97
Input split bytes=128
Combine input records=10
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=97
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=38
CPU time spent (ms)=1690
Physical memory (bytes) snapshot=400715776
Virtual memory (bytes) snapshot=1776209920
Total committed heap usage (bytes)=274202624
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=81
File Output Format Counters
Bytes Written=67
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -cat /user/hadoop/mapreduce/wordcount/output2/
17/03/01 17:44:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
cat: `/user/hadoop/mapreduce/wordcount/output2': Is a directory
[[email protected] hadoop-2.5.0]$
[[email protected] hadoop-2.5.0]$
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -cat /user/hadoop/mapreduce/wordcount/output2/part*
17/03/01 17:44:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop 4
hdfs 1
mapreduce 1
nodemanager 1
resourcemanager 1
yarn 2
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -text /user/hadoop/mapreduce/wordcount/output2/part*
17/03/01 17:47:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop 4
hdfs 1
mapreduce 1
nodemanager 1
resourcemanager 1
yarn 2
注意:mapreduce會對預設結果進行排序。
15、啟動mapreduce曆史伺服器
[[email protected] hadoop-2.5.0]$ sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.5.0/logs/mapred-hadoop-historyserver-db01.out
[[email protected] hadoop-2.5.0]$ jps
14573 NodeManager
13490 DataNode
13400 NameNode
14315 ResourceManager
16366 Jps
16296 JobHistoryServer
16、啟用yarn日志的聚集(Aggregation)功能
聚集:在mapreduce任務完成後,将日志資訊上床到hdfs上。
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>db01</value>
</property>
##開啟日志聚集功能
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
##日志儲存7天(機關秒)
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>600000</value>
</property>
</configuration>
----------重新開機yarn服務及historyserver服務:
[had[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh stop resourcemanager
stopping resourcemanager
[[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh stop nodemanager
stopping nodemanager
nodemanager did not stop gracefully after 5 seconds: killing with kill -9
[[email protected] hadoop-2.5.0]$ jps
13490 DataNode
13400 NameNode
16511 Jps
16296 JobHistoryServer
[[email protected] hadoop-2.5.0]$ sbin/mr-jobhistory-daemon.sh stop historyserver
stopping historyserver
[[email protected] hadoop-2.5.0]$ jps
13490 DataNode
13400 NameNode
16548 Jps
[hado[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop-2.5.0/logs/yarn-hadoop-resourcemanager-db01.out
[[email protected] hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /usr/local/hadoop-2.5.0/logs/yarn-hadoop-nodemanager-db01.out
[[email protected] hadoop-2.5.0]$ sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.5.0/logs/mapred-hadoop-historyserver-db01.out
[[email protected] hadoop-2.5.0]$ jps
16584 ResourceManager
13490 DataNode
13400 NameNode
16834 NodeManager
16991 JobHistoryServer
17028 Jps
[[email protected] hadoop-2.5.0]$
17、重新運作wordcount任務,測試yarn日志聚集功能:
[[email protected] hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /user/hadoop/mapreduce/wordcount/input/ /user/hadoop/mapreduce/wordcount/output3/
浏覽器(http://db01:8088/)檢視日志資訊:
Log Type: stderr
Log Length: 0
Log Type: stdout
Log Length: 0
Log Type: syslog
Log Length: 3816
2017-03-01 18:36:45,873 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2017-03-01 18:36:45,911 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2017-03-01 18:36:46,130 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-01 18:36:46,239 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-03-01 18:36:46,319 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-03-01 18:36:46,319 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-03-01 18:36:46,335 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-03-01 18:36:46,335 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1488364479714_0001, Ident: (org.apa[email protected])
2017-03-01 18:36:46,427 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-03-01 18:36:46,732 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /usr/local/hadoop-2.5.0/data/tmp/nm-local-dir/usercache/hadoop/appcache/application_1488364479714_0001
2017-03-01 18:36:46,863 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2017-03-01 18:36:46,878 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2017-03-01 18:36:47,202 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2017-03-01 18:36:47,668 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2017-03-01 18:36:47,873 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://db01:9000/user/hadoop/mapreduce/wordcount/input/wc.input:0+81
2017-03-01 18:36:47,887 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-03-01 18:36:47,953 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2017-03-01 18:36:47,953 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
2017-03-01 18:36:47,953 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 83886080
2017-03-01 18:36:47,953 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
2017-03-01 18:36:47,953 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
2017-03-01 18:36:47,989 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
2017-03-01 18:36:47,989 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
2017-03-01 18:36:47,990 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 121; bufvoid = 104857600
2017-03-01 18:36:47,990 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214360(104857440); length = 37/6553600
2017-03-01 18:36:48,002 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0
2017-03-01 18:36:48,008 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1488364479714_0001_m_000000_0 is done. And is in the process of committing
2017-03-01 18:36:48,106 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1488364479714_0001_m_000000_0' done.
18、hadoop配置檔案
預設配置檔案:四個子產品相對應的jar包中
*core-default.xml
*hdfs-default.xml
*yarn-default.xml
*mapred-default.xml
使用者自定義配置檔案:$HADOOP_HOME/etc/hadoop/
*core-site.xml
*hdfs-site.xml
*yarn-site.xml
*mapred-site.xml
19、開啟hdfs資源回收筒功能
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://db01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.5.0/data/tmp</value>
</property>
##開啟資源回收筒功能,設定儲存7天删除資料資訊
<property>
<name>fs.trash.interval</name>
<value>7 * 24 * 60</value>
</property>
</configuration>
重新開機生效:
20、hadoop的3種啟動/關閉方式
*各個伺服器逐一啟動(比較常用,編寫shell腳本)
hdfs:
sbin/hadoop-daemon.sh start|stop namenode
sbin/hadoop-daemon.sh start|stop datanode
sbin/hadoop-daemon.sh start|stop secondarynamenode
yarn:
sbin/yarn-daemon.sh start|stop resourcemanager
sbin/yarn-daemon.sh start|stop nodemanager
mapreduce:
sbin/mr-jobhistory-daemon.sh start|stop historyserver
*各個子產品分開啟動:需要配置ssh對等性,需要在namenode上運作
hdfs:
sbin/start-dfs.sh
sbin/start-yarn.sh
yarn:
sbin/stop-dfs.sh
sbin/stop-yarn.sh
*全部啟動:不建議使用,這個指令需要在namenode上運作,但是會同時叫secondaryname節點也啟動到namenode節點
sbin/start-all.sh
sbin/stop-all.sh
附加:配置ssh對等性
[[email protected] ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
[[email protected] ~]$ scp .ssh/authorized_keys db02:/home/hadoop/.ssh/authorized_keys
21、hadoop角色
namenode:參數值hdfs://db01:9000決定
core-site.xml
##以下參數确定namenode節點
<property>
<name>fs.defaultFS</name>
<value>hdfs://db01:9000</value>
</property>
datanode:slaves檔案内容決定
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/slaves
db01
secondarynamenode:參數dfs.namenode.secondary.http-address決定
hdfs-site.xml
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>db01:50090</value>
</property>
resourcemanager:
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>db01</value>
</property>
nodemanager:
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/slaves
db01
jobhistoryserver:
mapred-site.xml
<property>
<name>mapreduce.jobhistory.address</name>
<value>db01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>db01:19888</value>
</property>
22、問題
[[email protected] hadoop-2.5.0]$ bin/hdfs dfs -ls
17/03/01 21:50:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items -------------------------------------------------------------->這個警告需要源碼替換lib/native包才能消除
drwxr-xr-x - hadoop supergroup 0 2017-03-01 16:07 mapreduce
23、附加:配置檔案
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://db01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.5.0/data/tmp</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>7000</value>
</property>
</configuration>
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>db01:50090</value>
</property>
</configuration>
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>db01</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>600000</value>
</property>
</configuration>
[[email protected] hadoop-2.5.0]$ cat etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>db01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>db01:19888</value>
</property>
</configuration>
另外注意需要在各個配置檔案中定義java環境變量