Hadoop的代碼可以到Apache上下載下傳,連結為http://archive.apache.org/dist/hadoop/core/hadoop-0.19.0/,我使用的Linux機器是ubuntu12.10,Linux上安裝的Java版本為1.7.0_51,并且JAVA_HOME=/usr/java/jdk1.7.0_51。
實踐過程
1、ssh無密碼驗證登陸localhost
保證Linux系統的ssh服務已經啟動,并保證能夠通過無密碼驗證登陸本機Linux系統。如果不能保證,可以按照如下的步驟去做:
(1)啟動指令行視窗,執行指令行:
[plain] view plaincopyprint?
1. $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
2. $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
(2)ssh登陸localhost,執行指令行:
$ ssh localhost
第一次登入,會提示你無法建立到127.0.0.1的連接配接,是否要建立,輸入yes即可,下面是能夠通過無密碼驗證登陸的資訊:
[[email protected] hadoop-0.19.0]# ssh localhost
Last login: Sun Aug 1 18:35:37 2010 from 192.168.0.104
[[email protected] ~]#
2、Hadoop-0.19.0配置
下載下傳hadoop-0.19.0.tar.gz,大約是40.3M,解壓縮到Linux系統指定目錄,這裡我的是/root/hadoop-0.19.0目錄下。
下面按照有序的步驟來說明配置過程:
(1)修改hadoop-env.sh配置
将Java環境的配置進行修改後,并取消注釋“#”,修改後的行為:
export JAVA_HOME=/usr/java/jdk1.7.0_51
(2)在<configuration>與</configuration>加上3個屬性的配置,修改後的配置檔案内容為:
1、core-site.xml配置檔案
内容配置如下所示:
[xhtml] viewplaincopyprint?
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- </configuration>
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
2、hdfs-site.xml配置檔案
内容配置如下所示:
[xhtml] viewplaincopyprint?
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
3、mapred-site.xml配置檔案
配置内容如下所示:
[xhtml] viewplaincopyprint?
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- </configuration>
<xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
3、運作wordcount執行個體
wordcount例子是hadoop發行包中自帶的執行個體,通過運作執行個體可以感受并嘗試了解hadoop在執行MapReduce任務時的執行過程。按照官方的“HadoopQuick Start”教程基本可以容易地實作,下面簡單說一下我的練習過程。
導航到hadoop目錄下面,我的是/root/hadoop-0.19.0。
(1)格式化HDFS
執行格式化HDFS的指令行:
[plain] view plaincopyprint?
1. [[email protected] hadoop-0.19.0]# bin/hadoop namenode -format
[[email protected]]# bin/hadoop namenode -format
格式化執行資訊如下所示:
[plain] view plaincopyprint?
1. 10/08/01 19:04:02 INFO namenode.NameNode: STARTUP_MSG:
2.
9. Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) y
10. Format aborted in /tmp/hadoop-root/dfs/name
11. 10/08/01 19:04:05 INFO namenode.NameNode: SHUTDOWN_MSG:
12.
10/08/0119:04:02 INFO namenode.NameNode: STARTUP_MSG:
Re-formatfilesystem in /tmp/hadoop-root/dfs/name ? (Y or N) y
Formataborted in /tmp/hadoop-root/dfs/name
10/08/0119:04:05 INFO namenode.NameNode: SHUTDOWN_MSG:
(2)啟動Hadoop相關背景程序
執行指令行:
[[email protected] hadoop-0.19.0]# bin/start-all.sh
啟動執行資訊如下所示:
[plain] view plaincopyprint?
1. starting namenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-namenode-localhost.out
2. localhost: starting datanode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-datanode-localhost.out
3. localhost: starting secondarynamenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-secondarynamenode-localhost.out
4. starting jobtracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-jobtracker-localhost.out
5. localhost: starting tasktracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-tasktracker-localhost.out
startingnamenode, logging to/root/hadoop-0.19.0/bin/../logs/hadoop-root-namenode-localhost.out
localhost:starting datanode, logging to/root/hadoop-0.19.0/bin/../logs/hadoop-root-datanode-localhost.out
localhost:starting secondarynamenode, logging to/root/hadoop-0.19.0/bin/../logs/hadoop-root-secondarynamenode-localhost.out
startingjobtracker, logging to/root/hadoop-0.19.0/bin/../logs/hadoop-root-jobtracker-localhost.out
localhost:starting tasktracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-tasktracker-localhost.out
(3)準備執行wordcount任務的資料
首先,這裡在本地建立了一個資料目錄input,并拷貝一些檔案到該目錄下面,如下所示:
[[email protected] hadoop-0.19.0]# mkdir input
[[email protected] hadoop-0.19.0]# cp CHANGES.txt LICENSE.txt NOTICE.txtREADME.txt input/
然後,将本地目錄input上傳到HDFS檔案系統上,執行如下指令:
[[email protected] hadoop-0.19.0]# bin/hadoop fs -putinput/ input
(4)啟動wordcount任務
執行如下指令行:
[[email protected] hadoop-0.19.0]# bin/hadoop jarhadoop-0.19.0-examples.jar wordcount input output
中繼資料目錄為input,輸出資料目錄為output。
任務執行資訊如下所示:
[plain] view plaincopyprint?
1. 10/08/01 19:06:15 INFO mapred.FileInputFormat: Total input paths to process : 4
2. 10/08/01 19:06:15 INFO mapred.JobClient: Running job: job_201008011904_0002
3. 10/08/01 19:06:16 INFO mapred.JobClient: map 0% reduce 0%
4. 10/08/01 19:06:22 INFO mapred.JobClient: map 20% reduce 0%
5. 10/08/01 19:06:24 INFO mapred.JobClient: map 40% reduce 0%
6. 10/08/01 19:06:25 INFO mapred.JobClient: map 60% reduce 0%
7. 10/08/01 19:06:27 INFO mapred.JobClient: map 80% reduce 0%
8. 10/08/01 19:06:28 INFO mapred.JobClient: map 100% reduce 0%
9. 10/08/01 19:06:38 INFO mapred.JobClient: map 100% reduce 26%
10. 10/08/01 19:06:40 INFO mapred.JobClient: map 100% reduce 100%
11. 10/08/01 19:06:41 INFO mapred.JobClient: Job complete: job_201008011904_0002
12. 10/08/01 19:06:41 INFO mapred.JobClient: Counters: 16
13. 10/08/01 19:06:41 INFO mapred.JobClient: File Systems
14. 10/08/01 19:06:41 INFO mapred.JobClient: HDFS bytes read=301489
15. 10/08/01 19:06:41 INFO mapred.JobClient: HDFS bytes written=113098
16. 10/08/01 19:06:41 INFO mapred.JobClient: Local bytes read=174004
17. 10/08/01 19:06:41 INFO mapred.JobClient: Local bytes written=348172
18. 10/08/01 19:06:41 INFO mapred.JobClient: Job Counters
19. 10/08/01 19:06:41 INFO mapred.JobClient: Launched reduce tasks=1
20. 10/08/01 19:06:41 INFO mapred.JobClient: Launched map tasks=5
21. 10/08/01 19:06:41 INFO mapred.JobClient: Data-local map tasks=5
22. 10/08/01 19:06:41 INFO mapred.JobClient: Map-Reduce Framework
23. 10/08/01 19:06:41 INFO mapred.JobClient: Reduce input groups=8997
24. 10/08/01 19:06:41 INFO mapred.JobClient: Combine output records=10860
25. 10/08/01 19:06:41 INFO mapred.JobClient: Map input records=7363
26. 10/08/01 19:06:41 INFO mapred.JobClient: Reduce output records=8997
27. 10/08/01 19:06:41 INFO mapred.JobClient: Map output bytes=434077
28. 10/08/01 19:06:41 INFO mapred.JobClient: Map input bytes=299871
29. 10/08/01 19:06:41 INFO mapred.JobClient: Combine input records=39193
30. 10/08/01 19:06:41 INFO mapred.JobClient: Map output records=39193
31. 10/08/01 19:06:41 INFO mapred.JobClient: Reduce input records=10860
10/08/0119:06:15 INFO mapred.FileInputFormat: Total input paths to process : 4
10/08/0119:06:15 INFO mapred.JobClient: Running job: job_201008011904_0002
10/08/0119:06:16 INFO mapred.JobClient: map 0%reduce 0%
10/08/0119:06:22 INFO mapred.JobClient: map 20%reduce 0%
10/08/0119:06:24 INFO mapred.JobClient: map 40%reduce 0%
10/08/0119:06:25 INFO mapred.JobClient: map 60%reduce 0%
10/08/0119:06:27 INFO mapred.JobClient: map 80%reduce 0%
10/08/0119:06:28 INFO mapred.JobClient: map 100%reduce 0%
10/08/0119:06:38 INFO mapred.JobClient: map 100%reduce 26%
10/08/0119:06:40 INFO mapred.JobClient: map 100%reduce 100%
10/08/0119:06:41 INFO mapred.JobClient: Job complete: job_201008011904_0002
10/08/0119:06:41 INFO mapred.JobClient: Counters: 16
10/08/0119:06:41 INFO mapred.JobClient: FileSystems
10/08/0119:06:41 INFO mapred.JobClient: HDFSbytes read=301489
10/08/0119:06:41 INFO mapred.JobClient: HDFSbytes written=113098
10/08/0119:06:41 INFO mapred.JobClient: Localbytes read=174004
10/08/0119:06:41 INFO mapred.JobClient: Localbytes written=348172
10/08/0119:06:41 INFO mapred.JobClient: JobCounters
10/08/0119:06:41 INFO mapred.JobClient: Launched reduce tasks=1
10/08/0119:06:41 INFO mapred.JobClient: Launchedmap tasks=5
10/08/0119:06:41 INFO mapred.JobClient: Data-local map tasks=5
10/08/0119:06:41 INFO mapred.JobClient: Map-Reduce Framework
10/08/0119:06:41 INFO mapred.JobClient: Reduce input groups=8997
10/08/0119:06:41 INFO mapred.JobClient: Combine output records=10860
10/08/0119:06:41 INFO mapred.JobClient: Mapinput records=7363
10/08/0119:06:41 INFO mapred.JobClient: Reduce output records=8997
10/08/0119:06:41 INFO mapred.JobClient: Mapoutput bytes=434077
10/08/0119:06:41 INFO mapred.JobClient: Mapinput bytes=299871
10/08/0119:06:41 INFO mapred.JobClient: Combine input records=39193
10/08/0119:06:41 INFO mapred.JobClient: Mapoutput records=39193
10/08/0119:06:41 INFO mapred.JobClient: Reduce input records=10860
(5)檢視任務執行結果
可以通過如下指令行:
bin/hadoop fs -cat output/*
執行結果,截取部分顯示如下所示:
[plain] view plaincopyprint?
1. vijayarenu 20
2. violations. 1
3. virtual 3
4. vis-a-vis 1
5. visible 1
6. visit 1
7. volume 1
8. volume, 1
9. volumes 2
10. volumes. 1
11. w.r.t 2
12. wait 9
13. waiting 6
14. waiting. 1
15. waits 3
16. want 1
17. warning 7
18. warning, 1
19. warnings 12
20. warnings. 3
21. warranties 1
22. warranty 1
23. warranty, 1
vijayarenu 20
violations. 1
virtual3
vis-a-vis 1
visible1
visit 1
volume 1
volume,1
volumes2
volumes. 1
w.r.t 2
wait 9
waiting6
waiting. 1
waits 3
want 1
warning7
warning, 1
warnings 12
warnings. 3
warranties 1
warranty 1
warranty, 1
(6)終止Hadoop相關背景程序
執行如下指令行:
[[email protected] hadoop-0.19.0]# bin/stop-all.sh
執行資訊如下所示:
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
已經将上面列出的5個程序jobtracker、tasktracker、namenode、datanode、secondarynamenode終止。
異常分析
在進行上述實踐過程中,可能會遇到某種異常情況,大緻分析如下:
1、Call to localhost/127.0.0.1:9000 failed on local exception異常
(1)異常描述
可能你會在執行如下指令行的時候出現:
[[email protected] hadoop-0.19.0]# bin/hadoop jarhadoop-0.19.0-examples.jar wordcount input output
出錯異常資訊如下所示:
[plain] view plaincopyprint?
1. 10/08/01 19:50:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
2. 10/08/01 19:50:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
3. 10/08/01 19:50:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
4. 10/08/01 19:50:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
5. 10/08/01 19:50:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
6. 10/08/01 19:51:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
7. 10/08/01 19:51:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
8. 10/08/01 19:51:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
9. 10/08/01 19:51:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
10. 10/08/01 19:51:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
11. java.lang.RuntimeException: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused
12. at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:323)
13. at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:295)
14. at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:268)
15. at org.apache.hadoop.examples.WordCount.run(WordCount.java:146)
16. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
17. at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)
18. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
19. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
20. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
21. at java.lang.reflect.Method.invoke(Method.java:597)
22. at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
23. at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)
24. at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)
25. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
26. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
27. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
28. at java.lang.reflect.Method.invoke(Method.java:597)
29. at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
30. at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
31. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
32. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
33. at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
34. Caused by: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused
35. at org.apache.hadoop.ipc.Client.call(Client.java:699)
36. at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
37. at $Proxy0.getProtocolVersion(Unknown Source)
38. at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
39. at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
40. at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
41. at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
42. at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
43. at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
44. at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
45. at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
46. at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
47. at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:319)
48. ... 21 more
49. Caused by: java.net.ConnectException: Connection refused
50. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
51. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
52. at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
53. at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
54. at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
55. at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)
56. at org.apache.hadoop.ipc.Client.call(Client.java:685)
57. ... 33 more
10/08/0119:50:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 0 time(s).
10/08/0119:50:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 1 time(s).
10/08/0119:50:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 2 time(s).
10/08/0119:50:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 3 time(s).
10/08/0119:50:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 4 time(s).
10/08/0119:51:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 5 time(s).
10/08/0119:51:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 6 time(s).
10/08/0119:51:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 7 time(s).
10/08/0119:51:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 8 time(s).
10/08/0119:51:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 9 time(s).
java.lang.RuntimeException:java.io.IOException: Call to localhost/127.0.0.1:9000 failed on localexception: Connection refused
atorg.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:323)
atorg.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:295)
atorg.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:268)
atorg.apache.hadoop.examples.WordCount.run(WordCount.java:146)
atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
atorg.apache.hadoop.examples.WordCount.main(WordCount.java:155)
atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
atjava.lang.reflect.Method.invoke(Method.java:597)
atorg.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
atorg.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)
atorg.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)
atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
atjava.lang.reflect.Method.invoke(Method.java:597)
atorg.apache.hadoop.util.RunJar.main(RunJar.java:165)
atorg.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
atorg.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Causedby: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on localexception: Connection refused
at org.apache.hadoop.ipc.Client.call(Client.java:699)
atorg.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.getProtocolVersion(UnknownSource)
atorg.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
atorg.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
atorg.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
atorg.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
atorg.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
atorg.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
atorg.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
atorg.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:319)
... 21 more
Causedby: java.net.ConnectException: Connection refused
atsun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
atsun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
atsun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
atorg.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
atorg.apache.hadoop.ipc.Client.getConnection(Client.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:685)
... 33 more
(2)異常分析
從上述異常資訊分析,這句是關鍵:
Retrying connect to server: localhost/127.0.0.1:9000.
是說在嘗試10次連接配接到“server”時都無法成功,這就說明到server的通信鍊路是不通的。我們已經在hadoop-site.xml中配置了namenode結點的值,如下所示:
[xhtml] view plaincopyprint?
1. <property>
2. <name>fs.default.name</name>
3. <value>hdfs://localhost:9000</value>
4. </property>
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>
是以,敢肯定是無法連接配接到server,也就是很可能namenode程序根本就沒有啟動,更不必談要執行任務了。
上述異常,我模拟的過程是:
格式化了HDFS,但是沒有執行bin/start-all.sh,直接啟動wordcount任務,就出現上述異常。
是以,應該執行bin/start-all.sh以後再啟動wordcount任務。
2、Input path does not exist異常
(1)異常描述
當你在目前hadoop目錄下面建立一個input目錄,并cp某些檔案到裡面,開始執行:
[[email protected] hadoop-0.19.0]# bin/hadoop namenode-format
[[email protected] hadoop-0.19.0]# bin/start-all.sh
這時候,你認為input已經存在,應該可以執行wordcount任務了:
[[email protected] hadoop-0.19.0]# bin/hadoop jarhadoop-0.19.0-examples.jar wordcount input output
結果抛出一堆異常,資訊如下:
[plain] view plaincopyprint?
1. org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/root/input
2. at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
3. at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
4. at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782)
5. at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)
6. at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)
7. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
8. at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)
9. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
10. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
11. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
12. at java.lang.reflect.Method.invoke(Method.java:597)
13. at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
14. at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)
15. at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)
16. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
17. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
18. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
19. at java.lang.reflect.Method.invoke(Method.java:597)
20. at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
21. at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
22. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
23. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
24. at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
org.apache.hadoop.mapred.InvalidInputException:Input path does not exist: hdfs://localhost:9000/user/root/input
atorg.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
atorg.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
atorg.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782)
atorg.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)
at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)
atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
atorg.apache.hadoop.examples.WordCount.main(WordCount.java:155)
atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
atjava.lang.reflect.Method.invoke(Method.java:597)
atorg.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
atorg.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)
atorg.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)
atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
atjava.lang.reflect.Method.invoke(Method.java:597)
atorg.apache.hadoop.util.RunJar.main(RunJar.java:165)
atorg.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
atorg.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
上述異常,我模拟的過程是:
[[email protected] hadoop-0.19.0]# bin/hadoop fs -rmrinput
Deleted hdfs://localhost:9000/user/root/input
[[email protected] hadoop-0.19.0]# bin/hadoop fs -rmroutput
Deleted hdfs://localhost:9000/user/root/output
因為之前我已經成功執行過一次。
(2)異常分析
應該不用多說了,是因為本地的input目錄并沒有上傳到HDFS上,所出現org.apache.hadoop.mapred.InvalidInputException:Input path does not exist: hdfs://localhost:9000/user/root/input
在我的印象中,好像使用hadoop-0.16.4的時候,隻要input目錄存在,是不用執行上傳指令,就可以運作的,後期的版本是不行的。
隻需要執行上傳的指令即可:
[[email protected] hadoop-0.19.0]# bin/hadoop fs -putinput/ input