天天看點

linux下搭建hadoop-1.2.1

Hadoop的代碼可以到Apache上下載下傳,連結為http://archive.apache.org/dist/hadoop/core/hadoop-0.19.0/,我使用的Linux機器是ubuntu12.10,Linux上安裝的Java版本為1.7.0_51,并且JAVA_HOME=/usr/java/jdk1.7.0_51。

實踐過程

1、ssh無密碼驗證登陸localhost

保證Linux系統的ssh服務已經啟動,并保證能夠通過無密碼驗證登陸本機Linux系統。如果不能保證,可以按照如下的步驟去做:

(1)啟動指令行視窗,執行指令行:

[plain] view plaincopyprint?

1. $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa   

2. $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys  

$ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

$ cat~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

(2)ssh登陸localhost,執行指令行:

$ ssh localhost

第一次登入,會提示你無法建立到127.0.0.1的連接配接,是否要建立,輸入yes即可,下面是能夠通過無密碼驗證登陸的資訊:

[[email protected] hadoop-0.19.0]# ssh localhost

Last login: Sun Aug  1 18:35:37 2010 from 192.168.0.104

[[email protected] ~]#

2、Hadoop-0.19.0配置

下載下傳hadoop-0.19.0.tar.gz,大約是40.3M,解壓縮到Linux系統指定目錄,這裡我的是/root/hadoop-0.19.0目錄下。

下面按照有序的步驟來說明配置過程:

(1)修改hadoop-env.sh配置

将Java環境的配置進行修改後,并取消注釋“#”,修改後的行為:

  export JAVA_HOME=/usr/java/jdk1.7.0_51

(2)在<configuration>與</configuration>加上3個屬性的配置,修改後的配置檔案内容為:

1、core-site.xml配置檔案

内容配置如下所示:

[xhtml] viewplaincopyprint?

  1. <?xml version="1.0"?>  
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>  
  3. <!-- Put site-specific property overrides in this file. -->  
  4. <configuration>   
  5.   <property>   
  6.     <name>fs.default.name</name>   
  7.     <value>hdfs://localhost:9000</value>   
  8.   </property>   
  9. </configuration>   

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>

2、hdfs-site.xml配置檔案

内容配置如下所示:

[xhtml] viewplaincopyprint?

  1. <?xml version="1.0"?>  
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>  
  3. <!-- Put site-specific property overrides in this file. -->  
  4. <configuration>   
  5.   <property>   
  6.     <name>dfs.replication</name>   
  7.     <value>1</value>   
  8.   </property>   
  9. </configuration>   

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>

3、mapred-site.xml配置檔案

配置内容如下所示:

[xhtml] viewplaincopyprint?

  1. <?xml version="1.0"?>  
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>  
  3. <!-- Put site-specific property overrides in this file. -->  
  4. <configuration>   
  5.   <property>   
  6.     <name>mapred.job.tracker</name>   
  7.     <value>localhost:9001</value>   
  8.   </property>   
  9. </configuration>   

<xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>

3、運作wordcount執行個體

wordcount例子是hadoop發行包中自帶的執行個體,通過運作執行個體可以感受并嘗試了解hadoop在執行MapReduce任務時的執行過程。按照官方的“HadoopQuick Start”教程基本可以容易地實作,下面簡單說一下我的練習過程。

導航到hadoop目錄下面,我的是/root/hadoop-0.19.0。

(1)格式化HDFS

執行格式化HDFS的指令行:

[plain] view plaincopyprint?

1. [[email protected] hadoop-0.19.0]# bin/hadoop namenode -format  

[[email protected]]# bin/hadoop namenode -format

格式化執行資訊如下所示:

[plain] view plaincopyprint?

1. 10/08/01 19:04:02 INFO namenode.NameNode: STARTUP_MSG:   

2.   

9. Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) y  

10.   Format aborted in /tmp/hadoop-root/dfs/name  

11.   10/08/01 19:04:05 INFO namenode.NameNode: SHUTDOWN_MSG:   

12.     

10/08/0119:04:02 INFO namenode.NameNode: STARTUP_MSG:

Re-formatfilesystem in /tmp/hadoop-root/dfs/name ? (Y or N) y

Formataborted in /tmp/hadoop-root/dfs/name

10/08/0119:04:05 INFO namenode.NameNode: SHUTDOWN_MSG:

(2)啟動Hadoop相關背景程序

執行指令行:

[[email protected] hadoop-0.19.0]# bin/start-all.sh

啟動執行資訊如下所示:

[plain] view plaincopyprint?

1. starting namenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-namenode-localhost.out  

2. localhost: starting datanode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-datanode-localhost.out  

3. localhost: starting secondarynamenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-secondarynamenode-localhost.out  

4. starting jobtracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-jobtracker-localhost.out  

5. localhost: starting tasktracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-tasktracker-localhost.out  

startingnamenode, logging to/root/hadoop-0.19.0/bin/../logs/hadoop-root-namenode-localhost.out

localhost:starting datanode, logging to/root/hadoop-0.19.0/bin/../logs/hadoop-root-datanode-localhost.out

localhost:starting secondarynamenode, logging to/root/hadoop-0.19.0/bin/../logs/hadoop-root-secondarynamenode-localhost.out

startingjobtracker, logging to/root/hadoop-0.19.0/bin/../logs/hadoop-root-jobtracker-localhost.out

localhost:starting tasktracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-tasktracker-localhost.out

(3)準備執行wordcount任務的資料

首先,這裡在本地建立了一個資料目錄input,并拷貝一些檔案到該目錄下面,如下所示:

[[email protected] hadoop-0.19.0]# mkdir input

[[email protected] hadoop-0.19.0]# cp CHANGES.txt LICENSE.txt NOTICE.txtREADME.txt input/

然後,将本地目錄input上傳到HDFS檔案系統上,執行如下指令:

[[email protected] hadoop-0.19.0]# bin/hadoop fs -putinput/ input

(4)啟動wordcount任務

執行如下指令行:

[[email protected] hadoop-0.19.0]# bin/hadoop jarhadoop-0.19.0-examples.jar wordcount input output

中繼資料目錄為input,輸出資料目錄為output。

任務執行資訊如下所示:

[plain] view plaincopyprint?

1. 10/08/01 19:06:15 INFO mapred.FileInputFormat: Total input paths to process : 4  

2. 10/08/01 19:06:15 INFO mapred.JobClient: Running job: job_201008011904_0002  

3. 10/08/01 19:06:16 INFO mapred.JobClient:  map 0% reduce 0%  

4. 10/08/01 19:06:22 INFO mapred.JobClient:  map 20% reduce 0%  

5. 10/08/01 19:06:24 INFO mapred.JobClient:  map 40% reduce 0%  

6. 10/08/01 19:06:25 INFO mapred.JobClient:  map 60% reduce 0%  

7. 10/08/01 19:06:27 INFO mapred.JobClient:  map 80% reduce 0%  

8. 10/08/01 19:06:28 INFO mapred.JobClient:  map 100% reduce 0%  

9. 10/08/01 19:06:38 INFO mapred.JobClient:  map 100% reduce 26%  

10.   10/08/01 19:06:40 INFO mapred.JobClient:  map 100% reduce 100%  

11.   10/08/01 19:06:41 INFO mapred.JobClient: Job complete: job_201008011904_0002  

12.   10/08/01 19:06:41 INFO mapred.JobClient: Counters: 16  

13.   10/08/01 19:06:41 INFO mapred.JobClient:   File Systems  

14.   10/08/01 19:06:41 INFO mapred.JobClient:     HDFS bytes read=301489  

15.   10/08/01 19:06:41 INFO mapred.JobClient:     HDFS bytes written=113098  

16.   10/08/01 19:06:41 INFO mapred.JobClient:     Local bytes read=174004  

17.   10/08/01 19:06:41 INFO mapred.JobClient:     Local bytes written=348172  

18.   10/08/01 19:06:41 INFO mapred.JobClient:   Job Counters   

19.   10/08/01 19:06:41 INFO mapred.JobClient:     Launched reduce tasks=1  

20.   10/08/01 19:06:41 INFO mapred.JobClient:     Launched map tasks=5  

21.   10/08/01 19:06:41 INFO mapred.JobClient:     Data-local map tasks=5  

22.   10/08/01 19:06:41 INFO mapred.JobClient:   Map-Reduce Framework  

23.   10/08/01 19:06:41 INFO mapred.JobClient:     Reduce input groups=8997  

24.   10/08/01 19:06:41 INFO mapred.JobClient:     Combine output records=10860  

25.   10/08/01 19:06:41 INFO mapred.JobClient:     Map input records=7363  

26.   10/08/01 19:06:41 INFO mapred.JobClient:     Reduce output records=8997  

27.   10/08/01 19:06:41 INFO mapred.JobClient:     Map output bytes=434077  

28.   10/08/01 19:06:41 INFO mapred.JobClient:     Map input bytes=299871  

29.   10/08/01 19:06:41 INFO mapred.JobClient:     Combine input records=39193  

30.   10/08/01 19:06:41 INFO mapred.JobClient:     Map output records=39193  

31.   10/08/01 19:06:41 INFO mapred.JobClient:     Reduce input records=10860  

10/08/0119:06:15 INFO mapred.FileInputFormat: Total input paths to process : 4

10/08/0119:06:15 INFO mapred.JobClient: Running job: job_201008011904_0002

10/08/0119:06:16 INFO mapred.JobClient:  map 0%reduce 0%

10/08/0119:06:22 INFO mapred.JobClient:  map 20%reduce 0%

10/08/0119:06:24 INFO mapred.JobClient:  map 40%reduce 0%

10/08/0119:06:25 INFO mapred.JobClient:  map 60%reduce 0%

10/08/0119:06:27 INFO mapred.JobClient:  map 80%reduce 0%

10/08/0119:06:28 INFO mapred.JobClient:  map 100%reduce 0%

10/08/0119:06:38 INFO mapred.JobClient:  map 100%reduce 26%

10/08/0119:06:40 INFO mapred.JobClient:  map 100%reduce 100%

10/08/0119:06:41 INFO mapred.JobClient: Job complete: job_201008011904_0002

10/08/0119:06:41 INFO mapred.JobClient: Counters: 16

10/08/0119:06:41 INFO mapred.JobClient:   FileSystems

10/08/0119:06:41 INFO mapred.JobClient:     HDFSbytes read=301489

10/08/0119:06:41 INFO mapred.JobClient:     HDFSbytes written=113098

10/08/0119:06:41 INFO mapred.JobClient:     Localbytes read=174004

10/08/0119:06:41 INFO mapred.JobClient:     Localbytes written=348172

10/08/0119:06:41 INFO mapred.JobClient:   JobCounters

10/08/0119:06:41 INFO mapred.JobClient:    Launched reduce tasks=1

10/08/0119:06:41 INFO mapred.JobClient:     Launchedmap tasks=5

10/08/0119:06:41 INFO mapred.JobClient:    Data-local map tasks=5

10/08/0119:06:41 INFO mapred.JobClient:  Map-Reduce Framework

10/08/0119:06:41 INFO mapred.JobClient:    Reduce input groups=8997

10/08/0119:06:41 INFO mapred.JobClient:    Combine output records=10860

10/08/0119:06:41 INFO mapred.JobClient:     Mapinput records=7363

10/08/0119:06:41 INFO mapred.JobClient:    Reduce output records=8997

10/08/0119:06:41 INFO mapred.JobClient:     Mapoutput bytes=434077

10/08/0119:06:41 INFO mapred.JobClient:     Mapinput bytes=299871

10/08/0119:06:41 INFO mapred.JobClient:    Combine input records=39193

10/08/0119:06:41 INFO mapred.JobClient:     Mapoutput records=39193

10/08/0119:06:41 INFO mapred.JobClient:    Reduce input records=10860

(5)檢視任務執行結果

可以通過如下指令行:

bin/hadoop fs -cat output/*

執行結果,截取部分顯示如下所示:

[plain] view plaincopyprint?

1. vijayarenu      20  

2. violations.     1  

3. virtual 3  

4. vis-a-vis       1  

5. visible 1  

6. visit   1  

7. volume  1  

8. volume, 1  

9. volumes 2  

10.   volumes.        1  

11.   w.r.t   2  

12.   wait    9  

13.   waiting 6  

14.   waiting.        1  

15.   waits   3  

16.   want    1  

17.   warning 7  

18.   warning,        1  

19.   warnings        12  

20.   warnings.       3  

21.   warranties      1  

22.   warranty        1  

23.   warranty,       1  

vijayarenu      20

violations.     1

virtual3

vis-a-vis       1

visible1

visit   1

volume  1

volume,1

volumes2

volumes.        1

w.r.t   2

wait    9

waiting6

waiting.        1

waits   3

want    1

warning7

warning,        1

warnings        12

warnings.       3

warranties      1

warranty        1

warranty,       1

(6)終止Hadoop相關背景程序

執行如下指令行:

[[email protected] hadoop-0.19.0]# bin/stop-all.sh

執行資訊如下所示:

stopping jobtracker

localhost: stopping tasktracker

stopping namenode

localhost: stopping datanode

localhost: stopping secondarynamenode 

已經将上面列出的5個程序jobtracker、tasktracker、namenode、datanode、secondarynamenode終止。

異常分析

在進行上述實踐過程中,可能會遇到某種異常情況,大緻分析如下:

1、Call to localhost/127.0.0.1:9000 failed on local exception異常

(1)異常描述

可能你會在執行如下指令行的時候出現:

[[email protected] hadoop-0.19.0]# bin/hadoop jarhadoop-0.19.0-examples.jar wordcount input output

出錯異常資訊如下所示:

[plain] view plaincopyprint?

1. 10/08/01 19:50:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).  

2. 10/08/01 19:50:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).  

3. 10/08/01 19:50:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).  

4. 10/08/01 19:50:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).  

5. 10/08/01 19:50:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).  

6. 10/08/01 19:51:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).  

7. 10/08/01 19:51:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).  

8. 10/08/01 19:51:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).  

9. 10/08/01 19:51:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).  

10.   10/08/01 19:51:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).  

11.   java.lang.RuntimeException: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused  

12.           at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:323)  

13.           at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:295)  

14.           at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:268)  

15.           at org.apache.hadoop.examples.WordCount.run(WordCount.java:146)  

16.           at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)  

17.           at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)  

18.           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  

19.           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)  

20.           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)  

21.           at java.lang.reflect.Method.invoke(Method.java:597)  

22.           at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)  

23.           at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)  

24.           at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)  

25.           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  

26.           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)  

27.           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)  

28.           at java.lang.reflect.Method.invoke(Method.java:597)  

29.           at org.apache.hadoop.util.RunJar.main(RunJar.java:165)  

30.           at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)  

31.           at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)  

32.           at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)  

33.           at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)  

34.   Caused by: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused  

35.           at org.apache.hadoop.ipc.Client.call(Client.java:699)  

36.           at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)  

37.           at $Proxy0.getProtocolVersion(Unknown Source)  

38.           at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)  

39.           at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)  

40.           at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)  

41.           at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)  

42.           at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)  

43.           at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)  

44.           at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)  

45.           at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)  

46.           at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)  

47.           at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:319)  

48.           ... 21 more  

49.   Caused by: java.net.ConnectException: Connection refused  

50.           at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)  

51.           at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)  

52.           at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)  

53.           at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)  

54.           at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)  

55.           at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)  

56.           at org.apache.hadoop.ipc.Client.call(Client.java:685)  

57.           ... 33 more   

10/08/0119:50:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 0 time(s).

10/08/0119:50:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 1 time(s).

10/08/0119:50:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 2 time(s).

10/08/0119:50:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 3 time(s).

10/08/0119:50:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 4 time(s).

10/08/0119:51:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 5 time(s).

10/08/0119:51:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 6 time(s).

10/08/0119:51:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 7 time(s).

10/08/0119:51:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 8 time(s).

10/08/0119:51:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000.Already tried 9 time(s).

java.lang.RuntimeException:java.io.IOException: Call to localhost/127.0.0.1:9000 failed on localexception: Connection refused

        atorg.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:323)

        atorg.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:295)

        atorg.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:268)

        atorg.apache.hadoop.examples.WordCount.run(WordCount.java:146)

        atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        atorg.apache.hadoop.examples.WordCount.main(WordCount.java:155)

        atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        atjava.lang.reflect.Method.invoke(Method.java:597)

        atorg.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)

        atorg.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)

        atorg.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)

        atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        atjava.lang.reflect.Method.invoke(Method.java:597)

        atorg.apache.hadoop.util.RunJar.main(RunJar.java:165)

        atorg.apache.hadoop.mapred.JobShell.run(JobShell.java:54)

        atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

        atorg.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

Causedby: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on localexception: Connection refused

        at org.apache.hadoop.ipc.Client.call(Client.java:699)

        atorg.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

        at $Proxy0.getProtocolVersion(UnknownSource)

        atorg.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)

        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)

        atorg.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)

        atorg.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)

        atorg.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)

        atorg.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)

        atorg.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)

        atorg.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)

        atorg.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:319)

        ... 21 more

Causedby: java.net.ConnectException: Connection refused

        atsun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        atsun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)

        atsun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)

        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)

        atorg.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)

        atorg.apache.hadoop.ipc.Client.getConnection(Client.java:772)

        at org.apache.hadoop.ipc.Client.call(Client.java:685)

        ... 33 more

(2)異常分析 

從上述異常資訊分析,這句是關鍵:

Retrying connect to server: localhost/127.0.0.1:9000.

是說在嘗試10次連接配接到“server”時都無法成功,這就說明到server的通信鍊路是不通的。我們已經在hadoop-site.xml中配置了namenode結點的值,如下所示:

[xhtml] view plaincopyprint?

1. <property>   

2.   <name>fs.default.name</name>   

3.   <value>hdfs://localhost:9000</value>   

4. </property>  

<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>

是以,敢肯定是無法連接配接到server,也就是很可能namenode程序根本就沒有啟動,更不必談要執行任務了。

上述異常,我模拟的過程是:

格式化了HDFS,但是沒有執行bin/start-all.sh,直接啟動wordcount任務,就出現上述異常。

是以,應該執行bin/start-all.sh以後再啟動wordcount任務。

2、Input path does not exist異常

(1)異常描述

當你在目前hadoop目錄下面建立一個input目錄,并cp某些檔案到裡面,開始執行:

[[email protected] hadoop-0.19.0]# bin/hadoop namenode-format

[[email protected] hadoop-0.19.0]# bin/start-all.sh

這時候,你認為input已經存在,應該可以執行wordcount任務了:

[[email protected] hadoop-0.19.0]# bin/hadoop jarhadoop-0.19.0-examples.jar wordcount input output

結果抛出一堆異常,資訊如下:

[plain] view plaincopyprint?

1. org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/root/input  

2.         at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)  

3.         at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)  

4.         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782)  

5.         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)  

6.         at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)  

7.         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)  

8.         at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)  

9.         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  

10.           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)  

11.           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)  

12.           at java.lang.reflect.Method.invoke(Method.java:597)  

13.           at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)  

14.           at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)  

15.           at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)  

16.           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  

17.           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)  

18.           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)  

19.           at java.lang.reflect.Method.invoke(Method.java:597)  

20.           at org.apache.hadoop.util.RunJar.main(RunJar.java:165)  

21.           at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)  

22.           at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)  

23.           at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)  

24.           at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)  

org.apache.hadoop.mapred.InvalidInputException:Input path does not exist: hdfs://localhost:9000/user/root/input

        atorg.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)

        atorg.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)

        atorg.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782)

        atorg.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)

        at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)

        atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        atorg.apache.hadoop.examples.WordCount.main(WordCount.java:155)

        atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        atjava.lang.reflect.Method.invoke(Method.java:597)

        atorg.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)

        atorg.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)

        atorg.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)

        atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        atjava.lang.reflect.Method.invoke(Method.java:597)

        atorg.apache.hadoop.util.RunJar.main(RunJar.java:165)

        atorg.apache.hadoop.mapred.JobShell.run(JobShell.java:54)

        atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

        atorg.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

上述異常,我模拟的過程是:

[[email protected] hadoop-0.19.0]# bin/hadoop fs -rmrinput

Deleted hdfs://localhost:9000/user/root/input

[[email protected] hadoop-0.19.0]# bin/hadoop fs -rmroutput

Deleted hdfs://localhost:9000/user/root/output 

因為之前我已經成功執行過一次。

(2)異常分析

應該不用多說了,是因為本地的input目錄并沒有上傳到HDFS上,所出現org.apache.hadoop.mapred.InvalidInputException:Input path does not exist: hdfs://localhost:9000/user/root/input

在我的印象中,好像使用hadoop-0.16.4的時候,隻要input目錄存在,是不用執行上傳指令,就可以運作的,後期的版本是不行的。

隻需要執行上傳的指令即可:

[[email protected] hadoop-0.19.0]# bin/hadoop fs -putinput/ input

繼續閱讀