天天看點

MapReduce wordcount測試卡死在running job

hadoop環境搭建好後,準備用MapReduce自帶的wordcount程式測試一下,跑了幾次總是卡在Running job那裡

-- ::, INFO client.RMProxy: Connecting to ResourceManager at master/:
-- ::, INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hukun/.staging/job_1521944931433_0004
-- ::, INFO input.FileInputFormat: Total input files to process : 
-- ::, INFO mapreduce.JobSubmitter: number of splits:
-- ::, INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
-- ::, INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1521944931433_0004
-- ::, INFO mapreduce.JobSubmitter: Executing with tokens: []
-- ::, INFO conf.Configuration: resource-types.xml not found
-- ::, INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
-- ::, INFO impl.YarnClientImpl: Submitted application application_1521944931433_0004
-- ::, INFO mapreduce.Job: The url to track the job: http://master:/proxy/application_1521944931433_0004/
-- ::, INFO mapreduce.Job: Running job: job_1521944931433_0004
           

我的Hadoop環境搭建見hadoop 3.0 叢集配置(ubuntu環境)

我的wordcount測試步驟如下

随便找一個英文txt文本,push到hdfs,我這裡直接使用了hadoop的license檔案

hdfs dfs -mkdir hdfs://master:9000/wordcount
hdfs dfs -put ~/LICENSE.txt hdfs://master:9000/wordcount/LICENSE.txt
           

cd到hadoop的mapreduce目錄,這裡有hadoop提供的測試程式

cd ~/soft/hadoop-./share/hadoop/mapreduce/
           

執行wordcount測試程式

結果會放在

hdfs://master:9000/wordcount/result

目錄下

發現執行到

INFO mapreduce.Job: Running job: job_1521902469523_0005

卡死了

MapReduce wordcount測試卡死在running job

hadoop的預設日志級别是INFO,很多資訊沒有列印出來,這次我把hadoop的日志列印級别調整為DEBUG

再次執行wordcount測試程式

發現一直在和master進行ipc通信,也看不出其他什麼資訊

-- ::, DEBUG ipc.Client: The ping interval is  ms.
-- ::, DEBUG ipc.Client: Connecting to master/.:
-- ::, DEBUG ipc.Client: IPC Client () connection to master/.: from hukun: starting, having connections 
-- ::, DEBUG ipc.Client: IPC Client () connection to master/.: from hukun sending #0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo
-- ::, DEBUG ipc.Client: IPC Client () connection to master/.: from hukun got value #0
-- ::, DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took ms
-- ::, DEBUG mapred.ResourceMgrDelegate: getStagingAreaDir: dir=/tmp/hadoop-yarn/staging/hukun/.staging
-- ::, DEBUG ipc.Client: IPC Client () connection to master/.: from hukun sending #1 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo
-- ::, DEBUG ipc.Client: IPC Client () connection to master/.: from hukun got value #1
-- ::, DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took ms
-- ::, DEBUG ipc.Client: The ping interval is  ms.
-- ::, DEBUG ipc.Client: Connecting to master/.:
-- ::, DEBUG ipc.Client: IPC Client () connection to master/.: from hukun sending #2 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getNewApplication
-- ::, DEBUG ipc.Client: IPC Client () connection to master/.: from hukun: starting, having connections 
-- ::, DEBUG ipc.Client: IPC Client () connection to master/.: from hukun got value #2
-- ::, DEBUG ipc.ProtobufRpcEngine: Call: getNewApplication took ms
-- ::, DEBUG mapreduce.JobSubmitter: Configuring job job_1521903779426_0001 with /tmp/hadoop-yarn/staging/hukun/.staging/job_1521903779426_0001 as the submit dir
-- ::, DEBUG mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:[hdfs://master:9000]
2018-03-24 08:03:28,830 DEBUG mapreduce.JobResourceUploader: default FileSystem: hdfs://master:9000
2018-03-24 08:03:28,833 DEBUG ipc.Client: IPC Client (658532887) connection to master/192.168.85.3:9000 from hukun sending #3 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo
2018-03-24 08:03:28,835 DEBUG ipc.Client: IPC Client (658532887) connection to master/192.168.85.3:9000 from hukun got value #3
2018-03-24 08:03:28,835 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 2ms
2018-03-24 08:03:28,838 DEBUG hdfs.DFSClient: /tmp/hadoop-yarn/staging/hukun/.staging/job_1521903779426_0001: masked={ masked: rwxr-xr-x, unmasked: rwxrwxrwx }
2018-03-24 08:03:28,852 DEBUG ipc.Client: IPC Client (658532887) connection to master/192.168.85.3:9000 from hukun sending #4 org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs
2018-03-24 08:03:28,884 DEBUG ipc.Client: IPC Client (658532887) connection to master/192.168.85.3:9000 from hukun got value #4
           

重新開機hadoop

stop-all.sh
start-all.sh
           

再次執行wordcount有LOG列印出來了

-- ::, DEBUG retry.RetryInvocationHandler: Exception while invoking call #4 ClientNamenodeProtocolTranslatorPB.mkdirs over null. Not retrying because try once and fail.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hadoop-yarn/staging/hukun/.staging/job_1521903779426_0001. Name node is in safe mode.
The reported blocks  has reached the threshold  of total blocks  The number of live datanodes  has reached the minimum number  In safe mode extension. Safe mode will be turned off automatically in  seconds. NamenodeHostName:master
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:)
           

提示

Cannot delete /tmp/hadoop-yarn/staging/hukun/.staging/job_1521903779426_0001. Name node is in safe mode.

#關閉safenode
hdfs dfsadmin -safemode leave  
stop-all.sh
start-all.sh
           

再次執行wordcount程式,發現仍然卡在ipc通信那裡

MapReduce wordcount測試卡死在running job
後來了解到,出現

Cannot delete /tmp/hadoop-yarn/staging/hukun/.staging/job_1521903779426_0001. Name node is in safe mode.

的原因是執行MapReduce的時候強制按CTRL+C退出了,tmp裡面儲存了job的一些資訊,而safemode是無法删除tmp裡面的這些資訊的

看來是其他了問題了,網上試了各種方法,還是不行,最後把虛拟記憶體調到4G,處理器調成2個,每個處理器兩個核,後來再執行一次就OK了,看來還是虛拟機的記憶體設定太少了的緣故

MapReduce wordcount測試卡死在running job

參考

1. Hadoop開啟關閉調試資訊

2. Hadoop “Name node is in safe mode” 錯誤解決方法

3. 錯誤Name node is in safe mode的解決方法

繼續閱讀