(适用于hadoop 2.7及以上版本)
resourcemanager rest api’s:
<a href="https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html">https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html</a>
webhdfs rest api:
<a href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/webhdfs.html">https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/webhdfs.html</a>
mapreduce history server rest api’s:
<a href="https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/historyserverrest.html">https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/historyserverrest.html</a>
spark monitoring and instrumentation
<a href="http://spark.apache.org/docs/latest/monitoring.html">http://spark.apache.org/docs/latest/monitoring.html</a>
url
<a href="http://emr-header-1:50070/webhdfs/v1/?user.name=hadoop&op=getcontentsummary">http://emr-header-1:50070/webhdfs/v1/?user.name=hadoop&op=getcontentsummary</a>
傳回結果:
關于傳回結果的說明:
注意length與spaceconsumed的關系,跟hdfs副本數有關。
如果要統計各個組工作目錄的使用情況,使用如下請求:
<a href="http://emr-header-1:50070/webhdfs/v1/user/feed_aliyun?user.name=hadoop&op=getcontentsummary">http://emr-header-1:50070/webhdfs/v1/user/feed_aliyun?user.name=hadoop&op=getcontentsummary</a>
<a href="http://emr-header-1:8088/ws/v1/cluster">http://emr-header-1:8088/ws/v1/cluster</a>
傳回結果
<a href="http://emr-header-1:8088/ws/v1/cluster/scheduler">http://emr-header-1:8088/ws/v1/cluster/scheduler</a>
具體參數說明參考:
<a href="https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html#cluster_application_queue_api">https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html#cluster_application_queue_api</a>
<a href="http://emr-header-1:8088/ws/v1/cluster/apps">http://emr-header-1:8088/ws/v1/cluster/apps</a>
如果要統計固定時間段的,可以加上"?finishedtimebegin={時間戳}&finishedtimeend={時間戳}"參數,例如
<a href="http://emr-header-1:8088/ws/v1/cluster/apps?finishedtimebegin=1496742124000&finishedtimeend=1496742134000">http://emr-header-1:8088/ws/v1/cluster/apps?finishedtimebegin=1496742124000&finishedtimeend=1496742134000</a>
job掃描的資料量,需要通過history server的restful api查詢,mapreduce的和spark的又有一些差異。
<a href="http://emr-header-1:19888/ws/v1/history/mapreduce/jobs/job_1495123166259_0962/counters">http://emr-header-1:19888/ws/v1/history/mapreduce/jobs/job_1495123166259_0962/counters</a>
其中org.apache.hadoop.mapreduce.lib.input.fileinputformatcounter裡面的bytes_read為job掃描的資料量
<a href="http://emr-header-1:18080/api/v1/applications/application_1495123166259_1050/executors">http://emr-header-1:18080/api/v1/applications/application_1495123166259_1050/executors</a>
每個executor的totalinputbytes總和為整個job的資料掃描量。