天天看點

使用hadoop restful api實作對叢集資訊的統計

(适用于hadoop 2.7及以上版本)

resourcemanager rest api’s:

<a href="https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html">https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html</a>

webhdfs rest api:

<a href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/webhdfs.html">https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/webhdfs.html</a>

mapreduce history server rest api’s:

<a href="https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/historyserverrest.html">https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/historyserverrest.html</a>

spark monitoring and instrumentation

<a href="http://spark.apache.org/docs/latest/monitoring.html">http://spark.apache.org/docs/latest/monitoring.html</a>

url

<a href="http://emr-header-1:50070/webhdfs/v1/?user.name=hadoop&amp;op=getcontentsummary">http://emr-header-1:50070/webhdfs/v1/?user.name=hadoop&amp;op=getcontentsummary</a>

傳回結果:

關于傳回結果的說明:

注意length與spaceconsumed的關系,跟hdfs副本數有關。

如果要統計各個組工作目錄的使用情況,使用如下請求:

<a href="http://emr-header-1:50070/webhdfs/v1/user/feed_aliyun?user.name=hadoop&amp;op=getcontentsummary">http://emr-header-1:50070/webhdfs/v1/user/feed_aliyun?user.name=hadoop&amp;op=getcontentsummary</a>

<a href="http://emr-header-1:8088/ws/v1/cluster">http://emr-header-1:8088/ws/v1/cluster</a>

傳回結果

<a href="http://emr-header-1:8088/ws/v1/cluster/scheduler">http://emr-header-1:8088/ws/v1/cluster/scheduler</a>

具體參數說明參考:

<a href="https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html#cluster_application_queue_api">https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html#cluster_application_queue_api</a>

<a href="http://emr-header-1:8088/ws/v1/cluster/apps">http://emr-header-1:8088/ws/v1/cluster/apps</a>

如果要統計固定時間段的,可以加上"?finishedtimebegin={時間戳}&amp;finishedtimeend={時間戳}"參數,例如

<a href="http://emr-header-1:8088/ws/v1/cluster/apps?finishedtimebegin=1496742124000&amp;finishedtimeend=1496742134000">http://emr-header-1:8088/ws/v1/cluster/apps?finishedtimebegin=1496742124000&amp;finishedtimeend=1496742134000</a>

job掃描的資料量,需要通過history server的restful api查詢,mapreduce的和spark的又有一些差異。

<a href="http://emr-header-1:19888/ws/v1/history/mapreduce/jobs/job_1495123166259_0962/counters">http://emr-header-1:19888/ws/v1/history/mapreduce/jobs/job_1495123166259_0962/counters</a>

其中org.apache.hadoop.mapreduce.lib.input.fileinputformatcounter裡面的bytes_read為job掃描的資料量

<a href="http://emr-header-1:18080/api/v1/applications/application_1495123166259_1050/executors">http://emr-header-1:18080/api/v1/applications/application_1495123166259_1050/executors</a>

每個executor的totalinputbytes總和為整個job的資料掃描量。