Hadoop之MapReduce指令

概述

全部的Hadoop指令都通過bin/mapred腳本調用。

在沒有不論什麼參數的情況下。執行mapred腳本将列印該指令描寫叙述。

使用：mapred [--config confdir] COMMAND

[hadoop@hadoopcluster78 bin]$ mapred
Usage: mapred [--config confdir] COMMAND
       where COMMAND is one of:
  pipes                run a Pipes job
  job                  manipulate MapReduce jobs
  queue                get information regarding JobQueues
  classpath            prints the class path needed for running
                       mapreduce subcommands
  historyserver        run job history servers as a standalone daemon
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  hsadmin              job history server admin interface

Most commands print help when invoked w/o parameters.

使用者指令

對于Hadoop叢集使用者非常實用的指令：

classpath

列印須要得到Hadoop的jar和所須要的lib包路徑，hdfs，yarn腳本都有這個指令。

使用: mapred classpath

distcp

遞歸的複制檔案或者檔案夾，檢視該篇中的示範樣例：Hadoop之指令指南。

job

通過job指令和MapReduce任務互動。

參數選項	描寫叙述
-submit job-file	送出一個job.
-status job-id	列印map任務和reduce任務完畢百分比和全部JOB的計數器。
-counter job-id group-name counter-name	列印計數器的值。
-kill job-id	依據job-id殺掉指定job.
-events job-id from-event-# #-of-events	列印給力訪問内jobtracker接受到的事件細節。（用法見示範樣例）
-history [all]jobOutputDir	列印JOB的細節，失敗和殺掉原因的細節。很多其它的關于一個作業的細節比方:成功的任務和每一個任務嘗試等資訊能夠通過指定[all]選項檢視。
-list [all]	列印目前正在執行的JOB，假設加了all。則列印全部的JOB。
-kill-task task-id	Kill任務，殺掉的任務不記錄失敗重試的數量。
-fail-task task-id	Fail任務。殺掉的任務不記錄失敗重試的數量。預設任務的嘗試次數是4次超過四次則不嘗試。那麼假設使用fail-task指令fail同一個任務四次，這個任務将不會繼續嘗試，并且會導緻整個JOB失敗。
-set-priority job-id priority	改變JOB的優先級。同意的優先級有：VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW

示範樣例：

[hadoop@hadoopcluster78 bin]$ mapred job -events job_1437364567082_0109 0 100
15/08/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
Task completion events for job_1437364567082_0109
Number of events (from 0) are: 1
SUCCEEDED attempt_1437364567082_0109_m_000016_0 http://hadoopcluster83:13562/tasklog?plaintext=true&attemptid=attempt_1437364567082_0109_m_000016_0

[hadoop@hadoopcluster78 bin]$ mapred job -kill-task attempt_1437364567082_0111_m_000000_4
15/08/13 15:51:25 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
Killed task attempt_1437364567082_0111_m_000000_4

pipes

執行pipes JOB。

關于pipe。檢視：Hadoop pipes程式設計

Hadoop pipes同意C++程式猿編寫mapreduce程式。

它同意使用者混用C++和Java的RecordReader。 Mapper。 Partitioner。Rducer和RecordWriter等五個元件。

Usage: mapred pipes [-conf <path>] [-jobconf <key=value>, <key=value>, ...] [-input <path>] [-output <path>] [-jar <jar file>] [-inputformat <class>] [-map <class>] [-partitioner <class>] [-reduce <class>] [-writer <class>] [-program <executable>] [-reduces <num>]


-conf path	Job的配置檔案路徑。
-jobconf key=value, key=value, …	添加/重載 JOB的配置。
-input path	輸入路徑
-output path	輸出路徑
-jar jar file	JAR檔案名稱
-inputformat class	InputFormat類
-map class	Java Map 類
-partitioner class	Java Partitioner
-reduce class	Java Reduce 類
-writer class	Java RecordWriter
-program executable	可運作的URI
-reduces num	reduce的數量

queue

該指令用于互動和檢視Job Queue資訊。

使用: mapred queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]

-list 擷取在系統配置的Job Queues清單。已經Job Queues的排程資訊。

-info job-queue-name [-showJobs]


-list	擷取在系統配置的Job Queues清單。已經Job Queues的排程資訊。
-info job-queue-name [-showJobs]	顯示一個指定Job Queue的資訊和它的排程資訊。假設使用 `-showJobs選項，則顯示目前正在執行的JOB清單。`
-showacls	顯示隊列名和同意目前使用者對隊列的相關操作。這個指令列印的指令是目前使用者能夠訪問的。

顯示一個指定Job Queue的資訊和它的排程資訊。

假設使用

-showJobs選項，則顯示目前正在執行的JOB清單。

-showacls

顯示隊列名和同意目前使用者對隊列的相關操作。

這個指令列印的指令是目前使用者能夠訪問的。

[hadoop@hadoopcluster78 bin]$ mapred queue -list
15/08/13 14:25:30 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
======================
Queue Name : default 
Queue State : running 
Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 47.5

[hadoop@hadoopcluster78 bin]$ mapred queue -info default
15/08/13 14:28:45 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
======================
Queue Name : default 
Queue State : running 
Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 72.5

[hadoop@hadoopcluster78 bin]$ mapred queue -info default -showJobs
15/08/13 14:29:08 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
======================
Queue Name : default 
Queue State : running 
Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 72.5 
Total jobs:1
                  JobId         State         StartTime        UserName           Queue      Priority     UsedContainers     RsvdContainers     UsedMem     RsvdMem     NeededMem       AM info
 job_1437364567082_0107       RUNNING     1439447102615            root         default        NORMAL                 28                  0      29696M          0M        29696M    http://hadoopcluster79:8088/proxy/application_1437364567082_0107/

[hadoop@hadoopcluster78 bin]$ mapred queue -showacls
15/08/13 14:31:44 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
Queue acls for user :  hadoop

Queue  Operations
=====================
root  ADMINISTER_QUEUE,SUBMIT_APPLICATIONS
default  ADMINISTER_QUEUE,SUBMIT_APPLICATIONS

管理者指令

下面是對hadoop叢集超級管理者非常實用的指令。

historyserver

啟動JobHistoryServer服務。

使用: mapred historyserver

hsadmin

參數配置
-refreshUserToGroupsMappings	重新整理使用者-組的相應關系。
-refreshSuperUserGroupsConfiguration	重新整理超級使用者代理組映射
-refreshAdminAcls	重新整理JobHistoryServer管理的ACL
-refreshLoadedJobCache	重新整理JobHistoryServer載入JOB的緩存
-refreshJobRetentionSettings	重新整理Job histroy旗艦，job cleaner被設定。
-refreshLogRetentionSettings	重新整理日志保留周期和日志保留的檢查間隔
-getGroups [username]	擷取這個username屬于哪個組
-help [cmd]	幫助

[hadoop@hadoopcluster78 bin]$ mapred hsadmin -getGroups hadoop
hadoop : clustergroup

Hadoop之MapReduce指令

概述

使用者指令

archive

classpath

distcp

job

pipes

queue

管理者指令

historyserver

hsadmin

繼續閱讀

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

Ambari介紹和架構原理

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method