天天看點

Hadoop Single Node Setup(hadoop本地模式和僞分布式模式安裝-官方文檔翻譯 2.7.3)

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

這個文檔描述了如何安裝和配置一個單節點的Hadoop安裝,這樣很快的通過用Hadoop MapReduce 和Haddop分布式檔案系統(HDFS)執行一些簡單的操作。

GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.

Windows is also a supported platform but the followings steps are for Linux only. To set up Hadoop on Windows, see wiki page.

GNU/Linux可以被支援為開發或生産平台。Hadoop被在2000節點的GNU/Linux檢驗測試過。

Windows是在linux平台之後唯一被支援的平台,對于hadoop在windows上的安裝,看wiki。

Required software for Linux include:

1、Java™ must be installed. Recommended Java versions are described at Hadoop Java Versions.

2、ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

在linux上安裝需要的軟體包括:

1、java必須被安裝,推薦的java版本在hadoop java 版本中被描述。

2、ssh 一定要安裝并且 sshd 一定要處于運作狀态,進而使Hadoop scripts可以管理遠端Hadoop執行個體

If your cluster doesn’t have the requisite software you will need to install it.

For example on Ubuntu Linux:

  $ sudo apt-get install ssh

  $ sudo apt-get install rsync

如果你的叢集沒有安裝必要的軟體,請安裝他們.

以ubuntu linux為例:

$ sudo apt-get install ssh

$ sudo apt-get install rsync

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

為了下載下傳一個Hadoop分布式,從Apache的下載下傳鏡像一個下載下傳位址下載下傳一個最近的穩定的釋出版本。

Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

  # set to the root of your Java installation

  export JAVA_HOME=/usr/java/latest

Try the following command:

  $ bin/hadoop

This will display the usage documentation for the hadoop script.

Now you are ready to start your Hadoop cluster in one of the three supported modes:

Local (Standalone) Mode

Pseudo-Distributed Mode

Fully-Distributed Mode

解壓下載下傳的Hadoop分布式軟體,在這個分布式軟體中,編輯檔案 conf/hadoop-env.sh 定義一些如下參數:

# set to the root of your Java installation

嘗試如下指令:

$ bin/hadoop

這将顯示hadoop腳本的使用幫助。

現在,你已經準備好啟動Hadoop叢集中的三種支援的模式之一

本地(獨立)模式

僞分布式模式

完全分布模式

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

  $ mkdir input

  $ cp etc/hadoop/*.xml input

  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'

  $ cat output/*

預設情況,hadoop的配置可以運作一個非分布式模式,作為一個單獨的java程序,這通常應用在調試上。

就是把加壓的conf目錄的一個拷貝作為輸入目錄,然後查找顯示所給正則表達的每一個比對,輸出到所給出的輸出目錄

$ mkdir input

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Hadoop可以運作在僞分布式模式下運作在一個單一的節點,此時每一個Hadoop守護程序作為一個單獨的Java程序運作。

Use the following:

應用如下配置:

etc/hadoop/core-site.xml:

<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://localhost:9000</value>

    </property>

</configuration>

etc/hadoop/hdfs-site.xml:

        <name>dfs.replication</name>

        <value>1</value>

Now check that you can ssh to the localhost without a passphrase:

  $ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

  $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

  $ chmod 0600 ~/.ssh/authorized_keys

現在可以在本地檢查ssh的免密碼登入:

$ssh localhost

如果不能進行免密碼登入,執行如下指令:

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.

以下指令在本地運作一個mapreduce job,如果你想要執行job在yarn上,看yarn on single node。

Format the filesystem:

格式化檔案系統:

  $ bin/hdfs namenode -format

Start NameNode daemon and DataNode daemon:

啟動namenode程序和datanode程序。

  $ sbin/start-dfs.sh

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

Hadoop守護程序日志被寫入到$HADOOP_LOG_DIR目錄下(預設是$HADOOP_HOME/logs目錄下)。

Browse the web interface for the NameNode; by default it is available at:

Namenode的浏覽器web接口,通過如下位址登入:

NameNode - http://localhost:50070/

Make the HDFS directories required to execute MapReduce jobs:

建立hdfs目錄需要運作mapreduce jobs:

  $ bin/hdfs dfs -mkdir /user

  $ bin/hdfs dfs -mkdir /user/<username>

Copy the input files into the distributed filesystem:

拷貝輸入檔案到分布式檔案系統上:

  $ bin/hdfs dfs -put etc/hadoop input

Run some of the examples provided:

運作如下的例子:

Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:

檢查輸出檔案:拷貝輸出檔案從分布式檔案系統到本地檔案系統執行他們:

  $ bin/hdfs dfs -get output output

Or 或

View the output files on the distributed filesystem:

檢視輸出檔案在分布式檔案系統上:

  $ bin/hdfs dfs -cat output/*

When you’re done, stop the daemons with:

當你不使用時,通過如下指令停止守護程序:

  $ sbin/stop-dfs.sh

You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.

你可以在僞分布式模式下通過設定一些很少的參數開啟resourcemanager和nodemanager守護程序,并且運作一個mapreduce job在yarn上。

The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.

去執行executed章節的1~4步的指令。

Configure parameters as follows:etc/hadoop/mapred-site.xml:

配置如下參數在etc/hadoop/mapred-site.xml檔案:

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

etc/hadoop/yarn-site.xml:

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

Start ResourceManager daemon and NodeManager daemon:

啟動resourcemanager和nodemanager守護程序:

  $ sbin/start-yarn.sh

Browse the web interface for the ResourceManager; by default it is available at:

Resourcemanager的浏覽器通路接口通過如下位址:

ResourceManager - http://localhost:8088/

Run a MapReduce job.

運作一個mapreduce job。

當你不使用時,如下指令停止守護程序:

  $ sbin/stop-yarn.sh

For information on setting up fully-distributed, non-trivial clusters see Cluster Setup.

為了詳細說明完全分布式下的操作步驟,請看cluster setup章節。

繼續閱讀