天天看點

在Ubuntu上安裝Hadoop(單機模式)步驟

1. 安裝jdk:

sudo apt-get install openjdk-6-jdk

2. 配置ssh:

安裝ssh:

apt-get install openssh-server

為運作hadoop的使用者生成一個SSH key:

$ ssh-keygen -t rsa -P ""

讓你可以通過新生成的key來登入本地機器:

$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

3. 安裝hadoop:

下載下傳hadoop tar.gz包

并解壓:

tar -zxvf hadoop-2.2.0.tar.gz

4. 配置:

- 在~/.bashrc檔案中添加:

export HADOOP_HOME=/usr/local/hadoop

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64

export PATH=$PATH:$HADOOP_HOME/bin

在修改完成後儲存,重新登入,相應的環境變量就配置好了。

- 配置hadoop-env.sh:

- 配置hdfs-site.xml:

<property>

<name>hadoop.tmp.dir</name>

<value>/app/hadoop/tmp</value>

<description>A base for other temporary directories.</description>

</property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

<description>The name of the default file system. A URI whose

scheme and

authority determine the FileSystem implementation. The

uri's scheme determines the

config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's

authority is used to

determine the host, port, etc. for a filesystem.</description>

- 配置mapred-site.xml:

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

<name>dfs.replication</name>

<value>1</value>

<description>Default block replication.

The actual number of replications can be

specified when the file is created.

The default is used if replication is not specified

in create time.

5. 通過 NameNode 來格式化 HDFS 檔案系統

$ /usr/local/hadoop/bin/hadoop namenode -format

6. 運作hadoop

$ /usr/local/hadoop/sbin/start-all.sh

7. 檢查hadoop的運作狀況

- 使用jps來檢查hadoop的運作狀況:

$ jps

- 使用netstat 指令來檢查 hadoop 是否正常運作:

$ sudo netstat -plten | grep java

8. 停止運作hadoop:

$ /usr/local/hadoop/bins/stop-all.sh