1. 安裝jdk:
sudo apt-get install openjdk-6-jdk
2. 配置ssh:
安裝ssh:
apt-get install openssh-server
為運作hadoop的使用者生成一個SSH key:
$ ssh-keygen -t rsa -P ""
讓你可以通過新生成的key來登入本地機器:
$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
3. 安裝hadoop:
下載下傳hadoop tar.gz包
并解壓:
tar -zxvf hadoop-2.2.0.tar.gz
4. 配置:
- 在~/.bashrc檔案中添加:
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64
export PATH=$PATH:$HADOOP_HOME/bin
在修改完成後儲存,重新登入,相應的環境變量就配置好了。
- 配置hadoop-env.sh:
- 配置hdfs-site.xml:
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose
scheme and
authority determine the FileSystem implementation. The
uri's scheme determines the
config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's
authority is used to
determine the host, port, etc. for a filesystem.</description>
- 配置mapred-site.xml:
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be
specified when the file is created.
The default is used if replication is not specified
in create time.
5. 通過 NameNode 來格式化 HDFS 檔案系統
$ /usr/local/hadoop/bin/hadoop namenode -format
6. 運作hadoop
$ /usr/local/hadoop/sbin/start-all.sh
7. 檢查hadoop的運作狀況
- 使用jps來檢查hadoop的運作狀況:
$ jps
- 使用netstat 指令來檢查 hadoop 是否正常運作:
$ sudo netstat -plten | grep java
8. 停止運作hadoop:
$ /usr/local/hadoop/bins/stop-all.sh