天天看點

docker建構hadoop鏡像并運作單機版hadoop使用docker建構及運作

單機版hadoop使用docker建構及運作

一、環境:

元件資訊

元件 版本

CentOS 7.9.2009

java 1.8.0_161

hadoop 3.1.3

docker 20.10.8

服務配置

機器 服務

node1 datanode

node1 namenode

node1 resourcemanager

node1 nodemanager

node1 secondrynamenode

二、準備鏡像

使用最新版本的centOS.

docker pull centos:latest
           

三、下載下傳軟體包

1.下載下傳hadoop,此處使用的是3.1.3版本

2.下載下傳jdk

四、啟動容器

由于鏡像中不包含wget,也沒有預先安裝sshd,傳統的scp與http方式均無法傳輸,需要通過bind mount的方式啟動鏡像,來完成檔案傳輸.

此處使用本機的/export/software目錄

docker run -it --name hadoop -v /export/software:/usr/local/software centos:latest
           

五、安裝jdk與hadoop

将軟體包放置到/export/software,可以在容器/usr/local/software看到對應安裝包

先做一個目錄規劃.

/usr/local/bigdata/jdk              作為jdk目錄
/usr/local/bigdata/hadoop           hadoop的目錄 包含jar包 啟動腳本 hadoop配置等
/usr/local/bigdata/logs             存放日志,友善查閱 這個後邊用hadoop使用者建立
           

解壓軟體包

## 建立目錄并拷貝軟體包
mkdir /usr/local/bigdata
cp /usr/local/software /usr/local/bigdata
cd /usr/local/bigdata
## 解壓後重命名
tar -zxvf hadoop-3.1.3.tar.gz
tar -zxvf jdk-8u161-linux-x64.tar.gz
mv hadoop-3.1.3 hadoop
mv jdk1.8.0_281/ jdk
## 清理安裝包 減小容器大小
rm -f hadoop-3.1.3.tar.gz 
rm -f jdk-8u161-linux-x64.tar.gz
           

六、安裝sshd

hadoop節點間通過ssh操作,預設鏡像中并不包含sshd服務,因為需要安裝.

yum update
           

一路Y回車.更新完yum後安裝sshd

yum install -y openssl openssh-server
yum install openssh*
           

一路回車,建立密鑰并啟動ssh服務

ssh-keygen -t rsa
ssh-keygen -t dsa
ssh-keygen -t ecdsa
ssh-keygen -t ed25519
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys 
           

修改sshd的配置檔案

vi /etc/ssh/sshd_config
           

修改部分為

### 原内容
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_ecdsa_key
HostKey /etc/ssh/ssh_host_ed25519_key
### 修改為
HostKey /root/.ssh/id_rsa
HostKey /root/.ssh/id_ecdsa
HostKey /root/.ssh/id_ed25519
HostKey /root/.ssh/id_dsa
           

允許遠端登陸

vi /etc/pam.d/sshd
# 使用#注釋掉此行
# account    required     pam_nologin.so
           

啟動sshd服務并檢視狀态

/usr/sbin/sshd
ps -ef | grep sshd
           

啟動成功

root       311     1  0 06:43 ?        00:00:00 /usr/sbin/sshd
root       332     1  0 06:44 pts/0    00:00:00 grep --color=auto sshd
           

七、安裝net-tools

yum install net-tools
           

八、配置環境變量

接上步,root使用者

vi ~/.bashrc
           

替換内容如下

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

# User specific environment
if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]]
then
    PATH="$HOME/.local/bin:$HOME/bin:$PATH"
fi

export JAVA_HOME=/usr/local/bigdata/jdk
export CLASSPATH=$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin

# hadoop env
export HADOOP_HOME=/usr/local/bigdata/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin

PATH=$PATH:$HOME/bin
export PATH

# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=

# User specific aliases and functions
           

:wq儲存,更新環境變量

source ~/.bash_profile
           

接下來更新hadoop配置,首先修改core-site.xml.

cd /usr/local/bigdata/hadoop/etc/hadoop
vi core-site.xml
           

替換内容如下:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow" ?>

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:8020</value>
    </property>
</configuration>
           

:wq儲存,建立日志目錄

mkdir /usr/local/bigdata/logs
           

接下來修改hadoop-env.sh,

rm -f hadoop-env.sh
vi hadoop-env.sh
           

替換内容如下:

export JAVA_HOME=${JAVA_HOME}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
  if [ "$HADOOP_CLASSPATH" ]; then
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
  else
    export HADOOP_CLASSPATH=$f
  fi
done
export HADOOP_HEAPSIZE=1024
export HADOOP_NAMENODE_INIT_HEAPSIZE=1024
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HDFS_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HDFS_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
export HDFS_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx1024m $HADOOP_PORTMAP_OPTS"
export HADOOP_CLIENT_OPTS="-Xmx1024m $HADOOP_CLIENT_OPTS"
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}
export HADOOP_SECURE_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_IDENT_STRING=hadoop
export HADOOP_LOG_DIR=/usr/local/bigdata/logs

vi hdfs-site.xml
           

替換内容如下

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow" ?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
        <property>
        <name>dfs.namenode.http-address</name>
        <value>HOSTNAME:9870</value>
    </property>
</configuration>
           

九、初始化namenode

執行下列指令初始化namenode

hdfs namenode  -format
           

十、啟動hadoop

cd /usr/local/bigdata/hadoop/sbin
           

1、在start-dfs.sh、stop-dfs.sh檔案中的上面中添加啟動使用者

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
           

2、在start-yarn.sh、stop-yarn.sh檔案的上面添加啟動使用者

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
           

3、啟動

cd /usr/local/bigdata/hadoop/sbin
./start-dfs.sh 
./start-yarn.sh 
           

jps指令檢視,啟動成功

1122 SecondaryNameNode
900 DataNode
1399 ResourceManager
1849 Jps
779 NameNode
1517 NodeManager
           

十一、停止hadoop配置啟動腳本

1、停止程式

cd /usr/local/bigdata/hadoop/sbin
./stop-dfs.sh 
./stop-yarn.sh 
           

2、分别建立core-site.xml.template以及hdfs-site.xml.template

hdfs-site.xml.template

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow" ?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>HOSTNAME:9870</value>
    </property>
</configuration>
           

core-site.xml.template

<configuration>
      <property>
          <name>fs.defaultFS</name>
          <value>hdfs://HOSTNAME:9000</value>
      </property>
  </configuration>
           

3、建立啟動腳本

vi /etc/bootstrap.sh

#!/bin/bash
source ~/.bash_profile
source /etc/profile

rm -rf /tmp/*
: ${HADOOP_PREFIX:=/usr/local/bigdata/hadoop}
$HADOOP_PREFIX/bin/hdfs namenode  -format

$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh


# installing libraries if any - (resource urls added comma separated to the ACP system variable)
cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == $cp; curl -LO $cp ; done; cd -

# altering the core-site configuration
sed s/HOSTNAME/$HOSTNAME/ /usr/local/bigdata/hadoop/etc/hadoop/core-site.xml.template > /usr/local/bigdata/hadoop/etc/hadoop/core-site.xml
sed s/HOSTNAME/$HOSTNAME/ /usr/local/bigdata/hadoop/etc/hadoop/hdfs-site.xml.template > /usr/local/bigdata/hadoop/etc/hadoop/hdfs-site.xml


/usr/sbin/sshd
$HADOOP_PREFIX/sbin/start-dfs.sh
$HADOOP_PREFIX/sbin/start-yarn.sh

if [[ $1 == "-d" ]]; then
  while true; do sleep 1000; done
fi

if [[ $1 == "-bash" ]]; then
  /bin/bash
fi
           

十二、容器導出為鏡像

到目前位置hadoop的安裝配置啟動就完成了,後面需要将這個容器導出為鏡像,然後從這個鏡像啟動多個容器執行個體來搭建單機叢集

docker export hadoop > hadoop.tar
           

導入成鏡像

docker import hadoop.tar hadoop:3.1.3
           

十三、運作容器檢視啟動情況

1、運作

docker run --name hadoop3.1.3 -i -t -p 8020:8020 -p 9870:9870 -p 8088:8088 -p 8040:8040 -p 8042:8042 -p 49707:49707 -p 50010:50010 -p 50075:50075 -p 50090:50090 hadoop3.1.3 /etc/bootstrap.sh -bash
           

2、進入容器檢視啟動情況

docker exec -it hadoop3.1.3 bash
           

jps

1041 NodeManager
914 ResourceManager
644 SecondaryNameNode
1431 Jps
408 DataNode
269 NameNode
           

web端通路

主控端ip:9870

主控端ip:8088

繼續閱讀