天天看點

centos 安裝Deepo記錄centos 安裝Deepo記錄

centos 安裝Deepo記錄

Deepo官方文檔[https://github.com/ufoym/deepo?tdsourcetag=s_pctim_aiomsg]

背景檢查

1.系統版本

此次安裝均根據官方文檔進行安裝。

Get Docker CE for Centos[https://docs.docker.com/install/linux/docker-ce/centos/]

Prerequisites

Docker EE customers

To install Docker Enterprise Edition (Docker EE), go to Get Docker EE for CentOS instead of this topic.

To learn more about Docker EE, see Docker Enterprise Edition.

OS requirements

To install Docker CE, you need a maintained version of CentOS 7. Archived versions aren’t supported or tested.

The centos-extras repository must be enabled. This repository is enabled by default, but if you have disabled it, you need to re-enable it.

The overlay2 storage driver is recommended.

大緻意思為:centos的版本需要為

Centos7

centos-extras

在centos7中為預設啟動,存儲驅動推薦使用

overlay2

.

檢視系統版本:

[[email protected] ~]# uname -r
3.10.0-957.1.3.el7.x86_64
[[email protected] ~]# cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core) 
           

系統版本為Centos7,且核心的版本為3.10

2.檢視是否曾經安裝過docker相關

docker早期在系統中安裝被稱作

docker

或者

docker-engine

。使用以下指令,來删除系統中曾經安裝的docker:

[[email protected] ~]# sudo yum remove docker \
>                   docker-client \
>                   docker-client-latest \
>                   docker-common \
>                   docker-latest \
>                   docker-latest-logrotate \
>                   docker-logrotate \
>                   docker-engine
Loaded plugins: fastestmirror
No Match for argument: docker
No Match for argument: docker-client
No Match for argument: docker-client-latest
No Match for argument: docker-common
No Match for argument: docker-latest
No Match for argument: docker-latest-logrotate
No Match for argument: docker-logrotate
No Match for argument: docker-engine
No Packages marked for removal
           

如果yum傳回為空,則表明之前沒有安裝過;否則,将解除安裝old version。

3.檢視GPU和nvidia驅動版本

nvidia-docker安裝界面[https://github.com/NVIDIA/nvidia-docker]

在該安裝界面,提到安裝的預備條件中包括:

GNU/Linux x86_64 with kernel version > 3.10 (maintained)

Docker >= 1.12 (will be)

NVIDIA GPU with Architecture > Fermi (2.1)

NVIDIA drivers ~= 361.93 (untested on older versions)

接下來,使用如下指令,檢視後兩個條件是否滿足:

[[email protected] ~]# nvidia-smi
Thu Jan 17 22:31:26 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87                 Driver Version: 390.87                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:04:00.0 Off |                  N/A |
| 23%   35C    P0    55W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:0B:00.0 Off |                  N/A |
|  0%   33C    P0    54W / 250W |      0MiB / 11178MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
           

可以看到,Nvidia的驅動版本為390,滿足條件。

安裝docker

官方文檔中給出了三種方法,選擇最推薦的第一種:使用docker倉庫安裝。

由于本人安裝時為root使用者,故指令開頭為

#

,且不需要使用sudo指令。可自行添加。

[[email protected] ~]# yum install -y yum-utils device-mapper-persistent-data lvm2
Complete!
[[email protected] ~]# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
repo saved to /etc/yum.repos.d/docker-ce.repo
[[email protected] ~]# yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
repo saved to /etc/yum.repos.d/docker-ce.repo
           

由于使用docker自帶源一直提示逾時,改用阿裡雲的源。

檢視可以安裝的docker版本

centos 安裝Deepo記錄centos 安裝Deepo記錄
[[email protected] ~]#yum install docker-ce-18.09.0.ce-1.el7
Complete!
           

至此安裝成功,啟動docker并驗證!

[[email protected] ~]# systemctl start docker
[[email protected] ~]# docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete 
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.
           

安裝nvidia-docker

1.解除安裝old version的nvidia-docker

按照安裝文檔步驟:

$docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
$sudo yum remove nvidia-docker
           

2.安裝nvidia-docker

由于我的系統是centos7.redhat7.4,故使用:

CentOS 7 (docker-ce), RHEL 7.4/7.5 (docker-ce), Amazon Linux 1/2
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
  sudo tee /etc/yum.repos.d/nvidia-docker.repo
           

當輸入以下指令時:

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo yum install -y nvidia-docker2
           

有可能會提示錯誤,這是由于如果你的系統中沒有安裝docker-ce,此時會自動下載下傳對應的版本,但是如果你已經下載下傳了,此時的nvidia-docker會自動尋找最新的版本下載下傳,有可能與你的docker-ce版本不相容。

centos 安裝Deepo記錄centos 安裝Deepo記錄

根據錯誤提示資訊,需要進行更正,經過多方面錯誤測試,最終安裝的指令改為:

# yum install -y nvidia-docker2-2.0.3-1.docker18.09.0.ce.noarch

Installed:
  nvidia-docker2.noarch 0:2.0.3-1.docker18.09.0.ce                                     

Dependency Installed:
  nvidia-container-runtime.x86_64 0:2.0.0-1.docker18.09.0                              
  nvidia-container-runtime-hook.x86_64 0:1.4.0-2                                       

Complete!
           

直接指定對應的docker版本為已經安裝過的版本,在安裝過程中會自動下載下傳

nvidia-container-runtime

nvidia-container-runtime-hook

兩個依賴包,他們的版本與docker-ce版本相對應!

sudo pkill -SIGHUP dockerd

# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
           
centos 安裝Deepo記錄centos 安裝Deepo記錄
centos 安裝Deepo記錄centos 安裝Deepo記錄

這裡遇到了一點問題,以至于nvidia-smi指令已經無法識别GPU。

排查後發現是由于安裝過程中不小心動了GPU驅動版本,且nvidia-docker2版本中支援cuda10,于是把驅動和cuda更新到了10,且隻挂載了一個GPU。

目前:

[[email protected] ~]# nvidia-smi
Mon Jan 21 07:21:35 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78       Driver Version: 410.78       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:0B:00.0 Off |                  N/A |
| 18%   35C    P0    55W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
           

此時,再運作指令,就可以正常運作了。

[[email protected] ~]# docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Mon Jan 21 12:23:11 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78       Driver Version: 410.78       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:0B:00.0 Off |                  N/A |
| 18%   35C    P0    55W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
           

下載下傳ufoym/deepo深度學習Docker image

基礎使用

在這裡我們選擇使用GPU版本,根據github頁面使用說明進行操作:

  1. 安裝docker和nvidia-docker(已安裝)
  2. 從Docker Hub中pull鏡像
docker pull ufoym/deepo
           

pull到本地後可以使用

docker image list

來檢視目前的鏡像清單。

[[email protected] ~]# docker image list
REPOSITORY                   TAG                 IMAGE ID            CREATED             SIZE
ufoym/deepo                  latest              7df5ba2f4ed7        10 days ago         9.32GB
hello-world                  latest              fce289e99eb9        2 weeks ago         1.84kB
nvidia/cuda                  9.0-base            74f5aea45cf6        2 months ago        134MB
ufoym/deepo                  all-py36-jupyter    ca53b1635705        8 months ago        9.41GB
           

在deepo頁面中列出的使用方法,當我們根據提示輸入後,會出現:

/usr/bin/nvidia-docker: line 34: /usr/bin/docker: Permission denied
/usr/bin/nvidia-docker: line 34: /usr/bin/docker: Success
           

具體原因目前不是很明白。

如果有其他問題可以使用指令

journalctl -n -u nvidia-docker

來檢視錯誤資訊。

将使用指令改為

docker run --runtime=nvidia -it ufoym/deepo bash

也就是:

nvidia-docker => docker run --runtime=nvidia

[option] -it(根據不同的參數啟用不同的功能)

ufoym/deepo(鏡像ID或者name)

bash(指令)

即進入bash互動界面。

[[email protected] ~]# docker run --runtime=nvidia -it ufoym/deepo bash
[email protected]:/# 
           

進入之後,就可以開始盡情的使用啦!可以檢視下已安裝的python庫,可以進入ipython進行互動。

Jupyter支援

該鏡像好還好在支援jupyter notebook,同樣從docker hub上pull鏡像,并使用指令操作,運作鏡像。

但這裡有版本問題,pip版本在10.0,mxnet版本也在1.0,需要人為更新。或者在剛pull下來的deepo中安裝jupyter後commit。

docker pull ufoym/deepo:all-py36-jupyter
docker run --runtime=nvidia -it -p 8888:8888 --pc=host ufoym/deepo:all-py36-jupyter jupyter notebook --no-browser --ip=0.0.0.0 --allow-root --NotebookApp.token='0010' --notebook-dir='/root'
           

即可以使用該系統的ip進行遠端通路http://ip:8888/?token=

deepo同樣支援自己建立dockerfile。

如果docker中pip一直安裝失敗,提示read time out,可嘗試重新開機docker服務。