centos 安裝Deepo記錄
Deepo官方文檔[https://github.com/ufoym/deepo?tdsourcetag=s_pctim_aiomsg]
背景檢查
1.系統版本
此次安裝均根據官方文檔進行安裝。
Get Docker CE for Centos[https://docs.docker.com/install/linux/docker-ce/centos/]
Prerequisites
Docker EE customers
To install Docker Enterprise Edition (Docker EE), go to Get Docker EE for CentOS instead of this topic.
To learn more about Docker EE, see Docker Enterprise Edition.
OS requirements
To install Docker CE, you need a maintained version of CentOS 7. Archived versions aren’t supported or tested.
The centos-extras repository must be enabled. This repository is enabled by default, but if you have disabled it, you need to re-enable it.
The overlay2 storage driver is recommended.
大緻意思為:centos的版本需要為
Centos7
,
centos-extras
在centos7中為預設啟動,存儲驅動推薦使用
overlay2
.
檢視系統版本:
[[email protected] ~]# uname -r
3.10.0-957.1.3.el7.x86_64
[[email protected] ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
系統版本為Centos7,且核心的版本為3.10
2.檢視是否曾經安裝過docker相關
docker早期在系統中安裝被稱作
docker
或者
docker-engine
。使用以下指令,來删除系統中曾經安裝的docker:
[[email protected] ~]# sudo yum remove docker \
> docker-client \
> docker-client-latest \
> docker-common \
> docker-latest \
> docker-latest-logrotate \
> docker-logrotate \
> docker-engine
Loaded plugins: fastestmirror
No Match for argument: docker
No Match for argument: docker-client
No Match for argument: docker-client-latest
No Match for argument: docker-common
No Match for argument: docker-latest
No Match for argument: docker-latest-logrotate
No Match for argument: docker-logrotate
No Match for argument: docker-engine
No Packages marked for removal
如果yum傳回為空,則表明之前沒有安裝過;否則,将解除安裝old version。
3.檢視GPU和nvidia驅動版本
nvidia-docker安裝界面[https://github.com/NVIDIA/nvidia-docker]
在該安裝界面,提到安裝的預備條件中包括:
GNU/Linux x86_64 with kernel version > 3.10 (maintained)
Docker >= 1.12 (will be)
NVIDIA GPU with Architecture > Fermi (2.1)
NVIDIA drivers ~= 361.93 (untested on older versions)
接下來,使用如下指令,檢視後兩個條件是否滿足:
[[email protected] ~]# nvidia-smi
Thu Jan 17 22:31:26 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87 Driver Version: 390.87 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A |
| 23% 35C P0 55W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:0B:00.0 Off | N/A |
| 0% 33C P0 54W / 250W | 0MiB / 11178MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
可以看到,Nvidia的驅動版本為390,滿足條件。
安裝docker
官方文檔中給出了三種方法,選擇最推薦的第一種:使用docker倉庫安裝。
由于本人安裝時為root使用者,故指令開頭為
#
,且不需要使用sudo指令。可自行添加。
[[email protected] ~]# yum install -y yum-utils device-mapper-persistent-data lvm2
Complete!
[[email protected] ~]# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
repo saved to /etc/yum.repos.d/docker-ce.repo
[[email protected] ~]# yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
repo saved to /etc/yum.repos.d/docker-ce.repo
由于使用docker自帶源一直提示逾時,改用阿裡雲的源。
檢視可以安裝的docker版本
[[email protected] ~]#yum install docker-ce-18.09.0.ce-1.el7
Complete!
至此安裝成功,啟動docker并驗證!
[[email protected] ~]# systemctl start docker
[[email protected] ~]# docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
安裝nvidia-docker
1.解除安裝old version的nvidia-docker
按照安裝文檔步驟:
$docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
$sudo yum remove nvidia-docker
2.安裝nvidia-docker
由于我的系統是centos7.redhat7.4,故使用:
CentOS 7 (docker-ce), RHEL 7.4/7.5 (docker-ce), Amazon Linux 1/2
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker
# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
sudo tee /etc/yum.repos.d/nvidia-docker.repo
當輸入以下指令時:
# Install nvidia-docker2 and reload the Docker daemon configuration
sudo yum install -y nvidia-docker2
有可能會提示錯誤,這是由于如果你的系統中沒有安裝docker-ce,此時會自動下載下傳對應的版本,但是如果你已經下載下傳了,此時的nvidia-docker會自動尋找最新的版本下載下傳,有可能與你的docker-ce版本不相容。
根據錯誤提示資訊,需要進行更正,經過多方面錯誤測試,最終安裝的指令改為:
# yum install -y nvidia-docker2-2.0.3-1.docker18.09.0.ce.noarch
Installed:
nvidia-docker2.noarch 0:2.0.3-1.docker18.09.0.ce
Dependency Installed:
nvidia-container-runtime.x86_64 0:2.0.0-1.docker18.09.0
nvidia-container-runtime-hook.x86_64 0:1.4.0-2
Complete!
直接指定對應的docker版本為已經安裝過的版本,在安裝過程中會自動下載下傳
nvidia-container-runtime
和
nvidia-container-runtime-hook
兩個依賴包,他們的版本與docker-ce版本相對應!
sudo pkill -SIGHUP dockerd
# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
這裡遇到了一點問題,以至于nvidia-smi指令已經無法識别GPU。
排查後發現是由于安裝過程中不小心動了GPU驅動版本,且nvidia-docker2版本中支援cuda10,于是把驅動和cuda更新到了10,且隻挂載了一個GPU。
目前:
[[email protected] ~]# nvidia-smi
Mon Jan 21 07:21:35 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:0B:00.0 Off | N/A |
| 18% 35C P0 55W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
此時,再運作指令,就可以正常運作了。
[[email protected] ~]# docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Mon Jan 21 12:23:11 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:0B:00.0 Off | N/A |
| 18% 35C P0 55W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
下載下傳ufoym/deepo深度學習Docker image
基礎使用
在這裡我們選擇使用GPU版本,根據github頁面使用說明進行操作:
- 安裝docker和nvidia-docker(已安裝)
- 從Docker Hub中pull鏡像
docker pull ufoym/deepo
pull到本地後可以使用
docker image list
來檢視目前的鏡像清單。
[[email protected] ~]# docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
ufoym/deepo latest 7df5ba2f4ed7 10 days ago 9.32GB
hello-world latest fce289e99eb9 2 weeks ago 1.84kB
nvidia/cuda 9.0-base 74f5aea45cf6 2 months ago 134MB
ufoym/deepo all-py36-jupyter ca53b1635705 8 months ago 9.41GB
在deepo頁面中列出的使用方法,當我們根據提示輸入後,會出現:
/usr/bin/nvidia-docker: line 34: /usr/bin/docker: Permission denied
/usr/bin/nvidia-docker: line 34: /usr/bin/docker: Success
具體原因目前不是很明白。
如果有其他問題可以使用指令
journalctl -n -u nvidia-docker
來檢視錯誤資訊。
将使用指令改為
docker run --runtime=nvidia -it ufoym/deepo bash
也就是:
nvidia-docker => docker run --runtime=nvidia
[option] -it(根據不同的參數啟用不同的功能)
ufoym/deepo(鏡像ID或者name)
bash(指令)
即進入bash互動界面。
[[email protected] ~]# docker run --runtime=nvidia -it ufoym/deepo bash
[email protected]:/#
進入之後,就可以開始盡情的使用啦!可以檢視下已安裝的python庫,可以進入ipython進行互動。
Jupyter支援
該鏡像好還好在支援jupyter notebook,同樣從docker hub上pull鏡像,并使用指令操作,運作鏡像。
但這裡有版本問題,pip版本在10.0,mxnet版本也在1.0,需要人為更新。或者在剛pull下來的deepo中安裝jupyter後commit。
docker pull ufoym/deepo:all-py36-jupyter
docker run --runtime=nvidia -it -p 8888:8888 --pc=host ufoym/deepo:all-py36-jupyter jupyter notebook --no-browser --ip=0.0.0.0 --allow-root --NotebookApp.token='0010' --notebook-dir='/root'
即可以使用該系統的ip進行遠端通路http://ip:8888/?token=
deepo同樣支援自己建立dockerfile。
如果docker中pip一直安裝失敗,提示read time out,可嘗試重新開機docker服務。