天天看點

在伺服器上實作SSH(Single Stage Headless)

伺服器上ssh實作

寫在前面:這隻是我在伺服器上的環境實作的,僅供參考。要根據自己系統的環境做出修改。

==github源碼(https://github.com/mahyarnajibi/SSH)==

**實作參考(https://blog.csdn.net/qq_14845119/article/details/79105360)**

(https://blog.csdn.net/zziahgf/article/details/72900948)

初始工作:安裝cuda和cudnn還有nccl

因為伺服器上裝好了cuda和cudnn,我選擇了cuda9.0和cudnn7.0。是以直接安裝nccl

從github擷取并安裝

git clone https://github.com/NVIDIA/nccl.git
cd nccl 
make clean && make PREFIX=$NCCL_ROOT_DIR install
           

$NCCL_ROOT_DIR是自己安裝的路徑:比如我的路徑是 /home/lzm/data/nccl/install則為:

make clean && make PREFIX=/home/lzm/data/nccl/install install
           

等nccl安裝完成

安裝caffe-ssh

1、所有都在conda建立的python虛拟環境下進行如:

conda create -n caffetest(虛拟)  python=2.7(不是2.7貌似會報錯) anaconda 
conda activate caffetest
           

2、從github擷取源碼:

git clone --recursive https://github.com/mahyarnajibi/SSH.git
           

3、進入目錄SSH安裝需要的python子產品:

cd SSH 
pip install -r requirements.txt
           

4、建立臨時環境變量env

(1)把nccl和conda環境寫入env檔案:

dlm-conda activate caffetest
export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/home/lzm/data/caffe/caffe1.0_nccl/nccl/install/include
export C_INCLUDE_PATH=$C_INCLUDE_PATH:/home/lzm/data/caffe/caffe1.0_nccl/nccl/install/include
export LIBRARY_PATH=$LIBRARY_PATH:/home/lzm/data/caffe/caffe1.0_nccl/nccl/install/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/lzm/data/caffe/caffe1.0_nccl/nccl/install/lib
           

(2)激活環境變量:

source ./env
           

5、配置檔案将Makefile.config.example拷貝一份成配置檔案:

cd caffe-ssh
cp Makefile.config.example Makefile.config
           

修改Makefile.config

(1)改成自己cuda的目錄:​

CUDA_DIR := /usr/local/cuda
改成
CUDA_DIR := /usr/local/nvidia/cuda/9.0
           

(2)去掉的注釋:

#OPENCV_VERSION := 3
改成
OPENCV_VERSION := 3
           

(3)修改環境路徑

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
改為
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial/
           

6、安裝缺少子產品

conda install -c conda-forge readline=6.2
conda install libgcc
           

7、編譯

make all -j32
           

8、編譯pycaffe生成接口

make pycaffe
           

9、在lib中編譯運作setup.py

cd ../lib/
make
           

10、用scripts中的腳本下載下傳模型

cd ..
bash scripts/download_ssh_model.sh
bash scripts/ download_imgnet_model.sh
           

11、運作模型示範

python demo.py
           

結果如下:

在伺服器上實作SSH(Single Stage Headless)

可能出現的問題

(1)

在伺服器上實作SSH(Single Stage Headless)

問題:

Unsupported gpu architecture 'compute_20'

解決方案:

https://askubuntu.com/questions/960238/nvcc-fatal-unsupported-gpu-architecture-compute-20

即去掉Makefile.config 中兩行:

CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
        -gencode arch=compute_20,code=sm_21 \
        -gencode arch=compute_30,code=sm_30 \
        -gencode arch=compute_35,code=sm_35 \
        -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_50,code=compute_50
 改為:
 CUDA_ARCH := -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_52,code=sm_52 \
        -gencode arch=compute_60,code=sm_60 \
        -gencode arch=compute_62,code=sm_62 \
        -gencode arch=compute_61,code=compute_61
           

(2)

在伺服器上實作SSH(Single Stage Headless)
awk: symbol lookup error: /home/lzm/.conda/envs/lzm2/lib/libreadline.so.6: undefined symbol: PC

https://github.com/conda-forge/rpy2-feedstock/issues/1

https://github.com/bioconda/bioconda-recipes/issues/5350

即 run

conda install -c conda-forge readline = 6.2
           

(3)

在伺服器上實作SSH(Single Stage Headless)
./include/caffe/util/hdf5.hpp:6:18: fatal error: hdf5.h: no such file or directory

https://github.com/BVLC/caffe/issues/2690

https://github.com/NVIDIA/DIGITS/issues/156

即Makefile.config 拿兩行改掉:

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
改為
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial/
           

(4)

在伺服器上實作SSH(Single Stage Headless)
./include/caffe/util/nccl.hpp:5:18: fatal error: nccl.h: No such file or directory

建立檔案為env

将伺服器已經安裝的nccl路徑配置到env:

export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/home/lzm/data/caffe/caffe1.0_nccl/nccl/install/include
export C_INCLUDE_PATH=$C_INCLUDE_PATH:/home/lzm/data/caffe/caffe1.0_nccl/nccl/install/include
export LIBRARY_PATH=$LIBRARY_PATH:/home/lzm/data/caffe/caffe1.0_nccl/nccl/install/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/lzm/data/caffe/caffe1.0_nccl/nccl/install/lib
           

每次要用的時候都激活環境:

source  ./env
           

(5)

在伺服器上實作SSH(Single Stage Headless)
.build_release/lib/libcaffe.so: undefined reference to `cv::imdecode

解決方案:https://github.com/BVLC/caffe/issues/4621

把Makefile.config 中 OPENCV_VERSION = 3的注釋去掉即可

(6)

在伺服器上實作SSH(Single Stage Headless)
/caffe/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by caffe-ssh/python/caffe/_caffe.so)

解決方案:https://github.com/BVLC/caffe/issues/4953

conda install libgcc
           

PS:以上問題也是自己經過很久的搜尋排查得出來的,不要怕麻煩,要善于搜尋引擎,一切水到渠成