
@ Bergen, Norway
第一次安裝 CUDA 的過程簡直抓狂,中間出現了很多次莫名其妙的 bug,踩了很多坑。比如裝好了 CUDA 重新開機後進不去桌面系統了,直接黑屏、比如滑鼠鍵盤都不 work 了、再比如裝好了卻安裝不了 TensorFlow-GPU......看了一圈網上的安裝教程,發現還是官方指南真香了~
新年第一篇,分享一下我的 Ubuntu 18.04 + CUDA 10.0 + cuDNN 7.6.5 + TensorFlow 2.0 安裝筆記,希望可以幫助大家少踩坑。
整個安裝流程大緻是:安裝顯示卡驅動 -> 安裝 CUDA[1] -> 安裝 cuDNN[2] -> 安裝 tensorflow-gpu 并測試。
全文目錄:
- Ubuntu安裝與更新
- 安裝顯示卡驅動
- 安裝CUDA
- 安裝cuDNN
- 安裝TensorFlow2.0 GPU及測試
1. Ubuntu安裝和更新
先進行Ubuntu18.04系統一些基本的安裝和更新,具體的作業系統安裝過程省略,比較容易,大家可自行百度,有很多教程。
sudo apt-get update # 更新源
sudo apt-get upgrade # 更新已安裝的包
sudo apt-get install vim
2. 安裝顯示卡驅動
2.1 禁用 Nouveau 驅動
注意:Linux 系統下有兩種方案安裝 CUDA:一種是 Package Manager Installation (.deb),另一種是 Runfile Installation (.run)。本文采取的是第一種(也是官方推薦的方式)。如果使用deb方式安裝CUDA可以忽略此步,本人測試OK。如果使用 runfile 安裝CUDA需要手動禁用系統自帶的 Nouveau 驅動:
lsmod | grep nouveau # 要確定這條指令無輸出
vim /etc/modprobe.d/blacklist-nouveau.conf
# 添加下面兩行:
#######################################################
blacklist nouveau
options nouveau modeset=0
#######################################################
# 儲存後重新開機:
sudo update-initramfs -u
sudo reboot
# 再次輸入以下指令,無輸出就表示設定成功了
lsmod | grep nouveau
2.2 安裝合适的顯示卡驅動[3]
# 先清空現有的顯示卡驅動及依賴并重新開機
sudo apt-get remove --purge nvidia*
sudo apt autoremove
sudo reboot
# 添加ppa源并安裝最新的驅動
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
ubuntu-drivers devices
sudo apt install nvidia-driver-440
# 為了防止自動更新驅動導緻的相容性問題,我們還可以鎖定驅動版本:
sudo apt-mark hold nvidia-driver-440
# nvidia-driver-440 set on hold.
并在【軟體和更新】菜單中的附加驅動清單中,可以找到剛剛安裝的
nvidia-driver-440
,標明即可。輸入
sudo reboot
重新開機後,輸入
nvidia-smi
,顯示下圖資訊,這樣表示顯示卡驅動已經 ready:
lsmod | grep nvidia # 看到下面的輸出則為安裝成功,如果無輸出,表示有問題
也可以手動去官網下載下傳對應的安裝程式安裝顯示卡[4]
# 動态監測顯示卡使用的方式:
watch -n 1 nvidia-smi # 1表示每1秒重新整理一次
watch -n 0.01 nvidia-smi # 也可改成0.01s重新整理一次
# 也可以用gpustat
pip install gpustat
gpustat -i 1 -P
3. 安裝 CUDA
百度百科:CUDA(Compute Unified Device Architecture),是顯示卡廠商NVIDIA[5]推出的運算平台。CUDA 是一種由 NVIDIA 推出的通用并行計算[6]架構,該架構使GPU[7]能夠解決複雜的計算問題。
Linux 系統下有兩種方案安裝 CUDA:一種是 Package Manager Installation (.deb),另一種是 Runfile Installation (.run)。本文采取的是第一種(也是官方推薦的方式)。
另外,CUDA 對于系統環境有嚴格的依賴,比如對于 CUDA10.0 有如下的要求。其他的版本可檢視對應的Online Documentation[8]。
3.1 安裝前的準備
在安裝 CUDA 之前需要先确定環境是 ready 的,以免出現亂七八糟的 bug 無從下手。直接引用官網的說明:
Some actions must be taken before the CUDA Toolkit and Driver can be installed on Linux:
- Verify the system has a CUDA-capable GPU.
- Verify the system is running a supported version of Linux.
- Verify the system has gcc installed.
- Verify the system has the correct kernel headers and development packages installed.
- Download the NVIDIA CUDA Toolkit.
- Handle conflicting installation methods.
3.1.1 确認你有支援 CUDA 的 GPU
lspci | grep -i nvidia | grep VGA
3.1.2 确認你的 linux 版本
uname -m && cat /etc/*release
uname -a
# The x86_64 line indicates you are running on a 64-bit system.
3.1.3 确認 gcc 版本
gcc --version
# gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
3.1.4 安裝對應核心版本的頭檔案
檢視 kernel 的版本:
uname -r
# 5.0.0-37-generic
This is the version of the kernel headers and development packages that must be installed prior to installing the CUDA Drivers.
安裝對應核心版本的頭檔案:
sudo apt-get install linux-headers-$(uname -r)
3.1.5 選擇安裝方式
下載下傳對應的安裝包(以官方推薦的 Deb packages 安裝方式為例)[9]
The CUDA Toolkit can be installed using either of two different installation mechanisms: distribution-specific packages (RPM and Deb packages), or a distribution-independent package (runfile packages).
(1) The distribution-independent package has the advantage of working across a wider set of Linux distributions, but does not update the distribution's native package management system.
(2) The distribution-specific packages interface with the distribution's native package management system. It is recommended to use the distribution-specific packages, where possible.
3.1.6 徹底解除安裝之前安裝過的相關應用,避免沖突
如果是全新的 ubuntu,可忽略此部分,執行 3.2 部分即可。
如果 ubuntu 下用 RPM/Deb 安裝的:
sudo apt-get --purge remove <package_name>
sudo apt autoremove
如果是 runfile 安裝的:
sudo /usr/bin/nvidia-uninstall
sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl
3.2 安裝
首先確定已經下載下傳好對應的.deb 檔案,然後執行:
sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub # 根據執行完第一步的提示輸入,比如我是:
# sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda-toolkit-10-0 # 注意不是cuda,因為在第二步中裝過驅動了,此過程安裝cuda-toolkit-10-0即可
3.3 安裝後
安裝之後需要手動進行一些設定才能使 CUDA 正常的工作。
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
nvcc -V # 檢查CUDA是否安裝成功
# OUTPUT:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
最好關閉系統的自動更新,防止安裝好的環境突然 bug:
sudo vi /etc/apt/apt.conf.d/10periodic
# 修改為:
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";
也可以通過桌面設定:System Settings => Software&Updates => updates
4. 安裝 cuDNN[10]
NVIDIA cuDNN 是用于深度神經網絡的 GPU 加速庫。首先需要注冊下載下傳對應 CUDA 版本号的 cuDNN 安裝包: 連結[11]。
比如對應 CUDA10.0,我下載下傳的是:
tar -zxvf cudnn-10.0-linux-x64-v7.6.5.32.tgz
tar -zxvf cudnn-10.0-linux-x64-v7.6.5.32.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
驗證是否安裝成功:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
# 輸出
"""
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
"""
更推薦使用 Debian File 去安裝,因為可以通過裡面的樣例去驗證 cuDNN 是否成功安裝。首先下載下傳下面三個檔案:
# 分别下載下傳
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.0_amd64.deb
# 安裝完驗證:
cp -r /usr/src/cudnn_samples_v7/ $HOME
cd $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
./mnistCUDNN
# Test passed!
另外也可以用 conda 來安裝 cudatoolkit 和 cuDNN,但要保證驅動是 ready 的。
conda install cudatoolkit=10.0
conda install -c anaconda cudnn
5. 安裝 TensorFlow2.0 GPU及測試
# 安裝conda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
conda create -y -n tf2 python=3.7
conda activate tf2
pip install --upgrade pip
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install tensorflow-gpu
pip install catboost
測試:
import tensorflow as tf
print(tf.__version__)
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
"""
2.0.0
Num GPUs Available: 2
"""
"""
測試程式:
源連結:https://github.com/dragen1860/TensorFlow-2.x-Tutorials/blob/master/08-ResNet/main.py
"""
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
import tensorflow as tf
import numpy as np
from tensorflow import keras
tf.random.set_seed(22)
np.random.seed(22)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
assert tf.__version__.startswith('2.')
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
x_train, x_test = x_train.astype(np.float32) / 255., x_test.astype(
np.float32) / 255.
# [b, 28, 28] => [b, 28, 28, 1]
x_train, x_test = np.expand_dims(x_train, axis=3), np.expand_dims(x_test,
axis=3)
# one hot encode the labels. convert back to numpy as we cannot use a combination of numpy
# and tensors as input to keras
y_train_ohe = tf.one_hot(y_train, depth=10).numpy()
y_test_ohe = tf.one_hot(y_test, depth=10).numpy()
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
# 3x3 convolution
def conv3x3(channels, stride=1, kernel=(3, 3)):
return keras.layers.Conv2D(
channels,
kernel,
strides=stride,
padding='same',
use_bias=False,
kernel_initializer=tf.random_normal_initializer())
class ResnetBlock(keras.Model):
def __init__(self, channels, strides=1, residual_path=False):
super(ResnetBlock, self).__init__()
self.channels = channels
self.strides = strides
self.residual_path = residual_path
self.conv1 = conv3x3(channels, strides)
self.bn1 = keras.layers.BatchNormalization()
self.conv2 = conv3x3(channels)
self.bn2 = keras.layers.BatchNormalization()
if residual_path:
self.down_conv = conv3x3(channels, strides, kernel=(1, 1))
self.down_bn = tf.keras.layers.BatchNormalization()
def call(self, inputs, training=None):
residual = inputs
x = self.bn1(inputs, training=training)
x = tf.nn.relu(x)
x = self.conv1(x)
x = self.bn2(x, training=training)
x = tf.nn.relu(x)
x = self.conv2(x)
# this module can be added into self.
# however, module in for can not be added.
if self.residual_path:
residual = self.down_bn(inputs, training=training)
residual = tf.nn.relu(residual)
residual = self.down_conv(residual)
x = x + residual
return x
class ResNet(keras.Model):
def __init__(self, block_list, num_classes, initial_filters=16, **kwargs):
super(ResNet, self).__init__(**kwargs)
self.num_blocks = len(block_list)
self.block_list = block_list
self.in_channels = initial_filters
self.out_channels = initial_filters
self.conv_initial = conv3x3(self.out_channels)
self.blocks = keras.models.Sequential(name='dynamic-blocks')
# build all the blocks
for block_id in range(len(block_list)):
for layer_id in range(block_list[block_id]):
if block_id != 0 and layer_id == 0:
block = ResnetBlock(self.out_channels,
strides=2,
residual_path=True)
else:
if self.in_channels != self.out_channels:
residual_path = True
else:
residual_path = False
block = ResnetBlock(self.out_channels,
residual_path=residual_path)
self.in_channels = self.out_channels
self.blocks.add(block)
self.out_channels *= 2
self.final_bn = keras.layers.BatchNormalization()
self.avg_pool = keras.layers.GlobalAveragePooling2D()
self.fc = keras.layers.Dense(num_classes)
def call(self, inputs, training=None):
out = self.conv_initial(inputs)
out = self.blocks(out, training=training)
out = self.final_bn(out, training=training)
out = tf.nn.relu(out)
out = self.avg_pool(out)
out = self.fc(out)
return out
def main():
num_classes = 10
batch_size = 128
epochs = 2
# build model and optimizer
model = ResNet([2, 2, 2], num_classes)
model.compile(optimizer=keras.optimizers.Adam(0.001),
loss=keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.build(input_shape=(None, 28, 28, 1))
print("Number of variables in the model :", len(model.variables))
model.summary()
# train
model.fit(x_train,
y_train_ohe,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test_ohe),
verbose=1)
# evaluate on test set
scores = model.evaluate(x_test, y_test_ohe, batch_size, verbose=1)
print("Final test loss and accuracy :", scores)
if __name__ == '__main__':
main()
監測 GPU 使用:
watch -n 0.01 nvidia-smi
測試 catboost 使用 CPU:
from catboost.datasets import titanic
import numpy as np
from sklearn.model_selection import train_test_split
from catboost import CatBoostClassifier, Pool, cv
from sklearn.metrics import accuracy_score
train_df, test_df = titanic()
null_value_stats = train_df.isnull().sum(axis=0)
null_value_stats[null_value_stats != 0]
train_df.fillna(-999, inplace=True)
test_df.fillna(-999, inplace=True)
X = train_df.drop('Survived', axis=1)
y = train_df.Survived
X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=0.75, random_state=42)
X_test = test_df
categorical_features_indices = np.where(X.dtypes != np.float)[0]
model = CatBoostClassifier(
task_type="GPU",
custom_metric=['Accuracy'],
random_seed=666,
logging_level='Silent'
)
model.fit(
X_train, y_train,
cat_features=categorical_features_indices,
eval_set=(X_validation, y_validation),
logging_level='Verbose', # you can comment this for no text output
plot=True
);
監測 GPU 使用:
watch -n 0.01 nvidia-smi
REFERENCE
[1]
安裝CUDA: https://developer.nvidia.com/cuda-toolkit-archive
[2]
安裝cuDNN: https://developer.nvidia.com/rdp/cudnn-download
[3]
安裝合适的顯示卡驅動: http://www.linuxandubuntu.com/home/how-to-install-latest-nvidia-drivers-in-linux
[4]
也可以手動去官網下載下傳對應的安裝程式安裝顯示卡: https://www.geforce.cn/drivers
[5]
NVIDIA: https://baike.baidu.com/item/NVIDIA
[6]
并行計算: https://baike.baidu.com/item/并行計算/113443
[7]
GPU: https://baike.baidu.com/item/GPU
[8]
Online Documentation: https://developer.nvidia.com/cuda-toolkit-archive
[9]
下載下傳對應的安裝包(以官方推薦的Deb packages安裝方式為例): https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=deblocal
[10]
安裝cuDNN: https://developer.nvidia.com/rdp/cudnn-download
[11]
連結: https://developer.nvidia.com/rdp/cudnn-download
[12]
官方-NVIDIA CUDA Installation Guide for Linux: https://docs.nvidia.com/cuda/archive/10.0/cuda-installation-guide-linux/index.html
[13]
CUDA_Quick_Start_Guide-pdf: https://developer.download.nvidia.com/compute/cuda/10.0/Prod/docs/sidebar/CUDA_Quick_Start_Guide.pdf
[14]
CUDA_Installation_Guide_Linux-pdf: https://developer.download.nvidia.com/compute/cuda/10.0/Prod/docs/sidebar/CUDA_Installation_Guide_Linux.pdf
[15]
官方-cuDNN安裝: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-linux
[16]
[How To] Install Latest NVIDIA Drivers In Linux: http://www.linuxandubuntu.com/home/how-to-install-latest-nvidia-drivers-in-linux
推薦原創幹貨閱讀:
聊聊近狀, 唠十塊錢的
【Deep Learning】詳細解讀LSTM與GRU單元的各個公式和差別
【手把手AI項目】一、安裝win10+linux-Ubuntu16.04的雙系統(全網最詳細)
【Deep Learning】為什麼卷積神經網絡中的“卷積”不是卷積運算?
【TOOLS】Pandas如何進行記憶體優化和資料加速讀取(附代碼詳解)
【TOOLS】python3利用SMTP進行郵件Email自主發送
【手把手AI項目】七、MobileNetSSD通過Ncnn前向推理架構在PC端的使用
【時空序列預測第一篇】什麼是時空序列問題?這類問題主要應用了哪些模型?主要應用在哪些領域?
公衆号:AI蝸牛車
保持謙遜、保持自律、保持進步
個人微信
備注:昵稱+學校/公司+方向
如果沒有備注不拉群!
拉你進AI蝸牛車交流群