文章目錄

簡介
- GPU對比
- GPU浮點計算能力換算
安裝
- 下載下傳
- 準備
- 安裝CUDA
- - 禁用 Nouveau 驅動
  - 進入文本模式
  - 執行安裝
  - 添加環境變量
  - 加載新的環境變量
  - 檢視CUDA裝置
- 安裝CUDNN
CUDA使用
- CUDA版本切換
問題與解決
- 循環進登入界面
- you appear to be running an x server please exit x before installing
- The driver installation is unable to locate the kernel
- NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.

簡介

本文記錄本人安裝CUDA的方法及過程中，以及出現的一些問題及解決辦法。本人習慣于參考官方手冊，這裡主要參考 NVIDIA CUDA 官方手冊進行安裝，其中包含了CUDA開發的幾乎所有文檔手冊，如何在 Linux 系統上的安裝CUDA參見： CUDA Linux Inatallation。

通過閱讀上述手冊可知，Linux系統上CUDA的安裝方式有兩種：包管理器方式（

.deb

檔案) 和 RUN file安裝（

.run

檔案）。本文選擇

run

格式安裝，關于

deb

的安裝請參考上述官方手冊，或者本人另一篇部落格 NVIDIA DIGITS 學習筆記（NVIDIA DIGITS-2.0 + Ubuntu 14.04 + CUDA 7.0 + cuDNN 7.0 + Caffe 0.13.0），不過有點老了。

**參考手冊: **

NVIDIA CUDA 官方手冊
CUDA Linux Inatallation

軟體環境：Ubuntu 16.04LTS

硬體環境：1080TI

您還可以參照企鵝餓餓餓的 Ububtu16.04+GTX1070深度學習小鋼炮這篇文章 ,

GPU對比

RTX 3090 Benchmarks for Deep Learning – NVIDIA RTX 3090 vs 2080 Ti vs TITAN RTX vs RTX 6000/8000

GPU浮點計算能力換算

參考

Nvidia GPU的浮點計算能力(FP64/FP32/FP16)

理論峰值 ＝ GPU晶片數量*GPU Boost主頻*核心數量*單個時鐘周期内能處理的浮點計算次數

隻不過在GPU裡單精度和雙精度的浮點計算能力需要分開計算，以最新的Tesla P100為例：

雙精度理論峰值＝ FP64 Cores ＊ GPU Boost Clock ＊ 2 ＝ 1792 ＊1.48GHz＊2 = 5.3 TFlops

單精度理論峰值＝ FP32 cores ＊ GPU Boost Clock ＊ 2 ＝ 3584 ＊ 1.58GHz ＊ 2 ＝ 10.6 TFlops

安裝

安裝前請先參考注意事項部分…

下載下傳

先給出下載下傳連結：

CUDA
CUDNN

從這裡選擇作業系統并下載下傳 run 格式的 CUDA開發套件，以及顯示卡驅動檔案（一個檔案，如：

cuda_8.0.61_375.26_linux.run

），如下圖所示：

Ubuntu 16.04 LTS + CUDA8.0 + cudnn6.0簡介安裝CUDA使用問題與解決

從這裡下載下傳CUDNN，如

cudnn-8.0-linux-x64-v6.0.tgz

，并解壓得到

cudnn-8.0-linux-x64-v6.0

檔案夾，裡面僅包含一個

cuda

檔案夾。

準備

參考這裡确認你的PC機上裝有NVIDIA CUDA可計算顯示卡、支援的Linux版本系統、GCC等等。如果已确認，可飛過。

安裝CUDA

禁用 Nouveau 驅動

執行

lsmod | grep nouveau

指令，無論輸出什麼，都說明nouveau驅動已加載，Ubuntu系統中，通過如下步驟禁用nouveau驅動。

建立 /etc/modprobe.d/blacklist-nouveau.conf 檔案，輸入如下内容并儲存：

blacklist nouveau
options nouveau modeset=0

重新生成kernel initrd

終端執行：
sudo update-initramfs -u

提示成功後，往下看…

進入文本模式

進入文本模式：快捷鍵 Ctrl + Alt + F1 ，輸入使用者名密碼登入系統。再次確定 nouveau 驅動沒有被加載（指令： lsmod | grep nouveau 什麼也不輸出代表已禁用）。
關閉X server服務： sudo service lightdm stop

執行安裝

終端進入你下載下傳的 run格式CUDA開發套件所在檔案夾，執行該檔案進行安裝，指令示例如下，注意你的版本：

sudo sh cuda_<version>_linux.run
如：
sudo sh cuda_8.0.61_375.26_linux.run

在彈出的文本模式互動頁面，按

鍵退出文檔檢視，并根據提示輸入

accept

接受協定。

之後根據自己需要，按提示設定，如安裝路徑等等。一般預設即可，但注意: 安裝過程中，不要選擇OpenGL，否則會出現，循環進入登入界面，本人選擇安裝 CUDA Samples（建議安裝，待會會借助它檢視是否安裝成功，及顯示卡資訊），并安裝在 Documents 檔案夾下。

等待安裝完成即可…

添加環境變量

注意：此環境變量為64位系統下的，32位的把lib64改成lib即可。

方式1，僅修改使用者變量

# setting the environment variables so CUDA will be found

echo "\nexport PATH=/usr/local/cuda-8.0/bin:$PATH" >> ~/.bashrc
echo "\nexport LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH" >> ~/.bashrc

方式2，修改所有使用者變量

先使用sudo gedit /etc/profile打開“profile”檔案，或者

sudo gedit ~/.bashrc

打開“./bashrc”檔案，然後在打開的檔案的末尾添加如下代碼并儲存：

# setting the environment variables so CUDA will be found
# After open profile, Add follow code at the end of file
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH

加載新的環境變量

添加好環境變量後，需要加載新的環境變量，才能應用于系統。

加載更新環境變量

source ~/.bashrc

（方式1），

source /etc/profile

或

source ~/.bashrc

（方式2）。或者重新打開終端，或者幹脆重新開機系統 .

**注：**如果你沒有重新開機系統，還需要重新開機 X server 服務：

sudo service lightdm start

進入圖形界面。

現在，你可以關機，并取下亮機卡了 .

檢視CUDA裝置

終端進入你的 CUDA Samples安裝目錄，執行編譯後運作

./deviceQuery

輸出顯示卡裝置等資訊。

cd NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery
make
./deviceQuery

若提示如下資訊，說明NVIDIA顯示驅動有問題，使用

sudo apt --purge remove nvidia*

解除安裝，重新安裝時選擇安裝NVIDIA drivers。

./deviceQuery
./deviceQuery Starting...
 
 CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

輸出的資訊，成功看到類似資訊代表驅動及開發套件安裝成功，下面安裝CUDNN：

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11172 MBytes (11715084288 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1582 MHz (1.58 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8111 MBytes (8504868864 bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP:     2560 CUDA Cores
  GPU Max Clock rate:                            1734 MHz (1.73 GHz)
  Memory Clock rate:                             5005 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from GeForce GTX 1080 Ti (GPU0) -> GeForce GTX 1080 (GPU1) : No
> Peer access from GeForce GTX 1080 (GPU1) -> GeForce GTX 1080 Ti (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 2, Device0 = GeForce GTX 1080 Ti, Device1 = GeForce GTX 1080

安裝CUDNN

CUDNN是NVIDIA公司針對深度神經網絡，開發的基于CUDA的計算庫，用于加速計算，這裡安裝版本為6.0，目前最新為7.0，但安裝方式不變，且十分簡單。可參考官方步驟

1.解壓：進入“cudnn-8.0-linux-x64-v6.0.tgz”所在目錄，将檔案解壓，如下：

#注意修改成你的目錄
cd /home/liu/sfw
#解壓
tar zxvf  cudnn-8.0-linux-x64-v6.0.tgz

2.copy檔案至CUDA安裝目錄：解壓後，在你的目錄下生成一個“cuda”檔案夾，對于cuDNN6.0的版本解壓後生成“cudnn-8.0-linux-x64-v6.0”檔案。使用如下指令copy，注意第二個有個-a參數，否則，拷貝過去的檔案失去了連結。

$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

cuDNN安裝完成。～-～

CUDA使用

CUDA版本切換

請移步本人部落格: 計算機使用技巧-CUDA切換

問題與解決

循環進登入界面

有些童鞋安裝好CUDA重新開機後會發現：電腦不斷地在登入界面循環，這是由于安裝時選擇安裝了

OpenGL

，參考這裡，安裝時按照上述步驟，不選擇安裝OpenGL即可。以前使用壓縮指令壓縮備份系統後再恢複，也會出現該現象，後來重裝了系統。

you appear to be running an x server please exit x before installing

X 服務沒有完全關閉.

sudo /etc/init.d/lightdm stop
cd /tmp
sudo rm -rf .X*

The driver installation is unable to locate the kernel

如果更新了核心, 會進不了桌面, 重裝CUDA驅動又會提示

The driver installation is unable to locate the kernel

, 這是由于 Ubuntu 核心版本與CUDA要求的版本不一緻, CUDA9的官方文檔要求的Linux版本為4.4.0 可參見CUDA手冊.

已安裝核心版本檢視: uname -r
檢視可以安裝的核心: apt-cache search linux|grep linux-image
選擇一個核心版本并安裝: sudo apt-get install linux-image-X.X.X.XX-generic linux-headers-X.X.X.XX-generic
更新grub引導: sudo update-grup
重新開機系統, 進階選項中選擇新安裝的核心版本系統, 也可以修改grub配置, 預設進入某個版本的系統: sudo gedit /etc/boot/grub/grub.cfg

如果上述檢視的版本中含 4.4, 則可以通過, 開機進入grub進階菜單, 選擇4.4版本核心的系統啟動; 如果不存在, 可以通過上述步驟降低核心版本, 當然也可以安裝CUDA10.

如果是安裝cuda時出現上述錯誤，執行

sudo apt-get --purge remove nvidia*

删除之前安裝的驅動，再重新用 run包安裝。

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver.

使用

sudo apt --purge remove nvidia*

解除安裝驅動，然後重裝NVIDIA驅動，如

NVIDIA-Linux-x86_64-430.50.run

參考

Ubuntu 16.04 LTS + CUDA8.0 + cudnn6.0簡介安裝CUDA使用問題與解決

文章目錄

簡介

GPU對比

GPU浮點計算能力換算

安裝

下載下傳

準備

安裝CUDA

禁用 Nouveau 驅動

進入文本模式

執行安裝

添加環境變量

加載新的環境變量

檢視CUDA裝置

安裝CUDNN

CUDA使用

CUDA版本切換

問題與解決

循環進登入界面

you appear to be running an x server please exit x before installing

The driver installation is unable to locate the kernel

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver.

繼續閱讀

apt-get update 出現錯誤

關于sudo apt-get update 更新軟體源的選擇

Windows下Cygwin環境的Hadoop安裝（3）- 運作hadoop中的wordcount執行個體遇到的問題和解決方法

ubuntu hadoop2.6.1，terminal下運作wordcount

Ubuntu16.04下使用Dr.COM 校園網用戶端聯網的詳細流程（下載下傳playonlinunx軟體）

linux下的完美網銀們（google chrome, ubuntu10.04）

Ubunto 安裝Apache2以後 httpd.conf檔案找不到問題

Testlink的安裝及使用

Apache httpd 安裝啟動demo（Window版）

伺服器配置——Apache

ubuntu 16.04 源碼安裝httpd和php

Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin

Ubuntu14.04 LTS下安裝mongodb

ubuntu14.04下安裝hbse1.0.1.1

禁止ubuntu系統彈出報錯界面

JBoss,Geronimo和Glassfish初窺