本文是本人的安裝記錄,因為想做C語言級的調試,整個過程十分麻煩,而且肯定會有記錄忽略的地方,不建議大家使用。僅供參考。一般情況下,還是使用anaconda安裝NVIDIA Cuda tool kit吧,非常輕松。
在某些情況下(比如開發的需要)需要手動安裝時,可以參考下面的記錄。
The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver,
參考:
https://tutorials.technology/tutorials/85-How-to-remove-Nouveau-kernel-driver-Nvidia-install-error.html
How to remove Nouveau kernel driver (fix Nvidia install error)
然後就可以安裝cuda toolkit了
$ sudo sh cuda_10.0.130_410.48_linux.run
blablabla......
-----------------
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: y
Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: y
Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: y
Install the CUDA 10.0 Toolkit?
Enter Toolkit Location
[ default is /usr/local/cuda-10.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
Install the CUDA 10.0 Samples?
Enter CUDA Samples Location
[ default is /home/matthew ]:
Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...
Missing recommended library: libGLU.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so
Missing recommended library: libGL.so
Installing the CUDA Samples in /home/matthew ...
Copying samples to /home/matthew/NVIDIA_CUDA-10.0_Samples now...
Finished copying samples.
===========
= Summary =
Driver: Installed
Toolkit: Installed in /usr/local/cuda-10.0
Samples: Installed in /home/matthew, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-10.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.
Logfile is /tmp/cuda_install_1832.log
如果像前面一樣,出現缺少的包:
不過,這個主要是針對samples,因為報錯内容中說了,
如果需要編譯使用Samples,那就要補充安裝(samples 還需要lglut):
有的說還要同時需要添加lib庫路徑(貌似不加也可以): 在 /etc/ld.so.conf.d/加入檔案 cuda.conf,
這時候可以對samples進行編譯了
測試,如果正常會顯示安裝成功!
不過我的系統是cuda10,是以在nvidia網站上下載下傳的是cudnn-10.0-linux-x64-v7.4.2.24.tgz 這個包,注意下載下傳需要注冊Nvidia的開發者賬号。
安裝過程如下,解壓會得到一個cuda的檔案夾,
這兩個地方用起來大概沒什麼差別,不過一般情況下我建議使用/usr/local/cuda/include 和/usr/local/cuda/lib64,比如你要使用pytorch的時候,可以省掉一些手動配置,因為pytorch預設是通過 LD_LIBRARY_PATH來尋找cudnn的(參考:https://github.com/pytorch/pytorch/issues/573)。
下面連結cuDNN的庫檔案(必須!),這裡要特别注意的是,如果前面是将檔案拷貝到/usr/local/cuda/lib64和/usr/local/cuda/include,那麼下面的操作中也必須做相應的調整,下面的例子中,我假設目錄都是/usr/local/lib和/usr/local/include,不再一一說明
#連結完config更新
完成cuda和cudnn的安裝
如果在使用cudnn的lib 或者cudnn.h 時出現Permission denied提示,那麼說明copy過去的檔案目前系統沒有權限使用,那麼在拷貝過去之前,先對檔案授權
修複方式:
[1]本參考給出了ubuntu18.04, 16.04, 14.04等版本的nvidia驅動安裝
https://askubuntu.com/questions/1077061/how-do-i-install-nvidia-and-cuda-drivers-into-ubuntu
我按照參考[1]的安裝過程,發現安裝了大量的開發包,而且由于網絡的原因,過程非常緩慢,安裝過程如下,
Remove any CUDA PPAs that may be setup and also remove the <code>nvidia-cuda-toolkit</code> if installed:
Recommended to also remove all NVIDIA drivers before installing new drivers:
Then update the system:
Install the key:
Add the repo:
Update the system again:
Install CUDA 10.0.
It should be installing the nvidia-410 drivers with it as those are what are listed in the repo. See:http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/
Add the following lines to your <code>~/.profile</code> file for CUDA 10.0
Reboot the computer and check your settings when reboot is complete:
Check NVIDIA Cuda Compiler with <code>nvcc --version</code>:
Check NVIDIA driver with <code>nvidia-smi</code>: