GPU 加速NLP任務（Theano+CUDA）

　http://www.cnblogs.com/chenbjin/p/5021314.html?utm_source=tuicool&utm_medium=referral

之前學習了CNN的相關知識，提到Yoon Kim(2014)的論文，利用CNN進行文本分類，雖然該CNN網絡結構簡單效果可觀，但論文沒有給出具體訓練時間，這便值得進一步探讨。

　　Yoon Kim代碼：https://github.com/yoonkim/CNN_sentence

　　利用作者提供的源碼進行學習，在本人機子上訓練時，做一次CV的平均訓練時間如下，縱坐标為min/CV（供參考）：

　　機子配置：Intel(R) Core(TM) i3-4150 CPU @ 3.50GHz， 32G，x64

GPU 加速NLP任務（Theano+CUDA）

　　顯然，訓練非常慢慢慢！！！在CPU上訓練，做10次CV，得10多個小時啊，朋友發郵件和Yoon Kim求證過，他說确實很慢慢慢，難怪論文中沒有出現訓練時間資料~.~

　　考慮改進的話，要麼就是多線程作并行，卷積層可做并行，但代碼不容易寫啊:(，是以我考慮GPU加速。

　　流程：1、安裝NVIDIA驅動；2、安裝配置CUDA；3、修改程式用GPU跑；

1、安裝NVIDA驅動

　　（0）看看你有沒有符合的顯示卡：lspci | grep -i nvidia，參考教程

　　（1）下載下傳對應顯示卡的nVidia驅動：http://www.nvidia.com/Download/index.aspx?lang=en-us

　　本人機子GPU：GeForce GTX 660 Ti，對應下載下傳的驅動為NVIDIA-Linux-x86_64-352.63.run

　　（2）添加可執行權限： sudo chmod +x NVIDIA-Linux-x86_64-352.63.run

　　（3）關閉X-window：sudo service lightdm stop，然後切換到tty1：Ctrl+Alt+F1

　　（4）安裝驅動：sudo ./NVIDIA-Linux-x86_64-352.63.run。按照其中提示進行安裝，可能要設定compat32-libdir

　　（5）重新開機X-window：sudo service lightdm start.

　　（6）驗證驅動安裝是否成功：cat /proc/driver/nvidia/version

2、安裝配置CUDA

　　（1）安裝教程：http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#ubuntu-installation

　　（2）下載下傳cuda-toolkit:https://developer.nvidia.com/cuda-downloads。選擇和你配置符合的cuda下載下傳：cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb

　　（3）注意不同系統的安裝指令不同，下面是ubuntu14.04安裝指令。有什麼問題看上面的教程可以搞定。

sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda

　　（4）驗證toolkit是否成功：nvcc -V

　　（5）配置路徑：vim .bashrc

PATH=$PATH:/usr/local/cuda-7.0/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda7.0/lib64
export PATH
export LD_LIBRARY_PATH

3、修改程式用GPU跑

　　根據theano官方文檔：http://deeplearning.net/software/theano/tutorial/using_gpu.html

　　可以先用下列代碼測試CUDA配置是否正确，能否正常使用GPU。

View Code

　　将上述代碼儲存為check_GPU.py，使用以下指令進行測試，根據測試結果可知gpu能否正常使用，若出錯有可能是上面路徑配置問題。

GPU 加速NLP任務（Theano+CUDA）

$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python check1.py
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.06635117531 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu

$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check1.py
Using gpu device 0: GeForce GTX 580
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.638810873032 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu

GPU 加速NLP任務（Theano+CUDA）

　　由于目前Nvidia GPU主要是針對float32位浮點數計算進行優化加速，是以需要将代碼中的資料及變量類型置成float32。

　　具體對代碼做如下更改：

　　（1）process_data.py

line 55, W = np.zeros(shape=(vocab_size+1, k), dtype='float32')
line 56, W[0] = np.zeros(k, dtype='float32')

　　修改後運作指令，獲得每個word對應的詞向量（float32）。

python process_data.py GoogleNews-vectors-negative300.bin

　　（2）conv_net_sentence.py

　　添加allow_input_downcast=True，程式中間運算過程若産生float64，會cast到float32。

GPU 加速NLP任務（Theano+CUDA）

lin 82, set_zero = theano.function([zero_vec_tensor], updates=[(Words, T.set_subtensor(Words[0,:], zero_vec_tensor))], allow_input_downcast=True)
lin131, val_model = theano.function([index], classifier.errors(y),
　　　　　　givens={
　　　　　　　　　　x: val_set_x[index * batch_size: (index + 1) * batch_size],
　　　　　　　　　　y: val_set_y[index * batch_size: (index + 1) * batch_size]}, allow_input_downcast=True)
lin 137, test_model = theano.function([index], classifier.errors(y),
　　　　　　givens={
　　　　　　　　　　x: train_set_x[index * batch_size: (index + 1) * batch_size],
　　　　　　　　　　y: train_set_y[index * batch_size: (index + 1) * batch_size]}, allow_input_downcast=True)

lin 141, train_model = theano.function([index], cost, updates=grad_updates,
　　　　　　givens={
　　　　　　　　　　x: train_set_x[indexbatch_size:(index+1)batch_size],
　　　　　　　　　　y: train_set_y[indexbatch_size:(index+1)batch_size]}, allow_input_downcast=True)
lin 155, test_model_all = theano.function([x,y], test_error, allow_input_downcast=True)

GPU 加速NLP任務（Theano+CUDA）

　　（3）運作程式

GPU 加速NLP任務（Theano+CUDA）

THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,warn_float64=raise python conv_net_sentence.py -static -word2vec
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,warn_float64=raise python conv_net_sentence.py -nonstatic -word2vec
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,warn_float64=raise python conv_net_sentence.py -nonstatic -rand

GPU 加速NLP任務（Theano+CUDA）

　　（4）結果驚人，訓練時間提升了20x。

GPU 加速NLP任務（Theano+CUDA）

　　第一次跑gpu，以上過程，若有疏忽，還請多多指導。

Reference：

1、有關theano配置：http://deeplearning.net/software/theano/library/config.html

2、Ubuntu安裝Theano+CUDA：http://www.linuxidc.com/Linux/2014-10/107503.htm

GPU 加速NLP任務（Theano+CUDA）

繼續閱讀

Apache (You don't have permission to access / on this server.）

debian9更新4.9.0核心到4.19.2核心過程

centOS7 配置 vsftpd 虛拟使用者及權限Vsftpd配置虛拟使用者及權限

linux-svn解除安裝與安裝

vsftp虛拟多使用者多權限一鍵部署腳本

Ubuntu14.04 LTS下安裝mongodb

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

配置網頁内容通路

手動安裝Intel network I217-LM網卡的Linux驅動

禁止ubuntu系統彈出報錯界面

Ubuntu Linux下Apache的配置檔案

samba伺服器的功能

【Linux】UDP廣播封包接收速率問題

Linux裝置模型（中）之上層容器

PowerPC平台 Linux移植三