初始 DQN 程式所遇到的問題

　　最近在看 DQN，但是想試試别人放出來的 code，但是發現，額，各種問題，在此記錄，以備不時之需！

　　問題1.

wangxiao@GTX980:~/Documents/DRL/DQN-tensorflow-master$ python main.py --env_name=Breakout-v0 --is_train=True

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally

I tensorflow/stream_executor/dso_loader.cc:99] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH: /home/wangxiao/torch/install/lib:/home/wangxiao/torch/install/lib:/home/wangxiao/torch/install/lib:/home/wangxiao/torch/install/lib:

I tensorflow/stream_executor/cuda/cuda_dnn.cc:1562] Unable to load cuDNN DSO

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

[*] GPU : 1.0000

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:

name: GeForce GTX 980

major: 5 minor: 2 memoryClockRate (GHz) 1.329

pciBusID 0000:01:00.0

Total memory: 4.00GiB

Free memory: 3.58GiB

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0

I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y

I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:01:00.0)

[2016-07-03 09:14:02,576] Making new env: Breakout-v0

{'_save_step': 50000,

'_test_step': 10000,

'action_repeat': 4,

'backend': 'tf',

'batch_size': 32,

'cnn_format': 'NCHW',

'discount': 0.99,

'display': False,

'env_name': 'Breakout-v0',

'env_type': 'simple',

'ep_end': 0.1,

'ep_end_t': 1000000,

'ep_start': 1.0,

'history_length': 4,

'learn_start': 50000.0,

'learning_rate': 0.00025,

'learning_rate_decay': 0.96,

'learning_rate_decay_step': 50000,

'learning_rate_minimum': 0.00025,

'max_delta': 1,

'max_reward': 1.0,

'max_step': 50000000,

'memory_size': 1000000,

'min_delta': -1,

'min_reward': -1.0,

'model': 'm2',

'random_start': 30,

'scale': 10000,

'screen_height': 84,

'screen_width': 84,

'target_q_update_step': 10000,

'train_frequency': 4}

E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 3.58G (3844833280 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

[*] Loading checkpoints...

[!] Load FAILED: checkpoints/Breakout-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/memory_size-1000000/action_repeat-4/ep_end_t-1000000/min_reward--1.0/backend-tf/random_start-30/scale-10000/env_type-simple/learning_rate_decay_step-50000/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NCHW/learning_rate-0.00025/batch_size-32/discount-0.99/max_step-50000000/max_reward-1.0/learning_rate_decay-0.96/learning_rate_minimum-0.00025/env_name-Breakout-v0/ep_end-0.1/model-m2/screen_height-84/

0%| | 49970/50000000 [01:06<18:20:10, 756.70it/s]F tensorflow/stream_executor/cuda/cuda_dnn.cc:204] could not find cudnnCreate in cudnn DSO; dlerror: /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cudnnCreate

Aborted

那麼就搜到了如下的答案： link：http://stackoverflow.com/questions/35702403/tensorflow-0-7-1-with-cuda-toolkit-7-5-and-cudnn-7-0

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"

export CUDA_HOME=/usr/local/cuda

或者，copy the cuDNN libraries to /usr/local/cuda/lib64. 我兩個同時執行的，是以不知道哪個起作用了，好吧，但是再執行，确實是變成了另一個問題，即：

E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 3.58G (3844702208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

0%| | 49983/50000000 [01:04<18:01:04, 770.06it/s]F tensorflow/stream_executor/cuda/cuda_dnn.cc:220] could not find cudnnConvolutionBackwardFilter_v2 in cudnn DSO; dlerror: /usr/local/cuda/lib64/libcudnn.so: undefined symbol: cudnnConvolutionBackwardFilter_v2

wangxiao@GTX980:~/Documents/DRL/DQN-tensorflow-master$

　　然後就發現，cudnn 這麼多問題！

　　于是乎，我就将 cudnn 6.5 換成了 cudnn 7.0 版本，重新配置了一下，然後在執行：

　　仍然在加載檢查點 (checkpoint)，不知道什麼時候會斷掉？

　　額貌似可以了。。。

　　Ok 大家發現了吧原來主要原因在于 cndnn 的版本問題。

Question 2:

　 when running the code from: https://github.com/kuz/DeepMind-Atari-Deep-Q-Learner

　　It shown me a mistake, i.e. no module named AleWrap

　　 Don't worry, just run the following operation:

　　可以發現上述過程, 其實隻能一張一張的展示圖像, 怎麼把多張圖像放到一個視窗中進行顯示呢? 見下圖:

初始 DQN 程式所遇到的問題

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

初始 DQN 程式 所遇到的問題

繼續閱讀

初始 DQN 程式所遇到的問題