天天看点

cudnn升级方法,解决CuDNN版本不兼容问题1 查看cudnn版本2 下载cudnn3 删除旧版本4 安装新版本5 建立软连接6 测试验证

运行代码时出现:

32/1109 [..............................] - ETA: 12:41 - loss: 3.4072 - accuracy: 0.0000e+002020-09-24 02:47:25.341531: E tensorflow/stream_executor/cuda/cuda_dnn.cc:319] Loaded runtime CuDNN library: 7.5.1 but source was compiled with: 7.6.4.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible w  96/1109 [=>............................] - ETA: 12:31 - loss: 3.3774 - accuracy: 0.0312    Traceback (most recent call last):

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/runpy.py", line 193, in _run_module_as_main

    "__main__", mod_spec)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/runpy.py", line 85, in _run_code

    exec(code, run_globals)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/__main__.py", line 103, in <module>

    main()

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/__main__.py", line 92, in main

    cmdline_arguments.func(cmdline_arguments)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/cli/train.py", line 76, in train

    additional_arguments=extract_additional_arguments(args),

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/train.py", line 50, in train

    additional_arguments=additional_arguments,

  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/train.py", line 101, in train_async

    additional_arguments,

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/train.py", line 188, in _train_async_internal

    additional_arguments=additional_arguments,

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/train.py", line 223, in _do_training

    additional_arguments=additional_arguments,

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/train.py", line 361, in _train_core_with_validated_data

    additional_arguments=additional_arguments,

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/core/train.py", line 66, in train

    agent.train(training_data, **additional_arguments)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/core/agent.py", line 742, in train

    self.policy_ensemble.train(training_trackers, self.domain, **kwargs)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/core/policies/ensemble.py", line 124, in train

    policy.train(training_trackers, domain, **kwargs)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/rasa/core/policies/keras_policy.py", line 197, in train

    **self._train_params,

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit

    use_multiprocessing=use_multiprocessing)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit

    total_epochs=epochs)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch

    batch_outs = execution_function(iterator)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function

    distributed_function(input_fn))

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__

    result = self._call(*args, **kwds)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 599, in _call

    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2363, in __call__

    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call

    self.captured_inputs)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat

    ctx, args, cancellation_manager=cancellation_manager))

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 545, in call

    ctx=ctx)

  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute

    six.raise_from(core._status_to_exception(e.code, message), None)

  File "<string>", line 3, in raise_from

tensorflow.python.framework.errors_impl.UnknownError:  [_Derived_]  Fail to find the dnn implementation.

     [[{{node cond_29/then/_0/CudnnRNNV3}}]]

     [[sequential/lstm/StatefulPartitionedCall]] [Op:__inference_distributed_function_6721]

Function call stack:

distributed_function -> distributed_function -> distributed_function

代码报错,因为tensorflow的库版本雨cudnn不匹配,要求cudnn版本为7.6.4,而我之前安装的版本是7.5.1,因此需要对cudnn进行升级,升级方法很简单,而且不会对现有安装环境造成破坏,升级完之后tensorflow还可以正常使用

1 查看cudnn版本

首先使用以下指令查看现有cudnn的版本

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
           

输出如下

#define CUDNN_MAJOR 7
#define CUDNN_MINOR 5
#define CUDNN_PATCHLEVEL 1
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"
           

表明是7.5.1版本

2 下载cudnn

根据cuda和系统环境,在官网下载对应版本,建议选择tgz压缩文件,不要下载Deb文件。

cudnn升级方法,解决CuDNN版本不兼容问题1 查看cudnn版本2 下载cudnn3 删除旧版本4 安装新版本5 建立软连接6 测试验证

官方下载需要注册账号,这个版本已上传到百度网盘,相同版本的可以自己去下载,省去注册的麻烦。

文件:cudnn-10.1-linux-x64-v7.6.4.38.solitairetheme8

链接:https://pan.baidu.com/s/1ivxmaE_YUIaIaNsTTqskDQ 

提取码:xcp1

下载完后进行解压,解压方式如下:

# 重命名成tgz
mv cudnn-10.1-linux-x64-v7.6.4.38.solitairetheme8 cudnn-10.1-linux-x64-v7.6.4.38.tgz
# 解压
tar -zxvf cudnn-10.1-linux-x64-v7.6.4.38.tgz
           

解压出一个名为cuda的文件夹,文件夹中有include和lib64两个文件夹 

3 删除旧版本

sudo rm -rf /usr/local/cuda/include/cudnn.h
sudo rm -rf /usr/local/cuda/lib64/libcudnn*
           

4 安装新版本

cd进入刚才解压的cuda文件夹

sudo cp include/cudnn.h /usr/local/cuda/include/
sudo cp lib64/lib* /usr/local/cuda/lib64/
           

5 建立软连接

cd /usr/local/cuda/lib64/
sudo chmod +r libcudnn.so.7.6.4
sudo ln -sf libcudnn.so.7.6.4 libcudnn.so.7
sudo ln -sf libcudnn.so.7 libcudnn.so   
sudo ldconfig
           

6 测试验证

输入第1步的指令,得到输出如下

#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 4
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"
           

升级成功,运行代码再也不会报错了!

继续阅读