python粘性拓展_PyTorch中的C++擴充實作

今天要聊聊用 PyTorch 進行 C++ 擴充。

在正式開始前，我們需要了解 PyTorch 如何自定義module。這其中，最常見的就是在 python 中繼承torch.nn.Module，用 PyTorch 中已有的 operator 來組裝成自己的子產品。這種方式實作簡單，但是，計算效率卻未必最佳，另外，如果我們想實作的功能過于複雜，可能 PyTorch 中那些已有的函數也沒法滿足我們的要求。這時，用 C、C++、CUDA 來擴充 PyTorch 的子產品就是最佳的選擇了。

由于目前市面上大部分深度學習系統（TensorFlow、PyTorch 等）都是基于 C、C++ 建構的後端，是以這些系統基本都存在 C、C++ 的擴充接口。PyTorch 是基于 Torch 建構的，而 Torch 底層采用的是 C 語言，是以 PyTorch 天生就和 C 相容，是以用 C 來擴充 PyTorch 并非難事。而随着 PyTorch1.0 的釋出，官方已經開始考慮将 PyTorch 的底層代碼用 caffe2 替換，是以他們也在逐漸重構 ATen，後者是目前 PyTorch 使用的 C++ 擴充庫。總的來說，C++ 是未來的趨勢。至于 CUDA，這是幾乎所有深度學習系統在建構之初就采用的工具，是以 CUDA 的擴充接口是标配。

本文用一個簡單的例子，梳理一下進行 C++ 擴充的步驟，至于一些具體的實作，不做深入探讨。

PyTorch的C、C++、CUDA擴充

關于 PyTorch 的 C 擴充，可以參考官方教程或者這篇博文，其操作并不難，無非是借助原先 Torch 提供的

和等接口，再利用 PyTorch 中提供的torch.util.ffi子產品進行擴充。需要注意的是，随着 PyTorch 版本更新，這種做法在新版本的 PyTorch 中可能會失效。

本文主要介紹 C++（未來可能加上 CUDA）的擴充方法。

C++擴充

首先，介紹一下基本流程。在 PyTorch 中擴充 C++/CUDA 主要分為幾步：

安裝好 pybind11 子產品（通過 pip 或者 conda 等安裝），這個子產品會負責 python 和 C++ 之間的綁定；

用 C++ 寫好自定義層的功能，包括前向傳播forward和反向傳播backward；

寫好 setup.py，并用 python 提供的setuptools來編譯并加載 C++ 代碼。

編譯安裝，在 python 中調用 C++ 擴充接口。

接下來，我們就用一個簡單的例子（z=2x+y）來示範這幾個步驟。

第一步

安裝 pybind11 比較簡單，直接略過。我們先寫好 C++ 相關的檔案：

頭檔案 test.h

#include

// 前向傳播

torch::Tensor Test_forward_cpu(const torch::Tensor& inputA,

const torch::Tensor& inputB);

// 反向傳播

std::vector Test_backward_cpu(const torch::Tensor& gradOutput);

注意，這裡引用的頭檔案至關重要，它主要包括三個重要子產品：

pybind11，用于 C++ 和 python 互動；

ATen，包含 Tensor 等重要的函數和類；

一些輔助的頭檔案，用于實作 ATen 和 pybind11 之間的互動。

源檔案 test.cpp 如下：

#include "test.h"

// 前向傳播，兩個 Tensor 相加。這裡隻關注 C++ 擴充的流程，具體實作不深入探讨。

torch::Tensor Test_forward_cpu(const torch::Tensor& x,

const torch::Tensor& y) {

AT_ASSERTM(x.sizes() == y.sizes(), "x must be the same size as y");

torch::Tensor z = torch::zeros(x.sizes());

z = 2 * x + y;

return z;

}

// 反向傳播

// 在這個例子中，z對x的導數是2，z對y的導數是1。

// 至于這個backward函數的接口（參數，傳回值）為何要這樣設計，後面會講。

std::vector Test_backward_cpu(const torch::Tensor& gradOutput) {

torch::Tensor gradOutputX = 2 * gradOutput * torch::ones(gradOutput.sizes());

torch::Tensor gradOutputY = gradOutput * torch::ones(gradOutput.sizes());

return {gradOutputX, gradOutputY};

}

// pybind11 綁定

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {

m.def("forward", &Test_forward_cpu, "TEST forward");

m.def("backward", &Test_backward_cpu, "TEST backward");

}

第二步

建立一個編譯安裝的配置檔案 setup.py，檔案目錄安排如下：

└── csrc

├── cpu

│ ├── test.cpp

│ └── test.h

└── setup.py

以下是 setup.py 中的内容：

from setuptools import setup

import os

import glob

from torch.utils.cpp_extension import BuildExtension, CppExtension

# 頭檔案目錄

include_dirs = os.path.dirname(os.path.abspath(__file__))

# 源代碼目錄

source_cpu = glob.glob(os.path.join(include_dirs, 'cpu', '*.cpp'))

setup(

name='test_cpp', # 子產品名稱，需要在python中調用

version="0.1",

ext_modules=[

CppExtension('test_cpp', sources=source_cpu, include_dirs=[include_dirs]),

cmdclass={

'build_ext': BuildExtension

}

)

注意，這個 C++ 擴充被命名為test_cpp，意思是說，在 python 中可以通過test_cpp子產品來調用 C++ 函數。

第三步

在 cpu 這個目錄下，執行下面的指令編譯安裝 C++ 代碼：

python setup.py install

之後，可以看到一堆輸出，該 C++ 子產品會被安裝在 python 的 site-packages 中。

完成上面幾步後，就可以在 python 中調用 C++ 代碼了。在 PyTorch 中，按照慣例需要先把 C++ 中的前向傳播和反向傳播封裝成一個函數op（以下代碼放在 test.py 檔案中）：

from torch.autograd import Function

import test_cpp

class TestFunction(Function):

@staticmethod

def forward(ctx, x, y):

return test_cpp.forward(x, y)

@staticmethod

def backward(ctx, gradOutput):

gradX, gradY = test_cpp.backward(gradOutput)

return gradX, gradY

這樣一來，我們相當于把 C++ 擴充的函數嵌入到 PyTorch 自己的架構内。

我檢視了這個Function類的代碼，發現是個挺有意思的東西：

class Function(with_metaclass(FunctionMeta, _C._FunctionBase, _ContextMethodMixin, _HookMixin)):

...

@staticmethod

def forward(ctx, *args, **kwargs):

r"""Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any

number of arguments (tensors or other types).

The context can be used to store tensors that can be then retrieved

during the backward pass.

"""

raise NotImplementedError

@staticmethod

def backward(ctx, *grad_outputs):

r"""Defines a formula for differentiating the operation.

This function is to be overridden by all subclasses.

It must accept a context :attr:`ctx` as the first argument, followed by

as many outputs did :func:`forward` return, and it should return as many

tensors, as there were inputs to :func:`forward`. Each argument is the

gradient w.r.t the given output, and each returned value should be the

gradient w.r.t. the corresponding input.

The context can be used to retrieve tensors saved during the forward

pass. It also has an attribute :attr:`ctx.needs_input_grad` as a tuple

of booleans representing whether each input needs gradient. E.g.,

:func:`backward` will have ``ctx.needs_input_grad[0] = True`` if the

first input to :func:`forward` needs gradient computated w.r.t. the

output.

"""

raise NotImplementedError

這裡需要注意一下backward的實作規則。該接口包含兩個參數：ctx是一個輔助的環境變量，grad_outputs則是來自前一層網絡的梯度清單，而且這個梯度清單的數量與forward函數傳回的參數數量相同，這也符合鍊式法則的原理，因為鍊式法則就需要把前一層中所有相關的梯度與目前層進行相乘或相加。同時，backward需要傳回forward中每個輸入參數的梯度，如果forward中包括 n 個參數，就需要一一傳回 n 個梯度。是以，在上面這個例子中，我們的backward函數接收一個參數作為輸入（forward隻輸出一個變量），并傳回兩個梯度（forward接收上一層兩個輸入變量）。

定義完Function後，就可以在Module中使用這個自定義op了：

import torch

class Test(torch.nn.Module):

def __init__(self):

super(Test, self).__init__()

def forward(self, inputA, inputB):

return TestFunction.apply(inputA, inputB)

現在，我們的檔案目錄變成：

├── csrc

│ ├── cpu

│ │ ├── test.cpp

│ │ └── test.h

│ └── setup.py

└── test.py

之後，我們就可以将 test.py 當作一般的 PyTorch 子產品進行調用了。

測試

下面，我們測試一下前向傳播和反向傳播：

import torch

from torch.autograd import Variable

from test import Test

x = Variable(torch.Tensor([1,2,3]), requires_grad=True)

y = Variable(torch.Tensor([4,5,6]), requires_grad=True)

test = Test()

z = test(x, y)

z.sum().backward()

print('x: ', x)

print('y: ', y)

print('z: ', z)

print('x.grad: ', x.grad)

print('y.grad: ', y.grad)

輸出如下：

x: tensor([1., 2., 3.], requires_grad=True)

y: tensor([4., 5., 6.], requires_grad=True)

z: tensor([ 6., 9., 12.], grad_fn=)

x.grad: tensor([2., 2., 2.])

y.grad: tensor([1., 1., 1.])

可以看出，前向傳播滿足 z=2x+y，而反向傳播的結果也在意料之中。

CUDA擴充

雖然 C++ 寫的代碼可以直接跑在 GPU 上，但它的性能還是比不上直接用 CUDA 編寫的代碼，畢竟 ATen 沒法并不知道如何去優化算法的性能。不過，由于我對 CUDA 仍一竅不通，是以這一步隻能暫時略過，留待之後補充～囧～。

參考

CUSTOM C EXTENSIONS FOR PYTORCH

CUSTOM C++ AND CUDA EXTENSIONS

Pytorch拓展進階(一)：Pytorch結合C以及Cuda語言

Pytorch拓展進階(二)：Pytorch結合C++以及Cuda拓展

到此這篇關于PyTorch中的C++擴充實作的文章就介紹到這了,更多相關PyTorch C++擴充内容請搜尋我們以前的文章或繼續浏覽下面的相關文章希望大家以後多多支援我們！

本文标題: PyTorch中的C++擴充實作

本文位址: http://www.cppcns.com/jiaoben/python/305476.html

python粘性拓展_PyTorch中的C++擴充實作

繼續閱讀

python粘性拓展_Python基礎之：拓展解決問題的思路

python粘性拓展_Python Numpy 數組擴充 repeat和tile

python粘性拓展_如何将tkinter小部件置于粘性架構中

python粘性拓展_python 繼承：重寫、拓展（六）

python粘性拓展_Python拓展