使用libtorch進行c++工程化流程

Pytorch C++ Inference

step 1：

模型轉化為torch script module

轉換模型有兩種方法：

function 1：tracing->構造一個Script module，隻有一個forward方法

**使用方式：**給定一個輸入，并且模型加載好已經訓練完成的權值檔案，進行一次forward，tracing就可以記錄整個資料流

适用對象：

①forward過程中沒有控制流的情況（if/else,for loop）

②輸出的script module隻是為了inference(srcipt 隻會記錄trace儲存時的mode（eval/train）)

#the trace function only suitable for torch.nn.module model
#judge whether our net class is torch.nn.module
isinstance(net,torch.nn.module)
#before the process
#we should load the weights to net,and send net to a gpu device
#input should be at the same device as net locate
save_script_module=torch.jit.trace(net,input)

function 2: srcipt (顯式标記模型)

使用方式：

1.pytorch.version>=1.2 執行個體化一個torch.nn.module 類，将該module類執行個體化後，轉換成script module類

2.pytorch.version<=1.1,将原有的torch.nn.module 轉換成torch.jit.ScriptModule,原有的forwardmethod加上@torch.jit.script_method的修飾

**适用對象：**forward過程中存在控制流的情況

1.pytorch.version>=1.2

#如果method中有Script Module不支援的python feature，在method上方标注@torch.jit.ignore
class MyModule(torch.nn.Module):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.weight = torch.nn.Parameter(torch.rand(N, M))

    def forward(self, input):
        if input.sum() > 0:
          output = self.weight.mv(input)
        else:
          output = self.weight + input
        return output

my_module = MyModule(10,20)
sm = torch.jit.script(my_module)

2.pytorch.version<=1.1
import torch
class MyModule(torch.jit.ScriptModule):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.weight = torch.nn.Parameter(torch.rand(N, M))

    @torch.jit.script_method
    def forward(self, input):
        return self.weight.mv(input)

tips:對于forward過程中可能有script module 不支援的python feature，可以考慮把這個相關layer封裝起來，用trace記錄相關的layer流，其他layer不用修改。參考示例：

import torch
import torch.nn as nn
import torch.nn.functional as F

class MyScriptModule(torch.jit.ScriptModule):
    def __init__(self):
        super(MyScriptModule, self).__init__()
        # torch.jit.trace produces a ScriptModule's conv1 and conv2
        self.conv1 = torch.jit.trace(nn.Conv2d(1, 20, 5), torch.rand(1, 1, 16, 16))
        self.conv2 = torch.jit.trace(nn.Conv2d(20, 20, 5), torch.rand(1, 20, 16, 16))

    @torch.jit.script_method
    def forward(self, input):
      input = F.relu(self.conv1(input))
      input = F.relu(self.conv2(input))
      return input

step 2：

儲存torch script module

#save script module
torch.jit.save(save_script_module,'path/scriptmodulename.pt')

step 3:

在 c++工程中加載.pt 的script module

tips：

1.這裡需要注意一下加載的過程，我們在儲存模型的時候，前向inference過程是在某塊gpu上面完成的，會有一個預設的location，如果直接加載，不設定gpu device，會提示cuda error:invalid device ordinary.

2.如果在python inference的過程中，有适用torch.no_grad進行gpu的顯存優化，則需要在c++代碼中加上對應的部分，否則會出現在c++ 代碼中進行inference的過程中的顯存占用>python inference過程中的顯存占用

模型加載過程示例：

#include <torch/script.h> // One-stop header.
#include <torch/torch.h>
#include <iostream>
#include <memory>

int main(int argc, const char* argv[]) {
  if (argc != 2) {
    std::cerr << "usage: example-app <path-to-exported-script-module> gpuid\n";
    return -1;
  }
  int gpu_id=argv[2];
  torch.Device device(torch::cuda::is_available() ? torch::kCUDA :torch::kCPU,gpuid);
  torch::jit::script::Module module;
  //gpu optimize
  torch::NoGradGuard no_grad;
  try {
    // Deserialize the ScriptModule from a file using torch::jit::load().
    module = torch::jit::load(argv[1],device);
    
    //gpu optimize
    module.eval();
  }
  catch (const c10::Error& e) {
    std::cerr << "error loading the model\n";
    return -1;
  }

  std::cout << "ok\n";
}

step 4:下載下傳libtorch并進行cmake編譯

tips:c++ 的libtorch版本号應該大于或者等于python的torch版本，否則會出錯

參考：https://pytorch.org/tutorials/advanced/cpp_export.html

step 5:在c++代碼中進行前向inference

tips:此步驟中的input也應該發送到module所在的device，否則會報data的類型為cpu variable，module的類型為cuda variable。

// Create a vector of inputs.
at::Tensor input_tensor=torch::ones({1, 3, 224, 224});
input_tensor=input_tensor.to(device);
std::vector<torch::jit::IValue> inputs;
inputs.push_back(input_tensor);

// Execute the model and turn its output into a tensor.
at::Tensor output = module.forward(inputs).toTensor();
std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';

使用libtorch進行c++工程化流程

Pytorch C++ Inference

step 1：

step 2：

step 3:

step 4:下載下傳libtorch并進行cmake編譯

step 5:在c++代碼中進行前向inference

繼續閱讀

CQ V1.0分詞bates(基于雙數組tire樹)—應該是目前最快的中文分詞算法

成員函數初始化清單

【趨高機器視覺】機器視覺技術原了解析及解決方案

2021-08-13c++——類之操作符重載

swmm與lisflood-fp源碼如何一起編譯 CMake指令

Windows下VS開發環境環境安裝工程項目設定關于Debug和Release的提示

一文看懂字元串的加減乘除

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡

C++ 第十五周報告1--《冒泡法排序》

【Torch】最簡潔logging使用指南

C++實作簡單順序表

C經典書籍筆記——C陷阱與缺陷②(文法陷阱之優先級)一、錯誤案列二、優先級規律

線性表之順序表的實作

C++判斷素數、求最大公約數代碼判斷一個數是否為素數求兩個數的最大公約數

SequoiaDB巨杉資料庫C++驅動概述