為了将pytorch訓練出的人臉識别模型更好地部署到樹莓派中,這裡選用ncnn前向推理架構加速模型推理過程。
pytorch -> onnx
pytroch1.0以上的版本是自帶onnx的,是以轉換比較友善,直接用torch.onnx.export就能輸出.onnx檔案。為了保證pytorch與onnx的輸出一緻,我們用相同的輸入放入torch與onnx模型中,比較它們各自輸出,程式如下。
import torch.onnx
import torchvision
from model1 import MobileFaceNet
import torch
import cv2
import onnx
import onnxruntime
import numpy as np
model = MobileFaceNet(512)
device = torch.device("cpu")
dummy_input = torch.randn(1, 3, 112, 112).to(device)
state_dict = torch.load('./model_mobilefacenet.pth', map_location=device)
model.load_state_dict(state_dict)
model.eval()
out = model(dummy_input)
print(out[0][:10])
torch.onnx.export(model, # model being run
dummy_input, # model input (or a tuple for multiple inputs)
"my_mobileface.onnx", # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = ['input'], # the model's input names
output_names = ['output'], # the model's output names
)
onnx_model = onnx.load('./my_mobileface.onnx') # load onnx model
session = onnxruntime.InferenceSession("./my_mobileface.onnx", None)
input_name = session.get_inputs()[0].name
orig_result = session.run([], {input_name: dummy_input.data.numpy()})
print(orig_result[:10])
onnx->ncnn
ncnn安裝按照https://github.com/Tencent/ncnn官方提示來就行。
On Debian, Ubuntu or Raspberry Pi OS, you can install all required dependencies using:安裝依賴環境
sudo apt install build-essential git cmake libprotobuf-dev protobuf-compiler libvulkan-dev vulkan-utils libopencv-dev
然後git clone ncnn,因為不适用gpu,是以DNCNN_VULKAN=OFF
$ cd ncnn
$ mkdir -p build
$ cd build
build$ cmake -DCMAKE_BUILD_TYPE=Release -DNCNN_VULKAN=OFF -DNCNN_SYSTEM_GLSLANG=ON -DNCNN_BUILD_EXAMPLES=ON ..
build$ make -j$(nproc)
Verify build by running some examples:
build$ cd ../examples
examples$ ../build/examples/squeezenet ../images/256-ncnn.png
[0 AMD RADV FIJI (LLVM 10.0.1)] queueC=1[4] queueG=0[1] queueT=0[1]
[0 AMD RADV FIJI (LLVM 10.0.1)] bugsbn1=0 buglbia=0 bugcopc=0 bugihfa=0
[0 AMD RADV FIJI (LLVM 10.0.1)] fp16p=1 fp16s=1 fp16a=0 int8s=1 int8a=1
532 = 0.163452
920 = 0.093140
716 = 0.061584
example$
如果出現終端列印出這些資訊就說明安裝成功
在将onnx轉換為ncnn模型前,我們需要簡化onnx模型,以免出現不可編譯的情況
首先,安裝onnx-smiplifier
pip install onnx-simplifier
然後簡化onnx模型
python3 -m onnxsim my_mobileface.onnx my_mobileface-sim.onnx
onnx轉換為ncnn,需要使用在ncnn/build/tools/onnx2ncnn
./onnx2ncnn my_mobileface-sim.onnx my_mobileface.param my_mobileface.bin
生成的.bin與.param檔案就是我們在樹莓派上需要使用的NCNN模型檔案
最後,在c++環境下推理ncnn模型并輸出,這裡需要注意ncnn的輸入一定要對應pytorch的輸入,不然會嚴重影響NCNN的推理結果。
#include <iostream>
#include <fstream>
#include <stdio.h>
#include <algorithm>
#include <vector>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include "opencv2/imgproc/imgproc.hpp"
#include "net.h"
using namespace std;
//這個函數是官方提供的用于列印輸出的tensor
void pretty_print(const ncnn::Mat& m)
{
for (int q=0; q<m.c; q++)
{
const float* ptr = m.channel(q);
for (int y=0; y<m.h; y++)
{
for (int x=0; x<m.w; x++)
{
printf("%f ", ptr[x]);
}
ptr += m.w;
printf("\n");
}
printf("------------------------\n");
}
}
//main函數模闆
int main(){
string img_path = "xxx.jpg";
cv::Mat img = cv::imread(img_path, cv::IMREAD_COLOR);
cv::Mat img2;
int input_width = 512;//轉onnx時指定的輸入大小
int input_height = 512;
// resize
cv::resize(img, img2, cv::Size(input_width, input_height));
// 加載轉換并且量化後的alexnet網絡
ncnn::Net net;
//net.opt.num_threads=1;
net.load_param("xxx.param");
net.load_model("xxx.bin");
// 把opencv的mat轉換成ncnn的mat
ncnn::Mat input = ncnn::Mat::from_pixels(img2.data, ncnn::Mat::PIXEL_BGR, img2.cols, img2.rows);
const float mean_vals[3] = {0.f,0.f,0.f};
const float norm_vals[3] = {1/255.f,1/255.f,1/255.f};
input.substract_mean_normalize(mean_vals, norm_vals);
// ncnn前向計算
ncnn::Extractor extractor = net.create_extractor();
extractor.input("input", input);
ncnn::Mat output0,output1;//取決于模型的輸出有幾個
extractor.extract("output0", output0);
extractor.extract("output1", output1);
pretty_print(output0);
pretty_print(output1);
/*
// 或者展平後輸出
ncnn::Mat out_flatterned = output0.reshape(output0.w * output0.h * output0.c);
std::vector<float> scores;
scores.resize(out_flatterned.w);
for (int j=0; j<out_flatterned.w; j++)
{
scores[j] = out_flatterned[j];
}
*/
cout<<"done"<<endl;
return 0;
}