Environment

OS: Ubuntu 14.04
Python version: 3.7
PyTorch version: 1.4.0
IDE: PyCharm
GPU: 3 張 RTX 2080 Ti

文章目錄

0. 寫在前面
1. 設定 GPU
- 1.1 選擇 GPU
- 1.2 檢視可用 GPU 的資訊
2. 将資料和模型在 GPU 和 CPU 之間移動
- 1.2 對于 torch.Tensor
- 2.2 對于 torch.nn.Module
3. 多 GPU 資料并行

0. 寫在前面

深度學習的神經網絡模型往往參數巨多，一個能跑的 GPU 是基本配置。這裡記一下 PyTorch 中關于 GPU 的一些函數和訓練代碼。

CPU 和 GPU 的差別可以簡明地參考知乎 CPU 和 GPU 的差別是什麼？。

1. 設定 GPU

1.1 選擇 GPU

以三塊 RTX 2080 Ti 卡為例，它們的實體索引分别為 0、1、2

os.system('nvidia-smi -q -d Memory | grep -A4 GPU | grep Free > temp.txt')
gpu_memories = [int(x.split()[2]) for x in open('temp.txt', 'r').readlines()]
os.system('rm temp.txt')
print('GPU of free memories:', gpu_memories)
# GPU of free memories: [11009, 11009, 11009]

接着，設定索引為 1 和 2 的兩塊卡對目前 python 腳本程式可見

這樣，實體 GPU 中索引為 1 和 2 的 GPU 為對應索引為 0 和 1 邏輯 GPU。

1.2 檢視可用 GPU 的資訊

torch.cuda

子產品中提供了 PyTorch 關于使用 GPU 的函數。

torch.cuda.is_available() 傳回是否有可用的 GPU

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)  # cuda

torch.cuda.get_device_name() 傳回 GPU 裝置的名稱

import torch

print(torch.cuda.get_device_name())  # GeForce RTX 2080 Ti

torch.cuda.device_count() 傳回目前程式可見的 GPU 數目

import torch

print(torch.cuda.device_count())  # 2

2. 将資料和模型在 GPU 和 CPU 之間移動

使用

Tensor.is_cuda()

方法能夠檢視資料是否在 GPU 上

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tensor_in_cpu = torch.randn((128, 3, 299, 299))
print(tensor_in_cpu.is_cuda)
tensor_in_gpu = tensor_in_cpu.to(device)  # 移動資料到 cuda 這塊 GPU
print(tensor_in_gpu.is_cuda)

1.2 對于 torch.Tensor

Tensor.to()

方法，轉換資料類型或所在裝置（CPU / GPU）

傳入裝置，如 torch.device(cuda:0) ，則将張量移動到索引為 0 的邏輯 GPU 上
傳入資料類型，如 torch.float32 ，則将張量的類型轉換為該類型

注意，該操作非 in-place，需要建立一個變量來引用得到的結果。

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tensor_in_cpu = torch.randn((128, 3, 299, 299))
tensor_in_gpu = tensor_in_cpu.to(device)  # 移動資料到 cuda:0 這塊 GPU

使用

Tensor.is_cuda()

方法能夠檢視資料是否在 GPU 上

print(tensor_in_cpu.is_cuda)  # False
print(tensor_in_gpu.is_cuda)  # True

2.2 對于 torch.nn.Module

Module.to()

方法，轉換模型參數的資料類型或模型所在的裝置（CPU / GPU）。

傳入裝置，如 torch.device(cuda:0) ，則将模型移動到索引為 0 的邏輯 GPU 上
傳入資料類型，如 torch.float32 ，則将模型中參數的類型轉換為該類型

注意，該操作 in-place。

import torchvision

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = torchvision.models.resnet50()
model.to(device)

3. 多 GPU 資料并行

更多詳細的參考知乎 Pytorch的nn.DataParallel。

torch.nn.DataParallel

類，将模型中的資料分發到不同 GPU 上，讓模型水準上的資料并行，實作多 GPU 訓練。

import torch
from torch.nn import DataParallel, Module, Sequential, Linear, ReLU

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


class FooNet(Module):
    def __init__(self, in_features, num_classes):
        super(FooNet, self).__init__()
        self.fc = Sequential(
            Linear(in_features, 100),
            ReLU(),
            Linear(100, num_classes)
        )

    def forward(self, x):
        print("Batch size of data in 'forward' method:", x.size(0))
        return self.fc(x)


# ========== test ==========
batch_size = 128
in_features = 100
num_classes = 100

# create data
inputs = torch.randn(batch_size, in_features)
labels = torch.randn(batch_size, num_classes)
inputs, labels = inputs.to(device), labels.to(device)

# modeling
foonet = FooNet(in_features, num_classes)
foonet = DataParallel(
    module=foonet,  # 需要包裝分發的模型
    device_ids=None,  # 可分發的 GPU，預設 None，為分發到所有可見可用的 GPU
    output_device=None,  # 結果輸出的裝置，預設 None，為主 GPU，即索引為 0 的邏輯 GPU
    dim=0  # 應該是按次元 0 将資料分發到不同 GPU 上（文檔并沒有解釋）
)
foonet.to(device)

# forward
outputs = foonet(inputs)
print('Size of output:', outputs.size())

運作結果為列印出

Batch size of data in 'forward' method: 64
Batch size of data in 'forward' method: 64
Size of output: torch.Size([128, 100])

可見，輸入資料一個 batch 為 128，被平均配置設定到兩張卡中，每張卡的模型中資料 batch 為 64。

注意，

DataParallel

包裝後的模型是

torch.nn.parallel.data_parallel.DataParallel

類的執行個體對象，需要用

.module

屬性得到具體的模型

print(type(foonet))
# <class 'torch.nn.parallel.data_parallel.DataParallel'>

print(type(foonet.module))
# <class '__main__.FooNet'>

PyTorch學習筆記（五）使用GPU訓練模型0. 寫在前面1. 設定 GPU2. 将資料和模型在 GPU 和 CPU 之間移動3. 多 GPU 資料并行

文章目錄

0. 寫在前面

1. 設定 GPU

1.1 選擇 GPU

1.2 檢視可用 GPU 的資訊

2. 将資料和模型在 GPU 和 CPU 之間移動

1.2 對于 torch.Tensor

2.2 對于 torch.nn.Module

3. 多 GPU 資料并行

繼續閱讀

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入