Environment

OS: Ubuntu 14.04
Python version: 3.7
PyTorch version: 1.4.0
IDE: PyCharm
GPU: 3 张 RTX 2080 Ti

文章目录

0. 写在前面
1. 设置 GPU
- 1.1 选择 GPU
- 1.2 查看可用 GPU 的信息
2. 将数据和模型在 GPU 和 CPU 之间移动
- 1.2 对于 torch.Tensor
- 2.2 对于 torch.nn.Module
3. 多 GPU 数据并行

0. 写在前面

深度学习的神经网络模型往往参数巨多，一个能跑的 GPU 是基本配置。这里记一下 PyTorch 中关于 GPU 的一些函数和训练代码。

CPU 和 GPU 的区别可以简明地参考知乎 CPU 和 GPU 的区别是什么？。

1. 设置 GPU

1.1 选择 GPU

以三块 RTX 2080 Ti 卡为例，它们的物理索引分别为 0、1、2

os.system('nvidia-smi -q -d Memory | grep -A4 GPU | grep Free > temp.txt')
gpu_memories = [int(x.split()[2]) for x in open('temp.txt', 'r').readlines()]
os.system('rm temp.txt')
print('GPU of free memories:', gpu_memories)
# GPU of free memories: [11009, 11009, 11009]

接着，设置索引为 1 和 2 的两块卡对当前 python 脚本程序可见

这样，物理 GPU 中索引为 1 和 2 的 GPU 为对应索引为 0 和 1 逻辑 GPU。

1.2 查看可用 GPU 的信息

torch.cuda

模块中提供了 PyTorch 关于使用 GPU 的函数。

torch.cuda.is_available() 返回是否有可用的 GPU

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)  # cuda

torch.cuda.get_device_name() 返回 GPU 设备的名称

import torch

print(torch.cuda.get_device_name())  # GeForce RTX 2080 Ti

torch.cuda.device_count() 返回当前程序可见的 GPU 数目

import torch

print(torch.cuda.device_count())  # 2

2. 将数据和模型在 GPU 和 CPU 之间移动

使用

Tensor.is_cuda()

方法能够查看数据是否在 GPU 上

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tensor_in_cpu = torch.randn((128, 3, 299, 299))
print(tensor_in_cpu.is_cuda)
tensor_in_gpu = tensor_in_cpu.to(device)  # 移动数据到 cuda 这块 GPU
print(tensor_in_gpu.is_cuda)

1.2 对于 torch.Tensor

Tensor.to()

方法，转换数据类型或所在设备（CPU / GPU）

传入设备，如 torch.device(cuda:0) ，则将张量移动到索引为 0 的逻辑 GPU 上
传入数据类型，如 torch.float32 ，则将张量的类型转换为该类型

注意，该操作非 in-place，需要创建一个变量来引用得到的结果。

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tensor_in_cpu = torch.randn((128, 3, 299, 299))
tensor_in_gpu = tensor_in_cpu.to(device)  # 移动数据到 cuda:0 这块 GPU

使用

Tensor.is_cuda()

方法能够查看数据是否在 GPU 上

print(tensor_in_cpu.is_cuda)  # False
print(tensor_in_gpu.is_cuda)  # True

2.2 对于 torch.nn.Module

Module.to()

方法，转换模型参数的数据类型或模型所在的设备（CPU / GPU）。

传入设备，如 torch.device(cuda:0) ，则将模型移动到索引为 0 的逻辑 GPU 上
传入数据类型，如 torch.float32 ，则将模型中参数的类型转换为该类型

注意，该操作 in-place。

import torchvision

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = torchvision.models.resnet50()
model.to(device)

3. 多 GPU 数据并行

更多详细的参考知乎 Pytorch的nn.DataParallel。

torch.nn.DataParallel

类，将模型中的数据分发到不同 GPU 上，让模型水平上的数据并行，实现多 GPU 训练。

import torch
from torch.nn import DataParallel, Module, Sequential, Linear, ReLU

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


class FooNet(Module):
    def __init__(self, in_features, num_classes):
        super(FooNet, self).__init__()
        self.fc = Sequential(
            Linear(in_features, 100),
            ReLU(),
            Linear(100, num_classes)
        )

    def forward(self, x):
        print("Batch size of data in 'forward' method:", x.size(0))
        return self.fc(x)


# ========== test ==========
batch_size = 128
in_features = 100
num_classes = 100

# create data
inputs = torch.randn(batch_size, in_features)
labels = torch.randn(batch_size, num_classes)
inputs, labels = inputs.to(device), labels.to(device)

# modeling
foonet = FooNet(in_features, num_classes)
foonet = DataParallel(
    module=foonet,  # 需要包装分发的模型
    device_ids=None,  # 可分发的 GPU，默认 None，为分发到所有可见可用的 GPU
    output_device=None,  # 结果输出的设备，默认 None，为主 GPU，即索引为 0 的逻辑 GPU
    dim=0  # 应该是按维度 0 将数据分发到不同 GPU 上（文档并没有解释）
)
foonet.to(device)

# forward
outputs = foonet(inputs)
print('Size of output:', outputs.size())

运行结果为打印出

Batch size of data in 'forward' method: 64
Batch size of data in 'forward' method: 64
Size of output: torch.Size([128, 100])

可见，输入数据一个 batch 为 128，被平均分配到两张卡中，每张卡的模型中数据 batch 为 64。

注意，

DataParallel

包装后的模型是

torch.nn.parallel.data_parallel.DataParallel

类的实例对象，需要用

.module

属性得到具体的模型

print(type(foonet))
# <class 'torch.nn.parallel.data_parallel.DataParallel'>

print(type(foonet.module))
# <class '__main__.FooNet'>

PyTorch学习笔记（五）使用GPU训练模型0. 写在前面1. 设置 GPU2. 将数据和模型在 GPU 和 CPU 之间移动3. 多 GPU 数据并行

文章目录

0. 写在前面

1. 设置 GPU

1.1 选择 GPU

1.2 查看可用 GPU 的信息

2. 将数据和模型在 GPU 和 CPU 之间移动

1.2 对于 torch.Tensor

2.2 对于 torch.nn.Module

3. 多 GPU 数据并行

继续阅读

解码器用于语义分割：数据依赖的解码可以实现灵活的特征聚合

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

2021-2025年中国运动疗法（KT）带行业市场供需与战略研究报告

cs231n斯坦福基于卷积神经网络的CV学习笔记（一）KNN和线性分类器/分类器损失/反向传播一，KNN图像分类算法二，线性分类器三，线性分类器损失四，反向传播五，神经网络

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入