示例代碼:
import torch
import time
print(torch.__version__) # 傳回pytorch的版本
print(torch.cuda.is_available()) # 當CUDA可用時傳回True
a = torch.randn(10000, 1000) # 傳回10000行1000列的張量矩陣
b = torch.randn(1000, 2000) # 傳回1000行2000列的張量矩陣
t0 = time.time() # 記錄時間
c = torch.matmul(a, b) # 矩陣乘法運算
t1 = time.time() # 記錄時間
print(a.device, t1 - t0, c.norm(2)) # c.norm(2)表示矩陣c的二範數
device = torch.device('cuda') # 用GPU來運作
a = a.to(device)
b = b.to(device)
# 初次調用GPU,需要資料傳送,是以比較慢
t0 = time.time()
c = torch.matmul(a, b)
t2 = time.time()
print(a.device, t2 - t0, c.norm(2))
# 這才是GPU處理資料的真實運作時間,當資料量越大,GPU的優勢越明顯
t0 = time.time()
c = torch.matmul(a, b)
t2 = time.time()
print(a.device, t2 - t0, c.norm(2))
運作結果:
1.0.0
True
cpu 0.20143413543701172 tensor(3496775.5000)
cuda:0 0.28623294830322266 tensor(141487., device='cuda:0')
cuda:0 0.007987499237060547 tensor(141487., device='cuda:0')