Atomic Functions

2023-03-09 16:37:35

考慮兩個線程均是要往同一個全局或者共享資料中。

Atomic Functions

如果x的初始值是10，那麼x的最終結果是？答案是無法确定的，主要是因為有通路沖突。

Atomic Functions 就是讀-修改-寫操作時避免與其它線程沖突，計算時會将其位址鎖定，直到結束計算。

atomic opeations:

intatomicAdd(int* address, intval);

intatomicSub(int* address, intval);

intatomicExch(int* address, intval);

intatomicMin(int* address, intval);

intatomicMax(int* address, intval);

unsigned intatomicInc(unsigned int* address, unsigned intval);

unsigned intatomicDec(unsigned int* address, unsigned intval);

intatomicCAS(int* address, int compare, intval); //compare and swap

intatomicAnd(int* address, intval);

intatomicOr(int* address, intval);

intatomicXor(int* address, intval);

測試例子：

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "device_functions.h"
#include <iostream>

using namespace std;

__device__ int gpu_hist[10];

__global__ void init()
{
	int tid = blockIdx.x * blockDim.x + threadIdx.x;
	gpu_hist[tid] = 0;
}

__global__ void gpu_histogram(int *a, int n)
{
	//int *ptr;
	int tid = blockIdx.x * blockDim.x + threadIdx.x;
	int numberThreads = blockDim.x * gridDim.x;
	while (tid < n)
	{
		//ptr = &gpu_hist[a[tid]];
		//atomicAdd(ptr, 1);
		gpu_hist[a[tid]]++; // have no atomic functions
		tid += numberThreads;
	}
}

int main()
{
	int N = 32;
	int *a, *dev_a;
	int hist[10];
	int size = N * sizeof(int);
	a = (int *)malloc(size);
	srand(1);
	for (int i = 0; i < N; ++i)
	{
		a[i] = rand() % 10;
		printf("%d ", a[i]);
	}
	printf("\n");
	cudaMalloc((void**)&dev_a, size);
	cudaMemcpy(dev_a, a, size, cudaMemcpyHostToDevice);
	init << <1, 10 >> >();
	gpu_histogram<<<1, 32>>>(dev_a, N);
	cudaThreadSynchronize();
	cudaMemcpyFromSymbol(&hist, gpu_hist, 10 * sizeof(int));
	printf("Histogram as computed on GPU\n");
	for (int i = 0; i < 10; ++i)
	{
		printf("Number of %d s = %d\n", i, hist[i]);
	}
	free(a);
	cudaFree(dev_a);
}

Atomic Functions

繼續閱讀

std::atomic exchage的一點了解

Android atomic

DAT（NIPS 2018）視訊目标跟蹤源碼運作筆記1. 論文基本資訊2. 運作環境介紹3. 準備4. 配置5. 運作6. 可能出現的問題及解決方法

Ubuntu16.04+Pytorch1.4.0+cuda10.0的pip安裝

【CUDA-C/C++】任意次元矩陣乘

WSL2 的docker裡使用顯示卡的安裝

關于安裝pytorch的一些問題總結

jetson nano ubuntu 安裝opencv4 cuda10 pytorch

Jetson Nano Ubuntu編譯OpenCV4.4.0+opencv_contrib(帶CUDA)

大學、碩士、博士的差別是什麼？

yolov7 tensorrt模型加速部署【實戰】

linxu下CUDA靜态庫-上

《cuda并行程式設計》勘誤（3）

《cuda并行程式設計》勘誤（2）

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory（完美解決）

一種解決思路： ImportError: libcublas.so.10.0: cannot open shared object file: No such file