cuda - openACC: memory management -
i'm using caps openacc compiler. wonder can manage memory myself?
for example regular openacc code cuda :
#pragma acc kernels copyin(a,b) copy(c) (i = 0; < size; ++i) (j = 0; j < size; ++j) (k = 0; k < size; ++k) c[i][j] += a[i][k] * b[k][j];
i want change in way
//allocation cudamalloc((void**)&a, num_bytes); cudamalloc((void**)&b, num_bytes); cudamalloc((void**)&c, num_bytes); //transfer-in cudamemcpy(hosta, a, num_bytes, cudamemcpyhosttodevice); cudamemcpy(hostb, b, num_bytes, cudamemcpyhosttodevice); //computation //i think generated codelet caps openacc compiler. #pragma acc kernels (i = 0; < size; ++i) (j = 0; j < size; ++j) (k = 0; k < size; ++k) c[i][j] += a[i][k] * b[k][j]; cudamemcpy(c, hostc, num_bytes, cudamemcpydevicetohost); cudafree(&a);cudafree(&b);cudafree(&c);
yes, can allocate memory yourself. in example should possible achieve using device_ptr
pragma, like:
cudamalloc((void**)&a, num_bytes); cudamalloc((void**)&b, num_bytes); cudamalloc((void**)&c, num_bytes); cudamemcpy(hosta, a, num_bytes, cudamemcpyhosttodevice); cudamemcpy(hostb, b, num_bytes, cudamemcpyhosttodevice); #pragma acc data deviceptr(a, b, c) #pragma acc kernels (i = 0; < size; ++i) (j = 0; j < size; ++j) (k = 0; k < size; ++k) c[i][j] += a[i][k] * b[k][j]; cudamemcpy(c, hostc, num_bytes, cudamemcpydevicetohost); cudafree(a);cudafree(b);cudafree(c);
[disclaimer: written in browser, never compiled or tested, use @ own risk]
this should declare a
, b
, c
pre-existing allocations compiler. should able use openacc acc_malloc
routine allocate memory in place of cudamalloc
, if wish.
thanks @user2054656 pointing out incorrect use of device_resident
in first version of answer.
Comments
Post a Comment