cuda - openACC: memory management -


i'm using caps openacc compiler. wonder can manage memory myself?

for example regular openacc code cuda :

 #pragma acc kernels copyin(a,b) copy(c)   (i = 0; < size; ++i)     (j = 0; j < size; ++j)       (k = 0; k < size; ++k)         c[i][j] += a[i][k] * b[k][j]; 

i want change in way

//allocation cudamalloc((void**)&a, num_bytes); cudamalloc((void**)&b, num_bytes); cudamalloc((void**)&c, num_bytes);  //transfer-in cudamemcpy(hosta, a, num_bytes, cudamemcpyhosttodevice); cudamemcpy(hostb, b, num_bytes, cudamemcpyhosttodevice);  //computation //i think generated codelet caps openacc compiler. #pragma acc kernels   (i = 0; < size; ++i)     (j = 0; j < size; ++j)       (k = 0; k < size; ++k)         c[i][j] += a[i][k] * b[k][j];  cudamemcpy(c, hostc, num_bytes, cudamemcpydevicetohost); cudafree(&a);cudafree(&b);cudafree(&c); 

yes, can allocate memory yourself. in example should possible achieve using device_ptr pragma, like:

cudamalloc((void**)&a, num_bytes); cudamalloc((void**)&b, num_bytes); cudamalloc((void**)&c, num_bytes);  cudamemcpy(hosta, a, num_bytes, cudamemcpyhosttodevice); cudamemcpy(hostb, b, num_bytes, cudamemcpyhosttodevice);  #pragma acc data deviceptr(a, b, c) #pragma acc kernels   (i = 0; < size; ++i)     (j = 0; j < size; ++j)       (k = 0; k < size; ++k)         c[i][j] += a[i][k] * b[k][j];  cudamemcpy(c, hostc, num_bytes, cudamemcpydevicetohost); cudafree(a);cudafree(b);cudafree(c); 

[disclaimer: written in browser, never compiled or tested, use @ own risk]

this should declare a, b , c pre-existing allocations compiler. should able use openacc acc_malloc routine allocate memory in place of cudamalloc, if wish.

thanks @user2054656 pointing out incorrect use of device_resident in first version of answer.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

javascript - Clean way to programmatically use CSS transitions from JS? -

android - send complex objects as post php java -