Will this CUDA code execute in-order and asynchronously? -


  • will code below execute in order? (i cannot put device-to-device copy of cudamemcpy2darraytoarray() in stream)
  • will code below execute asynchronously? (cudamemcpy2darraytoarray() not have asynchronous counterpart)

i know code sample can implemented more efficiently, it's merely intended example.

for( i=0; i<10; i++ ) {     cudamemcpy2darraytoarray( dst, src );                   // device device copy.     cudabindtexturetoarray( texture_reference, dst, ... )   // bind dst texture.     kernel<<< dimgrid, dimblock, 0, stream >>>( out )       // compute array.     cudamemcpy2dtoarrayasync( src_p, out, stream )          // copy result src. } 

since kernel calls , cudamemcpy2dtoarrayasync calls use same stream, processed synchronously. 1 streams can't multiple things @ same time. however, if wanted multiple streams work, of form:

nstreams = 8; cudastream_t streams [nstreams ];  (unsigned int ii = 0; ii < nstreams; ++ii)     handle_error( cudastreamcreate(&(streams[ii])) );  for( i=0; i<10; i++ ) {     cudamemcpy2darraytoarray( dst, src );                      // device device copy.     cudabindtexturetoarray( texture_reference, dst, ... )      // bind dst texture.     kernel<<< dimgrid, dimblock, 0, stream[i] >>>( out )       // compute array.     cudamemcpy2dtoarrayasync( src_p, out, stream[i] )          // copy result src. }   (unsigned int ii = 0; ii < nstreams; ++ii)     handle_error( cudastreamdestroy(streams[ii]) ); 

however, way still dependent on waiting cudamemcpy2darraytoarray every step, function shows synchronous behaviour.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

javascript - Clean way to programmatically use CSS transitions from JS? -

android - send complex objects as post php java -