Will this CUDA code execute in-order and asynchronously? -
- will code below execute in order? (i cannot put device-to-device copy of
cudamemcpy2darraytoarray()
instream
) - will code below execute asynchronously? (
cudamemcpy2darraytoarray()
not have asynchronous counterpart)
i know code sample can implemented more efficiently, it's merely intended example.
for( i=0; i<10; i++ ) { cudamemcpy2darraytoarray( dst, src ); // device device copy. cudabindtexturetoarray( texture_reference, dst, ... ) // bind dst texture. kernel<<< dimgrid, dimblock, 0, stream >>>( out ) // compute array. cudamemcpy2dtoarrayasync( src_p, out, stream ) // copy result src. }
since kernel calls , cudamemcpy2dtoarrayasync
calls use same stream, processed synchronously. 1 streams can't multiple things @ same time. however, if wanted multiple streams work, of form:
nstreams = 8; cudastream_t streams [nstreams ]; (unsigned int ii = 0; ii < nstreams; ++ii) handle_error( cudastreamcreate(&(streams[ii])) ); for( i=0; i<10; i++ ) { cudamemcpy2darraytoarray( dst, src ); // device device copy. cudabindtexturetoarray( texture_reference, dst, ... ) // bind dst texture. kernel<<< dimgrid, dimblock, 0, stream[i] >>>( out ) // compute array. cudamemcpy2dtoarrayasync( src_p, out, stream[i] ) // copy result src. } (unsigned int ii = 0; ii < nstreams; ++ii) handle_error( cudastreamdestroy(streams[ii]) );
however, way still dependent on waiting cudamemcpy2darraytoarray
every step, function shows synchronous behaviour.
Comments
Post a Comment