I have seen these kind of example of cudamemcpy: (cudaMemcpyAsync(m_haParticleID + m_OutputParticleStart,m_daOutputParticleID+ m_OutputParticleStart,size, cudaMemcpyDeviceToHost, m_CudaStream) I couldnot understand the process of cudaMemcpyDeviceToHost and cudaMemAsync in above example, can any one suggest me "How the the above line of code works?"
Best How To :
cudaMemcpyAsync(m_haParticleID + m_OutputParticleStart,m_daOutputParticleID+ m_OutputParticleStart,size, cudaMemcpyDeviceToHost, m_CudaStream);
cudaMemcpyAsync is a cuda runtime API call which is used to transfer data usually between the GPU and the host. This api call has the
Async suffix because it must be called with a cuda stream designation, and returns control immediately to the host thread (before the transfer actually starts). The primary use of this function is in situations where it is desired to achieve some sort of concurrency, usually between data copying and compute operations on the GPU.
cudaMemcpyDeviceToHost designates the direction of data transfer. The same api call can be used to transfer from host to device, or from device to host. (It can also be used to transfer from one place in device memory to another.)
- just like
memcpy, the first parameter (
m_haParticleID + m_OutputParticleStart) is the destination pointer, and the second parameter (
m_daOutputParticleID+ m_OutputParticleStart) is the source pointer
- the third parameter is the number of bytes to be transferred
- the last parameter is the cuda stream designator. The transfer will occur after any previous cuda functions in that stream, and before any cuda functions subsequently issued to that stream. i.e. cuda calls issued to a particular stream are serialized with respect to each other, within the stream.