I am using a board with integrated gpu and cpu memory. I am also using an external matrix library (Blitz++). I would like to be able to grab the pointer to my data from the matrix object and pass it into a cuda kernel. After doing some digging, it sounds like I want to use some form of a zero copy by calling
cudaHostGetDevicePointer. What I am unsure of is the allocation of the memory. Do I have to have created the pointer using
cudaHostAlloc? I do not want to have to re-write Blitz++ to do
cudaHostAlloc if I don't have to.
My code currently works, but does a copy of the matrix data every time. That is not needed on the integrated memory cards.