Menu
  • HOME
  • TAGS

GPUDirect Peer 2 peer using PCIe bus: If I need to access too much data on other GPU, will it not result in deadlocks?

cuda,pci-e,multi-gpu

PCI Express has full speed in both directions. There should be no "deadlock" like you may experience in a synchronous MPI communication that needs handshaking before proceeding. As Robert mentioned in a comment "accessing data over PCIE bus is a lot slower than accessing it from on-board memory". However, it...

Set each gpu for each thread

cuda,gpu,gpu-programming,multi-gpu

So is it possible to choose the fisrt GPU from the first host thread and the second gpu from the second host thread by cudaSetDevice() call? Yes, it's possible. An example is given in the cudaOpenMP sample code for this type of usage, (excerpting): .... omp_set_num_threads(num_gpus); // create as...

CUDA fails when trying to use both onboard iGPU and Nvidia discrete card. How can i use both discrete nvidia and integrated (onboard) intel gpu? [closed]

cuda,intel,nvidia,multi-gpu

You have to install the drivers for your integrated onboard gpu. This can be done by booting up while using iGPU from bios settings, and your pc shall be able to load the drivers it needs on its own. For my Ivy bridge, the bios settings are these: Go...

Can I use in CUDA atomic-operations on remote GPU-RAM over GPUDirect 2.0 P2P?

cuda,gpgpu,nvidia,gpu-programming,multi-gpu

No. The GPU atomics are only atomic across the GPU performing the operation. They do not work on host memory or nonlocal device memory. I'm sure it is a roadmap item for NVIDIA to address these limitations on future platforms, esp. with NVLink....

Launching asynchronous memory copy opeerations on multiple-GPUs

c++,cuda,multi-gpu

http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__DEVICE_g418c299b069c4803bfb7cab4943da383.html cudaError_t cudaSetDevice ( int device ) Sets device as the current device for the calling host thread. Any device memory subsequently allocated from this host thread using cudaMalloc(), cudaMallocPitch() or cudaMallocArray() will be physically resident on device. Any host memory allocated from this host thread using cudaMallocHost() or cudaHostAlloc()...