Profiling usually implies that you care about performance. If you care about performance, you should profile the release version of a CUDA code. The debug version (-G) will generate different code, which usually runs slower. There's little point in doing performance analysis (including execution time measurement, benchmarking, profiling, etc.) on...