I have made a serial version for a code to calculate a histogram and I know the algorithm works. The problem is that when I do it in CUDA, the only thing I get back as a results are all 0. I can copy the input array dev_x into the output variable h, and I am able to see the input values of x.
The input data is a list of x and y positions with a corresponding color (int from 1 to 5)
The arguments are the input file name, output file name, cellWidth and cellHeight, where cellWidth and cellHeight is the number of regions the input is divided in. A 1000000 X 1000000 array is divided into 1000 X 1000 regions. I need to calculate the number of occurrences of each color in each region.
Best How To :
There are at least two gigantic, basic problems in this code, neither of which has anything to do with CUDA:
histSize = sizeof(unsigned int) * xMax/cellWidth * yMax/cellHeight * numColors;
h = (unsigned int*) malloc(histSize);
for(i=0; i<histSize; i++)
h[i]=0; // <-- buffer oveflow
which is probably killing the program before it ever even gets to launch the kernel, and:
cudaMalloc( (void**) &dev_h, histSize );
cudaMemcpy(dev_h, h, size, cudaMemcpyHostToDevice); // buffer overflow
which would kill the CUDA context if the program ever got that far.
These are elementary mistakes and you haven't detected them because your only usage case is apparently a program which attempts to process a 150Mb input file and emit a large histogram from it, and your only method of detecting errors is looking at a file containing that histogram. That is a completely insane way to develop and debug code. If you had done any of the following:
- Hardcoded a trivially small test case you already knew the answers for
- Added CUDA API error checking
- Run valgrind
- Used cuda-memcheck
- Used a host debugger
- ran nvprof
you probably would have instantly detected the problems (there might well be more but I don't care enough to look for them, that is your job), and this Stack Overflow question wouldn't exist.