I need to update a global array storing clock64() from different threads atomically. All of the atomic functions in CUDA support only
long long int sizes. But the return type of clock64() is signed. Is it safe to store the output from clock64() in an
Best How To :
There are various atomic functions which support atomic operations on
unsigned long long int (ie. a 64-bit unsigned integer), such as
atomicAdd. And if you have a cc3.5 or higher GPU you have even more options.
Referring to the documentation on
long long int clock64(); when executed in device code, returns the value of a per-multiprocessor counter that is incremented every clock cycle.
So, since it is a 64-bit signed quantity, it is bit-wise identical to an
unsigned long long int until it becomes negative. Let's assume the counter is reset to zero either at the start of your kernel, the start of the cuda context, or machine power-on. This counter will not become negative until around:
2^63(cycles)/1,000,000,000(cycles/s) = ~292 years after whichever of the above events is the actual reset point.
(I'm using 1GHz here as an estimate of the GPU core clock)
So for the first 200-300 years (after machine power-on, let's say), the
clock64() function will not return a negative value. So I'd say it's pretty safe to consider it as "always" positive, and therefore always identical to
unsigned long long int, meaning you can safely cast it to that, and use it in one of the atomic functions that support
unsigned long long int.
On the other hand, it's probably not safe to cast it into an
unsigned quantity. That arithmetic would be:
2^32(cycles)/1,000,000,000(cycles/s) = ~4 seconds (after machine power on)
So in about 4 seconds, the
clock64() function will numerically exceed the value that can be safely recorded in an