I need to update a global array storing clock64() from different threads atomically. All of the atomic functions in CUDA support only `unsigned`

for `long long int`

sizes. But the return type of clock64() is signed. Is it safe to store the output from clock64() in an `unsigned`

?

# Best How To :

There are various atomic functions which support atomic operations on `unsigned long long int`

(ie. a 64-bit unsigned integer), such as `atomicCAS`

, `atomicExch`

and `atomicAdd`

. And if you have a cc3.5 or higher GPU you have even more options.

Referring to the documentation on `clock64()`

:

long long int clock64(); when executed in device code, returns the value of a per-multiprocessor counter that is incremented every clock cycle.

So, since it is a 64-bit signed quantity, it is bit-wise identical to an `unsigned long long int`

until it becomes negative. Let's assume the counter is reset to zero either at the start of your kernel, the start of the cuda context, or machine power-on. This counter will not become negative until around:

2^63(cycles)/1,000,000,000(cycles/s) = ~292 years after whichever of the above events is the actual reset point.

(I'm using 1GHz here as an estimate of the GPU core clock)

So for the first 200-300 years (after machine power-on, let's say), the `clock64()`

function will not return a negative value. So I'd say it's pretty safe to consider it as "always" positive, and therefore always identical to `unsigned long long int`

, meaning you can safely cast it to that, and use it in one of the atomic functions that support `unsigned long long int`

.

On the other hand, it's probably not safe to cast it into an `unsigned`

quantity. That arithmetic would be:

2^32(cycles)/1,000,000,000(cycles/s) = ~4 seconds (after machine power on)

So in about 4 seconds, the `clock64()`

function will numerically exceed the value that can be safely recorded in an `unsigned`

quantity.