I wrote an OpenCL kernel function where I declared a local array inside the kernel, like:
It seems like now each work item have an independent array of 10000 elements. I am a little confused here since the private registers for each thread might not be able to hold a 10000-element-array.
Does anyone have any idea about this?
Best How To :
OpenCL spec, § 6.5 Address Space Qualifiers says:
The generic address space name for arguments to a function in a program, or local variables of a function is
So within a kernel,
float arr resides in the private address space. That's all the spec says.
In theory, what happens beyond that point is up to the implementation: it is not specified whether private address space should be physically stored in registers, a register file, some kind of off-chip memory, a combination of those, or something else.
In practice, some implementations will put small arrays in registers depending on a number of factors, while bigger arrays will be put on off-chip memory.