python,cuda,pycuda,numba,numba-pro

Numbapro supports numba.cuda.local.array(shape, type) for defining thread local arrays. As with CUDA C, whether than array is defined in local memory or register is a compiler decision based on usage patterns of the array. If the indexing pattern of the local array is statically defined and there is sufficient register...

python,design-patterns,scipy,numba

I can comment on the Numba portion of this question. As other users have mentioned, attribute access in Numba leads to some overhead. For example, you might be tempted to write code like this: class Foo(object): def __init__(self, x): self.x = x @numba.jit def dosomething(self, y): for i in range(len(self.x)):...

python,cython,cpython,numba,parakeet

I am used to code in C/C++ and when I see the following array operation, I feel some CPU wasting: A feel of CPU wasting is absolutely normal for C/C++ programmers facing python code. Your code: version = '1.2.3.4.5-RC4' # the end can vary a lot api = '.'.join(version.split('.')[0:3])...

I should've read this which is for the newer version of numba. http://numba.pydata.org/numba-doc/0.15.1/tutorial_firststeps.html#compiling-a-function-with-numba-jit-using-an-explicit-function-signature 2) jit(function) -> dispatcher Same as old autojit. Create a dispatcher function object that specialize at call site. Example: @jit def foo(x, y): return x + y ...

While numba supports such Python data structures as dicts and sets, it does so in object mode. From the numba glossary, object mode is defined as: A Numba compilation mode that generates code that handles all values as Python objects and uses the Python C API to perform all operations...

python,multithreading,numpy,numba

That code is used for calling a C function, in this case the functions are PyEval_SaveThread and PyEval_RestoreThread. savethread = pythonapi.PyEval_SaveThread keeps a reference to the function pythonapi.PyEval_SaveThread in the variable savethread, so that way when you call the function later using savethread() it's as if you called pythonapi.PyEval_SaveThread(). restype...

The solution below presents the 3 different methods to do the simple sum of sums and 4 different methods to do the sum of squares. sum of sums 3 methods - for loop, JIT for loops, einsum (none run into memory problems) sum of square of sums 4 methods -...

python,numpy,numba,numexpr,parakeet

As of the current release of Numba (which you are using in your tests), there is incomplete support for ufuncs with the @jit function. On the other hand you can use @vectorize and it faster: import numpy as np from numba import jit, vectorize import numexpr as ne def numpy_complex_expr(A,...

Simply, numba doesn't know how to convert np.arange into a low level native loop, so it defaults to the object layer which is much slower and usually the same speed as pure python. A nice trick is to pass the nopython=True keyword argument to jit to see if it can...

You're right in thinking that numba doesn't recognise fdot as a numba compiled function. I don't think you can make it recognise it as a function argument, but you can use this approach (using variable capture so fdot is known when the function is built) to build an ODE solver:...

Here's my version of your code which is significantly faster: @jit(nopython=True) def dot(a,b): res = a[0]*b[0]+a[1]*b[1]+a[2]*b[2] return res @jit def compute_stuff2(array_to_compute): N = array_to_compute.shape[0] con_mat = np.zeros((N,N)) p0 = np.zeros(3) p1 = np.zeros(3) q0 = np.zeros(3) q1 = np.zeros(3) p0m1 = np.zeros(3) p1m0 = np.zeros(3) q0m1 = np.zeros(3) q1m0 =...

python,performance,numpy,scientific-computing,numba

This does almost the same thing as your (excellent!) self-answer, but with a bit less rigamarole. It seems marginally faster on my machine as well -- about 30ms based on a cursory test. def apply_indexed_fast(array, func_indices, func_table): func_argsort = func_indices.argsort() func_ranges = list(np.searchsorted(func_indices[func_argsort], range(len(func_table)))) func_ranges.append(None) out = np.zeros_like(array) for f,...

The last thing I heard about numba class support was that they temporarily removed it since 0.12

Answer can be found here: Numba 0.16 has changed from using llvmpy to llvmlite as our wrapper around the LLVM library. (We also upgraded from LLVM 3.3 to LLVM 3.5 at the same time.) The installation process is described here: https://github.com/numba/numba/blob/master/README.md#custom-python-environments New link below... (Also note that llvmlite (much like...

python,multithreading,jit,numba

Unless I'm missing something, your two examples are exactly equivalent, not because of anything numba does (or doesn't do), but because that's how decorators work. I'll explain the general principle and you can double-check whether that's the transformation you are asking about. This code: @d(arg) def f(x): ... is by...

You should use numba dtypes instead of numpy import numba import numpy as np @numba.njit def f(): a = np.zeros(5, dtype=numba.int32) return a In [8]: f() Out[8]: array([0, 0, 0, 0, 0], dtype=int32) ...

The question was asked before the 0.18.x releases and numba master had already switch to the latest llvmlite. For those interested in building numba master, there is a numba channel on binstar that host builds of numba and llvmlite. You can do conda install -c numba llvmlite to install/update llvmlite.

A different approach, but one that still has the advantage of containing the variables would be to use captured variables in a wrapped function: def make_numba_functions(): SOME_NAME = 0x1 SOME_OTHER_CONSTANT = 0x2 AND_ANOTHER_CONSTANT = 0x4 @jit def f1(x,y): useful code goes here @jit def f2(x,y,z): some more useful code goes...

python,performance,loops,numpy,numba

Numba is very fast in nopython mode but with your code it has to fall back to object mode, which is a lot slower. You can see this happening if you pass nopython=True to the jit decorator. It does compile in nopython mode (at least in Numba version 0.18.2) if...

Here's what I think is happening with Numba: Numba works on Numpy arrays. Nothing else. Everything else has nothing to do with Numba. zip returns an iterator of arbitrary items, which Numba cannot see into. Thus Numba cannot do much compiling. Looping over the indexes with a for i in...