c,assembly,x86,dot-product,mmx

The problem is not in the assembly code, but in main. int16_t *dot; This is an uninitialized pointer; it could point anywhere, which typically means to a random address that is not yours. Hence the segfault here: movq [ecx], mm4 The quickest solution is to replace int16_t *dot; by: int16_t...

There's a race condition inside your local memory reduction, since different work-items read and write the same location on successive loop iterations. You need a barrier on each iteration, and you should also avoid rewriting values when no work is to be performed (since this will conflict with the other...

Why don't you just matrix multiply the whole thing. For example: set.seed(1) vec1 <- sample(1:10) vec2 <- sample(1:10) vec3 <- sample(1:10) rbind(vec1, vec2, vec3) %*% cbind(vec1, vec2, vec3) produces: vec1 vec2 vec3 vec1 385 298 284 vec2 298 385 296 vec3 284 296 385 Where each cell of a matrix...

java,vector,3d,plane,dot-product

Your average method changes the value of a, to make it the same as the average point. So your cube isn't a cube, after you've called average - three of the faces have rotated into new positions. So whatever happens in the loop over collider is wrong.

c,visual-c++,simd,avx,dot-product

There are two big inefficiencies in your loop that are immediately apparent: (1) these two chunks of scalar code: __declspec(align(32)) double ar[4] = { xb[i].x, xb[i + 1].x, xb[i + 2].x, xb[i + 3].x }; ... __m256d y = _mm256_load_pd(ar); and __declspec(align(32)) double arr[4] = { xb[i].x, xb[i + 1].x,...

python,numpy,scipy,sparse-matrix,dot-product

Use *: p * q Note that * uses matrix-like semantics rather than array-like semantics for sparse matrices, so it computes a matrix product rather than a broadcasted product....

vector,scheme,racket,dot-product

Shouldn't the following work? (I don't have an interpreter on hand to test it, give me a minute to check.) (define (dot a b) (apply + (vector->list (vector-map * a b))) ) ...

You don't allocate any memory here: matrix[i] = calloc(0,columns*sizeof(int)); The first parameter to calloc sets the number of elements you want to allocate. In this case it should be columns matrix[i] = calloc(columns,sizeof(int)); Also made sure you validate your scanf input....

algorithm,python-3.x,numpy,sum,dot-product

That double loop is a time killer in numpy. If you use vectorized array operations, the evaluation is cut to under a second. In [1764]: sum_np=0 In [1765]: for j in range(0,N): for k in range(0,N): sum_np += np.exp(-1j * np.inner(x_np,(r_np[j] - r_np[k]))) In [1766]: sum_np Out[1766]: (2116.3316526447466-1.0796252780664872e-11j) In [1767]:...

python,apache-spark,dot-product

That's not the dot product, that's the cartesian product. Use the cartesian method: def cartesian[U](other: spark.api.java.JavaRDDLike[U, _]): JavaPairRDD[T, U] Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in this and b is in other....