c++,distributed-computing,blas,intel-mkl,scalapack

The issue may come from : MPI_Bcast(&lda, 1, MPI_INT, 0, MPI_COMM_WORLD); Before this line, lda is different on each process if the dimension of the matrix is odd. Two processes handle 2 rows and two processes handle 3 rows. But after the MPI_Bcast(), lda is the same everywhere (3). The...

c,mpi,intel-mkl,mpich,scalapack

This answer is courtesy of Ying from Intel, all the credits go to him! The int in C are supposed to be 32bit, you may try lp64 mode. mpicc -o test_lp64 ex1.c -I/opt/intel/mkl/include /opt/intel/mkl/lib/intel64/libmkl_scalapack_lp64.a -L/opt/intel/mkl/lib/intel64 -Wl,--start-group /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/mkl/lib/intel64/libmkl_core.a /opt/intel/mkl/lib/intel64/libmkl_sequential.a -Wl,--end-group /opt/intel/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.a -lpthread -lm -ldl [[email protected] scalapack]$ mpirun -n 4...

parallel-processing,fortran,lapack,blas,scalapack

Scalapack library uses naming conversion to declare single or double precision function. This declaration is done by the second letter of scalapack function The "PD*" function means a double precision function, while "PS*" means single. So, you should change DBLESZ = 8 to DBLESZ = 4 All DOUBLE PRECISION to...

matrix,mpi,distributed-computing,scalapack

Block decompositions of matrices as you've described in your question are a perfectly valid way of distributing a matrix, but it is not the only way to do it. In particular, block data distributions (breaking up the matrix into procrows x process sub-matrices) is a little inflexible. If the matrix...