That is a good start. You can reduce one more addition: // Perform complex multiplication on the input and accumulate with the output pDstR[i] += fmsub(fSrc1R, fSrc2R, fSrc1I*fSrc2I); pDstI[i] += fmadd(fSrc1R, fSrc2I, fSrc2R*fSrc1I); Here you can make use of another fmadd in the calculation of the imaginary part: pDstI[i] =...