I'm trying to work with AVX instructions and windows 64bit. I'm comfortable with g++ compiler so I've been using that, however, there is a big bug described reported here and very rough solutions were presented here.
Basically, m256 variable can't be aligned on the stack to work properly with avx instructions, it needs 32 byte alignment.
The solutions presented at the other stack question I linked are really terrible, especially if you have performance in mind. A python program that you would have to run every time you want to debug that replaces instructions with their sub-optimal unaligned instructions, or over-allocating and doing a bunch of costly hacky pointer math in code to get proper alignment. If you do the pointer math solution, I think there is still even a chance for a seg fault because you can't control the allocation or r-values / temporaries.
I'm looking for an easier and cheaper solution. I don't mind switching compilers, would prefer not to, but if it's the best solution I will. However, my very poor understanding of the bug is that it is intrinsic to windows 64 bit, so would switching compilers help or do other compilers also have the same issue?