Previous: SIMD alignment and fftw_malloc, Up: Data Alignment
On the Pentium and subsequent x86 processors, there is a substantial performance penalty if double-precision variables are not stored 8-byte aligned; a factor of two or more is not unusual. Unfortunately, the stack (the place that local variables and subroutine arguments live) is not guaranteed by the Intel ABI to be 8-byte aligned.
Recent versions of gcc
(as well as most other compilers, we are
told, such as Intel's, Metrowerks', and Microsoft's) are able to keep
the stack 8-byte aligned; gcc
does this by default (see
-mpreferred-stack-boundary
in the gcc
documentation).
If you are not certain whether your compiler maintains stack alignment
by default, it is a good idea to make sure.
Unfortunately, gcc
only preserves the stack
alignment—as a result, if the stack starts off misaligned, it will
always be misaligned, with a disastrous effect on performance (in
double precision). To prevent this, FFTW includes hacks to align its
own stack if necessary, so it should perform well even if you call it
from a program with a misaligned stack. Currently, our hacks support
gcc
and the Intel C compiler; if you use another compiler you
are on your own. Fortunately, recent versions of glibc (on GNU/Linux)
provide a properly-aligned starting stack, but this was not the case
with a number of older versions, and we are not certain of the
situation on other operating systems. Hopefully, as time goes by this
will become less of a concern.