The year is almost over. Here is a benchmark of compilers and FFTs tested recently for supercontinuum generation on a Intel(R) Core(TM)2 CPU @ 2.66 GHz: 1024 data points Microsoft Compiler VS2008, no optimization, Numerical Recipes FFT: 68 sec Microsoft Compiler VS2008, no optimization, template based FFT: 68 sec Microsoft Compiler VS2008, no optimization, FFTW 2.1.3: 65 sec Microsoft Compiler VS2008, no optimization, Intel MKL FFT: 57 sec Microsoft Compiler VS2008, full optimization, Numerical Recipes FFT: 47 sec Microsoft Compiler VS2008, full optimization, template based FFT: 46 sec Microsoft Compiler VS2008, full optimization, FFTW 2.1.3: 44 sec Microsoft Compiler VS2008, full optimization, Intel MKL FFT: 43 sec Intel Compiler V11.0, Numerical Recipes FFT: 49 sec Intel Compiler V11.0, FFTW 2.1.3: 42 sec Intel Compiler V11.0, Intel MKL FFT: 42 sec 16384 data points Microsoft Compiler VS2008, intermediate optimization, Numerical Recipes FFT: 966 sec Microsoft Compiler VS2008, intermediate optimization, template based FFT: 876 sec Microsoft Compiler VS2008, intermediate optimization, FFTW 2.1.3: 803 sec Microsoft Compiler VS2008, intermediate optimization, Intel MKL FFT: 830 sec 16384 data points, memory aligned Microsoft Compiler VS2008, intermediate optimization, FFTW 3.3alpha1: 801 sec Microsoft Compiler VS2008, intermediate optimization, Intel MKL FFT: 793 sec In the commercial or demo version, the MS Compiler is used with the template based FFT. It is not to bad, however, there seems to be some speed enhancement possible for the next version, probably Intels MKL.
(remark added: Intels MKL has later become the standard for fiberdesk)