Easy to read C Fast Fourier Transform (FFT) signal processing code

Philippe Strauss

1996-01-01 00:00

I spent my youth on this code, guys. No laugh. Between age 17 and 23 I was fascinated by the time - frequency duality and the Fourier transform, spectrum analyzers. Having no internet at that time, no FFTW or existing code to tinker with, I rolled my own.

It's not as fast as FFTW for sure, but easily read, with good performances for a straight written code. If you're totally green on the topic, you should read either Oppenheim & Schaffer or Proakis & Manolakis chapter on the subject. There are 4 variants of the Cooley-Tukey algorithm in my code, radix 2 and 4, decimation in time or frequency. I found this fluency graph to depict one variant, it's a decimated in frequency radix 2 shown on this .PDF

/*

libdsp.c

Low-level library of signal processing functions. Memory is never allocated
inside one of these functions, the user must deal with the OS himself.
Calculus are done in place if possible.
The goal was to provide efficient routines in C to be used on a computer,
with high amount of memory available, few registers (intel crap processors),
low available memory bandwidth.

Those algorithm were hand optimized at the C level for GNU GCC by looking to
the compiled assembly code generated by gcc, but not further about TLB and
cache-line trashing, except for macroscopic view of global performance
degradation on high (>2048 points) number of points FFT.
The goal was to obtain good efficiency without rewriting the whole stuff
for each new CPU.

I.e, on a Pentium 166MHz, compiled with pgcc, efficiency reach 60 MFLOPS for
R4FFTdif at 256 pts.  At 4096 pts, performance become really bad (20 MFLOPS),
worse at higher number of points.

(c) Philippe Strauss <philou@philou.ch>, 1993, 1995, 1996, 1997, 2001.

This software library is licensed under the term of the
Lesser General Public License 2.1, which is included in the file LGPL21.

*/

#include <math.h>

#define DIRECT   (int) 1
#define REVERSE  (int) -1

typedef int tInt;
typedef tInt *ptInt;

typedef float tFloat;
typedef tFloat *ptFloat;

typedef int tError;

typedef struct {
    ptFloat pfReal, pfImag;
} tsCmplxRect;

typedef struct {
    ptFloat pfMagn, pfPhase;
} tsCmplxPol;

Aaarrfh a pentium 166MHz. And started the thing 3 years before that on a 486. 20 to 60 MFLOPS, Good ol'time.

Let start with the initialization of some array later looked up for reordering of coefficients, knows as bit reversal addressing in DSP litterature. My algorithm is far from beeing uninterestening I've later been told, as the bit or 2bits reversal is done entirely in radix 10.

All over the place you'll find some hungarian notation of mine at that time, I dislike it nowadays, still for beginners in C, not sure it is so bad. pi stand for pointer on integers, for example.

/*
InitR2SwapTable : Fill in an array piSwapTable with index for reordering a
                  shuffled input or output of a R2 FFT.
Input           : Array pointed by piSwapTable
Output          : Initialized array pointed by piSwapTable
Comment         : Quiet efficient algorithm (for processor without hardwired
                  bit reversed addressing), quiet original, too, there'not any
                  >> or << operator, only  arithmetic, a pure Philou design :)
*/

void InitR2SwapTable (ptInt piSwapTable, tInt iN)
{
    tInt iCnt1, iL, iM;

    iL = iN / 2;
    iM = 1;
    piSwapTable[0] = 0;

    while (iL >= 1)
    {
        for (iCnt1 = 0; iCnt1 < iM; ++iCnt1)
            piSwapTable[iCnt1 + iM] = piSwapTable[iCnt1] + iL;

        iL /= 2;
        iM *= 2;
    }
}

/*
InitR4SwapTable : Fill in an array piSwapTable with index for reordering a
                  shuffled input or output of a R4 FFT.
Input           : Array pointed by piSwapTable, base 4 log of N in iLog4N
Output          : Initialized array pointed by piSwapTable
Comment         : Quiet efficient algorithm (for processor without hardwired
                  bit reversed addressing), quiet original, too, there'not any
                  >> or << operator, only arithmetic. a pure Philou design :)
*/

void InitR4SwapTable (ptInt piSwapTable, tInt iLog4N, tInt iN)
{
    tInt iL, iM, iCnt1, iCnt2;

    iL = iN / 4;
    iM = 1;
    piSwapTable[0] = 0;

    for (iCnt1 = 0; iCnt1 < iLog4N; ++iCnt1)
    {
        for (iCnt2 = 0; iCnt2 < iM; ++iCnt2)
        {
            piSwapTable[    iM + iCnt2] = piSwapTable[iCnt2] +     iL;
            piSwapTable[2 * iM + iCnt2] = piSwapTable[iCnt2] + 2 * iL;
            piSwapTable[3 * iM + iCnt2] = piSwapTable[iCnt2] + 3 * iL;
        }
        iL /= 4;
        iM *= 4;
    }
}

The two doing the bit-revered reordering, having beforhand the swap table initialized, one for real-valued, the other for complex arrays.

/*
SwapFFTC : Do the (in place) swapping for reordering the in/out coeffs
           in bit or digit4 reversed order. piSwapTable [] must be
           initialized first.
*/

void SwapFFTC (tsCmplxRect sFFT, ptInt piSwapTable, tInt iN)
{
    tInt iSwapAddr, iCnt1;
    tFloat fRealOutSwap, fImagOutSwap;

    for (iCnt1 = 0; iCnt1 < iN; ++iCnt1)
    {
        fRealOutSwap = sFFT.pfReal[iCnt1];
        fImagOutSwap = sFFT.pfImag[iCnt1];
        iSwapAddr    = piSwapTable[iCnt1];
        if (iCnt1 < iSwapAddr)
        {
            sFFT.pfReal[iCnt1]     = sFFT.pfReal[iSwapAddr];
            sFFT.pfImag[iCnt1]     = sFFT.pfImag[iSwapAddr];
            sFFT.pfReal[iSwapAddr] = fRealOutSwap;
            sFFT.pfImag[iSwapAddr] = fImagOutSwap;
        }
    }
}

/*
SwapFFTR : Do the (in place) swapping for reordering the in/out coeffs
           in bit or digit4 reversed order. piSwapTable [] must be
           initialized first.
Comment  : This one is for real only coeffs.
*/

void SwapFFTR (ptFloat pfData, ptInt piSwapTable, tInt iN)
{
    tInt iSwapAddr, iCnt1;
    tFloat fOutSwap;

    for (iCnt1 = 0; iCnt1 < iN; ++iCnt1)
    {
        fOutSwap  = pfData     [iCnt1];
        iSwapAddr = piSwapTable[iCnt1];
        if (iCnt1 < iSwapAddr)
        {
            pfData[iCnt1]     = pfData[iSwapAddr];
            pfData[iSwapAddr] = fOutSwap;
        }
    }
}

The FFT twiddles factors coefficients initialization (sine & cosine):

/*
InitTwidTable : Initialize sin and cos table for later use in FFT routines.
Input         : Array pointed by sTwidTable.{piReal, piImag}
Output        : Initialized arrays
Comment       : Memory requirement is large
*/

void InitTwidTable (tsCmplxRect sTwidTable, int iDirectOrReverse, tInt iN)
{
    tInt iP;
    tFloat fTheta, fPhi;

    fTheta = (2 * M_PI) / iN;
    if (iDirectOrReverse == DIRECT)
    { 
        for (iP = 0; iP < iN; ++iP)
        {
            fPhi = fTheta * iP;
            sTwidTable.pfReal[iP] = cos (-fPhi);
            sTwidTable.pfImag[iP] = sin (-fPhi);
        }
    }
    if (iDirectOrReverse == REVERSE)
    {
        for (iP = 0; iP < iN; ++iP)
        {
            fPhi = fTheta * iP;
            sTwidTable.pfReal[iP] = cos (fPhi);
            sTwidTable.pfImag[iP] = sin (fPhi);
        }
    }
}

Common windows often used with the DFT/FFT: Blackman, Kaiser:

/*
InitWinBlackman : Initialize an array pfWindow with a Blackman window
*/

void InitWinBlackman (ptFloat pfWindow, tInt iN)
{
    tInt iCnt1;

    for (iCnt1 = 0; iCnt1 < iN; ++iCnt1)
    {
        pfWindow[iCnt1] = 0.42 - 0.50 * cos ((2 * M_PI * iCnt1) / (iN - 1))
                               + 0.08 * cos ((4 * M_PI * iCnt1) / (iN - 1));
    }
}

/*
InitWinKaiser : Initialize an array pfWindow with a Kaiser window
iBeta         : Low value: good spectral resolution, small gibbson effects
                attenuation.
Comment       : Due to the way Bessel_Io is calculated, this algorithm sucks
                performance wise.
*/

void InitWinKaiser (ptFloat pfWindow, tFloat iBeta, tInt iN)
{
    tInt iCnt1;

    for (iCnt1 = 0; iCnt1 < iN; ++iCnt1)
    {
        pfWindow[iCnt1] = Bessel_Io (2 * iBeta * sqrt (iCnt1 /
                                     (tFloat) (iN - 1) - pow (iCnt1 /
                                     (tFloat) (iN - 1), 2)), 10) /
                                     Bessel_Io (iBeta, 10);
    }
}

/*
Bessel_Io : Calculation of the zero order Bessel function.
Comment   : Not any performance optimisation here
*/

tFloat Bessel_Io (tFloat fX, tInt iAccuracy)
{
    tInt iCnt1;
    tFloat fOut = 0, fX_2 = fX / 2;

    for (iCnt1 = 1; iCnt1 < iAccuracy; ++iCnt1)
        fOut = fOut + pow ((pow (fX_2, iCnt1) / Fact (iCnt1)), 2);

    fOut += 1.0;
    return fOut;
}

/*
Fact : rather explicit
*/

tFloat Fact (tInt iX)
{
    tInt iCnt1;
    tFloat fFact = iX;

    for (iCnt1 = iX - 1; iCnt1 > 0; --iCnt1)
        fFact *= iCnt1;

    return fFact;
}

Plain simple, non-fast coding of the Discrete Fourier Transform: $X{(k)} = \sum_{n=0}^{N-1} x{(n)} \cdot e^{-j 2 \pi k n/N}$

/*
DFT   : Discrete Fourier Transform.
*/

void DFT (tsCmplxRect sIn, tsCmplxRect sOut, tsCmplxRect sTwidTable, tInt iN)
{
    tInt iIdx, iK, iKN;
    tFloat fRealSum = 0.0, fImagSum = 0.0;

    for (iK = 0; iK < iN; ++iK)
    {
        iKN = 0;
        for (iIdx = 0; iIdx < iN; ++iIdx)
        {
            fRealSum += sIn.pfReal[iIdx] * sTwidTable.pfReal[iKN]
                      - sIn.pfImag[iIdx] * sTwidTable.pfImag[iKN];
            fImagSum += sIn.pfReal[iIdx] * sTwidTable.pfImag[iKN]
                      + sIn.pfImag[iIdx] * sTwidTable.pfReal[iKN];
            iKN += iK;
            if (iKN >= iN)
                iKN = iKN - iN; /* circular addressing */
        }
        sOut.pfReal[iK] = fRealSum;
        sOut.pfImag[iK] = fImagSum;
        fRealSum = fImagSum = 0.0;
    }
}

Enough of initialization and preambles

Now the first fast algorithm: Radix 2, decimated in frequency:

/*

R2FFTdif : Radix 2, in place, complex in & out, decimation in frequency FFT
           algorithm.
Required : A tsCmplxRect array with data to transform, another tsCmplxRect
           array with data twiddle factors (sine & cosine table),
           base 2 log of N, N
*/

void R2FFTdif (tsCmplxRect sFFT, tsCmplxRect sTwidTable, tInt iLog2N, tInt iN)
{
  tInt
      iCnt1, iCnt2, iCnt3,
      iQ,    iL,    iM,
      iA,    iB;
  tFloat
      fRealTemp, fImagTemp,
      fReal_Wq,  fImag_Wq;

    iL = 1;
    iM = iN / 2;

    for (iCnt1 = 0; iCnt1 < iLog2N; ++iCnt1)
    {
        iQ = 0;
        for (iCnt2 = 0; iCnt2 < iM; ++iCnt2)
        {
            iA = iCnt2;
            fReal_Wq = sTwidTable.pfReal[iQ];
            fImag_Wq = sTwidTable.pfImag[iQ];

            for (iCnt3 = 0; iCnt3 < iL; ++iCnt3)
            {
                iB = iA + iM;

                /* Butterfly: 10 FOP, 4 FMUL, 6 FADD */

                fRealTemp        = sFFT.pfReal[iA] - sFFT.pfReal[iB];
                sFFT.pfReal[iA] +=                   sFFT.pfReal[iB];
                fImagTemp        = sFFT.pfImag[iA] - sFFT.pfImag[iB];
                sFFT.pfImag[iA] +=                   sFFT.pfImag[iB];

                sFFT.pfReal[iB]  = fRealTemp * fReal_Wq - fImagTemp * fImag_Wq;
                sFFT.pfImag[iB]  = fImagTemp * fReal_Wq + fRealTemp * fImag_Wq;

                iA = iA + 2 * iM;
            }
            iQ += iL;
        }
        iL *= 2;
        iM /= 2;
    }
}

Radix 2, decimated in time:

/*

R2FFTdit : Radix 2, in place, complex in & out, decimation in time FFT
           algorithm.
Required : A tsCmplxRect array with data to transform, another tsCmplxRect
           array with data twiddle factors (sine & cosine table),
           base 2 log of N, N
*/

void R2FFTdit (tsCmplxRect sFFT, tsCmplxRect sTwidTable, tInt iLog2N, tInt iN)
{
    tInt
        iCnt1, iCnt2,iCnt3,
        iQ,    iL,   iM,
        iA,    iB;
    tFloat
        fRealTemp, fImagTemp,
        fReal_Wq,  fImag_Wq;

    iL = iN / 2;
    iM = 1;

    for (iCnt1 = 0; iCnt1 < iLog2N; ++iCnt1)
    {
        iQ = 0;
        for (iCnt2 = 0; iCnt2 < iM; ++iCnt2)
        {
            iA = iCnt2;
            fReal_Wq = sTwidTable.pfReal[iQ];
            fImag_Wq = sTwidTable.pfImag[iQ];

            for (iCnt3 = 0; iCnt3 < iL; ++iCnt3)
            {
                iB = iA + iM;

                /* Butterfly: 10 FOP, 4 FMUL, 6 FADD */

                fRealTemp = sFFT.pfReal[iB] * fReal_Wq
                          - sFFT.pfImag[iB] * fImag_Wq;
                fImagTemp = sFFT.pfReal[iB] * fImag_Wq
                          + sFFT.pfImag[iB] * fReal_Wq;

                sFFT.pfReal[iB]  = sFFT.pfReal[iA] - fRealTemp;
                sFFT.pfReal[iA] +=                   fRealTemp;
                sFFT.pfImag[iB]  = sFFT.pfImag[iA] - fImagTemp;
                sFFT.pfImag[iA] +=                   fImagTemp;

                iA = iA + 2 * iM;
            }
            iQ += iL;
        }
        iL /= 2;
        iM *= 2;
    }
}

Radix 4, decimated in frequency:

/*

R4FFTdif : Radix 4, in place, complex in & out, decimation in frequency FFT
           algorithm.
Required : A tsCmplxRect array with data to transform, another tsCmplxRect
           array with data twiddle factors (sine & cosine table),
           base 4 log of N, N
*/

void R4FFTdif (tsCmplxRect sFFT, tsCmplxRect sTwidTable, tInt iLog4N, tInt iN)
{
    tInt
        iCnt1, iCnt2, iCnt3,
        iL,    iM,    iQ,
        iA,    iB,    iC,     iD;
    tFloat
        fRealA,   fRealB,    fRealC,    fRealD,
        fImagA,   fImagB,    fImagC,    fImagD,
        fReal_Wq, fReal_W2q, fReal_W3q,
        fImag_Wq, fImag_W2q, fImag_W3q;

    iL = 1;
    iM = iN / 4;

    for (iCnt1 = 0; iCnt1 < iLog4N; ++iCnt1)
    {
        iQ = 0;
        for (iCnt2 = 0; iCnt2 < iM; ++iCnt2)
        {
            iA = iCnt2;
            fReal_Wq  = sTwidTable.pfReal[    iQ];
            fImag_Wq  = sTwidTable.pfImag[    iQ];
            fReal_W2q = sTwidTable.pfReal[2 * iQ];
            fImag_W2q = sTwidTable.pfImag[2 * iQ];
            fReal_W3q = sTwidTable.pfReal[3 * iQ];
            fImag_W3q = sTwidTable.pfImag[3 * iQ];

            for (iCnt3 = 0; iCnt3 < iL; ++iCnt3)
            {
                iB = iA +     iM;
                iC = iA + 2 * iM;
                iD = iA + 3 * iM;

                /* Butterfly: 42 FOP, 12 FMUL, 30 FADD */

                fRealA = sFFT.pfReal[iA] + sFFT.pfReal[iB]
                       + sFFT.pfReal[iC] + sFFT.pfReal[iD];
                fImagA = sFFT.pfImag[iA] + sFFT.pfImag[iB]
                       + sFFT.pfImag[iC] + sFFT.pfImag[iD];
                fRealB = sFFT.pfReal[iA] + sFFT.pfImag[iB]
                       - sFFT.pfReal[iC] - sFFT.pfImag[iD];
                fImagB = sFFT.pfImag[iA] - sFFT.pfReal[iB]
                       - sFFT.pfImag[iC] + sFFT.pfReal[iD];
                fRealC = sFFT.pfReal[iA] - sFFT.pfReal[iB]
                       + sFFT.pfReal[iC] - sFFT.pfReal[iD];
                fImagC = sFFT.pfImag[iA] - sFFT.pfImag[iB]
                       + sFFT.pfImag[iC] - sFFT.pfImag[iD];
                fRealD = sFFT.pfReal[iA] - sFFT.pfImag[iB]
                       - sFFT.pfReal[iC] + sFFT.pfImag[iD];
                fImagD = sFFT.pfImag[iA] + sFFT.pfReal[iB]
                       - sFFT.pfImag[iC] - sFFT.pfReal[iD];

                sFFT.pfReal [iA] = fRealA;
                sFFT.pfImag [iA] = fImagA;
                sFFT.pfReal [iB] = fRealB * fReal_Wq  - fImagB * fImag_Wq;
                sFFT.pfImag [iB] = fRealB * fImag_Wq  + fImagB * fReal_Wq;
                sFFT.pfReal [iC] = fRealC * fReal_W2q - fImagC * fImag_W2q;
                sFFT.pfImag [iC] = fRealC * fImag_W2q + fImagC * fReal_W2q;
                sFFT.pfReal [iD] = fRealD * fReal_W3q - fImagD * fImag_W3q;
                sFFT.pfImag [iD] = fRealD * fImag_W3q + fImagD * fReal_W3q;

                iA = iA + 4 * iM;
            }
            iQ += iL;
        }
        iL *= 4;
        iM /= 4;
    }
}

Radix 4, decimated in time:

/*

R4FFTdit : Radix 4, in place, complex in & out, decimation in time FFT
           algorithm.
Required : A tsCmplxRect array with data to transform, another tsCmplxRect
           array with data twiddle factors (sine & cosine table),
           base 4 log of N, N
*/

void R4FFTdit (tsCmplxRect sFFT, tsCmplxRect sTwidTable, tInt iLog4N, tInt iN)
{
    tInt
        iCnt1, iCnt2, iCnt3,
        iL,    iM,    iQ,
        iA,    iB,    iC,     iD;
    tFloat
        fRealA,   fRealB,    fRealC,    fRealD,
        fImagA,   fImagB,    fImagC,    fImagD,
        fReal_Wq, fReal_W2q, fReal_W3q,
        fImag_Wq, fImag_W2q, fImag_W3q;

    iL = iN / 4;
    iM = 1;

    for (iCnt1 = 0; iCnt1 < iLog4N; ++iCnt1)
    {
        iQ = 0;
        for (iCnt2 = 0; iCnt2 < iM; ++iCnt2)
        {
            iA = iCnt2;
            fReal_Wq  = sTwidTable.pfReal[    iQ];
            fImag_Wq  = sTwidTable.pfImag[    iQ];
            fReal_W2q = sTwidTable.pfReal[2 * iQ];
            fImag_W2q = sTwidTable.pfImag[2 * iQ];
            fReal_W3q = sTwidTable.pfReal[3 * iQ];
            fImag_W3q = sTwidTable.pfImag[3 * iQ];

            for (iCnt3 = 0; iCnt3 < iL; ++iCnt3)
            {
                iB = iA +     iM;
                iC = iA + 2 * iM;
                iD = iA + 3 * iM;

                /* Butterfly: 42 FOP, 12 FMUL, 30 FADD */

                fRealA = sFFT.pfReal[iA];
                fImagA = sFFT.pfImag[iA];
                fRealB = sFFT.pfReal[iB] * fReal_Wq
                       - sFFT.pfImag[iB] * fImag_Wq;
                fImagB = sFFT.pfReal[iB] * fImag_Wq
                       + sFFT.pfImag[iB] * fReal_Wq;
                fRealC = sFFT.pfReal[iC] * fReal_W2q
                       - sFFT.pfImag[iC] * fImag_W2q;
                fImagC = sFFT.pfReal[iC] * fImag_W2q
                       + sFFT.pfImag[iC] * fReal_W2q;
                fRealD = sFFT.pfReal[iD] * fReal_W3q
                       - sFFT.pfImag[iD] * fImag_W3q;
                fImagD = sFFT.pfReal[iD] * fImag_W3q
                       + sFFT.pfImag[iD] * fReal_W3q;

                sFFT.pfReal[iA] = fRealA + fRealB + fRealC + fRealD;
                sFFT.pfImag[iA] = fImagA + fImagB + fImagC + fImagD;
                sFFT.pfReal[iB] = fRealA + fImagB - fRealC - fImagD;
                sFFT.pfImag[iB] = fImagA - fRealB - fImagC + fRealD;
                sFFT.pfReal[iC] = fRealA - fRealB + fRealC - fRealD;
                sFFT.pfImag[iC] = fImagA - fImagB + fImagC - fImagD;
                sFFT.pfReal[iD] = fRealA - fImagB - fRealC + fImagD;
                sFFT.pfImag[iD] = fImagA + fRealB - fImagC - fRealD;

                iA = iA + 4 * iM;
            }
            iQ += iL;
        }
        iL /= 4;
        iM *= 4;
    }
}

Philippe Strauss website

Enough of initialization and preambles

Comments