Zölzer-Boltze (ZB) peak/notch parametric EQ in C
For digital audio signal processing
I was looking for a good implementation of parametric EQ for digital audio, didn't find something valuable enough, so I needed to roll my own. I retained a paper by two german guys, Mrs. Zölzer and Boltze, and implemented it.
It has the best parameters independance of all the one I browsed through. Well, center frequency, Q factor and cut/boost gain are simply really independant in their work, contrary to a lot of such projects you will find while googling around.
You can't grasp any DSP code without having an article describing the process and a picture, so have your copy of the article. Unfortunately it doesn't seems freely available. The code I cut and paste here is TESTED and working flawlessly, except maybe for some omission as it is split in 4 or 5 files in my project.
First we need some helpers functions to convert from center frequency, Q factor and cut/boost gain to parameters suitable for an IIR biquad and ZB specific glue computation around it.
#include <math.h> struct zb_params { double hp; // gp-1, Vo-1 double ohmc; double ohmw; double d; double ax; }; typedef struct zb_params zb_params_t; struct biquad_coeffs { float b_num[3]; float a_denom[3]; }; typedef struct biquad_coeffs biquad_coeffs_t; double bw_from_q(double fc, double q) { return fc / q; } double q_from_bw(double fc, double fbw) { return fc / fbw; } double omega(double f, double fs) { return (2.0 * M_PI * f) / fs; } // Zölzer-Boltze (ZB) peak/notch parametric EQ // use with gp = 1 for boost case // ax: ZB paper use ab for boost, ac for cut, abc for both cases, hence ax // gp: Vo in ZB paper // ohmw: omega of bandWidth static double peq_zb_ax(double ohmw, double gp) { double tohm = tan(ohmw/2.0); return (tohm-gp) / (tohm+gp); } void peq_zb_params(double fc, double fbw, double fs, double gp, zb_params_t *params) { params->hp = gp-1.0; params->ohmc = omega(fc, fs); params->ohmw = omega(fbw, fs); params->d = -cos(params->ohmc); if (gp > 1.0) { params->ax = peq_zb_ax(params->ohmw, 1.0); } else { params->ax = peq_zb_ax(params->ohmw, gp); } }
For the last helper, beware of the sign reversal of the a coeffs if you'll do a tryout with something else than my iir2 C routine (i.e. using matlab, octave, ...), also see the comment after BEWARE in iir2 routine.
void peq_zb_coeffs(double ax, double hp, double d, biquad_coeffs_t *coeffs) { double cmid = d * (1.0 - ax); coeffs->b_num[0] = -ax ; coeffs->b_num[1] = cmid ; coeffs->b_num[2] = 1.0 ; // denominator coeffs->a_denom[0] = 1.0 ; // trivial & unused coeffs->a_denom[1] = -cmid ; coeffs->a_denom[2] = ax ; }
Down to here, we've been only preparing the writing of the next one: it takes as arguments a straightforward way of defining our center frequency, cut/boost dB gain, Q factor, and fill two structs with parameters suitable for a biquad IIR filter and the ZB computation around it. It should also be the only public, exported routine with peq_zb_peak_process.
void peq_zb_peak(double fc, double fs, double q, double gdb, zb_params_t *params, biquad_coeffs_t *coeffs) { double bw = bw_from_q(fc, q); double gp = pow(10.0, gdb / 20.0); peq_zb_params(fc, bw, fs, gp, params); peq_zb_coeffs(params->ax, params->hp, params->d, coeffs); }
Now the part in the signal path, the processing part. First a standard IIR biquad in plain C, not using intrinsics :
#ifndef INLINE # if __GNUC__ && !__GNUC_STDC_INLINE__ # define INLINE extern inline # define _INLINE extern inline # else # define INLINE inline # define _INLINE inline # endif #endif // IIR filtering : // // BEWARE: "a" coeffs sign need to be reversed before processing // see Proakis & Manolakis p. 492 // direct form 2 IIR filters INLINE void iir2(float *b, float *a, float *z, float *samples, int l) { int i; for (i = 0; i < l; ++i) { // delay z-1 z[2] = z[1]; z[1] = z[0]; // a indices are no longer offseted minus one rel. DSP literature // a[0] is assumed to be eq. to 1.0, even if never accessed // inplace (new 20110912) z[0] = samples[i] + a[1] * z[1] + a[2] * z[2]; samples[i] = b[0] * z[0] + b[1] * z[1] + b[2] * z[2]; } }
Some helpers, common vectors operations in intel SSE intrinsics, each on 16 samples at a time (supposed to be used on integer power of two buffer length greater or equal than 64 samples). First the simple copy:
#include <assert.h> #include <x86intrin.h> _INLINE void _copy16(float *a, float *b) { // gcc emit perfect asm w this (also for stride 4 variants) __m128 tmp0, tmp1, tmp2, tmp3; tmp0 = _mm_load_ps(a); tmp1 = _mm_load_ps(a+4); tmp2 = _mm_load_ps(a+8); tmp3 = _mm_load_ps(a+12); _mm_store_ps(b, tmp0); _mm_store_ps(b+4, tmp1); _mm_store_ps(b+8, tmp2); _mm_store_ps(b+12, tmp3); } INLINE void copy16(float *a, float *b, int l) { int i; assert(l % 16 == 0); for (i = 0; i < l; i += 16) { _copy16(a+i, b+i); } }
Here the element-wise addition of two vectors :
_INLINE void _vv_add16(float *a, float *b) { __m128 sum0, sum1, sum2, sum3; __m128 tmpb0 = _mm_load_ps(b); __m128 tmpb1 = _mm_load_ps(b+4); __m128 tmpb2 = _mm_load_ps(b+8); __m128 tmpb3 = _mm_load_ps(b+12); sum0 = _mm_add_ps(tmpb0, _mm_load_ps(a)); sum1 = _mm_add_ps(tmpb1, _mm_load_ps(a+4)); sum2 = _mm_add_ps(tmpb2, _mm_load_ps(a+8)); sum3 = _mm_add_ps(tmpb3, _mm_load_ps(a+12)); _mm_store_ps(b, sum0); _mm_store_ps(b+4, sum1); _mm_store_ps(b+8, sum2); _mm_store_ps(b+12, sum3); }
Substraction, store in A:
// second (b) - first (a) store in first (a) _INLINE void _vv_sub16_sta(float *a, float *b) { __m128 diff0, diff1, diff2, diff3; __m128 tmpb0 = _mm_load_ps(b); __m128 tmpb1 = _mm_load_ps(b+4); __m128 tmpb2 = _mm_load_ps(b+8); __m128 tmpb3 = _mm_load_ps(b+12); diff0 = _mm_sub_ps(tmpb0, _mm_load_ps(a)); diff1 = _mm_sub_ps(tmpb1, _mm_load_ps(a+4)); diff2 = _mm_sub_ps(tmpb2, _mm_load_ps(a+8)); diff3 = _mm_sub_ps(tmpb3, _mm_load_ps(a+12)); _mm_store_ps(a, diff0); _mm_store_ps(a+4, diff1); _mm_store_ps(a+8, diff2); _mm_store_ps(a+12, diff3); }
Constant * vector multiplication:
_INLINE void _cv_mul16(float a, float *b) { __m128 prod0, prod1, prod2, prod3; __m128 cst = _mm_set1_ps(a); // could be written with temps for _load_ps with no changes in generated asm prod0 = _mm_mul_ps(cst, _mm_load_ps(b)); prod1 = _mm_mul_ps(cst, _mm_load_ps(b+4)); prod2 = _mm_mul_ps(cst, _mm_load_ps(b+8)); prod3 = _mm_mul_ps(cst, _mm_load_ps(b+12)); _mm_store_ps(b, prod0); _mm_store_ps(b+4, prod1); _mm_store_ps(b+8, prod2); _mm_store_ps(b+12, prod3); }
And finally, the whole ZB process function around the IIR biquad:
/* block based IIR processing of peak/notch zolbol PEQ */ INLINE void peq_zb_peak_process(biquad_coeffs_t *coeffs, float hp, float *z, float *in, float *out, int l) { int i; assert(l % 16 == 0); copy16(in, out, l); iir2(coeffs->b_num, coeffs->a_denom, z, out, l); for (i = 0; i < l; i += 16) { // _sub16 : second minus first _vv_sub16_sta(out+i, in+i); _cv_mul16(0.5 * hp, out+i); _vv_add16(in+i, out+i); } }
I had some really nice feedback from two gentlemen following comp.dsp on usenet, first Robert Bristow-Johnson, about very low frequency usage of IIR biquad, especially in direct form II, on stability, accuracy, noise, distorsion:
Even with floating point, the problem with cos(w0) when w0 is so close to 0 is that cos(w0) is so close to 1 and all of the information about w0 is in the difference. so, if you want to solve the low freqs difficulty, you should rearrange your transfer function to be in terms of 1-cos(w0) and not in terms of cos(w0), and build the signal flow topology in terms of that redesigned transfer function.
And i dunno about ZB, but if their coefficients are expressed in terms of tan(w0/2) instead of cos(w0), i would suggest using a couple trig identities (that can be seen in the audio eq cookbook) to convert. ... and, i still haven't yet decided, if the signal is float, whether Direct Form 2 is as good as Direct Form 1 (for fixed-point signals, DF1 is far better for multiple reasons). but if you redefine the transfer function to be in terms of 1-cos(w0), it won't be either, precisely. maybe some other form would be better than either DF1 or DF2.
David Wooff replied:
In my experience, DF1 certainly is better than DF2 with floats but unfortunately requires more cycles (certainly on the SHARC in any case). DF2 32-bit float is hopeless. 40-bit float is just about acceptable in DF2 over the full audio range for a parametric EQ but doesn't really cut it for pro-audio use. 40-bit DF1 is also borderline depending on your requirements. Yes, a modified structure will help but again more costly. Like most things in life, a bit of a trade-off.
AssemblerGuy joined the discussion:
Direct form I is not canonic, which means that it will require more memory than necessary. It has the nice property of being immune to intermediate overflow if signed integer overflow has defined wraparound behavior in the programming language, which notably is not the case for C and C++. As long as the end result fits, intermediate overflows don't matter. Direct form I is also nice and straightforward. You can fire up the debugger and look at the values of the previous input and output samples used during the calculations. Direct form II is canonic. It will use the absolute minimum memory necessary. However, its internal buffer contents are not equivalent to x(n) or y(n), and it will not tolerate overflow.
Thanks to you guys, very instructive discussion, especially as I was thinking that 32bits IEEE-754 floats was much enough accuracy for all audio signal processing.
Commentaires