## Bandlimited synthesis = lots of xruns

Programming applications for making music on Linux.

Moderators: khz, MattKingUSA

Lyberta
Established Member
Posts: 681
Joined: Sat Nov 01, 2014 8:15 pm
Location: The Internet

### Bandlimited synthesis = lots of xruns

So I'm trying to make my chiptune synth bandlimited and I've made a straightforward implementation:

Code: Select all

`template <typename CalcType>CalcType BandlimitedSawtoothWave<CalcType>::operator()(){   CalcType sample = {};   for (std::size_t i = 1; i <= numharmonics; ++i)   {      sample += std::pow(CalcType{-1.0}, i) * std::sin(CalcType{2.0} * M_PI *         i * phase.GetValue()) / i;   }   sample = CalcType{-1.0} / M_PI * sample;   phase += phasedelta;   return sample;}`

For concert A at 44100 Hz this should be making 2210011 calls of std::sin per second. And during testing on my 4 GHz CPU I get 100% workload if I try to play anything below A3. Is there a way to optimize this?

skei
Established Member
Posts: 122
Joined: Sun May 18, 2014 4:24 pm

### Re: Bandlimited synthesis = lots of xruns

brute-force additive synthesis like this, especially when using functions like sin and pow, is not a very good idea.. you should look into wavetables or other alias-reducing algorithms like polyblep, dpw, etc)

for, example, a polyblep saw can be made like this.. pseudo-code-ish.. ripped right out of some of my sources and simplified (for clarity), untested, not sure if it compiles as-is..

Code: Select all

`float naive_saw(float t) {  return (t * 2.0f) - 1.0f;}float polyblep(float t, float dt) {  if (t < dt) {    t /= dt;    return t+t - t*t - 1.0f;  }  else if (t > (1-dt)) {    t = (t - 1.0f) / dt;    return t*t + t+t + 1.0f;  }  else return 0.0f;}// t = phase// dt = phase adder (pitch)void process(float* buffer, uint32 length) {  float* out = buffer;  for (uint32 i=0; i<length; i++) {    *out++ = naive_saw(t) - polyblep(t,dt);    t += dt;    if (t>=1.0f) t -= 1.0f;  }}`

so, just a couple of comparisons, muls and adds per sample..

sadko4u
Established Member
Posts: 629
Joined: Mon Sep 28, 2015 9:03 pm

### Re: Bandlimited synthesis = lots of xruns

I see you're still falling into the abyss of C++ metaprogramming. I still insist that using metaprogramming in signal processing is a bad idea. But I think I'll give some advices to this example. First of all, move out from the cycle set of constant calculations:

Code: Select all

`template <typename CalcType>CalcType BandlimitedSawtoothWave<CalcType>::operator()(){   CalcType sample = {};   CalcType k = CalcType{2.0} * M_PI * phase.GetValue();      for (std::size_t i = 1; i <= numharmonics; ++i)   {      sample += std::pow(CalcType{-1.0}, i) * std::sin(k * i) / i;   }   sample = CalcType{-1.0} / M_PI * sample;   phase += phasedelta;   return sample;}`

Now we see pretty expensive operation: std::pow that uses -1 as an argument.
Let's evaluate it's values:
std::pow(-1, 1) = -1
std::pow(-1, 2) = 1
std::pow(-1, 3) = -1
std::pow(-1, 4) = 1

Oh damn, we can use just multiplier -1 for odd i and +1 for even i:

Code: Select all

`template <typename CalcType>CalcType BandlimitedSawtoothWave<CalcType>::operator()(){   CalcType sample = {};   CalcType k = CalcType{2.0} * M_PI * phase.GetValue();      for (std::size_t i = 1; i <= numharmonics; ++i)   {      if (i & 1)        sample -= std::sin(k * i) / i;      else        sample += std::sin(k * i) / i;   }   sample = CalcType{-1.0} / M_PI * sample;   phase += phasedelta;   return sample;}`

Now we see that we can eliminate the (i & 1) condition by unrolling the cycle:

Code: Select all

`template <typename CalcType>CalcType BandlimitedSawtoothWave<CalcType>::operator()(){   CalcType sample = {};   CalcType k = CalcType{2.0} * M_PI * phase.GetValue();     std::size_t i=1;   size_t loops = numharmonics - 1;   while (i < loops)   {      sample -= std::sin(k * i) / i;      i++;      sample += std::sin(k * i) / i;      i++;   }   if (numharmonics & 1) // numharmonics is odd      sample -= std::sin(k * i) / i;      sample = CalcType{-1.0} / M_PI * sample;   phase += phasedelta;   return sample;}`

Then we can calculate sine and cosine once before cycle and then use complex vector rotation. That works faster because requires less math.
We can calculate the initial complex vector for i=1:

Code: Select all

`double v_re = std::cos(k);double v_im = std::sin(k);`

And the rotation vector will be the same because the rotation angle is k:

Code: Select all

`double r_re = v_re;double r_im = v_im;`

Now our code becomes looking like this:

Code: Select all

`template <typename CalcType>CalcType BandlimitedSawtoothWave<CalcType>::operator()(){   CalcType sample = {};   CalcType k = CalcType{2.0} * M_PI * phase.GetValue();      double v_re = std::cos(k), v_im = std::sin(k);   double r_re = v_re, r_im = v_im;   double t_re, t_im; // Temporary vector     std::size_t i=1;   size_t loops = numharmonics - 1;   while (i < loops)   {      sample -= v_im / i; // Using the imaginary part of the R vector because it's sine      // Rotate the vector { v_re, v_im } by multiplying on {r_re, r_im} and store into { t_re, t_im }      t_re = v_re * r_re - v_im * r_im;      t_im = v_re * r_im + v_im * r_re;      sample += t_im / (i + 1); // Using the imaginary part of the T vector because it's sine      // Rotate the vector { t_re, t_im } by multiplying on {r_re, r_im} and store into { v_re, v_im }      v_re = t_re * r_re - t_im * r_im;      v_im = t_re * r_im + t_im * r_re;            i += 2;   }   if (numharmonics & 1) // numharmonics is odd      sample -= v_im / i; // We don't need to rotate the vector because it's the last calculation      sample = CalcType{-1.0} / M_PI * sample;   phase += phasedelta;   return sample;}`
LSP (Linux Studio Plugins) Developer and Maintainer.

ssj71
Established Member
Posts: 1292
Joined: Tue Sep 25, 2012 6:36 pm

### Re: Bandlimited synthesis = lots of xruns

Wow, nice little example. I'd never thought of using a rotation vector like that. Its brilliant!
I still think this code will not be very suitable for RT since its still using libm's sin and cos for each sample. You've made a huge improvement but I think it should be further improved by only calling sin and cos if the freq changes, storing the vectors between calls. Also I wonder if you'd go unstable if you used a polynomial approximation for sin and cos.

I might have to actually play around with this a bit...
Thanks _ssj71

music: https://soundcloud.com/ssj71
My plugins are Infamous! http://ssj71.github.io/infamousPlugins
I just want to get back to making music!

Lyberta
Established Member
Posts: 681
Joined: Sat Nov 01, 2014 8:15 pm
Location: The Internet

### Re: Bandlimited synthesis = lots of xruns

skei wrote:brute-force additive synthesis like this, especially when using functions like sin and pow, is not a very good idea.. you should look into wavetables or other alias-reducing algorithms like polyblep, dpw, etc)

OK, I will look into other algorithms. I just only knew about bandlimited synthesis because I randomly stumbled on it.

sadko4u wrote:I see you're still falling into the abyss of C++ metaprogramming. I still insist that using metaprogramming in signal processing is a bad idea.

Templates have nothing to do with a math here. I use templates to allow the user to choose between float, double, long double or custom user-defined type. I will simplify my function a bit based on your suggestions.

sadko4u wrote:Then we can calculate sine and cosine once before cycle and then use complex vector rotation. That works faster because requires less math.

OK, this is where I'm lost. I've never worked with complex numbers. I guess I will look into it.

sadko4u
Established Member
Posts: 629
Joined: Mon Sep 28, 2015 9:03 pm

### Re: Bandlimited synthesis = lots of xruns

ssj71 wrote:Also I wonder if you'd go unstable if you used a polynomial approximation for sin and cos.

I'll open a small secret: the internal calculation of sin() and cos() funtions in <math.h> is currently based on Taylor's series calculation because it works faster than fsin instruction on modern CPUs. Moreover, on x86_64 architecture it's implemented with SSE instructions movss/movsd, mulss/mulsd and addss/addsd. So it's pretty realtime-compatible but requires more math operations than the complex vector multiplication.

Even more, this function can be vectorized with SSE/AVX and give additional about 3x performance improvement but you will fall into deep assembly.

Best from assembly guy.
LSP (Linux Studio Plugins) Developer and Maintainer.

CrocoDuck
Established Member
Posts: 1008
Joined: Sat May 05, 2012 6:12 pm
Contact:

### Re: Bandlimited synthesis = lots of xruns

FaTony wrote:OK, this is where I'm lost. I've never worked with complex numbers. I guess I will look into it.

In case you are searching for some source, the first few chapters of this are a good gentle introduction in my opinion.
Check my Linux audio experiments on my SoundCloud.
Browse my AUR packages.
Fancying a swim in the pond?

sadko4u
Established Member
Posts: 629
Joined: Mon Sep 28, 2015 9:03 pm

### Re: Bandlimited synthesis = lots of xruns

Also this instruction

Code: Select all

`sample = CalcType{-1.0} / M_PI * sample;`

Is a short way to the UB. Does it mean:

Code: Select all

`sample = (CalcType{-1.0} / M_PI) * sample;`

Or does it mean:

Code: Select all

`sample = CalcType{-1.0} / (M_PI * sample);`

Because multiplication and division are the operators of the same priority, compiler is free do decide in which order to evaluate them. So I would better rewrite this code into:

Code: Select all

`sample = (CalcType{-1.0} / M_PI) * sample;`
LSP (Linux Studio Plugins) Developer and Maintainer.

ssj71
Established Member
Posts: 1292
Joined: Tue Sep 25, 2012 6:36 pm

### Re: Bandlimited synthesis = lots of xruns

sadko4u wrote:Because multiplication and division are the operators of the same priority, compiler is free do decide in which order to evaluate them

Isn't the right to left associativity standard there? However your suggestion is definitely clearer code which makes it better IMHO.
_ssj71

music: https://soundcloud.com/ssj71
My plugins are Infamous! http://ssj71.github.io/infamousPlugins
I just want to get back to making music!

sadko4u
Established Member
Posts: 629
Joined: Mon Sep 28, 2015 9:03 pm

### Re: Bandlimited synthesis = lots of xruns

ssj71 wrote:Isn't the right to left associativity standard there? However your suggestion is definitely clearer code which makes it better IMHO.

IMHO it's not guaranteed that some compilers will do some 'optimization'. It's like trying to answer what the value of i will be after:

Code: Select all

`int i= 5;i = ++i + ++i;`

So I prefer to add braces even if I have just a little doubt about the priority of operators.
LSP (Linux Studio Plugins) Developer and Maintainer.

Lyberta
Established Member
Posts: 681
Joined: Sat Nov 01, 2014 8:15 pm
Location: The Internet

### Re: Bandlimited synthesis = lots of xruns

Do plugins have a signal that they are in realtime setting or not? Because I'd like to keep ideal algorithm for non-realtime render.

Return to “Developer's Section”

### Who is online

Users browsing this forum: No registered users and 1 guest