Bandlimited synthesis = lots of xruns

Programming applications for making music on Linux.

Moderators: MattKingUSA, khz

Post Reply
Lyberta
Established Member
Posts: 681
Joined: Sat Nov 01, 2014 8:15 pm
Location: The Internet
Been thanked: 1 time

Bandlimited synthesis = lots of xruns

Post by Lyberta »

So I'm trying to make my chiptune synth bandlimited and I've made a straightforward implementation:

Code: Select all

template <typename CalcType>
CalcType BandlimitedSawtoothWave<CalcType>::operator()()
{
	CalcType sample = {};
	for (std::size_t i = 1; i <= numharmonics; ++i)
	{
		sample += std::pow(CalcType{-1.0}, i) * std::sin(CalcType{2.0} * M_PI *
			i * phase.GetValue()) / i;
	}
	sample = CalcType{-1.0} / M_PI * sample;
	phase += phasedelta;
	return sample;
}
For concert A at 44100 Hz this should be making 2210011 calls of std::sin per second. And during testing on my 4 GHz CPU I get 100% workload if I try to play anything below A3. Is there a way to optimize this?
User avatar
skei
Established Member
Posts: 337
Joined: Sun May 18, 2014 4:24 pm
Has thanked: 8 times
Been thanked: 57 times
Contact:

Re: Bandlimited synthesis = lots of xruns

Post by skei »

brute-force additive synthesis like this, especially when using functions like sin and pow, is not a very good idea.. you should look into wavetables or other alias-reducing algorithms like polyblep, dpw, etc)

for, example, a polyblep saw can be made like this.. pseudo-code-ish.. ripped right out of some of my sources and simplified (for clarity), untested, not sure if it compiles as-is..

Code: Select all

float naive_saw(float t) {
  return (t * 2.0f) - 1.0f;
}

float polyblep(float t, float dt) {
  if (t < dt) {
    t /= dt;
    return t+t - t*t - 1.0f;
  }
  else if (t > (1-dt)) {
    t = (t - 1.0f) / dt;
    return t*t + t+t + 1.0f;
  }
  else return 0.0f;
}

// t = phase
// dt = phase adder (pitch)

void process(float* buffer, uint32 length) {
  float* out = buffer;
  for (uint32 i=0; i<length; i++) {
    *out++ = naive_saw(t) - polyblep(t,dt);
    t += dt;
    if (t>=1.0f) t -= 1.0f;
  }
}
so, just a couple of comparisons, muls and adds per sample..
User avatar
sadko4u
Established Member
Posts: 986
Joined: Mon Sep 28, 2015 9:03 pm
Has thanked: 2 times
Been thanked: 359 times

Re: Bandlimited synthesis = lots of xruns

Post by sadko4u »

I see you're still falling into the abyss of C++ metaprogramming. I still insist that using metaprogramming in signal processing is a bad idea. But I think I'll give some advices to this example. First of all, move out from the cycle set of constant calculations:

Code: Select all

template <typename CalcType>
CalcType BandlimitedSawtoothWave<CalcType>::operator()()
{
   CalcType sample = {};
   CalcType k = CalcType{2.0} * M_PI * phase.GetValue();
   
   for (std::size_t i = 1; i <= numharmonics; ++i)
   {
      sample += std::pow(CalcType{-1.0}, i) * std::sin(k * i) / i;
   }
   sample = CalcType{-1.0} / M_PI * sample;
   phase += phasedelta;
   return sample;
}
Now we see pretty expensive operation: std::pow that uses -1 as an argument.
Let's evaluate it's values:
std::pow(-1, 1) = -1
std::pow(-1, 2) = 1
std::pow(-1, 3) = -1
std::pow(-1, 4) = 1
Oh damn, we can use just multiplier -1 for odd i and +1 for even i:

Code: Select all

template <typename CalcType>
CalcType BandlimitedSawtoothWave<CalcType>::operator()()
{
   CalcType sample = {};
   CalcType k = CalcType{2.0} * M_PI * phase.GetValue();
   
   for (std::size_t i = 1; i <= numharmonics; ++i)
   {
      if (i & 1)
        sample -= std::sin(k * i) / i;
      else
        sample += std::sin(k * i) / i;
   }
   sample = CalcType{-1.0} / M_PI * sample;
   phase += phasedelta;
   return sample;
}
Now we see that we can eliminate the (i & 1) condition by unrolling the cycle:

Code: Select all

template <typename CalcType>
CalcType BandlimitedSawtoothWave<CalcType>::operator()()
{
   CalcType sample = {};
   CalcType k = CalcType{2.0} * M_PI * phase.GetValue();
  
   std::size_t i=1;
   size_t loops = numharmonics - 1;
   while (i < loops)
   {
      sample -= std::sin(k * i) / i;
      i++;
      sample += std::sin(k * i) / i;
      i++;
   }
   if (numharmonics & 1) // numharmonics is odd
      sample -= std::sin(k * i) / i;
   
   sample = CalcType{-1.0} / M_PI * sample;
   phase += phasedelta;
   return sample;
}
Then we can calculate sine and cosine once before cycle and then use complex vector rotation. That works faster because requires less math.
We can calculate the initial complex vector for i=1:

Code: Select all

double v_re = std::cos(k);
double v_im = std::sin(k);
And the rotation vector will be the same because the rotation angle is k:

Code: Select all

double r_re = v_re;
double r_im = v_im;
Now our code becomes looking like this:

Code: Select all

template <typename CalcType>
CalcType BandlimitedSawtoothWave<CalcType>::operator()()
{
   CalcType sample = {};
   CalcType k = CalcType{2.0} * M_PI * phase.GetValue();
   
   double v_re = std::cos(k), v_im = std::sin(k);
   double r_re = v_re, r_im = v_im;
   double t_re, t_im; // Temporary vector
  
   std::size_t i=1;
   size_t loops = numharmonics - 1;

   while (i < loops)
   {
      sample -= v_im / i; // Using the imaginary part of the R vector because it's sine
      // Rotate the vector { v_re, v_im } by multiplying on {r_re, r_im} and store into { t_re, t_im }
      t_re = v_re * r_re - v_im * r_im;
      t_im = v_re * r_im + v_im * r_re;

      sample += t_im / (i + 1); // Using the imaginary part of the T vector because it's sine
      // Rotate the vector { t_re, t_im } by multiplying on {r_re, r_im} and store into { v_re, v_im }
      v_re = t_re * r_re - t_im * r_im;
      v_im = t_re * r_im + t_im * r_re;
      
      i += 2;
   }
   if (numharmonics & 1) // numharmonics is odd
      sample -= v_im / i; // We don't need to rotate the vector because it's the last calculation
   
   sample = CalcType{-1.0} / M_PI * sample;
   phase += phasedelta;
   return sample;
}
LSP (Linux Studio Plugins) Developer and Maintainer.
ssj71
Established Member
Posts: 1294
Joined: Tue Sep 25, 2012 6:36 pm
Has thanked: 1 time

Re: Bandlimited synthesis = lots of xruns

Post by ssj71 »

Wow, nice little example. I'd never thought of using a rotation vector like that. Its brilliant!
I still think this code will not be very suitable for RT since its still using libm's sin and cos for each sample. You've made a huge improvement but I think it should be further improved by only calling sin and cos if the freq changes, storing the vectors between calls. Also I wonder if you'd go unstable if you used a polynomial approximation for sin and cos.

I might have to actually play around with this a bit...
Thanks :)
_ssj71

music: https://soundcloud.com/ssj71
My plugins are Infamous! http://ssj71.github.io/infamousPlugins
I just want to get back to making music!
Lyberta
Established Member
Posts: 681
Joined: Sat Nov 01, 2014 8:15 pm
Location: The Internet
Been thanked: 1 time

Re: Bandlimited synthesis = lots of xruns

Post by Lyberta »

skei wrote:brute-force additive synthesis like this, especially when using functions like sin and pow, is not a very good idea.. you should look into wavetables or other alias-reducing algorithms like polyblep, dpw, etc)
OK, I will look into other algorithms. I just only knew about bandlimited synthesis because I randomly stumbled on it.
sadko4u wrote:I see you're still falling into the abyss of C++ metaprogramming. I still insist that using metaprogramming in signal processing is a bad idea.
Templates have nothing to do with a math here. I use templates to allow the user to choose between float, double, long double or custom user-defined type. I will simplify my function a bit based on your suggestions.
sadko4u wrote:Then we can calculate sine and cosine once before cycle and then use complex vector rotation. That works faster because requires less math.
OK, this is where I'm lost. I've never worked with complex numbers. I guess I will look into it.
User avatar
sadko4u
Established Member
Posts: 986
Joined: Mon Sep 28, 2015 9:03 pm
Has thanked: 2 times
Been thanked: 359 times

Re: Bandlimited synthesis = lots of xruns

Post by sadko4u »

ssj71 wrote:Also I wonder if you'd go unstable if you used a polynomial approximation for sin and cos.
I'll open a small secret: the internal calculation of sin() and cos() funtions in <math.h> is currently based on Taylor's series calculation because it works faster than fsin instruction on modern CPUs. Moreover, on x86_64 architecture it's implemented with SSE instructions movss/movsd, mulss/mulsd and addss/addsd. So it's pretty realtime-compatible but requires more math operations than the complex vector multiplication.

Even more, this function can be vectorized with SSE/AVX and give additional about 3x performance improvement but you will fall into deep assembly.

Best from assembly guy.
LSP (Linux Studio Plugins) Developer and Maintainer.
CrocoDuck
Established Member
Posts: 1133
Joined: Sat May 05, 2012 6:12 pm
Been thanked: 17 times

Re: Bandlimited synthesis = lots of xruns

Post by CrocoDuck »

FaTony wrote: OK, this is where I'm lost. I've never worked with complex numbers. I guess I will look into it.
In case you are searching for some source, the first few chapters of this are a good gentle introduction in my opinion.
User avatar
sadko4u
Established Member
Posts: 986
Joined: Mon Sep 28, 2015 9:03 pm
Has thanked: 2 times
Been thanked: 359 times

Re: Bandlimited synthesis = lots of xruns

Post by sadko4u »

Also this instruction

Code: Select all

sample = CalcType{-1.0} / M_PI * sample;
Is a short way to the UB. Does it mean:

Code: Select all

sample = (CalcType{-1.0} / M_PI) * sample;
Or does it mean:

Code: Select all

sample = CalcType{-1.0} / (M_PI * sample);
Because multiplication and division are the operators of the same priority, compiler is free do decide in which order to evaluate them. So I would better rewrite this code into:

Code: Select all

sample = (CalcType{-1.0} / M_PI) * sample;
LSP (Linux Studio Plugins) Developer and Maintainer.
ssj71
Established Member
Posts: 1294
Joined: Tue Sep 25, 2012 6:36 pm
Has thanked: 1 time

Re: Bandlimited synthesis = lots of xruns

Post by ssj71 »

sadko4u wrote:Because multiplication and division are the operators of the same priority, compiler is free do decide in which order to evaluate them
Isn't the right to left associativity standard there? However your suggestion is definitely clearer code which makes it better IMHO.
_ssj71

music: https://soundcloud.com/ssj71
My plugins are Infamous! http://ssj71.github.io/infamousPlugins
I just want to get back to making music!
User avatar
sadko4u
Established Member
Posts: 986
Joined: Mon Sep 28, 2015 9:03 pm
Has thanked: 2 times
Been thanked: 359 times

Re: Bandlimited synthesis = lots of xruns

Post by sadko4u »

ssj71 wrote:Isn't the right to left associativity standard there? However your suggestion is definitely clearer code which makes it better IMHO.
IMHO it's not guaranteed that some compilers will do some 'optimization'. It's like trying to answer what the value of i will be after:

Code: Select all

int i= 5;
i = ++i + ++i;
So I prefer to add braces even if I have just a little doubt about the priority of operators.
LSP (Linux Studio Plugins) Developer and Maintainer.
Lyberta
Established Member
Posts: 681
Joined: Sat Nov 01, 2014 8:15 pm
Location: The Internet
Been thanked: 1 time

Re: Bandlimited synthesis = lots of xruns

Post by Lyberta »

Do plugins have a signal that they are in realtime setting or not? Because I'd like to keep ideal algorithm for non-realtime render.
Post Reply