KVR Audio

Mayae · Post by **Mayae** » Wed Oct 01, 2014 2:27 pm

Say you want to generate a sequence of cosine / sine pairs using an iir oscillator. To take advantage of SSE/AVX simd architechtures, one way is to generate the next 8 values using phase-offset oscillators running at 8x the frequency (or samplerate / 8 ).

Here's example code:

Code: Select all

// init:
auto omega = 2 * M_PI * freq / sampleRate;
auto c = tan(omega * 8 / 2.0); // 8x speed
c = 2.0 / (1.0 + c * c);
__m256 c0 = c - 1.0;
__m256 c1 = omega * c;

// phase-offset oscillators by omega
__m256 sines, cosines;
for (int i = 0; i < 8; ++i)
{
	sines[i] = sin(omega * i);
	cosines[i] = cos(omega * i);
}

// loop:
while (1)
{
	__m256 t0 = c0 * cosines - c1 * sines;
	__m256 t1 = c1 * cosines + c0 * sines;
	cosines = t0;
	sines = t1;
}

This code runs extremely fast obviously, but aliases somewhere around -120dB, while the noise floor is around -190 dB:

While running the very same algorithm, non-vectorized generates this:

Is anyone able to identify why? Mathematically, running the vector version is exactly the same. The specific algorithm used here is the one andy posted somewhere in here: Efficient sine oscillator. However, choice of algorithm doesn't change much.

Note that when i use double precision, the result is perfect, but this obviously halves processing power. If the issues comes from limited precision, i would have expected a higher noise floor due to quantization or similar, but not aliasing.. no? Also, if anyone has some other ideas for generating similar sequential simd oscillators, please dont hold back

mystran · Post by **mystran** » Wed Oct 01, 2014 6:10 pm

Mayae wrote:Is anyone able to identify why? Mathematically, running the vector version is exactly the same. The specific algorithm used here is the one andy posted somewhere in here: Efficient sine oscillator. However, choice of algorithm doesn't change much.

The code you posted is regular inaccurate rotations in the inner loop. You can get that by trivial arithmetic "optimization" of the more accurate version, but then you lose all the numerical benefits. Do NOT attempt to optimize the explicit accumulation by combining the unity-term into the coefficients, that's exactly what makes the regular rotations inaccurate.

Mayae · Post by **Mayae** » Sun Oct 12, 2014 10:52 am

mystran wrote:
Mayae wrote:Is anyone able to identify why? Mathematically, running the vector version is exactly the same. The specific algorithm used here is the one andy posted somewhere in here: Efficient sine oscillator. However, choice of algorithm doesn't change much.
The code you posted is regular inaccurate rotations in the inner loop. You can get that by trivial arithmetic "optimization" of the more accurate version, but then you lose all the numerical benefits. Do NOT attempt to optimize the explicit accumulation by combining the unity-term into the coefficients, that's exactly what makes the regular rotations inaccurate.

Perhaps.. although I've had the same problem with all the algorithms I've tried so far, even the more 'precise' ones (unless you mean some specific variant?)

I've temporarily solved the problem running 8 oscillators in parallel instead, although it is an inconvenience.

Parallel oscillators at reduced samplerate generates aliasing - why?