KVR Audio

juha_p · Post by **juha_p** » Wed Nov 24, 2021 5:44 pm

There's this approximation (Mathematica language) made for to improve the Taylor/Padé approximation of Cos function.

Here's my c++ code trying to do the same (for better accuracy just change all math inside the function use double precision, slower then of course):

Code: Select all

float cos_jp(float x)
{
    float z = x * 0.5f;
    int k = 1;
    while (std::abs(z)>0.25){
        z *= 0.5f;
        k++;
        }

    z *= z;

    float r = ((313.0f * z - 6900.0f) * z + 15120.0f)/((13.0f * z + 660.0f) * z + 15120.0f);

    while(k > 0){
    r = 2.0f * r * r - 1.0f;
    k--;
    }

    return r;
}

and Desmos plot (not correctly working) also trying to demonstrate the responses.

cos_jp() (as it is written in above code) is about EDIT: 30% slower than std::cos but, maybe SSE/AVX could perform better (Q: hows the while loop done in SSE/AVX?).

There's 'shared' cos/sin approximation available later in given stackexchange thread but, I would like to try sin approximation using same method as how cos approximation is done there.

Q: What all changes are needed to get method work for sine (there maybe are some winks given but, I'm not good in math so, ...)?

BertKoor · Post by **BertKoor** » Wed Nov 24, 2021 6:14 pm

When 0 ≤ x ≤ π/2 then sin(x)= cos(π/2 -x)

Pythagoras says a² + b² = c²

With c=1 (radius of the circle) and a representing sine and b representing cosine it follows that a = √(1 - b²)
But take care of the correct sign, removed by the square operation. Such if's might cause severe performance. Also you need a "fast" square root function.

Also see viewtopic.php?t=186381

mystran · Post by **mystran** » Wed Nov 24, 2021 6:43 pm

juha_p wrote: ↑Wed Nov 24, 2021 5:44 pm cos_jp() (as it is written in above code) is about 20% slower than std::cos but, maybe SSE/AVX could perform better (Q: hows the while loop done in SSE/AVX?).

Consider using a minmax polynomial instead. Taylor/Pade will lose you accuracy and the division will lose you performance. Cosines are also particularly easy to approximate with polynomials as the symmetry makes half the coefficients zero.

Z1202 · Post by **Z1202** » Wed Nov 24, 2021 6:45 pm

Is there any special purpose for this sine? If sinewave generation is assumed, then my first guess would be that either Hermitian (not necessarily cubic) or minimax (for high precision) approximation would give better result. Although maybe those would not be best results, I'd still doubt that Taylor or Pade will be better. It's of course arguable, what's the definition of "best" here, and that is an interesting question by itself.

mystran · Post by **mystran** » Wed Nov 24, 2021 6:46 pm

BertKoor wrote: ↑Wed Nov 24, 2021 6:14 pm With c=1 (radius of the circle) and a representing sine and b representing cosine it follows that a = √(1 - b²)
But take care of the correct sign, removed by the square operation. Such if's might cause severe performance. Also you need a "fast" square root function.

This strategy is likely to be worthwhile only if you need both the cosine and the sine at the same time (and even then it might be faster to just approximate both). Otherwise approximate the sine with a minmax polynomial directly. Just like with cosines, half the coefficients can be zero due to symmetry.

Z1202 · Post by **Z1202** » Wed Nov 24, 2021 6:52 pm

This actually raised an interesting question in my head. Given a polynomial approximation of a sine, is it solely the derivative discontinuities at the seams which are responsible for generating the fundamental and the harmonics. The rationale is that the polynomial itself doesn't contain any partials above 0Hz. This question is probably also closely related to the ideas of BLEP and bandlimitedness in the wide sense.

mystran · Post by **mystran** » Wed Nov 24, 2021 7:17 pm

Z1202 wrote: ↑Wed Nov 24, 2021 6:52 pm This actually raised an interesting question in my head. Given a polynomial approximation of a sine, is it solely the derivative discontinuities at the seams which are responsible for generating the fundamental and the harmonics. The rationale is that the polynomial itself doesn't contain any partials above 0Hz. This question is probably also closely related to the ideas of BLEP and bandlimitedness in the wide sense.

Yeah. Higher order polys tend to run into numerical problems with BLEPs, but for something like a cubic-spline approx of a sine you can sweep your oscillator past Nyquist just fine and if your BLEP is good you'll get silence... and we've actually discussed this a few times in the past.

Z1202 · Post by **Z1202** » Wed Nov 24, 2021 7:23 pm

I believe we discussed related topics but not specifically this one: it's the approximation's discontinuities which are responsible even for the fundamental itself. Or so it would seem, since the polynomial itself has only 0Hz partials. Sounds quite paradoxical, doesn't it?

You would think that discontinuities are bad and will try to get rid of those, since they create aliasing. But it turns out they also create the fundamental, which is what we actually want to have.

mystran · Post by **mystran** » Wed Nov 24, 2021 7:57 pm

Z1202 wrote: ↑Wed Nov 24, 2021 7:23 pm I believe we discussed related topics but not specifically this one: it's the approximation's discontinuities which are responsible even for the fundamental itself. Or so it would seem, since the polynomial itself has only 0Hz partials. Sounds quite paradoxical, doesn't it?

This is the very same hypothesis (ie. that the spectrum of a piecewise polynomial signal is completely determined by the discontinuities) I was trying to advocate the last time we discussed this in private.

In any case.. I think it works out mathematically from a purely spectral point of view.. but I think in terms of practical implementation (eg. in the BLEP framework) it runs into problems with compact support of the BLEP kernels in practice.

Z1202 · Post by **Z1202** » Wed Nov 24, 2021 8:59 pm

mystran wrote: ↑Wed Nov 24, 2021 7:57 pm In any case.. I think it works out mathematically from a purely spectral point of view.. but I think in terms of practical implementation (eg. in the BLEP framework) it runs into problems with compact support of the BLEP kernels in practice.

I have been thinking about it since, and I have been drifting into the direction of believing that this is a problem of chosen windows rather than compact support itself (IIRC I already expressed this conjecture when we discussed it). An additional rationale for this is e.g. that Kaiser window has discontinuities and hence an 1/f rolloff and might be not well suitable for windowing BLEPs of higher orders. Similar considerations apply to other windows.

But again, in the context of the current thread this question gets a different focus. I'm not considering removing discontinuities by using BLEPs. Rather here they are usually removed by increasing the order of the polynomial, thus transferring the discontinuity to higher-order derivatives. So it's not exactly the same.

This also might have an interesting implication to the construction of the "optimal" sine approximation. I was wondering if, instead of e.g. trying to shift the discontinuity fully to higher derivatives one should consider some kind of balance, leaving some of the discontinuity with lower derivatives, optimizing for "the best spectrum". Not sure whether better spectra can be achieved this way, but this is how I came up to the realization that the discontinuities must be responsible for the fundamental frequency partial too.

Edit: and apologies if this feels like thread hijacking. We are still pretty much within a strongly related area and hopefully this discussion is interesting not only to mystran and me (otherwise we could take it to the PM channel).

juha_p · Post by **juha_p** » Wed Nov 24, 2021 9:11 pm

Edit: and apologies if this feels like thread hijacking. We are still pretty much within a strongly related area and hopefully this discussion is interesting not only to mystran and me (otherwise we could take it to the PM channel).

No need to apologize, just continue here so that others who are interested can take part. Hopefully, application examples will also appear.

I did update the C++ source code to follow better the original Mathematica implementation of cos approximation.
"Accuracy" can be set by adjusting the value input is compared to. Here's (Mathematica) plot for value set to 1/8 (plot range is -pi...pi):

1_8.png

Also, here's the sin approximation (internally double precision) based on cos approximation:

Code: Select all

float sin_jp(float x)
{
    double z = (M_PI*0.5-x) * 0.5;
    int k = 1;
    while (std::abs(z)>0.5){
        z *= 0.5;
        k++;
        }

    z *= z;
    double r = ((313.0 * z - 6900.0) * z + 15120.0)/((13.0 * z + 660.0) * z + 15120.0);

    while(k > 0){
    r = 2.0 * r * r - 1.0;
    k--;
    }

    return r;
}

mystran wrote: ↑Wed Nov 24, 2021 6:43 pm ...
Taylor/Pade will lose you accuracy and the division will lose you performance.
...

Yes, performance is not very good because of div's and loops https://godbolt.org/z/Yfe61a1Pa .

urosh · Post by **urosh** » Wed Nov 24, 2021 11:06 pm

Z1202 wrote: ↑Wed Nov 24, 2021 8:59 pm
But again, in the context of the current thread this question gets a different focus. I'm not considering removing discontinuities by using BLEPs. Rather here they are usually removed by increasing the order of the polynomial, thus transferring the discontinuity to higher-order derivatives. So it's not exactly the same.

bandlimiting second order discontinuities might be a good option for emulation of sine approximations in analog synths, which are always distorted triangle wave (which is often imperfect).

mystran · Post by **mystran** » Wed Nov 24, 2021 11:28 pm

urosh wrote: ↑Wed Nov 24, 2021 11:06 pm
Z1202 wrote: ↑Wed Nov 24, 2021 8:59 pm
But again, in the context of the current thread this question gets a different focus. I'm not considering removing discontinuities by using BLEPs. Rather here they are usually removed by increasing the order of the polynomial, thus transferring the discontinuity to higher-order derivatives. So it's not exactly the same.
bandlimiting second order discontinuities might be a good option for emulation of sine approximations in analog synths, which are always distorted triangle wave (which is often imperfect).

Your typical "analog sine" is a waveshaped triangle. You can actually do this with BLEPs by taking a triangle and putting it through a waveshaper such as the "smoothstep" function (ie. 3x^2-2x^3) which gives you a C1 continuous two-piece cubic approximation, so assuming "no further shenanigans" (eg. not trying to emulate analog mismatch or anything like that), you only need BLEPs for the 2nd and 3rd derivative and like it's analog counter-part the discontinuities are at the minimum/maximum values (which is great for those "further shenanigans").

edit: Beyond cubic, building the BLEP kernels becomes a little problematic.. but up to cubic works fine if you're careful with how you integrate them.

oskari · Post by **oskari** » Thu Nov 25, 2021 3:19 am

i think you should make your talk public, sounds very interesting

juha_p · Post by **juha_p** » Sat Mar 26, 2022 7:47 am

I have not managed to test ULP error against library sin() nor cos() functions but, by Julia Function Accuracy Test tool it looks like these are "excellent" (what ever this ULP test means) approximation method (add below code to the runtest.jl module which belongs to the Julia test suite):

Code: Select all

function test_sine(x::AbstractFloat)  
 f=0.5  
 z=x*0.5
 k=0
    while (abs(z)>f)
        z*=0.5
        k=k+1  
    end 
    z2=z^2;  
    r=z*(1+(z2/105-1)*((z/3)^2))/(1+(z2/7-4)*((z/3)^2));  
    while(k > 0)
        r = (2*r)/(1-r*r);  
        k=k-1
    end
    return (2*r)/(1+r*r)
 end

function test_cosine(x::AbstractFloat)  
f=0.5  
z=x*0.5
k=0
   while (abs(z)>f)
       z*=0.5
       k=k+1  
   end 
   z2=z^2;  
   r=z*(1+(z2/105-1)*((z/3)^2))/(1+(z2/7-4)*((z/3)^2));  
   while (k > 0)
       r = (2*r)/(1-r*r);  
       k=k-1
   end
   return (1-r*r)/(1+r*r)
end  
  
pii = 3.141592653589793238462643383279502884

MAX_SIN(n::Val{pii}, ::Type{Float16}) = 3.1415926535897932f0
MAX_SIN(n::Val{pii}, ::Type{Float32}) = 3.1415926535897932f0
#MAX_SIN(n::Val{pii}, ::Type{Float64}) = 3.141592653589793238462643383279502884
MIN_SIN(n::Val{pii}, ::Type{Float16}) = -3.1415926535897932f0
MIN_SIN(n::Val{pii}, ::Type{Float32}) = -3.1415926535897932f0
#MIN_SIN(n::Val{pii}, ::Type{Float64}) = -3.141592653589793238462643383279502884

for (func, base) in (sin=>Val(pii), test_sine=>Val(pii), cos=>Val(pii), test_cosine=>Val(pii))    
    for T in (Float16, Float32)
        xx = range(MIN_SIN(base,T),  MAX_SIN(base,T), length = 10^6);
        test_acc(func, xx)
    end
end

Results for approximations and sin() and cos() in ranges [-pi/4,pi/4], [-pi,pi] and [-2pi,2pi] (Float64 not tested):

Code: Select all

pi/4 ----------------------------------------------------------------

Tol debug failed 0.0% of the time.
sin
ULP max 0.5000696182250977 at x = 0.23337582
ULP mean 0.2499294509920759
Test Summary: | Pass  Total
Float16 sin   |    1      1
Tol debug failed 0.0% of the time.
sin
ULP max 0.5000696182250977 at x = 0.23337582
ULP mean 0.2499294509920759
Test Summary: | Pass  Total
Float32 sin   |    1      1
Tol debug failed 0.0% of the time.
test_sine
ULP max 7.437233939810994e-9 at x = -0.12752758
ULP mean 1.1088425131523126e-9
Test Summary:     | Pass  Total
Float16 test_sine |    1      1
Tol debug failed 0.0% of the time.
test_sine
ULP max 7.437233939810994e-9 at x = -0.12752758
ULP mean 1.1088425131523126e-9
Test Summary:     | Pass  Total
Float32 test_sine |    1      1

Tol debug failed 0.0% of the time.
cos
ULP max 0.5008963942527771 at x = 0.62692666
ULP mean 0.24963942989463997
Test Summary: | Pass  Total
Float16 cos   |    1      1
Tol debug failed 0.0% of the time.
cos
ULP max 0.5008963942527771 at x = 0.62692666
ULP mean 0.24963942989463997
Test Summary: | Pass  Total
Float32 cos   |    1      1
Tol debug failed 0.0% of the time.
test_cosine
ULP max 4.495216643363165e-9 at x = 0.76539093
ULP mean 9.939168841158904e-10
Test Summary:       | Pass  Total
Float16 test_cosine |    1      1
Tol debug failed 0.0% of the time.
test_cosine
ULP max 4.495216643363165e-9 at x = 0.76539093
ULP mean 9.939168841158904e-10
Test Summary:       | Pass  Total
Float32 test_cosine |    1      1

pi -----------------------------------------------------------------------

Tol debug failed 0.0% of the time.
sin
ULP max 0.5008857846260071 at x = 2.203355
ULP mean 0.24990503381476237
Test Summary: | Pass  Total
Float16 sin   |    1      1
Tol debug failed 0.0% of the time.
sin
ULP max 0.5008857846260071 at x = 2.203355
ULP mean 0.24990503381476237
Test Summary: | Pass  Total
Float32 sin   |    1      1
Tol debug failed 0.0% of the time.
test_sine
ULP max 0.001272978144697845 at x = 2.899093
ULP mean 1.179825295005716e-8
Test Summary:     | Pass  Total
Float16 test_sine |    1      1
Tol debug failed 0.0% of the time.
test_sine
ULP max 0.001272978144697845 at x = 2.899093
ULP mean 1.179825295005716e-8
Test Summary:     | Pass  Total
Float32 test_sine |    1      1

Tol debug failed 0.0% of the time.
cos
ULP max 0.5008531212806702 at x = 0.45568538
ULP mean 0.2499933592458589
Test Summary: | Pass  Total
Float16 cos   |    1      1
Tol debug failed 0.0% of the time.
cos
ULP max 0.5008531212806702 at x = 0.45568538
ULP mean 0.2499933592458589
Test Summary: | Pass  Total
Float32 cos   |    1      1
Tol debug failed 0.0% of the time.
test_cosine
ULP max 0.0011584102176129818 at x = 1.4495481
ULP mean 1.6793535615395134e-8
Test Summary:       | Pass  Total
Float16 test_cosine |    1      1
Tol debug failed 0.0% of the time.
test_cosine
ULP max 0.0011584102176129818 at x = 1.4495481
ULP mean 1.6793535615395134e-8
Test Summary:       | Pass  Total
Float32 test_cosine |    1      1

2pi -----------------------------------------------------------------

Tol debug failed 0.0% of the time.
sin
ULP max 0.5008665323257446 at x = 1.7611976
ULP mean 0.24982504733224614
Test Summary: | Pass  Total
Float16 sin   |    1      1
Tol debug failed 0.0% of the time.
sin
ULP max 0.5008665323257446 at x = 1.7611976
ULP mean 0.24982504733224614
Test Summary: | Pass  Total
Float32 sin   |    1      1
Tol debug failed 0.0% of the time.
test_sine
ULP max 0.001272976049222052 at x = 5.798186
ULP mean 2.6857659977250206e-8
Test Summary:     | Pass  Total
Float16 test_sine |    1      1
Tol debug failed 0.0% of the time.
test_sine
ULP max 0.001272976049222052 at x = 5.798186
ULP mean 2.6857659977250206e-8
Test Summary:     | Pass  Total
Float32 test_sine |    1      1

Tol debug failed 0.0% of the time.
cos
ULP max 0.500886857509613 at x = 0.45355228
ULP mean 0.25012252611713054
Test Summary: | Pass  Total
Float16 cos   |    1      1
Tol debug failed 0.0% of the time.
cos
ULP max 0.500886857509613 at x = 0.45355228
ULP mean 0.25012252611713054
Test Summary: | Pass  Total
Float32 cos   |    1      1
Tol debug failed 0.0% of the time.
test_cosine
ULP max 0.0017902530962601304 at x = 4.396544
ULP mean 3.316964910986199e-8
Test Summary:       | Pass  Total
Float16 test_cosine |    1      1
Tol debug failed 0.0% of the time.
test_cosine
ULP max 0.0017902530962601304 at x = 4.396544
ULP mean 3.316964910986199e-8
Test Summary:       | Pass  Total
Float32 test_cosine |    1      1

Improved Taylor/Padé approximation of cos/sin functions