Fathom Synth Development Thread

VST, AU, AAX, CLAP, etc. Plugin Virtual Instruments Discussion
Post Reply New Topic
RELATED
PRODUCTS
Fathom Synth

Post

FathomSynth wrote: Sun Jun 09, 2019 5:39 pm Wow, thanks everyone for the reviews, we're back in the top 100 again!
You should see if you can get on Splice’s synth list. They have a pretty exhaustive one: https://splice.com/plugins/search?category=instrument

Post

Fathom Test Results Intel AVX Parallel Processing

FathomAvx.jpg

I basically have Fathom's parallel processor running now. Some additional work will be necessary over the next month to connect all GUI parameters to the new processor, but with the parallel processor running I can now share some initial test results. These test results use the debug build so the final release build will probably be even better than this.

This test uses 10 oscillators with 8 detune voices to push the normal CPU to exactly 100 percent.

The current version without any parallel processing runs all 80 voices at CPU 100 %.
The new parallel processor runs at 20 %.

This is a 5 X multiply using the debug build.
The release build will probably be between a 6 and 7 multiply.

This means essentially that presets which currently max at 5 notes will give you at least 25 notes.

Notice the parallel processor takes about twice as much CPU as a single voice. The reason actual test results give a 5X instead of the theoretical 8X is because there is some overhead loading the AVX registers. Because of the extra register instructions, one AVX voice takes about twice the actual run time of a normal voice with no loop. However, AVX voices over two have no impact on the CPU because everything is running in parallel and requires no loop, where as the normal processor requires 8 loops to run 8 detune voices. AVX crunches all 8 detune voices in one pass!

So basically running Fathom with the Intel AVX will give your all detune voices above 2 for free!

Also, keep in mind I have not yet done any code tuning in the new AVX code so I will probably improve it further before the first release.

My testing a few months back with Fathom against Omnisphere and Sylenth produced results in both cases with Fathom lagging in CPU by almost exactly a factor of 4. Probably because both these other synths use Intel SIMD which is a 4X multiply.

Since the new Fathom will be using Intel AVX which is an 8 X register multiply (a generation beyond SIMD) , when the parallel version of Fathom comes out, it will produce higher polyphony for similar presets than both Omnisphere and Sylenth.
You do not have the required permissions to view the files attached to this post.
Last edited by FathomSynth on Sun Jun 16, 2019 4:32 am, edited 2 times in total.

Post

Great news, As a hw engineer I appreciate you sharing the technical details. It helps me learn more about the sw world. It looks like you can differentiate fathom from the competition. I see some good marketing potential here. As a fathom user I'm happy to see your commitment and support for the product and user base. Looking forward to the months ahead. Fathom is one of the few synths I just enjoy using.

Post

Good news! And for those wondering if their CPU supports AVX, here is some more good news! It appears that all modern Intel Core CPUs with the i3/i5/i7 designation will comply.
Generally, CPUs with the commercial denomination "Core i3/i5/i7" support them, whereas "Pentium" and "Celeron" CPUs don't.
https://en.wikipedia.org/wiki/Advanced_ ... Extensions

So this will be maximum bang for the buck for Fathom Synth/Intel users!!! Excited, can't wait!!! :hyper:
Windows 10 and too many plugins

Post

First AVX2 support. Good for me, I have a modern CPU and will upgrade this year to the latest Intel:
https://en.wikipedia.org/wiki/Advanced_ ... _with_AVX2

Post

FYI, some plugins with AVX do not work with pre-2013 Intel CPUs
d o n 't
w a n t
m o r e

Post

Michael L wrote: Mon Jun 17, 2019 7:58 am FYI, some plugins with AVX do not work with pre-2013 Intel CPUs
Damn, think I'm running a 2012 processor.

Post

Michael L wrote: Mon Jun 17, 2019 7:58 am FYI, some plugins with AVX do not work with pre-2013 Intel CPUs
As multiple posts pointed out (including some from Everett/FathomSynth): there is AVX and AVX2 (see earlier posted link)

Wikipedia on AVX (Advanced Vector Extensions)
https://en.wikipedia.org/wiki/Advanced_ ... Extensions
Advanced Vector Extensions (AVX, also known as Sandy Bridge New Extensions) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge[1] processor shipping in Q1 2011 and later on by AMD with the Bulldozer[2] processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.

AVX2 expands most integer commands to 256 bits and introduces fused multiply-accumulate (FMA) operations. AVX-512 expands AVX to 512-bit support using a new EVEX prefix encoding proposed by Intel in July 2013 and first supported by Intel with the Knights Landing processor, which shipped in 2016.[3][4]
So....on Intel AVX implementation(s)

AVX introduced with Intel Sandy Bridge (Q1 2011)
https://en.wikipedia.org/wiki/Sandy_Bridge

AVX2 introduced with Haswell (Q2 2013)
https://en.wikipedia.org/wiki/Haswell_( ... hitecture)
https://en.wikipedia.org/wiki/Advanced_ ... _with_AVX2

AVX2 is an EXPANSION of AVX, a superset of additional features, so AVX based systems MAY benefit from "AVX2", just NOT from the AVX2 specific expensions/extras.
Advanced Vector Extensions 2 (AVX2), also known as Haswell New Instructions,[5] is an expansion of the AVX instruction set introduced in Intel's Haswell microarchitecture. AVX2 makes the following additions:

- expansion of most vector integer SSE and AVX instructions to 256 bits
- three-operand general-purpose bit manipulation and multiply
- Gather support, enabling vector elements to be loaded from non-contiguous memory locations
- DWORD- and QWORD-granularity any-to-any permutes
- vector shifts.
There's a note though:
Note: Not all CPUs from the listed families support AVX. Generally, CPUs with the commercial denomination "Core i3/i5/i7" support them, whereas "Pentium" and "Celeron" CPUs don't.

Post

I am currently using the AVX + AVX2 intel instructions set to do the parallel processing, which will be for the first release.

However, the good news is that the vast majority of the instructions are AVX1. So supporting AVX1 only is relatively easy and will probably be the very next release if not the same one.

Creating a build with just the AVX instructions is not a big deal, it just requires a little more bit twiddling to replace the convenient AVX2 instructions, so that should not be a problem.

I will probably not go back to SIMD unless there is a major revolt, but even that is not impossible.

Also, shortly I will be supporting AVX-512 which will be 16 detune channels with no extra CPU load. How it will work is that systems with an AVX-512 ready processor will be detected by Fathom and it will automatically enable 16 detune channels which will run in parallel using the new AVX 512 registers which are 16 floating point registers wide, and therefore can process the 16 detune channels in one pass!

Post

AMD cpus also use AVX (where available)

Post

FathomSynth wrote: Wed Jun 05, 2019 4:17 pm But it’s looking like the Fathom 3.0 CPU release will be sometime this summer maybe 6 weeks from now.
Now this is really good news. Wishing you good luck with the release plus hope you'll still have time to see some sunshine. :wink:
Thu Oct 01, 2020 1:15 pm Passing Bye wrote:
"look at SparkySpark's post 4 posts up, let that sink in for a moment"
Go MuLab!

Post

FathomSynth wrote: Mon Jun 17, 2019 5:43 pm I am currently using the AVX + AVX2 intel instructions set to do the parallel processing, which will be for the first release.

However, the good news is that the vast majority of the instructions are AVX1. So supporting AVX1 only is relatively easy and will probably be the very next release if not the same one.

Creating a build with just the AVX instructions is not a big deal, it just requires a little more bit twiddling to replace the convenient AVX2 instructions, so that should not be a problem.

I will probably not go back to SIMD unless there is a major revolt, but even that is not impossible.

Also, shortly I will be supporting AVX-512 which will be 16 detune channels with no extra CPU load. How it will work is that systems with an AVX-512 ready processor will be detected by Fathom and it will automatically enable 16 detune channels which will run in parallel using the new AVX 512 registers which are 16 floating point registers wide, and therefore can process the 16 detune channels in one pass!
you'll probably find this out by yourself, but you should be careful around AVX-512, because the core frequency will drop massively each time you use it. using AVX-512 is only worth it when used (relatively) infrequently and in big batches, rather than interspersing AVX-512 with scalar code here and there. i wouldn't be surprised if you, like many other people, will find that using AVX-512 code is actually slower than AVX-2 for your use case!
I don't know what to write here that won't be censored, as I can only speak in profanity.

Post

OK, Thanks for letting me know.

That is very odd. I will have to do some testing. Are you saying the entire CPU speed drops, or just the core running the AVX instructions? AVX-512 is a 16 X multiply so the speed would have to drop dramatically to impact the advantage of being 16 times faster. However if it causes the speed to drop on other threads being used by the host workstation then that could indeed be a problem.

Post

FathomSynth wrote: Tue Jun 18, 2019 3:41 pm OK, Thanks for letting me know.

That is very odd. I will have to do some testing. Are you saying the entire CPU speed drops, or just the core running the AVX instructions? AVX-512 is a 16 X multiply so the speed would have to drop dramatically to impact the advantage of being 16 times faster. However if it causes the speed to drop on other threads being used by the host workstation then that could indeed be a problem.
i've since corrected my original message - yep, it's the core frequency that drops, not the entire CPU (with caveats).

however, 1) the frequency drop related to AVX-512 can be pretty dramatic (for example), 2) running AVX-512 can also have consequences on other things because of hyperthreading (which is generally enabled on desktop parts), and 3) as far as i know, there are limited number of AVX-512 ports available so you may reach a point where piling on more AVX-512 work onto the CPU (such as running multiple instances) will slow everything to a crawl.

since AVX-512 drops the core frequency dramatically and each core frequency switch is a throttle + stall (i.e. after finishing your AVX-512 callout, you may find yourself running scalar code at AVX-512 core frequency for a while, followed by a complete core stall until the new frequency comes into effect), the overall effect on performance may be negative if you're not doing enough of AVX-512 to justify the stepping. i.e. if you're getting 16X speedup for 5% of your workload and end up running 95% of your workload at AVX-512 speeds while waiting for a frequency switch, it won't be worth it. doing bigger batches of AVX-512 and not interspersing scalar and vector code is extremely important when dealing with AVX-512.

as always, benchmarking is key, but you seem to have that part down already :) good luck with the implementation!
I don't know what to write here that won't be censored, as I can only speak in profanity.

Post

OK, I'll be sure to test it thoroughly when I implement 512.

I think it is safe to assume that this is only a temporary situation applying to the latest Intel processors with AVX-512. I can not imagine Intel would allow something so stupid as a permanent situation for all their processors going forward with AVX. Ideally the substrate should be able to clock the 512 bit wide registers at the exact same speed as 64 bit wide registers if they are truly parallel.

It could simply be a temperature issue they have not worked out yet. AVX-512 registers are 16 times as big so theoretically they could be generating 16 times the heat running at the same speed.

Post Reply

Return to “Instruments”