First steps on Vectorizing Audio Plugins: which Instruction Set do you use in 2018?

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

stratum wrote: Thu Nov 29, 2018 10:13 am I'd use InterlockedIncrement causing a single call to ippInit on one path of an if clause, and wait initialization to complete on the other, but you can find something more elegant by searching for "thread safe lazy initialization".
Why not just put the call in the constructor of a "dummy" global object?

Those get constructed (from a single thread) as part of loading the dynamic library (or for applications, before "main" is called). The only thing you have to worry about with such global constructors is that their order can be quite unpredictable... but I don't see why that would really be an issue here.

Post

mystran wrote: Thu Nov 29, 2018 10:54 am
stratum wrote: Thu Nov 29, 2018 10:13 am I'd use InterlockedIncrement causing a single call to ippInit on one path of an if clause, and wait initialization to complete on the other, but you can find something more elegant by searching for "thread safe lazy initialization".
Why not just put the call in the constructor of a "dummy" global object?

Those get constructed (from a single thread) as part of loading the dynamic library (or for applications, before "main" is called). The only thing you have to worry about with such global constructors is that their order can be quite unpredictable... but I don't see why that would really be an issue here.
for calling ippInit that would also work, because it doesn't depend on anything else to work properly, and the unpredictable execution order is OK.
~stratum~

Post

mystran wrote: Thu Nov 29, 2018 10:54 amWhy not just put the call in the constructor of a "dummy" global object?

Those get constructed (from a single thread) as part of loading the dynamic library (or for applications, before "main" is called). The only thing you have to worry about with such global constructors is that their order can be quite unpredictable... but I don't see why that would really be an issue here.
It seems to works:

Code: Select all

class MainIPlugInitializer
{
public:
	MainIPlugInitializer() {
		ippInit();
	}
};
...
MainIPlugInitializer mainIPlugInitializer;
That would be thread-safe, right?

Post

Nowhk wrote: Thu Nov 29, 2018 1:12 pm It seems to works:

Code: Select all

class MainIPlugInitializer
{
public:
	MainIPlugInitializer() {
		ippInit();
	}
};
...
MainIPlugInitializer mainIPlugInitializer;
That would be thread-safe, right?
Assuming "mainIPlugInitializer" is a global, then yeah.

[well: strictly speaking it's not that it's "thread-safe" as such, rather it simply runs in a context where there can only ever be one thread, unless you specifically create more threads of your own with another such initializer; the application won't get the library handle (and hence can't call into the library from other threads) until after the initializers have run]

Actually in this case, since ippInit() returns a status code, using it to initialize a global variable should also work (using the same mechanism):

Code: Select all

// somewhere in global scope
IppStatus ippInitStatus = ippInit();
The "wrap it into an initializer-class" method has the advantage of being a whole lot more general though.

Post

mystran wrote: Thu Nov 29, 2018 2:49 pm [well: strictly speaking it's not that it's "thread-safe" as such, rather it simply runs in a context where there can only ever be one thread, unless you specifically create more threads of your own with another such initializer; the application won't get the library handle (and hence can't call into the library from other threads) until after the initializers have run]
Anyway, still I don't understand why I need to do it :)

It seems they call ippInit() (if not already initialized) at the first IPP call, preventing (later) any other IPP functions to execute while it has been finished. Reasoning this way, it seems to be thread safe to me :? Any other (parallel) thread won't run any IPP function if ippInit() has not terminated.

The only problem I see is if I manually call ippInit() while another thread is running ippInit() (because a IPP function has run and automatically called ippInit(), for example). But again, if I never call manually ippInit(), it should already work out of the box...

Post

Nowhk wrote: Thu Nov 29, 2018 3:42 pm Anyway, still I don't understand why I need to do it :)
If you have looked at it carefully and haven't seen an explicit statement about thread safety of automatic initialization of the IPP dispatcher, then I guess it's not your fault.

Given lack of adequate information, you should assume the worst scenario, not the best, if you do not want a dispatcher "bug" - and if a data race occurs, you can't even call it a bug if they didn't explicitly say that it's thread safe.
~stratum~

Post

mystran wrote: Thu Nov 29, 2018 2:49 pm [well: strictly speaking it's not that it's "thread-safe" as such, rather it simply runs in a context where there can only ever be one thread, unless you specifically create more threads of your own with another such initializer; the application won't get the library handle (and hence can't call into the library from other threads) until after the initializers have run].
Uhm...

Since all functions are global within the IPP library, I believe that the flags initialized by IPP (i.e. which cpu, etc) are globals as well, right? So the process can access to it.

So theoretically, if I load two DLLs, which both include this library, the second loaded will overwrite the flags.

What if the second load occurs while the first Is running?

Post

DLL's are loaded once per process, not once per other dependent DLL's. Therefore no concurrent loads of the same DLL will occur (if the module loader loads any DLL concurrently at all), but the flags you mention can be overwritten as they are shared.

p.s In any case you probably shouldn't be linking the DLL version of IPP for a plugin.
~stratum~

Post

stratum wrote: Thu Nov 29, 2018 8:14 pm DLL's are loaded once per process, not once per other dependent DLL's. Therefore no concurrent loads of the same DLL will occur (if the module loader loads any DLL concurrently at all), but the flags you mention can be overwritten as they are shared.

p.s In any case you probably shouldn't be linking the DLL version of IPP for a plugin.
For the dispatch table, it should use a sort of static or singleton class that store cpu flags, elaborated once.

The visibility of globals is within the whole process. What if the DAW load two DLLs (i.e. inst+fx) which use both IPP, on the same process?

They will share the same global object/flags...

Post

Nowhk wrote: Thu Nov 29, 2018 9:06 pm
They will share the same global object/flags...
That's correct, but nobody wants a copy of IPP DLL's in users plugin folder anyway, and that's why no such problem will occur as anybody who happens to be using IPP will be static linking.
~stratum~

Post

Don't want to derail your thread but isn't IPP and the like going to be a lot slower if you want to have multiple instructions in your loop?

In my understanding IPP has to read and store the results of the entire loop for each instruction while if you used intrinsics (ideally wrapped nicely in a class) you only have to store the results to memory once. Maybe I'm mistaking and IPP combines the instructions somehow.

Post

Nowhk wrote: Thu Nov 29, 2018 9:06 pm The visibility of globals is within the whole process. What if the DAW load two DLLs (i.e. inst+fx) which use both IPP, on the same process?
Globals are only visible inside a module (eg. binary or dynamic library) and in other modules that link to that module. If you export some symbol, then the host can query for it at run-time, but other than that everything is private.

This means that the problem of "shared globals" (and a whole lot of other problems, like potential version mismatches and all that) only happens when two modules link to the same dynamic library. The solution to all these problems is to use static versions of any libraries in your plugins; this way you get your own private copy and never have to worry about it again.

Post

mtytel wrote: Thu Nov 29, 2018 9:26 pm Don't want to derail your thread but isn't IPP and the like going to be a lot slower if you want to have multiple instructions in your loop?

In my understanding IPP has to read and store the results of the entire loop for each instruction while if you used intrinsics (ideally wrapped nicely in a class) you only have to store the results to memory once. Maybe I'm mistaking and IPP combines the instructions somehow.
It provides the loop implementation as well.
I don't think he will gain anything by simply replacing arithmetic operations with IPP functions, it will rather be slower because you trade a cpu insturction for a library call (including all the overhead).

He will gain performance however if uses IPP DSP functions ( https://software.intel.com/en-us/ipp-de ... D6E834CC64 ).
Don't take it personal, but don't see how Nowhk + intrinsics will ever be able to beat an IPP IIR or FIR or FFT or whatever performance :D :P

Post

PurpleSunray wrote: Fri Nov 30, 2018 2:09 am Don't take it personal, but don't see how Nowhk + intrinsics will ever be able to beat an IPP IIR or FIR or FFT or whatever performance :D :P
Lol :D I agree. Maybe in 10 years, who know... I'm learning :wink:
PurpleSunray wrote: Fri Nov 30, 2018 2:09 am I don't think he will gain anything by simply replacing arithmetic operations with IPP functions, it will rather be slower because you trade a cpu insturction for a library call (including all the overhead).
Thats not really true.
Repleaced this:

Code: Select all

for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) {
	pValue[sampleIndex] = std::clamp(pStart[sampleIndex] + pMod[sampleIndex], 0.0, 1.0);
}
With this:

Code: Select all

ippsAdd_64f(pStart, pMod, pValue, blockSize);
ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess);
ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater);
Its more than 2x faster. Probably because MSVC is not so smart...

Post

Nowhk wrote: Fri Nov 30, 2018 6:38 am Repleaced this:

Code: Select all

for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) {
	pValue[sampleIndex] = std::clamp(pStart[sampleIndex] + pMod[sampleIndex], 0.0, 1.0);
}
With this:

Code: Select all

ippsAdd_64f(pStart, pMod, pValue, blockSize);
ippsThreshold_64f_I(pValue, blockSize, 0.0, ippCmpLess);
ippsThreshold_64f_I(pValue, blockSize, 1.0, ippCmpGreater);
Its more than 2x faster. Probably because MSVC is not so smart...
Depends on the value of blockSize. For very small values IPP function calls will have considerable overhead and intrinsics or custom assembly code could be better.

On the other hand, I hope you are not comparing the execution speed of MSVC debug builds with IPP and assuming performance improvement!
~stratum~

Post Reply

Return to “DSP and Plugin Development”