First steps on Vectorizing Audio Plugins: which Instruction Set do you use in 2018?
- KVRian
- Topic Starter
- 878 posts since 2 Oct, 2013
I see now what you are trying to suggest, I think: if you limit to a non-recent instructions set library, its probably they are quite stable (since its more tested).
Using IPP (which will use latest optimization sets, thus more recent = less stable) might introduce more problems (being more young) .
Let say I'll fit with a non-recent simd extension wrapper library (SSE2 for example, which I hope is pretty stable in the 2018), which library do you suggest?
I'll like to see the differences in performances, between them.
For example, I've tried some IPP functions, and they run very faster than MKL: IPP seems a beast!!!
Using IPP (which will use latest optimization sets, thus more recent = less stable) might introduce more problems (being more young) .
Let say I'll fit with a non-recent simd extension wrapper library (SSE2 for example, which I hope is pretty stable in the 2018), which library do you suggest?
I'll like to see the differences in performances, between them.
For example, I've tried some IPP functions, and they run very faster than MKL: IPP seems a beast!!!
- KVRian
- Topic Starter
- 878 posts since 2 Oct, 2013
But if you avoid the dispatcher (i.e. select automatically the better optimization function/simd set for the current CPU), which SIMD do you select to use in your plugins? Just Curious AVX? SSE2?
-
- KVRAF
- 2256 posts since 29 May, 2012
I don't write any plugins. When you do not have end users as customers potential dispatcher bugs do not matter at all so I just call ippInit and forget about potential IPP bugs as we always have more problems to due some other thing - like some threading error that cannot be easily tested or a supposedly standards-compiant hardware only the customer has and we do not. An analogy in the plugins market would be a DAW that only a few customers have and you do not.
~stratum~
- KVRian
- Topic Starter
- 878 posts since 2 Oct, 2013
Ahhh, ok. I was thinking that since we are within a DSP/Plug-in Development forum
Why do you call it manually? It should be called automatically by default once run the included ipp.h...
-
- KVRAF
- 2256 posts since 29 May, 2012
I used to be a Music-DSP hobbist. In fact I don't know why I'm still here while the electric guitars rust somewhere in the room without getting touched once a weekhhh, ok. I was thinking that since we are within a DSP/Plug-in Development forum
If you do not call it manually one of the following will occur: (1) Either it is automatically called and you will have a race condition at that point, (2) dispatcher is not initialized at all and you'll be running some suboptimal code. Which one occurs, I do not know.Why do you call it manually? It should be called automatically by default once run the included ipp.h...
~stratum~
- KVRian
- Topic Starter
- 878 posts since 2 Oct, 2013
Hope that if you have a family, you take care.
It seems that since version 9.0 (i.e. ~2015) ippInit is called automatically:stratum wrote: ↑Wed Nov 28, 2018 10:22 am If you do not call it manually one of the following will occur: (1) Either it is automatically called and you will have a race condition at that point, (2) dispatcher is not initialized at all and you'll be running some suboptimal code. Which one occurs, I do not know.
ippInit: Automatically initializes the static or dynamic library to that which is most appropriate for the runtime processor.
Not sure how a race conditions could create problem for IPP in the context of a plugin.
From specs: You can not use any other Intel IPP function while the function ippInit continues execution.
So I believe no one IPP functions overload will be chosed until the ippInit has detected the CPU.
-
- KVRAF
- 2256 posts since 29 May, 2012
That sentence doesn't mean that ippInit is called automatically. It only means you do not have to know that you need to call ippStaticInit for the static library version anymore.ippInit: Automatically initializes the static or dynamic library to that which is most appropriate for the runtime processor.
This particular sentence mean they explicitly acknowledge the possibility of a race condition even when ippInit is called manually, so be careful about the point you call it.Not sure how a race conditions could create problem for IPP in the context of a plugin.
From specs: You can not use any other Intel IPP function while the function ippInit continues execution.
~stratum~
- KVRian
- Topic Starter
- 878 posts since 2 Oct, 2013
Does this? But since Intel IPP 9.0, the ipp*Init*() functions are not necessary to enable the dispatcher. Any first call Intel IPP function is able to detect the processor type and set the dispatcher to use the processor-specific code.
-
- KVRAF
- 2256 posts since 29 May, 2012
Not useful without also stating that this is also being done in a thread safe manner.
Do they also say that this automatic initialization is thread safe?
~stratum~
- KVRian
- Topic Starter
- 878 posts since 2 Oct, 2013
I don't think CPU will change during the process
So any call to ippInit should end up with the same result (i.e. the processor code). Even in multi-tasking/parallel accesses.
I don't see a case where this could introduce problem, except the situation where a overloaded function will process in-between these ippInit() calls.
But as specified above, this can't happens. What am I missing?
-
- KVRAF
- 2256 posts since 29 May, 2012
There is a thread safe way to implement this automatic initialization and there is a thread unsafe way to do it. Both are known by IPP authors I suppose, but I would nevertheless look for an explicit statement in the docs before assuming anything about it.
As for standalone apps, all you have to do is to call ippInit in main().
For plugins, it's a little more involved than that...
As for standalone apps, all you have to do is to call ippInit in main().
For plugins, it's a little more involved than that...
~stratum~
-
- KVRist
- 134 posts since 13 Apr, 2016
Could you please explain this a bit further, Urs?Urs wrote: ↑Sun Nov 25, 2018 11:07 am We use vector intrinsics wrapped into objects. This usually gives us 2x the performance over scalar code. I more and more use templated functions so that scalar code and vectorized code are identical, and I just implement for either float or float vector. This makes the scalar code a tad slower (no conditional branches), but then it's only there for reference anyway.
- KVRian
- Topic Starter
- 878 posts since 2 Oct, 2013
-
- KVRAF
- 2256 posts since 29 May, 2012
I'd use InterlockedIncrement causing a single call to ippInit on one path of an if clause, and wait initialization to complete on the other, but you can find something more elegant by searching for "thread safe lazy initialization".
modern geeks version:
https://www.nosid.org/cxx11-threadsafe- ... ation.html
and do that somewhere other than audio signal processing code.
Code: Select all
static ALIGNED DWORD s_init_cnt= 0;
static bool s_init_completed=false;
if (s_init_cnt == 0 && 1 == InterlockedIncrement(&s_init_cnt))
{
ippInit();
s_init_completed=true;
}
else if (!s_init_completed)
{
while(!s_init_completed)
Sleep(1);
}
https://www.nosid.org/cxx11-threadsafe- ... ation.html
and do that somewhere other than audio signal processing code.
~stratum~