Ideas for an article on SIMD optimisations
-
Fire Sledge - Ohm Force Fire Sledge - Ohm Force https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=46
- KVRist
- Topic Starter
- 121 posts since 2 Nov, 2000 from 404 - Not found
I'm going to start an article related to techniques for optimising DSP algorithms with SIMD, in any possible way.
Therefore I would like to collect the maximum developer's ideas and opinions about encountered problems and/or their possible solutions and compromises. For example: block processing with short feedback, multirate processing, look-up tables, conditional code, etc.
Or links on existing articles, if any...
Thank you
-- Laurent
Therefore I would like to collect the maximum developer's ideas and opinions about encountered problems and/or their possible solutions and compromises. For example: block processing with short feedback, multirate processing, look-up tables, conditional code, etc.
Or links on existing articles, if any...
Thank you
-- Laurent
-
- KVRAF
- 3388 posts since 29 May, 2001 from New York, NY
Excellent idea Laurent... You could also address different strategies, like using SSE to process 4 voices at once vs. using SSE to optimize the code for 1 single voice...
Since I'm deep inside SSE at the moment I'm very interested. I could contribute some snippets, although the challenge I'm facing at the moment is how to move from a naive SSE implementation to an optimized one. There are no tools to help doing this as far as I know (maybe Intel VTune, but it's quite expensive).
'Tick
Since I'm deep inside SSE at the moment I'm very interested. I could contribute some snippets, although the challenge I'm facing at the moment is how to move from a naive SSE implementation to an optimized one. There are no tools to help doing this as far as I know (maybe Intel VTune, but it's quite expensive).
'Tick
-
Fire Sledge - Ohm Force Fire Sledge - Ohm Force https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=46
- KVRist
- Topic Starter
- 121 posts since 2 Nov, 2000 from 404 - Not found
Yes, it was what I meant by "in any possible way". I identified three main cases:Big Tick wrote:You could also address different strategies, like using SSE to process 4 voices at once vs. using SSE to optimize the code for 1 single voice...
- Block processing (N samples)
- N-ways (classic one, N voices at once)
- N-serial (one sample, N chained identical modules)
Great !I could contribute some snippets
-- Laurent
-
- KVRAF
- 2458 posts since 3 Oct, 2002 from SF CA USA NA Earth
I've got a bizarro SSE-ized chamberlin SVF that I use in Meridian that might be worth writing up. The code path implements a 12dB/octave filter, but the output of elements 1 and 2 is always used as the next input to elements 3 and 4, so in effect you have two filters with both 12dB and 24dB-and-one-sample-delay outputs - so it's a hybrid of 2-way serial and 2-way parallel vectorization. I suspect I'd be better off ditching it, actually, but the idea might be useful and I could write it up in more detail.
-
- KVRAF
- 3388 posts since 29 May, 2001 from New York, NY
OK, here is a first snippet. It sums up all 4 parts of xmm0. You typically need this for the final mixing of N-ways processing.
As is often the case with SSE, this code needs to be interleaved with other stuff for best performance.
'Tick
Code: Select all
// xmm0 = 1 2 3 4
movhlps xmm1, xmm0 // xmm1 = 3 4 3 4
addps xmm1, xmm0 // 1+3 2+4 3+3 4+4
movaps xmm0, xmm1
shufps xmm0, xmm0, 1 // 2+4 ...
addss xmm0, xmm1 // (1+3)+(2+4)
'Tick
-
- KVRist
- 63 posts since 31 Oct, 2002 from CA
it would be interesting to compare SSE and altivec functions, so cross platform developers can optimize for both Mac and x86 machines.
Peter
Peter