Highly efficient streaming sliding window (for peak hold)

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Code: Efficient streaming sliding window algorithm (for peak hold)

Uses:
1. Look ahead compression & limiting
2. Sample accurate peak hold (not sure why you'd need that)

Interface:

Code: Select all

/*! Allocates instance with zeroed buffers. */
struct PeakHold* PeakHold_Create(size_t max_window_size);
/*! Destroys all memory associated with the instance */
void PeakHold_Destroy(struct PeakHold *self);
/*! Clears all buffers */
void PeakHold_Clear(struct PeakHold *self);
/*! Inserts next sample */
void PeakHold_Insert(struct PeakHold *self, float value);
/*! Returns maximum sample within the given window size */
float PeakHold_CalcMaxPeak(struct PeakHold *self, size_t window_size);
I've written a fast streaming sliding window method for efficient sample accurate peak hold. You can use peak hold for a sample accurate lookahead as it will tell you what the highest sample is for the next `x` samples is (or previous depending on how you look at it).

Oh, and you can specify any arbitrary window length and you'll get O(log N) with N being the window size you are searching for. This can only performed at the current position but you can perform as many searches as you like (for whatever reason you would want to do that for).

In terms of speed, each sample is O(log N), with N being window size, and my i5 laptop CPU took 2 seconds to process 100 seconds of data with a 1 second window @ 44.1khz. That means that computationally this is very cheap - mainly because it doesn't use any complex tree structures - it's more like a skip-list and is a modification of a sort of skip list I was designing.

It's not the cleanest code ever (will clean it up at some point), it's not well commented to explain the algorithm either but it does come with some tests to prove it works (which basically compares this algorithm with a brute force O(N^2) sliding window).

Post

It seems you have put a lot of effort into this code. I enjoyed reading through it and I find the binary tree structure interesting.
But in regards to speed, doesn't comparing the last greatest value to the current buffer index execute faster, even if the buffer is quite large?

Post

No, because this is also a circular buffer so the last highest value quickly gets outdated.

I might add though that this can easily be modified for use as a running sum where you can immediately change the window size without any additional costs.

Post

I would love to see how it works graphically, having a hard time visualizing how the functions work.

Post

I think it is a circular buffer splited in two, than each segment splitted in two till the sample. Each segment has the peak value information. So the update is about the sample and its parents. It is like a binary tree, balanced, but sorted not on values but on position

The splitting of space gives log n. But they are audio samples with a correlation on local position (not white noise) so the update is less frequent on all leaves. You stop updating for the first correct leaf

Post

There is an O(1) algorithm here http://articles.leetcode.com/sliding-window-maximum/ based on a double ended queue. Would be interesting to see a benchmark comparison.

Post

Why can't we just use average amplitude or rms average instead of the maximum? The average value of a sliding window is pretty simple to compute using integers. I have made a compressor algorithm using this method and the result seemed to be fine (not that I'm an expert of such algorithms).
~stratum~

Post

Nevermind

Post

Nevermind
OK, I've found the answer. Sorry for disrupting the thread.
~stratum~

Post

Ahahah no I was just trying to delete my post but I don't know how to do from the phone lol
Maybe there is some delete button but I'm a bit blind, I cannot see it

Post

martinvicanek wrote:There is an O(1) algorithm here http://articles.leetcode.com/sliding-window-maximum/ based on a double ended queue. Would be interesting to see a benchmark comparison.
It's not O(1) per sample but O(log n). The author had to later correct it.
Zaphod (giancarlo) wrote:I think it is a circular buffer splited in two, than each segment splitted in two till the sample. Each segment has the peak value information. So the update is about the sample and its parents. It is like a binary tree, balanced, but sorted not on values but on position

The splitting of space gives log n. But they are audio samples with a correlation on local position (not white noise) so the update is less frequent on all leaves. You stop updating for the first correct leaf
Yes, circular buffers and you've got the structure right too. Updates average at O(1) because you only upgrade the next level of the tree every other sample, so for N samples you update at most 2N nodes. It's the retrieval that is O(N log M) with M being the chosen window size, so you pay only for what you use!

The actual structure is much more like a binary cone and this algorithm ignores the root, simply because it will only be relevant for one sample in the cycle.
stratum wrote:Why can't we just use average amplitude or rms average instead of the maximum? The average value of a sliding window is pretty simple to compute using integers. I have made a compressor algorithm using this method and the result seemed to be fine (not that I'm an expert of such algorithms).
It's just that for some needs you will need the max window, particularly for a sample accurate peak hold.

Post

It's just that for some needs you will need the max window, particularly for a sample accurate peak hold.
Well, I know. I should have googled 'peak hold' before asking that question. It just shows my level of knowledge about music production. From http://www.recordingmag.com/glossary/P/122.html
Peak Hold
A meter setting that will display the highest signal on a peak meter, usually by leaving an LED lit up at the peak level. The length of time the peak remains indicated may be adjusted to the users preference.
Obviously that's not possible with an algorithm that just finds the average value in a sliding window.
~stratum~

Post

AUTO-ADMIN: Non-MP3, WAV, OGG, SoundCloud, YouTube, Vimeo, Twitter and Facebook links in this post have been protected automatically. Once the member reaches 5 posts the links will function as normal.
Hi everybody,

This is my first post on KVR, so let me introduce myself:
My name is Bart Brouns and I've been doing some form of dsp for almost
20 years.
Started with a Nord Micro Modular, then learned pure-data, and for the
last 3 years moved to faust. http://faust.grame.fr/ (http://faust.grame.fr/)

I became a member because I wanted to ask about an algo I made, and then found this thread, talking about a very similar algo.
I *think* it's very similar, but I'm not sure, since I can't read C much.

My algorithm is a bit more general: it applies a binary operator to the last N
samples, where N is variable at runtime.

It's very cheap: it uses just two delay lines and two operators for each
doubling of the number of samples you want to apply the operator to.
So for calculating the sum of the last 65536 samples, you need just 32
delay lines and 32 plus operators. (I think that means its O(log n), right?)
You can still continuously vary the number of samples to compute at
runtime.

It is written in faust, but I hope the comments make it clear what I'm
doing:

https://github.com/magnetophon/faustCom ... Reduce.lib (https://github.com/magnetophon/faustCompressors/blob/master/slidingReduce.lib)

If it helps, here is the cpp version as produced by the faust compiler,
with + as the operator and maxN = 8:
https://github.com/magnetophon/faustCom ... ingsum.cpp (https://github.com/magnetophon/faustCompressors/blob/master/slidingsum.cpp)

The problem is: I'm not sure how to compare the efficiecy of the OP's algo with mine:
My program runs in realtime. When I set the maximum window size to 95 seconds (needs to be a power of 2 samples), it uses just 2% CPU.

To contrast, the OP's program outputs: Seconds (100), Window (44100): seconds (19s).


My questions:

- Does my explanation of the algo make sense?
- Is it the same algo as the OP?
- How does it compare in efficiency?

Many thanks,
Bart.

Post

avasopht wrote:
martinvicanek wrote:There is an O(1) algorithm here http://articles.leetcode.com/sliding-window-maximum/ based on a double ended queue. Would be interesting to see a benchmark comparison.
It's not O(1) per sample but O(log n). The author had to later correct it.
It is O(1) per sample in the long run, although there might be spikes along the way.
https://lists.columbia.edu/pipermail/mu ... 00920.html

Post

martinvicanek wrote: It is O(1) per sample in the long run, although there might be spikes along the way.
https://lists.columbia.edu/pipermail/mu ... 00920.html
My bad. I mistook that page for one I had read in 2014.

So yes that should easily outperform what I wrote and can be converted into a streaming algorithm with minimal changes.

Excellent use of dynamic programming!
magnetophon wrote:Hi everybody,

This is my first post on KVR, so let me introduce myself:
My name is Bart Brouns and I've been doing some form of dsp for almost
20 years.
Started with a Nord Micro Modular, then learned pure-data, and for the
last 3 years moved to faust. <span class="skimlinks-unlinked">http://faust.grame.fr</span>/

I became a member because I wanted to ask about an algo I made, and then found this thread, talking about a very similar algo.
I *think* it's very similar, but I'm not sure, since I can't read C much.

My algorithm is a bit more general: it applies a binary operator to the last N
samples, where N is variable at runtime.

It's very cheap: it uses just two delay lines and two operators for each
doubling of the number of samples you want to apply the operator to.
So for calculating the sum of the last 65536 samples, you need just 32
delay lines and 32 plus operators. (I think that means its O(log n), right?)
You can still continuously vary the number of samples to compute at
runtime.

It is written in faust, but I hope the comments make it clear what I'm
doing:

<span class="skimlinks-unlinked">https://github.com/magnetophon/faustCom ... .lib</span>

If it helps, here is the cpp version as produced by the faust compiler,
with + as the operator and maxN = 8:
<span class="skimlinks-unlinked">https://github.com/magnetophon/faustCom ... .cpp</span>

The problem is: I'm not sure how to compare the efficiecy of the OP's algo with mine:
My program runs in realtime. When I set the maximum window size to 95 seconds (needs to be a power of 2 samples), it uses just 2% CPU.

To contrast, the OP's program outputs: Seconds (100), Window (44100): seconds (19s).


My questions:

- Does my explanation of the algo make sense?
- Is it the same algo as the OP?
- How does it compare in efficiency?

Many thanks,
Bart.
Hmm. I will have to examine the code properly to say if it's similar. Will let you know when I've had a proper look.

Post Reply

Return to “DSP and Plugin Development”