FFT overlap for feature extraction

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

Fft question:
Do I need overlaps if I’m only grabbing some features like mel or bark? I don’t even necessarily need every frame of a STFT even one every second or so works. How I understand fft overlapping is that each overlap sums together after the final window after the inverse fft. However, for my needs I do not even do an inverse fft and I only extract mel and similar features in the frequency domain. So do I need to do overlapping for my use case? It doesn’t seem like the feature values i extract even consider the overlap.

Unless I’m supposed to combine overlapping fft values for the feature extraction itself??? Actually this makes me wonder if I should be doing that when extracting feature because some info is lost from the windowing to prevent spectral leaking. If that’s the case then do I just sum the window overlaps together after the fft and before feature extraction??

Because I don’t do that for spectral audio processing (again, i only sum after the final window after the ifft..) Thanks!

Post

FFT is periodic, so usually you'd usually apply a window function to suppress discontinuity at the "wrap-around" point (=frame boundary). When you're reconstructing with IFFT, you'd then do overlap so that you can arrange windows from adjacent frames to add together to unity to avoid gain modulation. This doesn't fully solve other issues that periodicity can cause, but at least it helps suppress the most obvious ones. The case of overlap with fast-convolution is a bit different, because there we arrange frame artifacts to cancel out exactly, but again the overlapping is there to deal with something specific.

If you're doing pure analysis, you generally still want a window, but if we're not going to be doing IFFT then the time offset from one frame to another is effectively a free parameter. If you skip data, then there's the possibility you miss something important, if you overlap the data you get results that vary smoother from frame to frame and there's less chance that something important never aligns with the part of the frame where the window function is large... but there's nothing special about overlap here, we're just looking at the signal at more points in time, so the question is basically "what is good enough for you?"

Post

Thanks! So I think I will want to do overlaps in my case then is what I’m thinking. I guess my question now is when/how to correctly sum the overlaps for the feature analysis? Because usually when I do spectral processing it is on real time audio and I sum the overlaps after the window following the inverse fft. Since I am not doing an ifft and just getting features, would I sum the overlaps immediately after the fft before the feature analysis? Thanks!!

Post

Think of the results of every FFT frame as an estimate of the signal around the frame midpoint, a point in time. If you overlap frames, you just get the time series on a denser grid. This won't necessarily give you "true" additional resolution in the sense that it's the FFT framesize that ultimately controls how much stuff is smeared over time, but it will still give you more data points over time. If you want to analyze each of these time points, just analyze each of the points. If you want to average over time, then average over time... but what is "correct" depends on what exactly you are trying to do.

Post

Thanks for the answer. I hear what you are saying but maybe need a little more clarity I might be misunderstanding something.. my understanding is that overlaps arent averaged but they are staggered at the percentage of overlap (overlap adding) and that that serves more than one purpose but one such purpose is to add more frequency info that is lost from the windowing, so if I’d want an accurate representation of the frequency content shouldn’t I still want to overlap? I get what you are saying about it depends on what exactly I need, but just trying to clarify if my understanding of overlap is somewhat accurate!

Post

That's the thing, overlaps serve a purpose and this purpose depends on application.

For overlap-add in fast-convolution that purpose is to let any filter ringing extend into the next block and cancel out discontinuities at frame boundaries, but the input blocks in this case are not really overlapped.

For spectral processing where you perform some non-linear operations on a frame (and for stuff like DCT based lossy compression, etc), overlapping serves the purpose of fading between blocks and you generally choose windows so that they add together into unity after overlap, so as to cancel out any gain modulation. Here the input blocks typically would be overlapped.

For analysis, if we're keeping the data in spectral domain, there is no time within any given frame. This is fundamental, because FFT takes data from time-domain to frequency-domain, so essentially you have one or the other at any given time. The only notion of time in spectral domain then comes from how the results from one frame differ to the next. Since there is no time within frames, there is no meaningful way to overlap the frames (post FFT) either. You just have more datapoints. Extract whatever features you want to extract from each of those frames and then you have a denser grid of features. If you don't need the dense grid, but would still like to overlap to avoid dropping any data, then you can average out (ie. essentially resample to a lower rate) the final results.

The point I'm trying to make here is that overlapping is not fundamentally part of FFT, rather it's something done by various algorithms built on FFT.

Post

For measuring the audio spectrum (e.g. for a spectrum analyzer) you want the windows close enough together that the whole signal power is captured (so 50% to 75% overlap depending on window shape) and there are no gaps between, but measuring more often than that doesn't make it more accurate.

Measuring something derived from the spectrum like pitch or phase, each FFT result is already the average of the window contents so having more overlap may or may not help. Instead of giving you a sharper signal it's just giving you more points along an already "blurry" signal.

Post

Thanks all!

Post Reply

Return to “DSP and Plugin Development”