Any science to explain the “weight” or “3D depth” of hardware audio vs software that some people claim?

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

I read an understood exactly what you meant. It's rubbish. Absolute, complete and utter rubbish.
NOVAkILL : Asus RoG Flow Z13, Core i9, 16GB RAM, Win11 | EVO 16 | Studio One | bx_oberhausen, GR-8, JP6K, Union, Hexeract, Olga, TRK-01, SEM, BA-1, Thorn, Prestige, Spire, Legend-HZ, ANA-2, VG Iron 2 | Uno Pro, Rocket.

Post

somebody needs a hug :hug:

Post

an diction lesson

Post

and diction lessons?

Post

Sometimes it's like Wine people. If you pour cheapo wine in an expensive wine bottle without them knowing, they will be convinced it tastes more refined - I've seem it happen, but they had had a few by then! :D

Post

A very exciting topic. I would like to quote on Christoph Kemper from Access, who was once quoted here in another thread and would like to give a different view on this topic:
Clearly yes, in theory. I can digitally record any analog synthesizer - whether with a recorder or a workstation - and no one will doubt that it sounds the same afterwards. Maybe except for the people who say it should have been recorded with an analog tape machine (laughs). That's why I have the numbers standing there at some point, in samples. That means you can represent "analog" in numbers. After that you just have to have the right program that generates the numbers - so that the numbers come out exactly as if I had actually recorded the analog before. So you can't say "it doesn't work", you can only say "nobody did it yet" or "nobody did it yet", but "it doesn't work" is bullshit.
Under this premise, every analog signal can be represented in digital numbers (it is anyway, because everyone only listens digitally, even all these blindtests are based on a digital / interpolated version of the signal!). So theoretically it should be possible to generate these numbers - in this case there should be absolutely no difference to hardware. Provided that the hardware was recorded or listened to via the same converters (same interpolation).

In other words, the difference you are asking for should be fairly easy to show by a null test.
I am even of the opinion that all developers who advertise to sound like the original should provide a simple A/B comparison to each individual plugin or instrument of the corresponding hardware. Simple as that.

The problem with most of the zero tests is just that they are done wrong. You can't compare a pure static waveform (or noise/sine sweeps), you should compare a complete musical dynamic phrase to see how envelopes, filters or the sound reacts and changes over time.
A musical dynamic phrase played (or played through an fx hardware), 1:1 that's all it takes. Then you can see exactly what's the difference, what's the magic, what's 3D, what's organic and so on. If the numbers are the same (see Kemper) it will sound absolutely identical. If the numbers are not the same, there is your magic.

And when you do, you will find that quite often there is still an audible difference. And phrases like "different calibration" or "hardware tolerances" don't help here either. There are simply too many variables that still aren't be reproduced in the digital world or are simply dismissed as unimportant but in the end still make a difference to the sound.

How identical the numbers have to be, that is up to each individual to decide. But as long as the numbers are not identical - there is a difference that may be mentioned.

Post

soundmodel wrote: Sun Jan 12, 2020 3:03 pm ^Good points yes.

However, since many people refer to weight and "3D sound", then I think this implies that the phenomenon is not entirely subjective. But there are some unifying "properties".

But I'm more interested as to what "weight" or "3D" even mean in acoustics sense.

Is weight merely an "EQ bump"? Or is there more to it? Some phase relationships for example? Is it a time-relative process, rather than stationary?

Or "3D"? Is it EQ, dynamics changes, spatialization? All of them? Are there some particular "3D frequencies" or frequency areas?
Terms such as "weight" are essentially useless at describing sound. Trying to differentiate between "hardware" and "software" is not useful either. If you have a digital piece of hardware running some particular algorithm and then a piece of software running the same algorithm, you will get the same results.

As far as 3D goes, spatialisation is actually understood to a certain extent. Beyond the ILD/ITD cues, localisation in 3D appears to be a combination of overall EQ balance and the notches created by torso reflection and pinnae echoes. Fairly broad EQ changes can actually give surprisingly decent localisation (together with ILD/ITD), if the sound is familiar or lasts long enough that the listener can estimate the original balance (and hence the location) by turning their head (or by having a moving sound source in a binaural recording, etc). While proper pinnae echoes (matching the listeners ears) will always help, they seem to be particularly important for short unfamiliar sounds, where the location cannot be inferred from other cues easily.

Now, in musical context, you can certainly "move" sounds to a certain extent by changing the EQ balance. Perhaps unsurprisingly the traditional mixing advice for this stuff actually matches actual overall responses measured from HRTFs reasonably well. However, there is no obvious reason why this would be any different between "hardware" and "software."

There are some things that could theoretically modify the spatial perception, but one should be careful to attribute these to "hardware" or "software" as such. For example, typically dry synthesiser sounds and the like will sound more "3D" if you add some echoes (eg. with reverb). If some instrument naturally provides low-level echoes (or something resembling low-level echoes; truncated and minmax FIRs can produce low-level "echoes" for example and there is some scientific literature suggesting that this might be more perceptible than the actual response ripple) for any reason, then that could in theory change the spatial perception.

With analog hardware, you additionally do have (necessarily) some amount of instability, which could again change the spatial perception, for example by drawing more attention to the sound. If you had two sound source, both of them having very slight frequency variations uncorrelated with each other, then it would seem plausible that this might help the brain differentiate between the source, which one might perceive as more "depth" or whatever. This stuff can be measured though (eg. the easiest way to visualise such things is to put the audio through a comb-filter and see how the notches move).

Also with regards to non-linearities on a signal path, adding noise to the signal does affect IMD products (for better or worse), which again adds random variation to the sound (beyond just the noise itself) that could potentially be important for the overall perception. In fact, I believe that it is impossible to really emulate any piece of analog hardware properly without also emulating the noise, even though the noise itself is usually considered undesirable.

Either way, the point I'm trying to make here is that going by some vague notions of "feels good" descriptions is not terribly helpful. There might or might not be some substance to some of such claims, but they certainly won't generalise to a broad "hardware" vs. "software" distinction. That said, personally I think that trying to actually objectively measure and quantify those distinctions is a whole lot more useful than trying to ABX-test subjective perceptions. Sometimes we can measure things that nobody can reliably hear, but at the same time our analysis tools have their own limitations, so it is reasonable to assume that we might also hear some things we don't currently know how to measure.

But please.. just stop with the "hardware vs. software" generalisations, because that sort of thing does more harm than help.

Post

Meffy wrote: Sun Jan 12, 2020 10:49 pm
imrae wrote: Sun Jan 12, 2020 10:37 pm "Soundstage" is a word often associated with expensive headphones. I have no idea what it means.
Audiophiles and, I suppose, other gearheads use "soundstage" to mean the imagined/perceived positions of the individual parts of a mix within the 2-D space around the listener. (With speakers above and below this can be extended into 3-D.)

If I understand right, the claim is that pricey headphones or monitoring setups of whatever kind provide a listener more cleanly defined position for each sound, while cheaper gear is defective in this regard.
So from a technical standpoint it just means a consistent response between the two cans? That seems like a QC attribute rather than anything to do with the general design of the headphones.

Post

Basically if you want to be scientific about it, first measure a potential difference, then implement a plugin where that difference can be enabled and disabled at will and then ABX test with and without, so you actually know what you are ABX testing in the first place.

Post

mystran wrote: Mon Jan 13, 2020 12:18 pm
soundmodel wrote: Sun Jan 12, 2020 3:03 pm ^Good points yes.

However, since many people refer to weight and "3D sound", then I think this implies that the phenomenon is not entirely subjective. But there are some unifying "properties".

But I'm more interested as to what "weight" or "3D" even mean in acoustics sense.

Is weight merely an "EQ bump"? Or is there more to it? Some phase relationships for example? Is it a time-relative process, rather than stationary?

Or "3D"? Is it EQ, dynamics changes, spatialization? All of them? Are there some particular "3D frequencies" or frequency areas?
Terms such as "weight" are essentially useless at describing sound. Trying to differentiate between "hardware" and "software" is not useful either. If you have a digital piece of hardware running some particular algorithm and then a piece of software running the same algorithm, you will get the same results.

As far as 3D goes, spatialisation is actually understood to a certain extent. Beyond the ILD/ITD cues, localisation in 3D appears to be a combination of overall EQ balance and the notches created by torso reflection and pinnae echoes. Fairly broad EQ changes can actually give surprisingly decent localisation (together with ILD/ITD), if the sound is familiar or lasts long enough that the listener can estimate the original balance (and hence the location) by turning their head (or by having a moving sound source in a binaural recording, etc). While proper pinnae echoes (matching the listeners ears) will always help, they seem to be particularly important for short unfamiliar sounds, where the location cannot be inferred from other cues easily.

Now, in musical context, you can certainly "move" sounds to a certain extent by changing the EQ balance. Perhaps unsurprisingly the traditional mixing advice for this stuff actually matches actual overall responses measured from HRTFs reasonably well. However, there is no obvious reason why this would be any different between "hardware" and "software."

There are some things that could theoretically modify the spatial perception, but one should be careful to attribute these to "hardware" or "software" as such. For example, typically dry synthesiser sounds and the like will sound more "3D" if you add some echoes (eg. with reverb). If some instrument naturally provides low-level echoes (or something resembling low-level echoes; truncated and minmax FIRs can produce low-level "echoes" for example and there is some scientific literature suggesting that this might be more perceptible than the actual response ripple) for any reason, then that could in theory change the spatial perception.

With analog hardware, you additionally do have (necessarily) some amount of instability, which could again change the spatial perception, for example by drawing more attention to the sound. If you had two sound source, both of them having very slight frequency variations uncorrelated with each other, then it would seem plausible that this might help the brain differentiate between the source, which one might perceive as more "depth" or whatever. This stuff can be measured though (eg. the easiest way to visualise such things is to put the audio through a comb-filter and see how the notches move).

Also with regards to non-linearities on a signal path, adding noise to the signal does affect IMD products (for better or worse), which again adds random variation to the sound (beyond just the noise itself) that could potentially be important for the overall perception. In fact, I believe that it is impossible to really emulate any piece of analog hardware properly without also emulating the noise, even though the noise itself is usually considered undesirable.

Either way, the point I'm trying to make here is that going by some vague notions of "feels good" descriptions is not terribly helpful. There might or might not be some substance to some of such claims, but they certainly won't generalise to a broad "hardware" vs. "software" distinction. That said, personally I think that trying to actually objectively measure and quantify those distinctions is a whole lot more useful than trying to ABX-test subjective perceptions. Sometimes we can measure things that nobody can reliably hear, but at the same time our analysis tools have their own limitations, so it is reasonable to assume that we might also hear some things we don't currently know how to measure.

But please.. just stop with the "hardware vs. software" generalisations, because that sort of thing does more harm than help.
I mentioned the MPEG-7 low level audio descriptors earlier in the thread, and this is where I think they could be interesting - in trying to quantify vague 'feels good' terms. Spitballing here, but if we take an easily understood term like "brightness" as an example - if we play someone 2 sounds and ask which is brighter, we might find "brightness" can be quantified as "the size of the difference between the fundamental frequency and the spectral centroid". What then happens when we ask which sound is "weightier", which sound is "warmer"... might we find correlations in the descriptors that tend to pop up when people use these subjective terms? Will we find one listener applies their own terms consistently? How about correlations across multiple listeners using the same subjective word?

Post

3d depth is usually designed through the clever use of reverb. In principle, the brain is capable of reconstructing the spatial relationship of objects in a space by analyzing reverb characteristics. Blind people do this all the time, that is how they "see". If you have hardware that is capable of presenting the details of these reverb characteristics better, you will get a better 3d impression. And, of course, you can do all kinds of tricks to emphasize that effect.

Imho, the phenomenon of soundstage in headphones has to do with the openness of the headphones and how well what you hear integrates with the sound characteristics of the room you are in. It is something you will find in open back headphones because they do not shut you out completely. I hear people talk about soundstage of closed back headphones but I am not quite sure what that is supposed to be exactly to be honest (other than what I said before).
Follow me on Youtube for videos on spatial and immersive audio production.

Post

Reading this thread really explains why music these days in general sounds like sh*t, without any character. FLAT af.
{"panic_string":"BAD MAGIC! :shrug: (flag set in iBoot panic header), no macOS panic log available"} "Apple did not respond to a request for comment."

Post

[Delete]
Last edited by ijiwaru on Fri Jan 22, 2021 2:39 pm, edited 1 time in total.

Post

Its all about fabrication processes. Once upon a time, the lithographic processes required to create an integrated chip were quite large-scale, in the micrometres. That meant that, because of their relative width, every 'wire' was transferring lots of electrons at a time. For analog devices this is still true, the process is still basically on the same scale as it was 20,30 years ago. However it has shrunk.
However, as digital technology has becomes prevalent, the lithography processes required to support faster and faster CPUs processing has been shrunk massively, down to tens of nanometeres, the scale of those wires is much much smaller, and as a result, far less electrons pass along each wire in a modern processor.

Lithography scales: https://en.wikichip.org/wiki/50_%C2%B5m ... hy_process

Its quite obvious that the more electrons make up your audio signal, the heavier that audio will be. And its part of why even older digital hardware always sounds better than modern softsynths(*). There's just more combined weight of electrons.

(* well, that and smart aliasing, obvs)
my other modular synth is a bugbrand

Post

ijiwaru wrote: Mon Jan 13, 2020 2:19 pm
Any science to explain the “weight” or “3D depth” of hardware audio vs software that some people claim?
Yep

https://en.wikipedia.org/wiki/Confirmation_bias
Works both ways, btw. ;)
Follow me on Youtube for videos on spatial and immersive audio production.

Post Reply

Return to “DSP and Plugin Development”