Photosounder

pittsburghjoe · Post by **pittsburghjoe** » Thu Jan 24, 2019 1:40 pm

I'm told a major limitation to creating a unique sound using only a magnitude spectrogram is the phase information. Would it be possible to feed Photosounder a phase spectrogram on along with the frequency one?

https://www.reddit.com/r/audio/comments ... uld_guess/

A_SN · Post by **A_SN** » Thu Jan 24, 2019 3:40 pm

By "phase" they mean the phase of the complex bins in the short time Fourier transforms that are typically used for spectrogram analysis. However unlike pretty much everything else Photosounder doesn't use the STFT technique at all (too basic for me), Photosounder filters the sound into hundreds of narrow bands in frequency and calculates their envelope, which allows for extra flexibility.

People won't shut up about phase but the truth is phase can't possibly make any visual sense for anyone due to how highly specific it is to the method being used (it represents the phase of one frequency component in one of the short windowed FFTs that make up one column of the image), so in other words it's information that you couldn't use or create yourself, you couldn't possibly know what phase bins to create to create the sound you'd want, the only thing it's good for is reconstructing the original signal, which is what Photosounder does with its lossless mode where it modulates each of the filtered frequency bands to obtain the desired transformation.

You say you want to "Create a new sound from scratch using only visible data", that's exactly what Photosounder is for, I suggest you try the demo. Load a sound into one layer, then create another layer and draw over it, turn off the original layer and see how it sounds.

Edit: Note that oratory1990's answer and much of what I said is about analysing a sound, turning it into a spectrogram, then turning that spectrogram (image) back into a sound. You want to draw a sound, in which case I can confirm that there couldn't possibly be any "phase" information, that's only for reconstructing a specific waveform. Also when he said that white noise and a sine sweep have the same frequency profile that's only right for a single Fourier transform over the whole sound but wrong for the topic at hand which is time-frequency analysis, in which case a noise looks like a full noisy area while the sine sweep looks like a thin diagonal line.

pittsburghjoe · Post by **pittsburghjoe** » Sun Jan 27, 2019 1:12 am

You are a 1000x more knowledgeable on this topic than I. I saw this and want you to see it https://youtu.be/OadA2gZBLuA?t=212

You couldn't possible know what to draw for phase ..but if you provided an image of what is being used (per layer) ..we could trial n error it for the fun of it.

A_SN · Post by **A_SN** » Sun Jan 27, 2019 8:59 pm

I don't know what you expect but the only difference between precise phase and random phase is how faithful it sounds. So if you just want to draw sounds it's not really relevant.

pittsburghjoe · Post by **pittsburghjoe** » Mon Jan 28, 2019 1:10 pm

It irks me that there are limitations. You are saying there isn't a way to display or edit phase. How about giving me a knob labeled "phase" that doesn't actually do anything?

A_SN · Post by **A_SN** » Mon Jan 28, 2019 1:56 pm

Again, as I've already explained, Photosounder doesn't use STFTs therefore there's no such thing as "phase" in it. You gotta ask yourself what you actually want, I'm suspecting that you don't really know other than wanting all possible options for creating sounds, which is fine, but doesn't necessarily make sense.

woggle · Post by **woggle** » Sun Feb 03, 2019 3:05 am

if you take the FFT of a sound, randomise the phase and then take the IFFT you get a noise that has the same power spectrum as the original sound. But it is a noise, which means phase is what gives the sound spectral structure over time. When using the FFT algorithm. But not if one uses another algorithm - say one in the time domain.

As Photosounder doesn't use an FFT it makes no sense at all to talk about using phase in Photosounder. Phase is not something in the algorithm that is used. A sound does not have phase - phase is in the FFT algorithm as part of how it reproduces a signal. Not in the signal itself.
It's like you are wanting to use a photo of someone singing to reproduce what they are singing. Perhaps better - you have a drawing and you want the RAW data from the drawing so you can tweak the shadows

A_SN · Post by **A_SN** » Tue Feb 05, 2019 6:22 pm

Well said, maybe I should make a video explaining how Photosounder does those things since it's actually not complicated and well suited to graphical explanations, and I've only done videos about operating the program but never about how it works. Maybe if I explain both the noise bank synthesis and the "lossless" mode I will stop getting emails about how bad resynthesis sounds or people asking about phase.

woggle · Post by **woggle** » Wed Feb 06, 2019 7:10 pm

A_SN wrote: ↑Tue Feb 05, 2019 6:22 pm Well said, maybe I should make a video explaining how Photosounder does those things since it's actually not complicated and well suited to graphical explanations, and I've only done videos about operating the program but never about how it works. Maybe if I explain both the noise bank synthesis and the "lossless" mode I will stop getting emails about how bad resynthesis sounds or people asking about phase.

good idea - knowing the process certainly helps me use something creatively

pittsburghjoe · Post by **pittsburghjoe** » Sun Feb 17, 2019 11:30 pm

https://en.wikipedia.org/wiki/Constant-Q_transform

woggle · Post by **woggle** » Mon Feb 18, 2019 8:14 am

pittsburghjoe wrote: ↑Sun Feb 17, 2019 11:30 pm https://en.wikipedia.org/wiki/Constant-Q_transform

looks a type of multiresoultion analysis - of no use for what you originally asked

pittsburghjoe · Post by **pittsburghjoe** » Mon Feb 18, 2019 2:36 pm

All I'm asking is that he rewrite his app to allow AI interaction https://magenta.tensorflow.org/nsynth

A_SN · Post by **A_SN** » Mon Feb 18, 2019 5:37 pm

pittsburghjoe · Post by **pittsburghjoe** » Tue Feb 19, 2019 2:04 am

Yes, I was kidding, but know you would be smart to look into it, before someone else does; TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer.

Rainbowgram, is what I've been hunting, the colors represent phase. Here is a gawd awful explanation/presentation of the future. https://www.youtube.com/watch?v=lwEwJURVdts

or maybe this http://www.johnglover.net/blog/learning ... audio.html

https://neural-music-synthesis.github.io/

https://ai.honu.io/presentations/sing/s ... ia.html#/5

A_SN · Post by **A_SN** » Tue Feb 19, 2019 7:26 am

Well I don't want to go on a whole rant about how we've reached peak hype for AI/ML with little of actual practical value to show for it mostly in artistic fields (generating some weird stuff automatically from some input doesn't rank very high for me), so I'll just tell you that I have no interest in neural voodoo synthesis. I really don't see the interest, none of what I've seen or heard about it seems compelling, but more importantly it's the approach that I find uninteresting due to the lack of control. Again it's largely voodoo, you throw a bunch of "training" and massive amounts of computing power at something and it makes a weird poor quality sound, why? I'm much more interested in creating rational tools to let human intelligences control every aspect of creation or transformation. You'd think that given what I do I'd be interested in whatever that can create weird sounds but actually I'm not, my interest was always to control sound in a direct and clear way based on sound principles, not just throwing algorithms at some input to see what happens. I'm smart enough to do what you want but more importantly I'm smart enough not to do it

I think it's perhaps a philosophy thing, I see computers as fancy calculators that can do pixels and sound samples and I know exactly what I want to do. It seems that AI enthusiasts will be happy with any results they consider cool enough (they set the bar for cool rather low) and see computers as potentially magic and try to make them do unscientific magical things that are easy to be made to work poorly but impossible to be made to work well due to the usually chimeric nature of the desired goal (e.g. "automatically colourise a black and white image with no other input", easy to make it guess that the sky is blue and grass is green, impossible to make it generally close to accurate, enthusiasts usually dismiss such fundamental limitations by extrapolating that everything will get better as we approach the techno-rapture and their AI god kicks the abrahamic god in the teeth). I'm trying not to rant too much, but isn't it amazing how so much of computer """science""" in the late 2010s is mostly becoming "hey let's just throw TensorFlow and a bunch of GPU cores at it and see what happens" and they can only speculate as to what their algorithms actually do? And a lot of the people involved firmly believe in the AI techno-rapture they call "The Singularity" and that their work will help immanentise the AI eschaton. It's like a philosophical degenerative disease that makes potentially very capable people waste their time on pointless endeavours, kind of like string theory did for astrophysics, but more as a new religion for atheists. Ah well, that leaves more low hanging fruits for me to pick.

But you already picked your algorithms and application and you're interested in it, do like me, I didn't ask the creators of Coagula or Metasynth to make Photosounder for me (then again I didn't know about them until after I already released something), I just went ahead and did it despite hardly any relevant education beyond high school, even back in 2005 the Internet had all that one needed. What you're asking for is beyond the scope of my work and doesn't even make use of any of my algorithms, and it doesn't fit my goals at all.

Photosounder

KVR Audio

How is Phase handled?