Google WaveNet sound and speech synthesis

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Hi! I'm not into DSP programming, but I stumbled across this and thought it might interest some of you:

https://deepmind.com/blog/wavenet-gener ... raw-audio/

It's a deep neural network developed for improving speech synthesis. The speech synthesis is extremely good, but even more fascinating is that the network can be trained on any input data. Scroll down and check out the piano playing examples. :hyper: Unreal, though there might be a reason why the examples are so short...

I also love the random babble examples, although possible uses probably still have to be found. :party:

Link to the paper:

https://drive.google.com/file/d/0B3cxcn ... JINDQ/view

Post

:tu:

Post

Deep learning allows you to do that indeed. But then, you can't known why something happens. Neural network in all their glory.

Post

Can this even produce animal sounds?

Post

I'd like to ask the 1000000$ question : would it have any use to train a deep neural network to make audio effects as well ? Analog modeling stuff too ?

Post

Definitely. It would do so and you wouldn't even know how it did it.
The main issue is of course that to do, you need non linear neurons, and these are acheieved through the sigmoid function (i.e. une exponentielle). Avec plusieurs niveaux de prodondeur + une largeur des données d'entrée assez importante, ça risque de coûter cher pour du temps réel.

Post

And i even wait for the day when (Google WaveNet) is even used to do mastering on songs.

Post

Ivan_C wrote:I'd like to ask the 1000000$ question : would it have any use to train a deep neural network to make audio effects as well ? Analog modeling stuff too ?
An interesting concept would be to make a plugin that learns what an individual user prefers. Hence it slightly modifies it algorithms to match a users preference. That would be incredible. ;)

Hence presets could almost be various user interest profiles.

Post

Amazing.
Yet another stream of R&D hurtling into the future at an exponential rate towards powerful ends.

So I guess our inevitable AI robot overlords wont have to sound so cold and detached at least. :)
ImageImageImageImage

Post

I really need to go deep into this topic one day :D
And i even wait for the day when (Google WaveNet) is even used to do mastering on songs.
An interesting concept would be to make a plugin that learns what an individual user prefers. Hence it slightly modifies it algorithms to match a users preference. That would be incredible. ;)

Hence presets could almost be various user interest profiles.
I just remembered I have heard some "online automatic mastering applications" use AI already... But they are not that good yet...

Post

Ivan_C wrote: I just remembered I have heard some "online automatic mastering applications" use AI already... But they are not that good yet...
Well, copywriters can say anything is AI, even just a simple set of static rules.

In the case of automatic mastering, this will be a neural network tweaking some knobs so that certain *measurements* are similar to a reference.

I think the challenge with this is not the artificial intelligence itself (setting up and learning a network isn't too hard), but equipping it with the right "senses", which means which measurements to extract from audio that have some meaning.

Post

The speech modeling is fascinating, obviously, but just think how this conditioning could be applied to the timbre/performance of individual instruments: using the right dataset, it could be made to model, say, classic analog leads, controlled by user preferences to not only offer modifiable sound parameters but adapt performance qualities distinct to that instrument/style (or that of another). A synthesizer that performs like an operatic soprano, a cello that plays like a maraca, or a drum kit that sounds like speech.

Post

The teams using Deep Learning for speech recognition do not use yet unprocessed audio samples, but a smaller set of descriptors based on a FFT.
While their goal is to remove this layer, it's not done yet for performance and trainability reasons.
Checkout our VST3/VST2/AU/AAX/LV2:
Inner Pitch | Lens | Couture | Panagement | Graillon

Post

Me and my incredibly nerdy friend have used RNNs to analyse audio files to 'teach' a neural net how to construct audio data. We even had an super-fast GPU to farm out the calculations to, and it STILL took 48 hours to generate a very strange noise sample. 5 seconds long. We used one engine to analyse a file at the sample level, then we found another engine which used an FFT-aware brain, the output of which was a bit more 'normal'.. still very simple sine wave constructions.

I will see if he has some time this weekend to have a look at this. WOW, those samples nearer the end are brilliantly confusing, maybe we can create some in our lab. It sounds very much like trying to listen to someone talk while having a quite intense LSD rush.

I think I'm going to feed our engine with some kraftwerk.

Post

The teams using Deep Learning for speech recognition do not use yet unprocessed audio samples, but a smaller set of descriptors based on a FFT.
I do not know about deep learning based speech recognition, but more classical approaches use feature descriptiors based on mel frequency cepstrum. That's something like fft of log of fft of speech, aligned to human ear sensitivities (not exactly, as the second step is a dct). More info can be found here https://en.wikipedia.org/wiki/Mel-frequency_cepstrum https://en.wikipedia.org/wiki/Cepstrum
~stratum~

Post Reply

Return to “DSP and Plugin Development”