KVR Audio

Mike Greene · Post by **Mike Greene** » Fri Apr 24, 2015 4:26 am

My company, Realitone, makes vocal sample libraries. We trying to push the boundaries of what the libraries can do, so I would like to take vocal samples and split them into two components: pitch and noise. The "noise" component is not white noise or room noise, but rather is the throat and mouth "air."

These two components would then be re-combined on playback in Kontakt. The reason for wanting these components separated has mostly to do with phase aligning samples. This video (cued to 1:26) shows the end result of what I'm going for:
https://youtu.be/Ri64NFdHKA4?t=1m26s

I suspect this process might be possible with existing apps, but if so, I don't know which apps would do this. I can sorta do this with iZotope's RX, by separating out the harmonics one by one, and what's left is the noise, but that is very time consuming. I would prefer (and am willing to pay for) an app that just figures out what is pitch data, removes that, then what's left is noise data.

Any thoughts on how possible this might be? Anyone interested in doing it?

whyterabbyt · Post by **whyterabbyt** » Fri Apr 24, 2015 7:08 am

You could look at the IRCAM TRAX plugins and possibly Aspire and a couple of the other voice processors by Antares.

deastman · Post by **deastman** » Fri Apr 24, 2015 7:21 am

Sounds to me like you should give Izotope a call.

wakax · Post by **wakax** » Fri Apr 24, 2015 7:26 am

also this is a free workaround:
( free vst/ladspa plug - do a batch with it )
Sin + Noise: Decompose the sound into pitched (Sinuses) and unpitched (Noise) components, and allow the level of each part to be adjusted

http://www.pitchtech.ch/Plugins/index.html

Mike Greene · Post by **Mike Greene** » Fri Apr 24, 2015 11:25 pm

Thanks for the suggestions, guys. I have IRCAM TRAX and Antares' Aspire, but those won't give me two separate elements. Izotope's RX will, but it's not an easy/quick process.

The PitchTech Sin+Noise app looks very promising. Their audio demos doesn't seem at all what they're describing, but I'll give it a try for myself.

xoxos · Post by **xoxos** » Sat Apr 25, 2015 6:47 pm

if you're paying or that serious about it, why not take the process a step further and use linear predictive coding to separate the source from the filter (perry cook's sdk), allowing continuous modulation of source and filter and establishing a superior standard for products in this category?

Mike Greene · Post by **Mike Greene** » Sat Apr 25, 2015 7:15 pm

xoxos wrote:if you're paying or that serious about it, why not take the process a step further and use linear predictive coding to separate the source from the filter (perry cook's sdk), allowing continuous modulation of source and filter and establishing a superior standard for products in this category?

I am indeed serious about paying for an app that streamlines the "separate tonal elements from noise" process. It sounds like you're talking about going beyond that, though?

I did a little poking around on Perry Cook's SDK site, and from what I can understand of it (I'm not a real coder myself, other than Kontakt's KSP language) this might be of use to create a synthetic vocal, or physical modeling of a vocal? Am I correct in that that's what you mean? This might be something I'd be interested in, but it's much less familiar territory for me, and I'd imagine considerably more expensive than what I'd doing now.

If it's promising, I could do it, though. Can you give me a basic overview of what you think I could do in this direction?

Smashed Transistors · Post by **Smashed Transistors** » Sat Apr 25, 2015 8:00 pm

By "pitch component", do you mean strictly harmonic component... something you plan to use in wavetables ?

Quite a long time ago i did something like this
(AES Preprint AES 104th convention 1998 #4664
T. Rochebois and G. Charbonneau
Method for Multiple Wavetable Synthesis of Musical Tones Based on Complex Valued Principal Component Analysis )

The "complex valued" component allowed us to resynthesise the "pitched" part taking account of the phases. A simple substraction allowed us to extract the "non pitched" component...

The drawback is that many non harmonic components neither fit in the noise category nor in the pitch category.
fast transients
subharmonics / octaviation
wolf tones
partials
beating harmonics
...
In the case of voice i think that you may be careful with transients and octaviation... these are not harmonic but they are not random.

Mike Greene · Post by **Mike Greene** » Sat Apr 25, 2015 8:31 pm

Smashed Transistors, I only understand about half the words in your paper’s title!

I get the gist, though, and your point is well taken.

To make this clearer, here is an example of what I’m looking to do. Here are two wave files, one is the tonal component, and the other is the noise component of a vocal “oh.”
http://realitone.com/uploads/misc/exampleohtone.wav
http://realitone.com/uploads/misc/exampleohnoise.wav

As you can hear, the “noise” element is basically throat noise, and is independent from the tonal element in terms of sync. (Other than at the beginning, of course, since you can hear the transient in the noise file. In my method, the transients themselves would always be in sync.)

There are a few reasons I want the noise as a separate component. The biggest one is that it gives me more flexibility over dynamics with fewer samples, since the noise is the “breathiness,” which is most noticeable in quiet samples.

For example, if I had a mezzoforte sample of “oh,” I could more effectively “fake” a quieter oh by eq’ing and lowering the volume of the tonal component, but at the same time, keep the noise component level the same, or even raise it.

Smashed Transistors · Sat Apr 25, 2015 10:01 pm

I think that you need something like:
http://mtg.upf.edu/files/publications/P ... xserra.pdf

Maybe you can find some tools/old software based on this work.

After this thesis. Dr Xavier Serra joined Yamaha.
He is one of the fathers of the http://www.vocaloid.com/en/

Shabdahbriah · Post by **Shabdahbriah** » Sat Apr 25, 2015 11:52 pm

Fascinating subject:.

This may be of some interest to you, Mike:

http://forumnet.ircam.fr/product/audiosculpt/

It is only available through a Premium Subscription, and Mac only however.

http://forumnet.ircam.fr/product/indivi ... ium-offer/

There would of course also be the potentiality of finding someone on their forum, to code an app/add a script, to address your specific needs.

Mike Greene · Post by **Mike Greene** » Mon Apr 27, 2015 6:56 pm

Smashed Transistors wrote:I think that you need something like:
http://mtg.upf.edu/files/publications/P ... xserra.pdf

That's going to be my bedtime reading for the next few nights. (Seriously!

) It may or may not apply directly to a noise/tone separation app, but there's some very useful study in there.

Shabdahbriah, thanks for the IRCAM link. That may be just the place to look for someone if nothing pans out with the suggestions so far.

earlevel · Post by **earlevel** » Wed Apr 29, 2015 8:06 am

Hi Mike,

I read through quickly (trying to get to bed), so I'm not sure if the "noise" portion should be a residual (original sample minus the sinusoidal components) or synthesized (the former will be more exact, the latter more flexible for transformations), but in either case it's pretty easy. Smashed Transistors mentioned Xavier Serra, and his thesis. I took Xavier's course (and passed—not so many people made it to the end) that covered this sort of thing, and it works very well for what you're asking—pretty impressive.

I think I'm too busy (and probably too expensive) to take on the job, but take a look at the course videos—week 7 hits what you're after (harmonic model plus residual, or stochastic). Like others have said, there's probably existing software that can do the job, though perhaps you need more flexibility than what they offer. The course uses a Python (with some C) framework, so you code your own degree of flexibility:

Audio Signal Processing for Music Applications

BTW, I thought Xavier did a terrific job with the videos and the course overall.

Mike Greene · Post by **Mike Greene** » Wed Apr 29, 2015 4:38 pm

Yes! I’ve been reading the thesis Smashed Transistors linked, which has been good, but these videos are even better. I watched a couple Week 7 videos, and you’re right - he is doing exactly the separation process (yes, I’m thinking residual noise, as opposed to synthesized noise) I am looking for.

It looks like I can get his nifty sms-tools app here:
https://github.com/MTG/sms-tools

In the video, he just does playback with this app. I’m hoping it can also output wave files. It will probably take me some time to figure out how to even install it (Python and C are both new territory for me), but I’m optimistic. Thank you!

earlevel · Post by **earlevel** » Wed Apr 29, 2015 4:47 pm

Great, I'm glad that was helpful.

For the sms-tools, it's pretty easy to get going with Ubuntu (I started out trying to to use my Mac's FreeBSD, realized how much work it would be, created a Ubuntu VM instead, and was working with the tools in minutes). Yes, you can output sound files. The course required uploading wave files for some of the assignments. You can get very good results with the tools.

Pay someone to build app to split voice samples into pitch and noise components