KVR Audio

tygrodlak · Post by **tygrodlak** » Thu Apr 04, 2019 9:41 pm

Hi, I'm working on a plugin that can process human voice in various ways, it is based on some concepts of artificial inteligence - neural networks - and I already have a working proof of concept/prototype, no plugin, running from command line only, but it seems it is gonna work fine.

For this thread lets just assume it can really turn a terrible singing into stellar singing. Yet it needs to be adjusted to user's voice, which means they have to follow through some under five minutes singing exercise and then have this recording processed by my software - after this setup phase, they can run the plugin in real time from their DAW.

My question to you is would you consider using a technology with such an entry barrier? Or would it turn you off? Thank you for your opinions, Martin

DPhil · Post by **DPhil** » Thu Apr 04, 2019 9:58 pm

Hi Martin. Autotune, Revoice or Melodyne makes me sing good already without adjustment.

Is your adjustment a one time thing or do I have to make it everytime I record something new (because my room may have changed, or I have a bad day, or I get older and my voice deeper)?

Why 5 minutes? Is there room for improvement in time?

I'd say it's the result that makes this time worth or not.

tygrodlak · Post by **tygrodlak** » Sat Apr 06, 2019 8:07 pm

Thank you, all of these are good points I will try to take into account during the development.
Autotune and melodyne work rather with pitch while I focus more on timbre, there is a lot of people who don't sing off pitch but their voice is somewhat weak or awkward … including me

DPhil · Post by **DPhil** » Sat Apr 06, 2019 8:27 pm

Hahaha okay. If it makes me sound like Beyonce I would consider longer one-time adjustment times not a deal breaker

xoxos · Post by **xoxos** » Thu Apr 18, 2019 9:46 pm

music technology is instrumentation, i'd expect most new plugin experiences involve some learning curve, which the user hopefully finds engaging and informative.

having recently experimented with cepstra, i remember the first example i saw by kurzweil, in the 90s. this used data for another throat to do teh beyonce thing. many cepstral applications involve fitting a phoneme to data.

i am surprised there aren't current plugins with "dataset" products existing to process weak performance into ideal already. i would think that, for many users, such a plug might require the user to tailor their performance some to get the right phoneme in some cases.

even if not for realtime, apps like melodyne could be replaced with an effect that turned some dribble into product, allowing automation of params, or simply using the source wav for phoneme keying.

my related note..

i baked the cepstral filter into impulses and trigger those at pitch rate for an oscillator.. for those who may have encountered julius smiths' "commuted synthesis example" which has been posted for years.. this produces good spectral contouring but limits resynthesis to having the entire series of harmonics.. the definate plus is that it is latency free and super efficient since we're jsut rendering tables.

..i have no intention of producing commercial vst again (happy with old school dev environ) but this is so simple to implement.. the breath component is simply written as a wavetable of filtered noise. users could build their own formant set and trade realistic sounding voices on a very light platform.

Plugin that needs to be adjusted to specific voice