Bitmask evolution optimization method from Aleksey Vaneev

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Aleksey Vaneev wrote:GAs are not equally good at global optimization.
NM has to start from the proximity of a local minimum, whereas GAs don't. If you add new samples regularly, you can have a "good" sampling of your constrained space, so they qualify as global. The fact that they still struggle is another matter :p

Post

Aleksey Vaneev wrote:
Miles1981 wrote:Nelder Mead is a local optimization algorithm, whereas GA are global ones. The fact that they are global helps them in tricky valleys of the cost function IMHO, so not really surprised by this result ;)
GAs are not equally good at global optimization. My strategy also strives to provide the global best solution, it is capable of that. The current problem is that when the global minimum's shape is very narrow and there are many local minimums, the strategy chooses a sub-optimal minimum - it just can't "hit" global minimum's position with a lower cost.

The problematic function I'm not struggling with is

Code: Select all

y = 100 * sqrt( fabs( y - 0.01 * x * x )) + 0.01 * fabs( x + 10 )

Functions I've tried with the Nelder Mead method are mostly local optimization problems. So, the comparison was fair.
Are you solving that equation for every step? Or did you mean to write something more like

Code: Select all

f(x,y) = 100 * sqrt( fabs( y - 0.01 * x * x )) + 0.01 * fabs( x + 10 )

Looking at this on wolfram alpha ( here ) it looks like there is almost redundancy in the function, is there definitely a global minima?

Post

What do you mean by redundancy?
The function is complex because the scale on x and y are not "consistent", the problem is not well-posed (condition of the derivative matrix is bad), but this is something that can happen and you may not be able to always protect yourself from these cases.

Post

resynthesis wrote:It's been a while since I looked at this sort of stuff but it's always difficult to be sure of reaching the absolute optimal solution in difficult energy landscapes. I worked with GAs and found in the problem domain I was looking at (load balancing for finite element problems) took quite a bit of tinkering with mutation rates to climb out of local minima. Perhaps some sort of hybrid method would help i.e. use the evolutionary method for a few iterations and then switch to a local optimization scheme to reduce iterations required.
It is indeed tricky, especially when global minimum is narrow like a notch filter - think of finding a minimum on a complex EQ shape with several bell filters with a single notch filter. I'm still working on my method and I was able to almost solve the problematic function - it still does not converge completely, but is very close. I think if this method can solve that function, it can be robust in a wide range of cases.
Image

Post

squaresine wrote:Are you solving that equation for every step? Or did you mean to write something more like

Code: Select all

f(x,y) = 100 * sqrt( fabs( y - 0.01 * x * x )) + 0.01 * fabs( x + 10 )

Looking at this on wolfram alpha ( here ) it looks like there is almost redundancy in the function, is there definitely a global minima?
Yes, I've meant f(x,y). Wolfram does not display the function in enough resolution - this function has a lot of notch-like minimums, with the global minimum at -10,1.
Image

Post

I've just posted a new version of optimizer to GitHub. Things turned to better, but the optimizer became a bit monstrous due to several added tricks. I think it now uses almost all potential of randomization meaning all random operations are used toward improving the solution.

For multiple-minimum functions no cigar yet. The average solution for the aforementioned function is -9.7,1 which is still a bit far from -10,1. At least this strategy bypasses a lot of obviously sub-optimal minimums.
Image

Post

Made more important improvements. The strategy is now so robust it can even find biquad coefficients for the given frequency response. This is a very hard optimization problem usually.

https://github.com/avaneev/biteopt
Image

Post

This is very interesting, Aleksey.
Do you care to add a test example of finding biquad coefficients for a response?

Post

CurryPaste wrote:This is very interesting, Aleksey.
Do you care to add a test example of finding biquad coefficients for a response?
I do not have a ready-to-use example. I did some quick tests though. However, the optimizer usually does not hit the peaks precisely like an RBJ equation does, the response is smoothed-out since biquad parameter space is huge and literally swamped with local minimums. And you can't design filters in real-time - it's too heavy for that. I'm talking about optimizing coefficients directly.

But another possibility is to approximate the frequency response with several parametric (RBJ) filters bands (like 5 bell filters), without direct coefficient optimization - I think this is doable, but I have not tried it myself.
Image

Post

OK, I think I have almost finalized the method, it looks like I've reached its fundamental limits. It turned out to be very robust, and I've used the very same method to optimize method's own internal parameters.

I've introduced the optimizePlateau() function (see the test3.cpp) which manages to find a really good minimum in global search, works in the bounds of the allocated iteration limit.
Image

Post

Several improvements introduced. Faster convergence time. I've actually used this function optimizer to find best timing constants in the latest Elephant EL-C limiter mode, worked great.
https://github.com/avaneev/biteopt
Image

Post

Interesting. Can you specify domain-specific constraints such as IIR pole stability and pole+zero minimum phase on the independent variables? Also, is the learner of a fixed model order or can it regularize itself to drop-out variables during training?

Post

More improvements implemented, reduced convergence time considerably, reduced overhead.
https://github.com/avaneev/biteopt

But still this is mostly a local function optimizer, or I would say "almost global" optimizer. Also it does not work acceptably if optimization of huge systems with dozens of parameters is required. So far I've used it on problems with 8-10 parameters. Test problems with like 100 parameters won't solve-the reason in simple words is that this strategy uses too little memory to memorize features of the objective function.

Most constraints can be introduced by applying huge penalties to the objective function. Even binary penalties "if(x>1)cost+=10000" should work. This strategy optimizes a fixed number of variables, but it does not depend on the actual use of the variables in the objective function.
Image

Post

Made more changes to the method, it's now even easier than before. Also updated page text to include useful information.

I've recently discovered a web site dedicated to global optimization: http://infinity77.net/global_optimization/
Now I'm using its guidelines. My test corpus includes 49 functions which can be successfully optimized by the method in 100 attempts, up to 2000 iterations each. This is only about 25% of all available functions, but I've included mostly harder functions into my test corpus. So, if I'm correct, my method can be considered as very robust (for stochastic method), and it can be called a global optimization method (which I was hesitant to do before).

Beside that, the method now also successfully optimizes 30-dimensional problems. Previously I thought it can't do that, but after adding dependence of the FanSize parameter on the number of dimensions, the thing just works.
Image

Post

Thank you for posting your research here, Aleksey.

I know its not intended for real-time use, but do you think its approaching something nearly useful for simple real-time optimisation problems? Im thinking perhaps some optimisation problems which are local enough to keep the amount of iterations low.

Post Reply

Return to “DSP and Plugin Development”