NM has to start from the proximity of a local minimum, whereas GAs don't. If you add new samples regularly, you can have a "good" sampling of your constrained space, so they qualify as global. The fact that they still struggle is another matter :pAleksey Vaneev wrote:GAs are not equally good at global optimization.
Bitmask evolution optimization method from Aleksey Vaneev
-
- KVRian
- 1379 posts since 26 Apr, 2004 from UK
-
- KVRer
- 21 posts since 6 Jul, 2016
Are you solving that equation for every step? Or did you mean to write something more likeAleksey Vaneev wrote:GAs are not equally good at global optimization. My strategy also strives to provide the global best solution, it is capable of that. The current problem is that when the global minimum's shape is very narrow and there are many local minimums, the strategy chooses a sub-optimal minimum - it just can't "hit" global minimum's position with a lower cost.Miles1981 wrote:Nelder Mead is a local optimization algorithm, whereas GA are global ones. The fact that they are global helps them in tricky valleys of the cost function IMHO, so not really surprised by this result
The problematic function I'm not struggling with is
Code: Select all
y = 100 * sqrt( fabs( y - 0.01 * x * x )) + 0.01 * fabs( x + 10 )
Functions I've tried with the Nelder Mead method are mostly local optimization problems. So, the comparison was fair.
Code: Select all
f(x,y) = 100 * sqrt( fabs( y - 0.01 * x * x )) + 0.01 * fabs( x + 10 )
Looking at this on wolfram alpha ( here ) it looks like there is almost redundancy in the function, is there definitely a global minima?
-
- KVRian
- 1379 posts since 26 Apr, 2004 from UK
What do you mean by redundancy?
The function is complex because the scale on x and y are not "consistent", the problem is not well-posed (condition of the derivative matrix is bad), but this is something that can happen and you may not be able to always protect yourself from these cases.
The function is complex because the scale on x and y are not "consistent", the problem is not well-posed (condition of the derivative matrix is bad), but this is something that can happen and you may not be able to always protect yourself from these cases.
- KVRAF
- Topic Starter
- 4021 posts since 7 Sep, 2002
It is indeed tricky, especially when global minimum is narrow like a notch filter - think of finding a minimum on a complex EQ shape with several bell filters with a single notch filter. I'm still working on my method and I was able to almost solve the problematic function - it still does not converge completely, but is very close. I think if this method can solve that function, it can be robust in a wide range of cases.resynthesis wrote:It's been a while since I looked at this sort of stuff but it's always difficult to be sure of reaching the absolute optimal solution in difficult energy landscapes. I worked with GAs and found in the problem domain I was looking at (load balancing for finite element problems) took quite a bit of tinkering with mutation rates to climb out of local minima. Perhaps some sort of hybrid method would help i.e. use the evolutionary method for a few iterations and then switch to a local optimization scheme to reduce iterations required.
- KVRAF
- Topic Starter
- 4021 posts since 7 Sep, 2002
Yes, I've meant f(x,y). Wolfram does not display the function in enough resolution - this function has a lot of notch-like minimums, with the global minimum at -10,1.squaresine wrote:Are you solving that equation for every step? Or did you mean to write something more like
Code: Select all
f(x,y) = 100 * sqrt( fabs( y - 0.01 * x * x )) + 0.01 * fabs( x + 10 )
Looking at this on wolfram alpha ( here ) it looks like there is almost redundancy in the function, is there definitely a global minima?
- KVRAF
- Topic Starter
- 4021 posts since 7 Sep, 2002
I've just posted a new version of optimizer to GitHub. Things turned to better, but the optimizer became a bit monstrous due to several added tricks. I think it now uses almost all potential of randomization meaning all random operations are used toward improving the solution.
For multiple-minimum functions no cigar yet. The average solution for the aforementioned function is -9.7,1 which is still a bit far from -10,1. At least this strategy bypasses a lot of obviously sub-optimal minimums.
For multiple-minimum functions no cigar yet. The average solution for the aforementioned function is -9.7,1 which is still a bit far from -10,1. At least this strategy bypasses a lot of obviously sub-optimal minimums.
- KVRAF
- Topic Starter
- 4021 posts since 7 Sep, 2002
Made more important improvements. The strategy is now so robust it can even find biquad coefficients for the given frequency response. This is a very hard optimization problem usually.
https://github.com/avaneev/biteopt
https://github.com/avaneev/biteopt
-
- KVRist
- 45 posts since 7 Jul, 2012
This is very interesting, Aleksey.
Do you care to add a test example of finding biquad coefficients for a response?
Do you care to add a test example of finding biquad coefficients for a response?
- KVRAF
- Topic Starter
- 4021 posts since 7 Sep, 2002
I do not have a ready-to-use example. I did some quick tests though. However, the optimizer usually does not hit the peaks precisely like an RBJ equation does, the response is smoothed-out since biquad parameter space is huge and literally swamped with local minimums. And you can't design filters in real-time - it's too heavy for that. I'm talking about optimizing coefficients directly.CurryPaste wrote:This is very interesting, Aleksey.
Do you care to add a test example of finding biquad coefficients for a response?
But another possibility is to approximate the frequency response with several parametric (RBJ) filters bands (like 5 bell filters), without direct coefficient optimization - I think this is doable, but I have not tried it myself.
- KVRAF
- Topic Starter
- 4021 posts since 7 Sep, 2002
OK, I think I have almost finalized the method, it looks like I've reached its fundamental limits. It turned out to be very robust, and I've used the very same method to optimize method's own internal parameters.
I've introduced the optimizePlateau() function (see the test3.cpp) which manages to find a really good minimum in global search, works in the bounds of the allocated iteration limit.
I've introduced the optimizePlateau() function (see the test3.cpp) which manages to find a really good minimum in global search, works in the bounds of the allocated iteration limit.
- KVRAF
- Topic Starter
- 4021 posts since 7 Sep, 2002
Several improvements introduced. Faster convergence time. I've actually used this function optimizer to find best timing constants in the latest Elephant EL-C limiter mode, worked great.
https://github.com/avaneev/biteopt
https://github.com/avaneev/biteopt
- KVRist
- 251 posts since 7 Feb, 2017
Interesting. Can you specify domain-specific constraints such as IIR pole stability and pole+zero minimum phase on the independent variables? Also, is the learner of a fixed model order or can it regularize itself to drop-out variables during training?
- KVRAF
- Topic Starter
- 4021 posts since 7 Sep, 2002
More improvements implemented, reduced convergence time considerably, reduced overhead.
https://github.com/avaneev/biteopt
But still this is mostly a local function optimizer, or I would say "almost global" optimizer. Also it does not work acceptably if optimization of huge systems with dozens of parameters is required. So far I've used it on problems with 8-10 parameters. Test problems with like 100 parameters won't solve-the reason in simple words is that this strategy uses too little memory to memorize features of the objective function.
Most constraints can be introduced by applying huge penalties to the objective function. Even binary penalties "if(x>1)cost+=10000" should work. This strategy optimizes a fixed number of variables, but it does not depend on the actual use of the variables in the objective function.
https://github.com/avaneev/biteopt
But still this is mostly a local function optimizer, or I would say "almost global" optimizer. Also it does not work acceptably if optimization of huge systems with dozens of parameters is required. So far I've used it on problems with 8-10 parameters. Test problems with like 100 parameters won't solve-the reason in simple words is that this strategy uses too little memory to memorize features of the objective function.
Most constraints can be introduced by applying huge penalties to the objective function. Even binary penalties "if(x>1)cost+=10000" should work. This strategy optimizes a fixed number of variables, but it does not depend on the actual use of the variables in the objective function.
- KVRAF
- Topic Starter
- 4021 posts since 7 Sep, 2002
Made more changes to the method, it's now even easier than before. Also updated page text to include useful information.
I've recently discovered a web site dedicated to global optimization: http://infinity77.net/global_optimization/
Now I'm using its guidelines. My test corpus includes 49 functions which can be successfully optimized by the method in 100 attempts, up to 2000 iterations each. This is only about 25% of all available functions, but I've included mostly harder functions into my test corpus. So, if I'm correct, my method can be considered as very robust (for stochastic method), and it can be called a global optimization method (which I was hesitant to do before).
Beside that, the method now also successfully optimizes 30-dimensional problems. Previously I thought it can't do that, but after adding dependence of the FanSize parameter on the number of dimensions, the thing just works.
I've recently discovered a web site dedicated to global optimization: http://infinity77.net/global_optimization/
Now I'm using its guidelines. My test corpus includes 49 functions which can be successfully optimized by the method in 100 attempts, up to 2000 iterations each. This is only about 25% of all available functions, but I've included mostly harder functions into my test corpus. So, if I'm correct, my method can be considered as very robust (for stochastic method), and it can be called a global optimization method (which I was hesitant to do before).
Beside that, the method now also successfully optimizes 30-dimensional problems. Previously I thought it can't do that, but after adding dependence of the FanSize parameter on the number of dimensions, the thing just works.
-
- KVRist
- 45 posts since 7 Jul, 2012
Thank you for posting your research here, Aleksey.
I know its not intended for real-time use, but do you think its approaching something nearly useful for simple real-time optimisation problems? Im thinking perhaps some optimisation problems which are local enough to keep the amount of iterations low.
I know its not intended for real-time use, but do you think its approaching something nearly useful for simple real-time optimisation problems? Im thinking perhaps some optimisation problems which are local enough to keep the amount of iterations low.