Thinking about simulated annealing |
October 31st, 2009 |
machinelearning, programming, tech [html] |
I just finished reading about simulated annealing, and I think I understand now why it's much more generally applicable than simple random restart hill climbing. In the latter you pick a random point in the solution space, consider all the neighboring points, move to the best one, repeat until you're better than your neighbors. Then repeat the whole process, remembering your best places so far, until you decide to stop. People always talk about the problem with this as "getting stuck in local minima" (with the "down is good" analogy of "gradient descent" instead of the "local maxima" that the "up is good' analogy of "hill climbing" would give) but I didn't really understand. I think now I do.
Consider trying to find the lowest point in a parking lot. The lot is pretty smooth, and pretty consistent, so you could put a soccer ball in the middle and see where it stopped. But what if you chose a ball bearing? The parking lot is smooth on a soccer ball scale, but not on a ball bearing scale. So the bearing gets caught in a low bit (local minimum) while the soccer ball rolls all the way down (global minimum).
In simulated annealing, we use a ball bearing but shake the parking lot so it has a decent chance of getting out of these little pockets and making it's way all the bottom of the lot. So that we don't keep bouncing around for ever, we start off shaking the lot a large amount, and over time decrease our shaking exponentially until we're not shaking at all.
Because simulated annealing decreases the amount of shaking over a very wide range (lots to almost none) it's suitable for all sorts of terrain, from ditches to dents. Hill climbing only works if the terrain is close to smooth.
What would happen if we tried a search system closer to the one with the soccer ball? The soccer ball represents a large portion of the solution space, and at each time step it moves from one set of adjacent points to a new set, under the condition that the highest of the new set of points be lower than the highest of the current set of points. Once could imagine "simulated soccer ball rolling" where one starts off with a giant ball and then makes it smaller over time until one has a ball bearing making tiny adjustments. The biggest problem with this that I see is that rolling a soccer ball requires testing a prohibitively large number of points. Imagine how many points would need to be tested to roll a ball 1/16 the size of the earth over the earth's surface. Worse, there are lots of solutions this search misses because it is strongly biased towards solutions that lie in large flattish regions. Our giant ball rolling over arizona would likely miss the grand canyon.
This diversion with ball rolling aside, I think the part about simulated annealing that really made sense to me is: exponentially decreasing willingness to make things worse is a good fit for a huge range of possible solution spaces.
Comment via: facebook