'Simulated annealing' (SA) is a generic
probabilistic meta-algorithm for the
global optimization problem, namely locating a good approximation to the
global optimum of a given
function in a large
search space. It was independently presented by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi in 1983, and by V. Černý in 1985. It originated as a generalisation of a
Monte Carlo method for examining the equations of state and frozen states of n-body systems, invented by N. Metropolis et al in 1953.
The name and inspiration come from
annealing in
metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its
crystals and reduce their
defects. The heat causes the
atoms to become unstuck from their initial positions (a local minimum of the
internal energy) and wander randomly through states of higher energy; the slow cooling gives them more chances of finding configurations with lower internal energy than the initial one.
By analogy with this physical process, each step of the SA algorithm replaces the current solution by a random "nearby" solution; if the new solution is better, it is chosen, whereas if it is worse, it can still be chosen with a probability that depends on the difference between the corresponding function values and on a global parameter ''T'' (called the ''temperature''), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when ''T'' is large, but increasingly "downhill" as ''T'' goes to zero. The allowance for "uphill" moves saves the method from becoming stuck at
local minima—which are the bane of
greedier methods.
Overview
In the simulated annealing (SA) method, each point ''s'' of the search space is compared to a
state of some
physical system, and the function ''E''(''s'') to be minimized is interpreted as the
internal energy of the system in that state. Therefore the goal is to bring the system, from an arbitrary ''initial state'', to a state with the minimum possible energy.
The basic iteration
At each step, the SA heuristic considers some neighbour ''s' ''of the current state ''s'', and
probabilistically decides between moving the system to state ''s' ''or staying put in state ''s''. The probabilities are chosen so that the system ultimately tends to move to states of lower energy. Typically this step is repeated until the system reaches a state which is good enough for the application, or until a given computation budget has been exhausted.
The neighbours of a state
The neighbours of a state are derived by applying all possible instances of a problem- and implementation- specific "move" to the given state. For example in the
traveling salesman problem where a state can be defined as the order according to which the "salesman" will visit the cities, a "move" that can be used is the interchange of the order that two cities must be visited. The neighbours are all those states that are derived by applying the aforementioned "move" to a given state for all possible pairs of cities.
Transition probabilities
The probability of making the
transition from the current state:
to a candidate new state:
is a function
of the energies
and
of the two states, and of a global time-varying parameter
called the ''temperature''.
One essential requirement for the transition probability
is that it must be nonzero when
, meaning that the system may move to the new state even when it is ''worse'' (has a higher energy) than the current one. It is this feature that prevents the method from becoming stuck in a ''local minimum'' — a state less ideal than the global minimum, but deceptively attractive because it is more ideal than any immediate neighbor.
On the other hand, when
goes to zero, the probability
must tend to zero if
, and to a positive value if
.
That way, for sufficiently small values of
, the system will increasingly favor moves that go "downhill" (to lower energy values), and avoid those that go "uphill". In particular, when
becomes 0, the procedure will reduce to the
greedy algorithm — which makes the move only if it goes downhill.
The
function is usually chosen so that the probability of accepting a move decreases when the difference
increases — that is, small uphill moves are more likely than large ones. However, this requirement is not strictly necessary, provided that the above requirements are met.
Given these properties, the evolution of the state
depends crucially on the temperature
. Roughly speaking, the evolution of
is sensitive only to coarser energy variations when
is large, and to finer variations when
is small.
The annealing schedule
Another essential feature of the SA method is that the temperature is gradually reduced as the simulation proceeds. Initially,
is set to a high value (or infinity), and it is decreased at each step according to some ''annealing schedule'' — which may be specified by the user, but must end with
towards the end of the allotted time budget. In this way, the system is expected to wander initially towards a broad region of the search space containing good solutions, ignoring small features of the energy function; then drift towards low-energy regions that become narrower and narrower; and finally move downhill according to the
steepest descent heuristic.
Convergence to optimum
It can be shown that, for any given finite problem, the probability that the simulated annealing algorithm terminates with the
global optimal solution approaches 1 as the annealing schedule is extended. This theoretical result is, however, not particularly helpful, since the annealing time required to ensure a significant probability of success will usually exceed the time required for a complete search of the
solution space.
In plain English
Given a problem for which no better solution is known than an extensive
brute-force search, such as finding a set of 100 numbers that fit some characteristic, this technique attempts to quickly find an approximative solution by adjusting the current set (in this case some random sequence of 100 numbers) until a better answer is found (the way the adjustment is done depends on the problem). The new set of numbers, or "neighbour", if better, is then used as the current set and the process is repeated.
Sometimes a problem may arise when one approaches what is known as a local minimum, in this case, a set of numbers that is the best solution for its current region. In an attempt to find a better solution, this technique may use various methods to jump out of the current pit, such as searching for better solutions randomly by a factor of the inverse amount of the previous adjustment.
Keep in mind that implementations of this method are strictly problem specific, and so the way in which one finds a solution will vary from problem to problem.
Pseudo-code
The following pseudo-code implements the simulated annealing heuristic, as described above, starting from state ''s0'' and continuing to a maximum of ''kmax'' steps or until a state with energy ''emax'' or less is found. The call neighbour(''s'') should generate a randomly chosen neighbour of a given state s; the call random() should return a random value in the range
. The annealing schedule is defined by the call temp(r), which should yield the temperature to use, given the fraction ''r'' of the time budget that has been expended so far.
s := s0; e := E(s) // ''Initial state, energy.''
k := 0 // ''Energy evaluation count.''
'while' k < kmax 'and' e > emax // ''While time remains & not good enough:''
sn := neighbour(s) // ''Pick some neighbour.''
en := E(sn) // ''Compute its energy.''
'if' random() < P(e, en, temp(k/kmax)) 'then' // ''Should we move to it?''
s := sn; e := en // ''Yes, change state.''
k := k + 1 // ''One more evaluation done''
'return' s // ''Return current solution''
Saving the best solution seen
As in any metaheuristic, one should keep track of the best solution seen so far, in a separate state variable. Namely:
s := s0; e := E(s) // ''Initial state, energy.''
sb := s; eb := e // ''Initial "best" solution''
k := 0 // ''Energy evaluation count.''
'while' k < kmax 'and' e > emax // ''While time remains & not good enough:''
sn := neighbour(s) // ''Pick some neighbour.''
en := E(sn) // ''Compute its energy.''
'if' en < eb 'then' // ''Is this a new best?''
sb := sn; eb := en // ''Yes, save it.''
'if' random() < P(e, en, temp(k/kmax)) 'then' // ''Should we move to it?''
s := sn; e := en // ''Yes, change state.''
k := k + 1 // ''One more evaluation done''
'return' sb // ''Return the best solution found.''
While copying the state may be an expensive operation, this step happens only on a small fraction of the moves, so the change is usually worth the cost.
Selecting the parameters
In order to apply the SA method to a specific problem, one must specify the state space, the neighbour selection method (which enumerates the candidates for the next state
), the probability transition function, and the annealing schedule. These choices can have a significant impact on the method's effectiveness. Unfortunately, there are no choices that will be good for all problems, and there is no general way to find the best choices for a given problem. About the only guidance available is that choices that maximize the correlation between the energies of neighbouring states can lead to better solutions. It has therefore been observed that applying the SA method is more an art than a science.
Transition probabilities
The
transition probability function
follows the general requirements of the SA method stated before. Since the probabilities depend on the temperature
, in practice the same probability function is used for all problems, and the annealing schedule is adjusted accordingly.
In the original formulation of the method by Kirkpatrick ''et. al'', the transition probability
was defined as 1 if
(i.e., downhill moves were always performed);
otherwise, the probability would be:
.
This formula was said to come from the
Metropolis-Hastings algorithm, used here to generate samples from the
Maxwell-Boltzmann distribution governing the distribution of energies of molecules in a gas. However, there is no mathematical justification for using this particular formula in SA. Even the physical analogy is misleading: since the neighbours are tested sequentially in the SA algorithm, the actual probability of the SA algorithm moving from a state
to a state
definitely is ''not'' given by that formula.
Annealing schedule
The annealing schedule must be chosen with care. The initial temperature must be large enough to make the uphill and downhill transition probabilities about the same. To do that, one must have some estimate of the difference
for a random state and its neighbours.
The temperature must then decrease so that it is zero, or nearly zero, at the end of the allotted time. A popular choice is the exponential schedule, where the temperature decreases by a fixed factor
at each step.
Simulated quenching and tuning
A reference to the use of SA does not always mean that SA is the algorithm being used. Any true SA algorithm has an associated proof of (weak) ergodicity, which means that theoretically one can actually find a global optimum. Many SA algorithms are quite slow, so in practice many users simply speed up the algorithm, but in the process they also void their SA proof. A modified approach, such as Simulated Quenching (SQ), can work well but is not strictly SA. Mathematically-correct SA is often used for very hard problems where the danger of being trapped in local minima is high. SA also is used in problems with complex constraints, since the method of rejection can be made to exclude constrained regions. In practice, when the need is clear to use a sampling technique, it is best to do at least a few long runs with SA, and then use SQ modifications to speed up calculations as long as the same results are achieved. To apply any generic sampling technique like SA over many classes of nonlinear problems, it should be expected that tuning of some basic search parameters will be required
Thermodynamic Simulated Annealing
One of the theoretical bases on SA (Simulated Annealing) consists of reaching equilibrium at each temperature. In order to achieve this goal, the annealing schedule of SA goes along a sequence of temperatures for cooling the system, while a number of rearrangements are attempted to recover equilibrium at each one. Commonly, both the cooling rule and the number of movements per temperature are experimentally adjusted to provide high performance in a specific problem. Thermodynamic Simulated Annealing (TSA) provides an alternative method to perform the cooling close to equilibrium. The temperature in TSA is free to evolve (i.e. not controlled), but continuously updated from the variation of state functions as the internal energy and entropy according to the thermodynamic laws. Thereby, TSA achieves straightforward the high quality results of fine-tuned SA tools.
State neighbours
In practice, one tries to perform the movements by using a search graph where the neighbours of s are initially widespread to the whole solution space. Due to the high initial temperature used in the probability of acceptance, all initial movements are expected to be accepted. As the search progresses the neighborhood of s must be narrowed according to the temperature fall to maintain a high rate of movement acceptance.
Thus, in the traveling salesman problem above, generating the neighbour by swapping two arbitrary cities is expected to be accepted at the beginning of the search due to the high initial temperature value. As the search progresses the temperature falls, so that the length of the movements must be shortened accordingly to avoid low acceptance rates.
Restarts
Sometimes it is better to move back to a solution that was significantly better rather than always moving from the current state. This is called ''restarting''. To do this we set
s and
e to
sb and
eb and perhaps restart the annealing schedule. The decision to restart could be based on a fixed number of steps, or based on the current energy being too high from the best energy so far.
Related methods
Quantum annealing is a method which uses ''quantum fluctuations'' instead of thermal fluctuations to explore the state space. It has been demonstrated experimentally as well as theoretically, that quantum annealing can indeed outperform thermal annealing in certain cases, specifically, where the potential energy (cost) landscape consists of very high but thin barriers surrounding shallow local minima. Since thermal transition probabilities are proportional to
, (where
is temperature and
is Boltzmann's constant) and depend only on the height
of the barriers, it is very difficult for thermal fluctuations to get the system out from such local minima. But quantum tunneling probabilities through a barrier depend not only the height
of the barrier, but also on its width
; if the barriers are thin enough, quantum fluctuations may bring the system out of the shallow local minima surrounded by them.
Stochastic tunneling is a particular successful attempt to overcome the increasing difficulty simulated annealing runs have in escaping from local minima as the temperature decreases. Stochastic tunneling pushes the optimization run out of local minima, 'tunneling' through barriers.
Tabu search (TS) is similar to simulated annealing, in that both traverse the solution space by testing neighbours of the current solution. In order to prevent cycling, the TS algorithm maintains a "
tabu list" of solutions already seen, and moves to those solutions are suppressed.
Stochastic hill climbing (SH) runs many hill-climbing searches from random initial locations.
Genetic algorithms (GA) maintain a pool of solutions rather than just one. New candidate solutions are generated not only by "mutation" (as in SA), but also by "combination" of two solutions from the pool. Probabilistic criteria, similar to those used in SA, are used to select the candidates for mutation or combination, and for discarding excess solutions from the pool.
Ant colony optimization (ACO) uses many ants (or agents) to traverse the solution space and find locally productive areas.
The
cross-entropy method (CE) generates candidates solutions via a parameterized probability distribution. The parameters are updated via cross-entropy minimization, so as to generate better samples in the next iteration.
Stochastic optimization is an umbrella set of methods that includes simulated annealing and numerous other approaches.
See also
★
Adaptive simulated annealing
★
Markov chain
★
Combinatorial optimization
★
Genetic algorithm
★
Tabu search
★
Ant colony optimization
★
Automatic label placement
★
Multidisciplinary optimization
★
Place and route
★
Traveling salesman problem
★
Reactive search
★
Quantum annealing
★
Stochastic tunneling
★
Cross-entropy method
★
Stochastic optimization
★
Graph cuts in computer vision
References
★ S. Kirkpatrick and C. D. Gelatt and M. P. Vecchi, Optimization by Simulated Annealing, Science, Vol 220, Number 4598, pages 671-680, 1983. http://citeseer.ist.psu.edu/kirkpatrick83optimization.html.
★ V. Cerny, A thermodynamical approach to the travelling salesman problem: an efficient simulation algorithm. Journal of Optimization Theory and Applications, 45:41-51, 1985
★ N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. "Equations of State Calculations by Fast Computing Machines". ''Journal of Chemical Physics'', 21(6):1087-1092, 1953.
[1]
★ A. Das and B. K. Chakrabarti (Eds.), ''Quantum Annealing and Related Optimization Methods,'' Lecture Note in Physics, Vol. 679, Springer, Heidelberg (2005)
★ E. Weinberger, Correlated and Uncorrelated Fitness Landscapes and How to Tell the Difference, Biological Cybernetics, 63, No. 5, 325-336 (1990).
★ J. De Vicente, J. Lanchares, R. Hermida, "Placement by Thermodynamic Simulated Annealing”, Physics Letters A,Vol. 317, Issue 5-6, pp.415-423, 2003.
★ J. M. Ware and N. Thomas, "Automated cartographic map generalisation with multiple operators: a simulated annealing approach", The International Journal of Geographical Information Science (Taylor and Francis), 17(8), 743-769 (2003)
External links
★
Simulated Annealing A Java applet that allows you to experiment with simulated annealing. Source code included.
★
"General Simulated Annealing Algorithm" An open-source MATLAB program for general simulated annealing exercices. Can be adapted as needed.
★
VisualBots - Freeware multi-agent simulator in Microsoft Excel. Sample programs include genetic algorithm, ACO, and simulated annealing solutions to TSP.
★
Technique Developed for Optimizing Traveling-Wave Tubes A page showing the uses of this method in the space program for long distance communications.
★
Adaptive Simulated Annealing (ASA)
[2] is a free C-language code developed to statistically find the best global fit of a nonlinear constrained non-convex cost-function over a D-dimensional space. This algorithm permits an annealing schedule for "temperature" T decreasing exponentially in annealing-time k,
and has more than 100 options to provide robust tuning over many classes of nonlinear stochastic systems.
★
Resolving Graphic Conflict In Scale-Reduced Maps: A Simulated Annealing Approach Outlines the use of simulated annealing applied to Map Generalization to solve graphic conflict with spatial objects.
★
Self-Guided Lesson on Simulated Annealing Go to the Wikiversity and teach yourself how to create a simulated annealing program.