|
Simulated annealing (SA) is a generic probabilistic meta-algorithm for the global optimization problem, namely locating a good approximation to the global optimum of a given function in a large search space. It was independently presented by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi in 1983, and by V. Černý in 1985. It originated as a generalisation of a Monte Carlo method for examining the equations of state and frozen states of n-body systems, invented by N. Metropolis et al in 1953. Anneal may refer to: Annealing (metallurgy), a heat treatment wherein the microstructure of a material is altered, causing changes in its properties such as strength and hardness. ...
A randomized algorithm is an algorithm which is allowed to flip a truly random coin. ...
A meta-algorithm is an algorithm that can be usefully considered to have other significant algorithms, not just elementary operations and simple control structures, as its constituents; also an algorithm that has subordinate algorithms as variable and replaceable parameters. ...
Global optimization is a branch of applied mathematics and numerical analysis that deals with the optimization of a function or a set of functions to some criteria. ...
Polynomial of degree 4, on the right one finds a local optimum, on the left is the global optimum. ...
Graph of example function, The mathematical concept of a function expresses the intuitive idea of deterministic dependence between two quantities, one of which is viewed as primary (the independent variable, argument of the function, or its input) and the other as secondary (the value of the function, or output). A...
In search algorithms, search space is the collection of possible solutions to a problem, and incorporates some notion of distance between candidate solutions; the correct solution will be near the optimal point in this hypothetical space, which may be envisioned as having a dimension for each variable. ...
Monte Carlo methods are a widely used class of computational algorithms for simulating the behavior of various physical and mathematical systems, and for other computations. ...
The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. The heat causes the atoms to become unstuck from their initial positions (a local minimum of the internal energy) and wander randomly through states of higher energy; the slow cooling gives them more chances of finding configurations with lower internal energy than the initial one. Annealing, in metallurgy and materials science, is a heat treatment wherein a material is altered, causing changes in its properties such as strength and hardness. ...
Georg Agricola, author of De re metallica, an important early book on metal extraction Metallurgy is a domain of materials science that studies the physical and chemical behavior of metallic elements, their intermetallic compounds, and their mixtures, which are called alloys. ...
For other uses, see Crystal (disambiguation). ...
Crystalline solids have a very regular atomic structure: that is, the local positions of atoms with respect to each other are repeated at the atomic scale. ...
For other uses, see Atom (disambiguation). ...
In thermodynamics, the internal energy of a thermodynamic system, or a body with well-defined boundaries, denoted by U, or sometimes E, is the total of the kinetic energy due to the motion of molecules (translational, rotational, vibrational) and the potential energy associated with the vibrational and electric energy of...
By analogy with this physical process, each step of the SA algorithm replaces the current solution by a random "nearby" solution; if the new solution is better, it is chosen, whereas if it is worse, it can still be chosen with a probability that depends on the difference between the corresponding function values and on a global parameter T (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when T is large, but increasingly "downhill" as T goes to zero. The allowance for "uphill" moves saves the method from becoming stuck at local minima—which are the bane of greedier methods. A graph illustrating local min/max and global min/max points In mathematics, a point x* is a local maximum of a function f if there exists some ε > 0 such that f(x*) ≥ f(x) for all x with |x-x*| < ε. Stated less formally, a local maximum...
The greedy algorithm determines the minimum number of US coins to give while making change. ...
Overview In the simulated annealing (SA) method, each point s of the search space is compared to a state of some physical system, and the function E(s) to be minimized is interpreted as the internal energy of the system in that state. Therefore the goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy. In physics, the term state is used in several related senses, each of which expresses something about the way a physical system is. ...
A physical system is a system that is comprised of matter and energy. ...
In thermodynamics, the internal energy of a thermodynamic system, or a body with well-defined boundaries, denoted by U, or sometimes E, is the total of the kinetic energy due to the motion of molecules (translational, rotational, vibrational) and the potential energy associated with the vibrational and electric energy of...
The basic iteration At each step, the SA heuristic considers some neighbour s' of the current state s, and probabilistically decides between moving the system to state s' or staying put in state s. The probabilities are chosen so that the system ultimately tends to move to states of lower energy. Typically this step is repeated until the system reaches a state which is good enough for the application, or until a given computation budget has been exhausted. The word probability derives from the Latin probare (to prove, or to test). ...
The neighbours of a state The neighbours of a state are derived by applying all possible instances of a problem- and implementation- specific "move" to the given state. For example in the traveling salesman problem where a state can be defined as the order according to which the "salesman" will visit the cities, a "move" that can be used is the interchange of the order that two cities must be visited. The neighbours are all those states that are derived by applying the aforementioned "move" to a given state for all possible pairs of cities. The traveling salesman problem (TSP), is a problem in discrete or combinatorial optimization. ...
Transition probabilities The probability of making the transition from the current state: s to a candidate new state: s' is a function P(e,e',T) of the energies e = E(s) and e' = E(s') of the two states, and of a global time-varying parameter T called the temperature. In Automata Theory, a state transition table is a table describing the transition function T of a finite automaton. ...
One essential requirement for the transition probability P is that it must be nonzero when e' > e, meaning that the system may move to the new state even when it is worse (has a higher energy) than the current one. It is this feature that prevents the method from becoming stuck in a local minimum — a state less ideal than the global minimum, but deceptively attractive because it is more ideal than any immediate neighbor. On the other hand, when T goes to zero, the probability P(e,e',T) must tend to zero if e' > e, and to a positive value if e' < e. That way, for sufficiently small values of T, the system will increasingly favor moves that go "downhill" (to lower energy values), and avoid those that go "uphill". In particular, when T becomes 0, the procedure will reduce to the greedy algorithm — which makes the move only if it goes downhill. The greedy algorithm determines the minimum number of US coins to give while making change. ...
The P function is usually chosen so that the probability of accepting a move decreases when the difference e' − e increases — that is, small uphill moves are more likely than large ones. However, this requirement is not strictly necessary, provided that the above requirements are met. Given these properties, the evolution of the state s depends crucially on the temperature T. Roughly speaking, the evolution of s is sensitive only to coarser energy variations when T is large, and to finer variations when T is small.
The annealing schedule
An image with random pixels, using simulated annealing to swap pairs of pixels. Similar colours attract at short range and repel at slightly less short range. Fast cooling schedule. Possibly resembles an amorphous solid.
An image with random pixels, using simulated annealing to swap pairs of pixels. Similar colours attract at short range and repel at slightly less short range. Slow cooling schedule. Possibly resembles a crystalline solid. Another essential feature of the SA method is that the temperature is gradually reduced as the simulation proceeds. Initially, T is set to a high value (or infinity), and it is decreased at each step according to some annealing schedule — which may be specified by the user, but must end with T = 0 towards the end of the allotted time budget. In this way, the system is expected to wander initially towards a broad region of the search space containing good solutions, ignoring small features of the energy function; then drift towards low-energy regions that become narrower and narrower; and finally move downhill according to the steepest descent heuristic. Image File history File links SimulatedAnnealingFast. ...
Image File history File links SimulatedAnnealingFast. ...
Image File history File links SimulatedAnnealingSlow. ...
Image File history File links SimulatedAnnealingSlow. ...
Gradient descent is an optimization algorithm that approaches a local maximum of a function by taking steps proportional to the gradient (or the approximate gradient) of the function at the current point. ...
Convergence to optimum It can be shown that, for any given finite problem, the probability that the simulated annealing algorithm terminates with the global optimal solution approaches 1 as the annealing schedule is extended. This theoretical result is, however, not particularly helpful, since the annealing time required to ensure a significant probability of success will usually exceed the time required for a complete search of the solution space. Polynomial of degree 4, on the right one finds a local optimum, on the left is the global optimum. ...
In optimization (a branch of mathematics), a candidate solution is a member of a set of possible solutions to a given problem. ...
In plain English Given a problem for which no better solution is known than an extensive brute-force search, such as finding a set of 100 numbers that fit some characteristic, this technique attempts to quickly find an approximative solution by adjusting the current set (in this case some random sequence of 100 numbers) until a better answer is found (the way the adjustment is done depends on the problem). The new set of numbers, or "neighbour", if better, is then used as the current set and the process is repeated. In computer science, a brute-force search consists of systematically enumerating every possible solution of a problem until a solution is found, or all possible solutions have been exhausted. ...
Sometimes a problem may arise when one approaches what is known as a local minimum, in this case, a set of numbers that is the best solution for its current region. In an attempt to find a better solution, this technique may use various methods to jump out of the current pit, such as searching for better solutions randomly by a factor of the inverse amount of the previous adjustment. Keep in mind that implementations of this method are strictly problem specific, and so the way in which one finds a solution will vary from problem to problem.
Pseudo-code The following pseudo-code implements the simulated annealing heuristic, as described above, starting from state s0 and continuing to a maximum of kmax steps or until a state with energy emax or less is found. The call neighbour(s) should generate a randomly chosen neighbour of a given state s; the call random() should return a random value in the range [0,1). The annealing schedule is defined by the call temp(r), which should yield the temperature to use, given the fraction r of the time budget that has been expended so far. s := s0; e := E(s) // Initial state, energy. k := 0 // Energy evaluation count. while k < kmax and e > emax // While time remains & not good enough: sn := neighbour(s) // Pick some neighbour. en := E(sn) // Compute its energy. if random() < P(e, en, temp(k/kmax)) then // Should we move to it? s := sn; e := en // Yes, change state. k := k + 1 // One more evaluation done return s // Return current solution Saving the best solution seen As in any metaheuristic, one should keep track of the best solution seen so far, in a separate state variable. Namely: s := s0; e := E(s) // Initial state, energy. sb := s; eb := e // Initial "best" solution k := 0 // Energy evaluation count. while k < kmax and e > emax // While time remains & not good enough: sn := neighbour(s) // Pick some neighbour. en := E(sn) // Compute its energy. if en < eb then // Is this a new best? sb := sn; eb := en // Yes, save it. if random() < P(e, en, temp(k/kmax)) then // Should we move to it? s := sn; e := en // Yes, change state. k := k + 1 // One more evaluation done return sb // Return the best solution found. While copying the state may be an expensive operation, this step happens only on a small fraction of the moves, so the change is usually worth the cost.
Selecting the parameters In order to apply the SA method to a specific problem, one must specify the state space, the neighbour selection method (which enumerates the candidates for the next state s'), the probability transition function, and the annealing schedule. These choices can have a significant impact on the method's effectiveness. Unfortunately, there are no choices that will be good for all problems, and there is no general way to find the best choices for a given problem. About the only guidance available is that choices that maximize the correlation between the energies of neighbouring states can lead to better solutions. It has therefore been observed that applying the SA method is more an art than a science.
Transition probabilities The transition probability function P follows the general requirements of the SA method stated before. Since the probabilities depend on the temperature T, in practice the same probability function is used for all problems, and the annealing schedule is adjusted accordingly. In the original formulation of the method by Kirkpatrick et. al, the transition probability P(e,e',T) was defined as 1 if e' < e (i.e., downhill moves were always performed); otherwise, the probability would be: exp((e − e') / T). This formula was said to come from the Metropolis-Hastings algorithm, used here to generate samples from the Maxwell-Boltzmann distribution governing the distribution of energies of molecules in a gas. However, there is no mathematical justification for using this particular formula in SA. Even the physical analogy is misleading: since the neighbours are tested sequentially in the SA algorithm, the actual probability of the SA algorithm moving from a state s to a state s' definitely is not given by that formula. The Proposal distribution Q proposes the next point that the random walk might move to. ...
The introduction to this article provides insufficient context for those unfamiliar with the subject matter. ...
Annealing schedule The annealing schedule must be chosen with care. The initial temperature must be large enough to make the uphill and downhill transition probabilities about the same. To do that, one must have some estimate of the difference e' − e for a random state and its neighbours. The temperature must then decrease so that it is zero, or nearly zero, at the end of the allotted time. A popular choice is the exponential schedule, where the temperature decreases by a fixed factor α < 1 at each step.
Simulated quenching and tuning A reference to the use of SA does not always mean that SA is the algorithm being used. Any true SA algorithm has an associated proof of (weak) ergodicity, which means that theoretically one can actually find a global optimum. Many SA algorithms are quite slow, so in practice many users simply speed up the algorithm, but in the process they also void their SA proof. A modified approach, such as Simulated Quenching (SQ), can work well but is not strictly SA. Mathematically-correct SA is often used for very hard problems where the danger of being trapped in local minima is high. SA also is used in problems with complex constraints, since the method of rejection can be made to exclude constrained regions. In practice, when the need is clear to use a sampling technique, it is best to do at least a few long runs with SA, and then use SQ modifications to speed up calculations as long as the same results are achieved. To apply any generic sampling technique like SA over many classes of nonlinear problems, it should be expected that tuning of some basic search parameters will be required
Thermodynamic Simulated Annealing One of the theoretical bases on SA (Simulated Annealing) consists of reaching equilibrium at each temperature. In order to achieve this goal, the annealing schedule of SA goes along a sequence of temperatures for cooling the system, while a number of rearrangements are attempted to recover equilibrium at each one. Commonly, both the cooling rule and the number of movements per temperature are experimentally adjusted to provide high performance in a specific problem. Thermodynamic Simulated Annealing (TSA) provides an alternative method to perform the cooling close to equilibrium. The temperature in TSA is free to evolve (i.e. not controlled), but continuously updated from the variation of state functions as the internal energy and entropy according to the thermodynamic laws. Thereby, TSA achieves straightforward the high quality results of fine-tuned SA tools.
State neighbours In practice, one tries to perform the movements by using a search graph where the neighbours of s are initially widespread to the whole solution space. Due to the high initial temperature used in the probability of acceptance, all initial movements are expected to be accepted. As the search progresses the neighborhood of s must be narrowed according to the temperature fall to maintain a high rate of movement acceptance. Thus, in the traveling salesman problem above, generating the neighbour by swapping two arbitrary cities is expected to be accepted at the beginning of the search due to the high initial temperature value. As the search progresses the temperature falls, so that the length of the movements must be shortened accordingly to avoid low acceptance rates.
Restarts Sometimes it is better to move back to a solution that was significantly better rather than always moving from the current state. This is called restarting. To do this we set s and e to sb and eb and perhaps restart the annealing schedule. The decision to restart could be based on a fixed number of steps, or based on the current energy being too high from the best energy so far.
Related methods Quantum annealing is a method which uses quantum fluctuations instead of thermal fluctuations to explore the state space. It has been demonstrated experimentally as well as theoretically, that quantum annealing can indeed outperform thermal annealing in certain cases, specifically, where the potential energy (cost) landscape consists of very high but thin barriers surrounding shallow local minima. Since thermal transition probabilities are proportional to exp( − Δ / kBT) , (where T is temperature and kB is Boltzmann's constant) and depend only on the height Δ of the barriers, it is very difficult for thermal fluctuations to get the system out from such local minima. But quantum tunneling probabilities through a barrier depend not only the height Δ of the barrier, but also on its width w; if the barriers are thin enough, quantum fluctuations may bring the system out of the shallow local minima surrounded by them. In mathematics and applications, quantum annealing is a method for finding solutions to combinatorial optimization problems and ground states of glassy systems using quantum fluctuations. ...
Stochastic tunneling is a particular successful attempt to overcome the increasing difficulty simulated annealing runs have in escaping from local minima as the temperature decreases. Stochastic tunneling pushes the optimization run out of local minima, 'tunneling' through barriers. Stochastic tunneling (STUN) is one approach to global optimization among several others and is based on the Monte Carlo method-sampling of the function to be minimized. ...
Tabu search (TS) is similar to simulated annealing, in that both traverse the solution space by testing neighbours of the current solution. In order to prevent cycling, the TS algorithm maintains a "tabu list" of solutions already seen, and moves to those solutions are suppressed. Tabu search is a mathematical optimization method, belonging to the class of local search techniques. ...
This article is about cultural prohibitions in general, for other uses, see Taboo (disambiguation). ...
Stochastic hill climbing (SH) runs many hill-climbing searches from random initial locations. Stochastic gradient descent is a general optimization algorithm, but is typically used to fit the parameters of a machine learning model. ...
Genetic algorithms (GA) maintain a pool of solutions rather than just one. New candidate solutions are generated not only by "mutation" (as in SA), but also by "combination" of two solutions from the pool. Probabilistic criteria, similar to those used in SA, are used to select the candidates for mutation or combination, and for discarding excess solutions from the pool. A genetic algorithm (GA) is an algorithm used to find approximate solutions to difficult-to-solve problems through application of the principles of evolutionary biology to computer science. ...
Ant colony optimization (ACO) uses many ants (or agents) to traverse the solution space and find locally productive areas. The ant colony optimization algorithm (ACO), introduced by Marco Dorigo [Dor92,DoSt04], is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. ...
The cross-entropy method (CE) generates candidates solutions via a parameterized probability distribution. The parameters are updated via cross-entropy minimization, so as to generate better samples in the next iteration. The cross-entropy (CE) method attributed to Reuven Rubinstein is a general Monte Carlo approach to combinatorial and continuous multi-extremal optimization and importance sampling. ...
Harmony search (HS) mimics musicians in improvisation process where each musician plays a note for finding a best harmony all together. This article needs additional references or sources to facilitate its verification. ...
Stochastic optimization is an umbrella set of methods that includes simulated annealing and numerous other approaches. Mathematical techniques for optimization are aimed at providing a formal means for making the best decisions in a wide range of practical problems. ...
See also The adaptive Simulated Annealing is a variant of simulated annealing (SA) algorithm in which the algorithm parameters that control temperature schedule and random step selection are automatically adjusted according to algorithm progress. ...
In mathematics, a Markov chain, named after Andrey Markov, is a discrete-time stochastic process with the Markov property. ...
Combinatorial optimization is a branch of optimization in applied mathematics and computer science, related to operations research, algorithm theory and computational complexity theory. ...
A genetic algorithm (or GA) is a search technique used in computing to find true or approximate solutions to optimization and search problems. ...
Tabu search is a mathematical optimization method, belonging to the class of local search techniques. ...
The ant colony optimization algorithm (ACO), introduced by Marco Dorigo [Dor92,DoSt04], is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. ...
The issue of automatic label placement is one of the most difficult problems in GIS, but other kinds of computer_generated graphics _ like charts, graphs etc. ...
Multidisciplinary design optimization (MDO) is a field of engineering that uses optimization methods to solve design problems incorporating a number of disciplines. ...
Place and Route is a stage in design of: Printed circuit boards at which components are graphically placed on the board and the wires drawn between them. ...
The traveling salesman problem (TSP), is a problem in discrete or combinatorial optimization. ...
Reactive search is the common name for a family of optimization algorithms based on the local search techniques. ...
In mathematics and applications, quantum annealing is a method for finding solutions to combinatorial optimization problems and ground states of glassy systems using quantum fluctuations. ...
Stochastic tunneling (STUN) is one approach to global optimization among several others and is based on the Monte Carlo method-sampling of the function to be minimized. ...
The cross-entropy (CE) method attributed to Reuven Rubinstein is a general Monte Carlo approach to combinatorial and continuous multi-extremal optimization and importance sampling. ...
Mathematical techniques for optimization are aimed at providing a formal means for making the best decisions in a wide range of practical problems. ...
The theory of graph cuts, was first applied in computer vision in the paper by Greig, Porteous and Seheult of Durham University, UK. In the Bayesian statistical context of smoothing noisy, or corrupted, images, Greig, Porteous and Seheult showed how the maximum a posteriori estimate of a binary image can...
References - S. Kirkpatrick and C. D. Gelatt and M. P. Vecchi, Optimization by Simulated Annealing, Science, Vol 220, Number 4598, pages 671-680, 1983. http://citeseer.ist.psu.edu/kirkpatrick83optimization.html.
- V. Cerny, A thermodynamical approach to the travelling salesman problem: an efficient simulation algorithm. Journal of Optimization Theory and Applications, 45:41-51, 1985
- N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. "Equations of State Calculations by Fast Computing Machines". Journal of Chemical Physics, 21(6):1087-1092, 1953. [1]
- A. Das and B. K. Chakrabarti (Eds.), Quantum Annealing and Related Optimization Methods, Lecture Note in Physics, Vol. 679, Springer, Heidelberg (2005)
- E. Weinberger, Correlated and Uncorrelated Fitness Landscapes and How to Tell the Difference, Biological Cybernetics, 63, No. 5, 325-336 (1990).
- J. De Vicente, J. Lanchares, R. Hermida, "Placement by Thermodynamic Simulated Annealing”, Physics Letters A,Vol. 317, Issue 5-6, pp.415-423, 2003.
- J. M. Ware and N. Thomas, "Automated cartographic map generalisation with multiple operators: a simulated annealing approach", The International Journal of Geographical Information Science (Taylor and Francis), 17(8), 743-769 (2003)
External links |