FACTOID # 63: Brazil takes up 47.8% of South America.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Boltzmann machine

A Boltzmann machine is a type of stochastic recurrent neural network originally invented by Geoffrey Hinton and Terry Sejnowski, whose dynamics consist of simulated annealing. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield nets. They were one of the first examples of a neural network capable of learning internal representations, and are able to represent and (given sufficient time) solve difficult combinatoric problems. However, due to a number of issues discussed below, Boltzmann machines with unconstrained connectivity have not proven useful for practical problems in machine learning or inference. They are still theoretically intriguing, however,due to the locality and Hebbian nature of their training algorithm, as well as their parallelism and the resemblance of their dynamics to simple physical processes. If the connectivity is constrained, the learning can be made efficient enough to be useful for practical problems. Stochastic neural networks are a type of artificial neural networks, which is a tool of artificial intelligence. ... Geoffrey Hinton is a British computer scientist most noted for his work on the mathematics and applications of neural networks, and their relationship to information theory. ... Terrence J. Sejnowski is an Investigator with the Howard Hughes Medical Institute and is the Francis Crick Professor at The Salk Institute for Biological Studies where he directs the Computational Neurobiology Laboratory. ... Simulated annealing (SA) is a generic probabilistic meta-algorithm for the global optimization problem, namely locating a good approximation to the global optimum of a given function in a large search space. ... In the mathematics of probability, a stochastic process is a random function. ... Generative models are mathematical equations or algorithms, typically used in computer science and machine learning disciplines. ... A Hopfield net is a form of recurrent artificial neural network invented by John Hopfield. ...

Contents


Structure

A Boltzmann machine, like a Hopfield Network, is a network of units with an "energy" defined for the network. Unlike Hopfield nets, Boltzmann machine units are binary and stochastic. The global energy, E, in a Boltzmann machine is identical in form to that of a Hopfield network: A Hopfield net is a form of recurrent artificial neural network invented by John Hopfield. ... Stochastic, from the Greek stochos or goal, means of, relating to, or characterized by conjecture; conjectural; random. ...

E = -sum_{i<j} w_{ij} , s_i , s_j + sum_i theta_i , s_i

Where:

  • wij is the connection strength between unit j and unit i.
  • si is the state, s_i in {0,1}, of unit i.
  • θi is the threshold of unit i.

The connections in a Boltzmann machine have two restrictions: Look up Threshold in Wiktionary, the free dictionary In general, a threshold is a fixed location or value where an abrupt change is observed. ...

  • w_{ii}=0, forall i. (No unit has a connection with itself.)
  • w_{ij}=w_{ji}, forall i,j. (All connections are symmetric.)

Thus, the difference in the global energy that results from a single unit i being 0 versus 1, written ΔEi, is given by: Symmetry is a characteristic of geometrical shapes, equations and other objects; we say that such an object is symmetric with respect to a given operation if this operation, when applied to the object, does not appear to change it. ...

Delta E_i = sum_j w_{ij} , s_j - theta_i

A Boltzmann machine is made up of stochastic units. The probability, pi of the i-th unit being on is given by: Stochastic, from the Greek stochos or goal, means of, relating to, or characterized by conjecture; conjectural; random. ...

pi = 1 / (1 + exp( − ΔEi / T))

where the scalar T is referred to as the temperature of the system. The term scalar is used in mathematics, physics, and computing basically for quantities that are characterized by a single numeric value and/or do not involve the concept of direction. ... Temperature is the physical property of a system which underlies the common notions of hot and cold; the material with the higher temperature is said to be hotter. ...


The network is run by repeatedly choosing a unit and setting its state according to the above formula. After running for long enough, the probability of a global state of the network will depend only upon that global state's energy, according to a Boltzmann Distribution. This means that log-probabilities of global states become linear in their energies. This relationship is true when the machine is "at equilibrium", meaning that the probability distribution of global states has converged. The Maxwell-Boltzmann distribution is a probability distribution with applications in physics and chemistry. ...


Units are divided into "visible" units, V, and "hidden" units, H. The visible units are those which receive information from the "environment", i.e. those units that receive binary state vectors for training. In physics and in vector calculus, a spatial vector is a concept characterized by a magnitude, which is a scalar, and a direction (which can be defined in a 3-dimensional space by the Euler angles). ...


Training

The units in the Boltzmann Machine are divided into "visible" units, V, and "hidden" units, H. The visible units are those which receive information from the "environment", i.e. our training set is a bunch of binary vectors over the set V. The distribution over the training set is denoted P + (V).


On the Boltzmann Machine side, as recalled, the distribution over the global states is converging as we reach a thermal equilibrium. We denote the convereged distribution, after we marginilize it over the visible units V, as P (V). In thermodynamics, a thermodynamic system is in thermodynamic equilibrium if its energy distribution equals a Maxwell-Boltzmann-distribution. ...


Our goal is to approximate the "real" distribution P + using the P (V) which will be produced (eventually) by the machine. To measure how similar the two distributions are we use the Kullback-Leibler distance, G: In probability theory and information theory, the Kullback-Leibler divergence, or relative entropy, is a quantity which measures the difference between two probability distributions. ...

G = sum_{alpha}{P^{+}(V_{alpha})ln{frac{P^{+}(V_{alpha})}{P^{-}(V_{alpha})}}}

G is a function of the weights, since they determine the energy of a state, and the energy determines P^{-}(V_{alpha}), as promised by the Boltzmann distribution. Hence, we can use a gradient descent algorithm over G, so a given weight, wij is changed by subtracting the partial derivative of G with respect to the weight. The Maxwell-Boltzmann distribution is a probability distribution with applications in physics and chemistry. ... Gradient descent is an optimization algorithm that approaches a local minimum of a function by taking steps proportional to the negative of the gradient (or the approximate gradient) of the function at the current point. ... In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables with the others held constant. ...


There are two phases to Boltzmann machine training, and we switch iteratively between them. One is the "positive" phase where the visible units' states are clamped to a particular binary state vector sampled from the training set (according to P + ). The other is the "negative" phase where the network is allowed to run freely, i.e. no units have their state determined by external data. Surprisingly enough, the gradient with respect to a given weight, wij, is given by the very simple equation:

frac{partial{G}}{partial{w_{ij}}} = -frac{1}{T}[p_{ij}^{+}-p_{ij}^{-}]

Where:

  • p_{ij}^{+} is the probability of units i and j both being on when the machine is at equilibrium on the positive phase.
  • p_{ij}^{-} is the probability of units i and j both being on when the machine is at equilibrium on the negative phase.

This result follows from the fact that at the thermal equilibrium the probability P (S) of any global state S when the network is free-running is given by the Boltzmann distribution (hence the name "Boltzmann machine"). In thermodynamics, a thermodynamic system is in thermodynamic equilibrium if its energy distribution equals a Maxwell-Boltzmann-distribution. ... The Maxwell-Boltzmann distribution is a probability distribution with applications in physics and chemistry. ...


What is remarkable here, is that this learning rule is fairly biologically plausible because the only information needed to change the weights is provided by "local" information. That is, the connection (or synapse biologically speaking) does not need information about anything other than the two neurons it connects. This is far more biologically realistic than the information needed by a connection in many other neural network training algorithms, such as backpropagation. Illustration of the major elements in a prototypical synapse. ... Backpropagation is a supervised learning technique used for training neural networks. ...


The training of a Boltzmann machines can also be viewed as an application of the EM algorithm, which is heavily used in machine learning. We are given an incomplete dataset (we only know the values of the visible units). At the positive phase, we estimate the completed data based on the current parameters of the system (the weights). Later, the weights are updated to to maximize the probability of the network producing the completed data. In statistical computing, an expectation-maximization (EM) algorithm is an algorithm for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. ... Machine learning is an area of artificial intelligence concerned with the development of techniques which allow computers to learn. More specifically, machine learning is a method for creating computer programs by the analysis of data sets. ...


Problems

If it worked, the Boltzmann machine would be a sort of universal computational medium, able to solve nearly any learning problem. For instance, if trained on photographs, the machine would model the distribution of photographs, and could use that model to, for example, complete a partial photograph.


Unfortunately, there is a serious practical problem with the Boltzmann machine, namely that the learning seems to stop working correctly when the machine is scaled up to anything larger than a trivial machine. This is due to a number of effects, the most important of which are:

  • the time the machine must be run for in order to collect equilibrium statistics grows exponentially with the machine's size, and with the magnitude of the connection strengths
  • connection strengths are more plastic when the units being connected have activation probabilities intermediate between zero and one, leading to a so-called variance trap. The net effect is that noise causes the connection strengths to random walk until the activities saturate.

Recent Progress

Although learning is impractical in general Boltzmann machines, it can be made quite efficient in an architecture called the "Restricted Boltzmann Machine" or "RBM" which does not allow connections between hidden units. After learning one RBM, the activities of its hidden units can be treated as data for training a higher-level RBM. This method of stacking RBM's makes it possible to learn many layers of hidden units efficiently and as each new layer is added the overall generative model gets better. Examples of multilayer RBM models of hand-written digits and faces can be viewed at http://www.cs.utoronto.ca/~hinton/


References

  • Hinton, G. E., Sejnowski, T. J. (1986) Learning and Relearning in Boltzmann Machines. In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. (pp 282-317) Cambridge: MIT Press
  • Hinton, G. E.(2002) Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, Vol 14 pp 1771-1800.
  • Hinton, G. E., Osindero, S. and Teh, Y. (2006) A fast learning algorithm for deep belief nets. Neural Computation, Vol 18, in press.

  Results from FactBites:
 
Boltzmann machine - definition of Boltzmann machine in Encyclopedia (795 words)
A Boltzmann machine is a type of stochastic recurrent neural network originally invented by Geoffrey Hinton and Terry Sejnowski.
Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield nets.
A Boltzmann machine, like a Hopfield net is a network of binary units with an "energy" defined for the network.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.