|
Backpropagation is a supervised learning technique used for training artificial neural networks. It was first described by Paul Werbos in 1974, and further developed by David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams in 1986. Supervised learning is a machine learning technique for creating a function from training data. ...
An artificial neural network (ANN), often just called a neural network (NN), is a mathematical model or computational model based on biological neural networks. ...
Paul Werbos is an scientist best known for his 1974 Harvard University Ph. ...
Year 1974 (MCMLXXIV) was a common year starting on Tuesday (link will display full calendar) of the 1974 Gregorian calendar. ...
David E. Rumelhart has made many contributions to the formal analysis of human cognition, working primarily within the frameworks of mathematical psychology, symbolic artificial intelligence, and parallel distributed processing. ...
Geoff Hinton invented the back-propogation algorithm which solved the problem of training multi-layer neural networks first identified by Marvin Minsky and Seymour Papert in 1969 in their book Perceptrons, which almost but not quite killed off neural-network research. ...
Year 1986 (MCMLXXXVI) was a common year starting on Wednesday (link displays 1986 Gregorian calendar). ...
It is most useful for feed-forward networks (networks that have no feedback, or simply, that have no connections that loop). The term is an abbreviation for "backwards propagation of errors". Backpropagation requires that the transfer function used by the artificial neurons (or "nodes") be differentiable. Feed-forward is a term describing a kind of system which reacts to changes in its environment, usually to maintain some desired state of the system. ...
A transfer function is a mathematical representation of the relation between the input and output of a linear time-invariant system. ...
An artificial neuron (also called a node or Nv neuron or Binary neuron or McCulloch-Pitts neuron) is an abstraction of biological neurons and the basic unit in an artificial neural network. ...
In mathematics, the derivative of a function is one of the two central concepts of calculus. ...
Summary of the technique: - Present a training sample to the neural network.
- Compare the network's output to the desired output from that sample. Calculate the error in each output neuron.
- For each neuron, calculate what the output should have been, and a scaling factor, how much lower or higher the output must be adjusted to match the desired output. This is the local error.
- Adjust the weights of each neuron to lower the local error.
- Assign "blame" for the local error to neurons at the previous level, giving greater responsibility to neurons connected by stronger weights.
- Repeat the steps above on the neurons at the previous level, using each one's "blame" as its error.
Actual Algorithm: 1. Initialize the weights in the network (often randomly) 2. repeat * foreach example e in the training set do 1. O = neural-net-output(network, e) ; forward pass 2. T = teacher output for e 3. Calculate error (T - O) at the output units 4. Compute delta_wi for all weights from hidden layer to output layer ; backward pass 5. Compute delta_wi for all weights from input layer to hidden layer ; backward pass continued 6. Update the weights in the network * end 3. until all examples classified correctly or stopping criterion satisfied 4. return(network) As the algorithm's name implies, the errors (and therefore the learning) propagate backwards from the output nodes to the inner nodes. So technically speaking, backpropagation is used to calculate the gradient of the error of the network with respect to the network's modifiable weights. This gradient is almost always then used in a simple stochastic gradient descent algorithm to find weights that minimize the error. Often the term "backpropagation" is used in a more general sense, to refer to the entire procedure encompassing both the calculation of the gradient and its use in stochastic gradient descent. Backpropagation usually allows quick convergence on satisfactory local minima for error in the kind of networks to which it is suited. In mathematics, computing, linguistics, and related disciplines, an algorithm is a finite list of well-defined instructions for accomplishing some task that, given an initial state, will terminate in a defined end-state. ...
Stochastic gradient descent is a general optimization algorithm, but is typically used to fit the parameters of a machine learning model. ...
A graph illustrating local min/max and global min/max points In mathematics, maxima and minima, also known as extrema, are points in the domain of a function at which the function takes the largest (maximum), or smallest (minimum) value either within a given neighbourhood (local extrema), or on the...
It is important to note that backpropagation networks are necessarily multilayer (usually with one input, one hidden, and one output layer). In order for the hidden layer to serve any useful function, multilayer networks must have non-linear activation functions for the multiple layers: a multilayer network using only linear activiation functions is equivalent to some single layer, linear network. Non-linear activation functions that are commonly used include the logistic function, the softmax function, and the gaussian function. Logistic curve, specifically the sigmoid function A logistic function or logistic curve models the S-curve of growth of some set P. The initial stage of growth is approximately exponential; then, as competition arises, the growth slows, and at maturity, growth stops. ...
The softmax activation function is a neural transfer function. ...
Gaussian curves parametrised by expected value and variance (see normal distribution) A Gaussian function (named after Carl Friedrich Gauss) is a function of the form: for some real constants a > 0, b, and c. ...
The backpropagation algorithm for calculating a gradient has been rediscovered a number of times, and is a special case of a more general technique called automatic differentiation in the reverse accumulation mode. In mathematics and computer algebra, automatic differentiation, or AD, sometimes alternatively called algorithmic differentiation, is a method to numerically evaluate the derivative of a function specified by a computer program. ...
It is also closely related to the Gauss-Newton algorithm. In mathematics, the Gauss-Newton algorithm is used to solve nonlinear least squares problems. ...
External links
- Chapter 7 The backpropagation algorithm of Neural Networks - A Systematic Introduction by Raúl Rojas (ISBN 978-3540605058)
- Implementation of BackPropagation in C#
- Quick explanation of the backpropagation algorithm
|