Back Propagation

 

Background

The artificial neural network consists of three layers: the Input, Hidden, and Output layers.  The network is initialized pseudo-randomly, which means that the internal weights are given values generated from a randomized equation.  Recall that since this is a neuro net, it has both short term (output of units) and long term (weights of connections) memory.  The Back Propagation (BP from here on out) adapts the LTM area of the network in its learning procedure.  The BP commences its learning procedure by performing a ‘forward sweep’ of input, followed by a backward sweep of error correction.  The error correction refines the network and draws the network closer to the ideal one.  This backward sweep is the learning mechanism of the algorithm. 

 

Overview

 

The net value of a node is the summation of all the contributing nodes output, times their connection weights, plus any additional biases.  Which places the net value of a node j as the following:            [Note: p stands for current pattern being pushed through network]

 

The output of j then becomes:

 


 


The backwards error correction, or change, needed in i and j is determined by analyzing the desired output.  The equations are as follows:


 


 


The above error signals are used to compute the change in the weight from I to j.


 

 


The previous equations for i and j, can now be seen in a more general form:

 


 


 


 

 


BP Variable definitions


    à        weight of the connection between I and j


      à        output value of i

à               

j

 

 
error value of j

 

     à        bias of unit j

        à        learning rate (step size)


       à        desired output (teaching) of j

 

      à        momentum factor


à      weight between i and j at time t+1

à      change in weight between i and j at time t


 

 

 

 

 


Something worth mentioning

This learning algorithm starts at nothing.  In fact, it starts at a level below nothing when it’s initialized randomly.  It then effectively ‘learns’ the network whose units are being looped through it simply by adjusting the weights.  The network can only adjust the weights because they are the only part that last; they are referred to as the Long Term Memory (LTM) portion of the network.  By adjusting the weights of a particular node, we have effectively adjusted the inputs.

 

 

Neuro Network and Algorithm Analysis

Although the BP algorithm is good at learning, since it’s based on an artificial neuro net, it has its limitations.  Its weights are more plastic than elastic like our actual minds.  I say plastic, because the neuro nets form only around the particular set of inputs.

 

Understand that the neuro net doesn’t store the information as humans seemingly do.  It simply holds the ridged form of the network, much like the memory foam in roller blades.  Just as a similar shape and sized foot can comfortably fit the roller blade’s memory foam lined interior, the neuro network can assess similar unrehearsed items.  However, if a totally new set of characteristics were given to either of the two, neither would retain the network of before, since the actual information was never stored. 

[Note: This problem (called Negative Transfer) was later solvable with neuro nets.]

 

Side Note

Our mind contains a seemingly unlimited supply of connections.  In fact, we don’t even use a sixth of them in our lifetime.  The plethora of biological connections presents a solution to the limitation of the restricted and constrained Artificial Neuro Network (ANN).  Once the network attempts to hold another set of information, the old set may be completely erased.  With humans, cross connections are made between other existing networks that allow us to retain what may have been lost in the ANN.  Later modifications were done with the ANN in BP to allow it to retain the old network.  This allowed the algorithm to act more ‘naturally’.