Stochastic Delta Rule

 

Background

Widrow and Hoff discovered the Delta Rule.  It consisted of a teacher, an error signal, and a weight adjustment procedure.  The Stochastic Delta Rule is similar to theirs.  It begins stochastically, which means that the weights begin with random values, and that the update rule only gives a probability that the units will take on either their local maxima or minima values.  The weights are then adapted according to the Delta Rule.

 

Overview

Basically, the network picks a more suitable weight from the population.  The population then decreases, and a more suitable weight is taken.  The system is constrained via standard deviation.  The weights are chosen at random from the subsets of space the network works with.  Then the teacher identifies if the network is correct in its response; it determines if the answer is within the standard deviation from the current mean.  If the network is correct, the mean is updated and the new, more refined standard deviation is determined.  If it’s incorrect, the teacher accesses the derived error signal and then adjusts the weights accordingly.

 

The original Delta Rule is applied as follows:


It begins by looking at the output (which is bipolar in ADALINE), that is determined through the net input:


 


The error signal, δ, is then calculated to see if the Output is correct:

 


If the error signal doesn’t equal zero then the change, Δx, for the units are computed:

 



Finally, the weights to the units are adjusted with the above change:

 


This way, the weights that started from random values are now refined to the values needed for the desired network result. These calculations are done as needed, for all the input patterns.

 

 

 

The Boltzmann Machine (Stochastic) changes the original:

Instead of calculating whether the Net is 0 or 1, or –1 or 1, it finds the unit’s probability:


Then, if the Probability is greater than the random value, then


Otherwise,