How could the BP algorithm become more effective in dealing with multiple training sessions?
Hinton and Sejnowski (1986)
Decay Factor (scaling factor):
Originally, the idea was to either keep the original weights if this factor was 0, or to zero the weights if it were 1. This was erroneous since all weights were treated the same.
Adaptive Decay Factor:

Hinton and Sejnowski later proposed this idea.
Instead of zeroing the weights, they attempted to shrink them over
time. Their equations for the decay and
weight adjustment were the following:
![]()
Although their latter idea wasn’t as static, it still didn’t conquer the
negative transfer problem.
Von Lehman, Pack, Lias, Marrakch, and Patel (1988)
The Noise and Clip Method:
Their idea was to add random noise to the weights, and then clip the weights so that they do not exceed a maximum value. They added the random noise as so:
![]()
![]()
![]()
Then the weights were clipped with the following:

Though this was interesting, it still failed to solve the negative transfer problem.
Thanks to the combined attempts of the aforementioned individuals, and with Squire, Cohen & Nadel (1984) with the coarse grain memory consolidation, we are now able to utilize a Modified Sequential Learning Framework, a.k.a. Modified Back Propagation (MBP). The Old Sequential Learning Framework, a.k.a. Old Back Propagation (OBP), had unbound weights and the lack of a reorganization of memory. The MBP algorithm, however, controls this growth of the weights and reorganized its memory between the training sessions. The MBP’s reorganization is adaptive and can be implemented in a number of ways.
The growth of weights are controlled via weight clipping. If a weight is larger than the designated maximum, the weight is set to that maximum value. Likewise, if a weight is smaller than the designated minimum, it is set to that minimum.
![]()
The adaptive decay is computed with the following:
![]()
Weight Standardization is done with the following:
![]()