Virtual input" phenomena within the death of a simple pattern associator

Post on 02-Feb-2023

2 views 0 download

Transcript of Virtual input" phenomena within the death of a simple pattern associator

~ Pergamon

CONTRIBUTED A R T I C L E 0893-6080(94)00065-4

Neural Networks, Vol. 8, No. 1, pp. 55-65, 1995 Copyright © 1994 Elsevier Science Ltd Printed in the USA. All rights reserved

0893-6080/95 $9.50 + .00

"Virtual Input" Phenomena Within the Death of a Simple Pattern Associator

S. L. THALER

Dendrite Neurocomputing

(Received 10 September 1993; revised and accepted 6 June 1994)

Abstract--A simple pattern association network, trained to convert any of eight three-bit input patterns to corre- sponding 3 × 3 pixel patterns, is destroyed by the random pruning of its connection weights. Within such "'deaths'" we see the frequent appearance of the trained output patterns independent of the application of the corresponding inputs. Such events, which we shall call "'virtual inputs, "" increase in frequency when input layer weights are pruned in favor of those in the output layer. We ultimately attribute the virtual inputs to a neural network completion process in which the pattern of zeros produced by the stochastic decay are interpreted by the largely intact hidden and output layers as the application of any of the eight original training input vectors to the input Units. After isolating the essential mechanisms producing virtual inputs, we generalize this phenomenon to a wide range of parallel distributed systems.

Keywords--Virtual inputs, Neural network completion, Death, Pruning, Pattern associator, Phantom experience.

1. I N T R O D U C T I O N

In a previous investigation Thaler (1993) observed that as both weights and biases were randomly pruned from a 4-2-4 encoder, with its inputs clamped at constant values, training outputs frequently occurred without the application of their corresponding inputs. We call such events "virtual inputs" due to the false indication that a training vector has been applied to the inputs of the network, when, in fact, unassociated inputs are present. Here, we a t tempt to investigate this phenom- enon in ~ o r e detail and describe an experiment per- formed to support the hypothesis that virtual inputs are largely the result of neural network completion, the fundamental property of neural networks to recognize or to classify degraded input as previously learned pat° terns (see, for instance, Rumelhar t & Zipser, 1989).

Acknowledgements: Significant gratitude goes out to Philip Yam of Scientific American for his prized editorial feedback in the presen- tation of these concepts. Likewise, sincere thanks are expressed to Dr. David C. Plaut of Carnegie Mellon Psychology and Matthew M. Thomas of Washington University Chemical Engineering for the varied criticisms that help sharpen this paper's arguments. As usual, I ac- knowledge the continuing general encouragement of both Lawrence Pado and Pete Lichtenwalner of the Intelligent Systems Development Group at McDonnell Douglas. The opinions expressed, however, are solely those of the author.

Requests for reprints should be sent to S. L. Thaler, 12906 Autumn View Drive, St. Louis, MO 63146.

Typically, the process of completion is exemplified by the retrieval of lost or ambiguous components of a net- work input vector, either through output classification or, if a feedback path exists, regeneration of the inde- terminate components. In short, the network associates an arbitrary input vector with its closest "relative" among the network's training input set or known en- vironmental patterns. Here, we broaden discussion to include not just vectors applied to input units, but to the completion of activation patterns generated inter- nally within a feedforward network by both hidden and output layers.

To study this phenomenon we chose the 3-5-9 pat- tern associator, a simple feedforward network that maps any of eight possible three-bit input patterns to eight, fourfold symmetric, pixel patterns. Generally speaking, pattern associators are networks that have been trained by the presentation of pairs of both input and output patterns. A s a result, such systems have "learned" that when one input pattern is presented, the corresponding output pattern of that pair must likewise be produced. Here, we study the random pruning of a pattern as- sociator in the hope of extracting general lessons that may be applied to massively damaged associative memories. We select this particular network mapping from among many other candidate associators (all of which manifest the virtual input phenomenon) due to the visually distinctive appearance of these highly sym- metric outputs compared with that of random output

55

56 S. L. Thaler

inputs 1 2 3 / ~

4 5 6

7 8 9 1 2 3 4 5 6 7 8 9 outputs

FIGURE 1. Mapping of network outputs to the 3 x 3 pixel pattem. The numbers indicate the association of outputs with pixels.

patterns. Thus, an observer of a typical pruning se- quence may readily and intuitively ascertain the sig- nificant frequency of training outputs within the de- grading network's output stream.

In Figure l, we see the relationship between network outputs and individual pixels in the 3 × 3 pattern. Fig- ure 2 shows the eight training activations of the network along with the resulting pixel patterns. There, a black solid square within a node indicates an activation greater than or equal to zero, whereas an open square

denotes an activation less than zero. The sides of each of these squares scales with the magnitude of the acti- vation, which is normally in the range from zero to I.

The actual experiment involved clamping the net- work's inputs and then randomly pruning connection weights, leaving biases intact. Simultaneously, we mon- itored the output of the "dying" network and collected relevant statistics. Provisions were made within the code to adjust the relative probability of pruning within the two connection layers of the network. Therefore, within the pruning algorithm, it was possible to stochastically favor weight eliminations in the input layer over those in the output layer, or vice-versa. As a result, we could smoothly vary the intactness of any given layer and therefore its capacity to carry out the neural network completion process. Thus, positive correlation of virtual input rate with the degree of preservation of the output weight layer would strongly indicate completion as the dominant mechanism.

This work loosely connects with research in several

{o,o,o} {],o,o}

{o,o,1} {1,0,1}

{0,1,0} {1,1,0}

{0,1,1} I {1,1,1}

FIGURE 2. The eight pattern associations used in this study. In each case, the output pattern is shown along with the associated three-bit input pattern. Activations of all neurons in the network are also shown as black (>=0 ) or white (<0) squares. Note that inputs have been rescaled and displayed at 1 /4 scale.

"'Virtual Input" Phenomena 57

areas. For instance, Hinton and Shallice (1991) and Plaut (1993) have used pruning procedures in modeling dyslexia and brain damage. There, activation patterns of the simulation network behave as local attractors for which boundaries shift with progressive weight removal. So-called "dreaming" has been reported by Crick (1983) and Hopfield (1983) within the Boltzmann Machine and Hopfield Nets as spurious minima are removed by reverse Hebbian learning. Further, the in- corporation of pruning techniques by such researchers as LeCun (1990) has greatly facilitated accelerated learning schemes. However, none of these areas of re- search has dealt with the phenomenon of virtual inputs, as defined above, when a large fraction of connection weights are pruned from a network.

In general, prior research involving weight pruning has concerned itself with the fidelity of the input to output mapping with progressive network damage. The fraction of total weights destroyed has been within the regime of graceful degradation. Here, we take a radically different perspective, carefully observing the results of network destruction from beginning to end, as network output becomes chaotic. The foremost quantity we measure is the frequency with which the network pro- duces a false indication that a training vector has been applied to its inputs. Such frequencies are judged by the dominance of training output vectors at the output layer, in the course of network pruning. We anticipate that such events are of significance to parallel hardware implementations and that the effect may have some role in phantom experience seen in traumatized bio- logical networks.

to a preset threshold, PL 1, also falling within this range. If the random number was less than or equal to the threshold PL 1, the weight was pruned. For a candidate weight within connection layer 2, the decision to prune was similarly carried out using the threshold value PL2, also chosen within the range 0 to 1. For the white dis- tribution represented by the random number generator, PL 1 and PL2 assumed the significance of probabilities, and in turn rates of pruning for connection layers 1 and 2, respectively. Both of these threshold values were varied randomly from one pruning sequence to another to study their effect upon virtual input frequencies.

By assigning two independent layer pruning rates governed by the thresholds PL 1 and PL2, we built in the required layer pruning bias to the random decay. Once a weight had been removed in this fashion, the clamped inputs were propagated through the damaged network by the normal feedforward paradigm, with pruned weights set to zero. Outputs were then mapped to the 3 × 3 pixel pattern by the scheme outlined in Figure 1. If an output pattern corresponded to any of the trained outputs shown in Figure 2, we incremented the tally of virtual inputs. If output did not match any of the trained outputs, we held the virtual input tally constant. The pruning cycle was then continued with the targeting of a new weight for pruning. This process was repeated until the network was pruned of all weights.

To summarize and elaborate, the pruning algorithm may be represented in C-pseudocode as the following function, called repetitively from the main program:

2. M E T H O D O L O G Y

We trained the 3-5-9 pattern associator by the standard methods of back propagation and gradient descent. Hidden layer units employed sigmoidal activation whereas output units utilized linear transfer functions. Inputs in the range from 0 to 1 were rescaled to the range - 4 to +4. (Such a rescaling was found to generally decrease training time within the trainer we routinely utilize.) After attaining a training tolerance of 0.01 RMS, the fully connected network was embedded within the pruning code. This algorithm contained a subroutine that randomly chose some value of con- nection weight terminal layer, l (2 or 3 ) and connection weight indices j and k (the weight joining the j th and kth units in layers l-1 and l, respectively, as shown in the Appendix). Once a weight, identified by the indices j , k, and l, was chosen as a candidate for pruning, the final decision to prune the weight was made by call to a random number generator. If a weight belonged to the first connection weight layer, this randomly gener- ated number in the range from 0 to 1 was compared

Death ( ) {

integer j , k, l; j = Random ( ); k = Random ( ); l = Random ( ); if (CONNECT(l, j , k) ! =

{ if (1 == 2 && Random <

CONNECT(l, j , k) = 0; if (1 == 3 && Random <

CONNECT(l, j , k) = 0; Associator(inputs, outputs); Tally(outputs, virtual inputs);

.

0)

PL1)

PL2)

Display (inputs, puts);

wtDeaths++;

outputs, virtual in-

Here, PL1 and PL2 are the above-mentioned thresh- olds; "Associator" the feedforward function for the

58 S. L. Thaler

network; "Tally" a function to maintain statistics on virtual inputs observed, and "Display" the function responsible for all graphics. "Random" is a either a Monte Carlo value of the indices j , k, or l (neuron and layer labels), or the random value between zero and 1 to be compared with the thresholds PL 1 and PL2. The variable wtDeaths, which represents the number of weights pruned, is incremented until equal to the total number of nonbias weights originally in the network (60). At that point, network weights are refreshed in preparation for a new pruning sequence. We note that layers 1 = 2 and 1 = 3 represent connection weights terminating on those unit layers, respectively, as dis- cussed in the Appendix, and may equivalently be la- beled as connection weight layers 1 and 2.

Within the graphical routine, we set the threshold for activating a given pixel to 0.5, anticipating outputs in the range from zero to 1. Thus, outputs less than this threshold generated a white pixel, whereas those above or equal to this value produced a blackened pixel. Graphical display consisted of the output pattern as well as the entire pattern of activations within the net- work. This style of program output made it easier to identify trends that might emerge during the pruning, while serving as a visual tool to demonstrate and explore the concept of virtual inputs.

Because the overall number of decay scenarios (i.e., the distinct permutations of connection weight removal order) is so great, 60! = 8.3E + 81, and because we intend to gather meaningful averages over many of these scenarios, data collection involved carrying out 100 simulated deaths for any given pair of layer decay prob- abilities, PLI and PL2. At the completion of this decay cycle, PLI, PL2, and the average number of virtual inputs per death were logged away to file. In addition, virtual inputs were also recorded as a function of frac- tional pruning. Because output was updated only after the pruning of a single weight, we anticipated no more than 60 virtual inputs per death, the total number of weights in the network. Further, because weights were chosen as candidates for pruning by an entirely random process, PL1 and PL2 did not directly influence the order in which weights were pruned. Instead, trajectory of pruning through the connection weight space was influenced by the ratios of P L I / P L 2 (i.e., the favored elimination of one layer or the o the r ) . The time re- quired to prune all weights in the network was deter- mined jointly by their magnitudes.

Using the general procedures outlined above, we carried out several types of pruning experiments on the 3-5-9 pattern associator to discover various trends in the virtual input phenomenon. These studies included (1) PL1 /PL2 variations in the death of the trained network, (2) selected PL 1 / PL2 values in the death of an untrained network, and (3) input clamping varia- tions in the death of the trained network.

3. RESULTS

The first eight stages of a representative, trained network death are shown in Figure 3, with inputs clamped at { 0, 0, 0 }. The two thresholds for decay have been cho- sen to produce a 10-to-1 ratio (PL1 = 1 and PL2 = 0.1 ) in the rates of decay of input versus output layer connections. This series of output patterns was obtained by screen capture following each transition in network output during the pruning sequence. We observe that output consists of nearly all the training output patterns, even though inputs have been clamped at values of zero. The one exception to this trend occurs at a fractional death of 0.2 (where 20% of the connection weights in the network have been pruned). Therefore, we see that the majority of this network death is dominated by virtual inputs. Also noteworthy is the observation that network output tends to remain fixed throughout mul- tiple weight prunings, as evidenced by only eight distinct stages of output throughout the 18 stages of weight eliminations. This behavior appears to be a common phenomenon within all pruning sequences observed.

We contrast this decay with that of an untrained network, with weights randomized to values between +5 and - 5 (Figure 4). In this pruning sequence, virtual inputs are completely absent, with the exception of the trivial case seen at 97% pruning, with nearly all con- nections to the output layer nulled. (This effect is the result of retaining the trained biases, which, when also randomized, produce random outputs near 100% weight pruning.)

The outcome of 50 experiments, each involving spe- cific ratios of pruning rates PLI and PL2, is reported in terms of the average number of virtual inputs de- tected over 100 deaths in Figure 5. There we have plot- ted the mean number of virtual inputs per death versus the ratio PL1/PL2. We observe that virtual inputs are a monotonically increasing function of the layer decay ratios PL 1 / PL2 and that there appear to be two distinct regimes, one extending from P L I / P L 2 ~ 0 to PL1/ P2 ~ 3-4 and the other, from PL 1 /PL2 ~ 4 upward. In the latter regime, virtual inputs asymptotically ap- proach the upper limit of 60. At that point, elimination of every connection weight has resulted in a virtual input.

We observe that virtual input behavior merges smoothly into the area of completely random decay at PL 1 / PL2 = 1 and that the shoulder in this curve near PL1 /PL2 = 3 occurs when the rate of destruction of the two layers is equal (recall that layer 2 contains 45 weights and layer 1 contains 15 weights, for a ratio of 3 / 1 ). Above this ratio, connection layer 1 decay rate exceeds that of layer 2.

Also noteworthy in the plot of Figure 5 is that fact that in the limit of PL 1 /PL2 approaching zero, we see a mean virtual input count of 12 per death, for a virtual

"Virtual Input" Phenomena 59

0% pruned 25.0% pruned

20.0% pruned 26.6% pruned

22.6% pruned 28.3% pruned

23.3% pruned l 30.0% pruned FIGURE 3. A representative stochastic death for the trained pattern associator. Probability of pruning has been biased to favor elimination of the first layer of connection weights. Percentage of weights pruned from the network are shown at each transition in network output.

input probability of 20%. This scenario amounts to the complete preservation of the input weight layer as the output weight layer disintegrates.

The time evolution of virtual inputs is plotted in Figure 6 for the two antithetical cases of PL 1 equal to either zero or 1, over a range of PL2 values. These curves were produced via generalization by a 2-5-10 feedfor- ward network trained on data gleaned from 5000 deaths of the pattern associator. Inputs consisted of PL 1 and PL2, whereas outputs were the probabilities of virtual inputs over 10 distinct stages of network pruning. In the case of PL 1 equal to zero, we expect favored hidden to output layer pruning and for the case of PL 1 equal to 1, we anticipate preferred input to hidden layer pruning. In the former situation, virtual input generally decreases with pruning, but then recovers with the ap- pearance of the trivial output corresponding to input vector { 1, 1, 1 }, near the stage of complete death. In the latter case, virtual input rate bottoms out near a fractional pruning of 0.2 and then steadily increases toward death.

To demonstrate that the frequency of virtual inputs during the death of the trained network was significant, we carried out two other baseline computer experi- ments consisting of (1) a "coin toss" in which eight bits were randomly set to either zero or 1 with equal probability and (2) the decay of an untrained network over many deaths. In the first case, the probability of any of the named virtual inputs occurring was deter- mined to be 0.0154 _+ 0.0006, after 90,000 trials (+ indicates a measure of sample standard deviation). This value is very close to the anticipated value of 8 × ( 1 /2)9 = 0.0156 for a completely random process. For the second case of network death with randomized weights, carried out over 100 varied ratios of PL 1 / PL2, the probability of virtual inputs was evaluated to be 0.02181 _ 0.02664. We contrast these respective prob- abilities with that of 0.58 for random pruning of the trained network, as read from the plot of Figure 5 (i.e., an average of 35 virtual inputs per death out of 60 possible steps of network pruning).

The effect of using different clamping inputs within

60 S. L. Thaler

0% pruned 53.3% pruned

21.6% pruned 73.3% pruned

40.0% pruned 78.3% pruned

45.0% pruned 96.6% pruned FIGURE 4. A representative stochastic death for the pattern associator using randomized weights. As in Figure 3, the probability of pruning has been biased to favor elimination of the first layer of connection weights.

the pruning sequence is shown in Table 1 for PL 1 / PL2 equal to 10. We see an approximate spread of 16% in the average number of virtual inputs per death. In all cases summarized there, virtual input represents a sig- nificant fraction (63%) of the response of the dying network.

4. D I S C U S S I O N

At the most casual level of observation we note that network output evolves with progressive weight prun- ings, even though input values are clamped at zero. This process is outside the normal paradigm of feed- forward processes in which there is a static mapping between network input and output. We witness a se- quence of output events consisting of a mixture of noise and more meaningful training outputs (the network response associated with virtual inputs). Effectively, the trained network is processing its own progressing in- ternal decay and, as a result, producing a stream of both nonsense and training outputs. These training or "phan tom" outputs are seen 37 times more frequently than would be seen in the random setting of the eight

~ "~ ~ ~ . ~ ~ ~ , ~ I~ t_ ~ ~

, ~ ~

N

7 0 .

60

50

4O

30

20

10

0 0 5 10 20

I I

15

P L 1 / P L 2

FIGURE 5. Mean number of virtual inputs per death versus the relative rates of pruning of input and output connection layers (PL1/PL2). A curve fit was achieved using a separate feed- forward network trained to within an RMS error of 0.5. The plot is divided into two regimes, I and II, respectively, in which input or output weight layers tend to be preserved. At the point PL1 / PL2 = 1 there is totally random decay with no preference for either layer. There, 35 virtual inputs may be seen per death.

"Virtual Input" Phenomena 61

1.0

• 0.8

~ 0.6

,~ 0.4

~ 0.2

0.0 0.0

1.0

.~ 0.8

~, 0 . 6 -~

~ 0.4. o ~ ,~ ,~ 0.2

_ _ _

~,~

~ ~ , I ~

0 . 2 0 . 4 0 . 6 0 . 8

fractional pruNng

• i - •

,,,,. "~ ~ 0 . ( 0.4 ~ O~

1.0

P L I = 0

0 . 0 , I , I I "

0.0 0.2 0.4 0.6 0.8 Lo fractional p r u ~ n g

P L I = I

FIGURE 6. The probability of virtual input vs. fractional weight pruning for conditions of (top) PL1 equal to zero and (bottom) PL1 equal to 1. PL2 values are indicated on each curve. Note that these plots were produced using a separate feedforward network trained on data from 5000 deaths.

output bits. Therefore, we view such duplication of training outputs, independent of inputs, as more than just chance occurrence. The fact that so few virtual inputs are observed within the death of the untrained network further corroborates our view that this phe- nomenon is neither a fluke nor an artifact of transfer functions.

Clearly, the faster the input layer is destroyed relative to the output layer, the greater the level of virtual input. The favored destruction of the input weight layer tends to preserve the output weight layer in a form that is sufficiently intact to ensure its ability to perform the internal neural network completion process. Further, even with complete preservation of the input weight layer and output weight layer degradation, we see sig- nificant frequencies of virtual inputs near 20%. We at- tribute this residual rate of virtual inputs as completion by the network's output layer. Thus, our original con- tention that completion is the basis of virtual inputs within the dying 3-5-9 pattern associator, is borne out.

To better envision the phenomenon of virtual inputs, we refer to Figure 7, where we see a trivial example of virtual input and the underlying completion process. There, the input pattern is { l, 1, 1 }, but the network alternately "sees" the input vector { 0, l, 1 } due to the selective pruning of all outputs from the leftmost unit in the first layer. Of course, within the stochastic pruning schemes examined here, such a scenario is rather un- likely. However, the example serves as a template for other less transparent situations, in particular when not all connections have been severed from a given unit. Important to note here, is that the network is not car- rying out completion beginning at the input layer, but rather within the hidden and output layers. We therefore see that virtual inputs are strictly an internal completion process as the second and third unit layers tend to in- terpret the network activations passed to them as those produced by any of the eight possible training vectors.

Generalizing from Figure 7, we observe that the vir- tual input phenomenon may be viewed at two levels, the local or microscopic and then the global or mac- roscopic. The microscopic interpretation focuses on the form of the unit transfer function, F, yielding the ac- tivation

act/ = F ( ~ wijxj+@#~), (1)

where acti is the activation of the ith unit, xj is the input from the j th unit to unit i, w o is the connection

TABLE 1 Average Number of Virtual Inputs per Death for Different Input

Clampings

Average Number of Virtual Inputs Standard

Input 1 Input 2 Input 3 Per Death Deviation

0 0 0 29.43 10.92 0 0 1 31.60 11.12 0 1 0 39.89 10.36 0 1 1 38.71 10.06 1 0 0 33.75 10.21 1 0 1 43.35 10.55 1 1 0 46.44 11.88 1 1 1 44.75 7.90

Averaging was performed over 100 pruning clamping.

Average = 38.49 _+ 5.92.

sequences for each

62 S. L. Thaler

FIGURE 7. A simple form of virtual input. Above, input pattern of /1, 1, 1 } may produce an output pattern corresponding to inputs {0, 1, 1} if weights are pruned as shown. Below we show equivalent pattem association of the intact network. Note that inputs in the range [0, 1] are rescaled to the range [ - 4 , ÷4] so that the pruning shown above is equivalent to the left- most input bit being set to a value of 1/2. Output then bifurcates with successive degradation with the network alternately in- terpreting input as either { 1, 1, 1 } or { O, 1, 1 }.

weight between units i and j , and ~; is the bias applied to unit i. From the multiplicative form of weights and inputs within this activation function, we see that the pruning of a specific weight to unit i is indistinguishable from a bonafide input of zero originating from an af- ferent unit,

w~j : 0 ~=~ xj : 0. (2)

We may figuratively interpret the outcome of pruning a weight as a single unit mistakenly sensing that it has received a zero-valued input. The macroscopic view of virtual inputs is that units act collectively to carry out the neural network completion process. Thus, as the network is randomly pruned, the network as a whole sees a series of false inputs and activations that it then interprets as one of the eight possible trained activations

of the network. Expansion of the local attractor basins, similar to that observed by Hinton, Plaut, and Shallice (1993) in graphic to semantic mapping, may be oc- curring here as the interpretation of any given input as a training vector broadens with progressive damage to output layer weights.

The general ability of neural networks to classify inputs into distinct input categories may alternately be viewed as the intrinsic nature of such systems, both natural and simulated, to attain any of a number of stable states, each represented by a complete set of unit activations. At any given stage of connection weight pruning, the damaged network tends to select one of these now perturbed, yet stable, states in response to the overall pattern of internal activations and "pseu- dozeroes," as represented in Equation 2.

In the illustrative example chosen for exploration here, we have used a highly symmetric mapping be- tween network outputs and the 3 × 3 pixel pattern to clearly mark virtual input phenomena within the de- grading network's output stream. To show that the vir- tual input effect persists with lower symmetry output patterns, we carried out similar pruning experiments on 3-5-9 pattern associators trained on low symmetry output patterns. In Table 2, we show the average num- ber of virtual inputs encountered per death (PL1 /PL2 = 1 ) of similar 3-5-9 pattern associators, trained to the same degree of error. For each mapping 1000 deaths were carried out. We see that although the virtual input rate drops by a factor of 2 for the nonsymmetrical mappings, the overall frequency of virtual inputs is still at least 10 times greater than would be seen in the ran- dom setting of outputs to values of zero and 1.

5. CONCLUSIONS

Our original hypothesis, that neural network comple- tion is the dominant mechanism behind virtual input phenomena within the death of the 3-5-9 pattern as-

TABLE 2 Virtual Inputs per Death for Various 3-5-9 Pattern Associator Mappings

Mapping 1 Four-Fold Symmetric (Average No. Virtual Inputs per Death =

35.20 +_ 10.33)

Mapping 2 Adjacent Squares

(Average No. Virtual Inputs per Death =

17.06 +_ 12.89)

Inputs Outputs Inputs Outputs

Mapping 3 Mapping 4 Three-Way Encoder Three-Way Complement (Average No. Virtual (Average No. Virtual Inputs per Death = Inputs per Death =

11.67 _+ 2.05) 18.29 _+ 12.55)

Inputs Outputs Inputs Outputs

000 000000000 001 111101111 010 010101010 011 101000101 100 101010101 101 010111010 110 000010000 111 111111111

000 110000000 001 011000000 010 001001000 011 000001001 100 000000011 101 000000110 110 0001001 O0 111 100100000

000 000000000 000 111111111 001 001001001 001 110110110 010 010010010 010 101101101 011 011011011 011 100100100 100 100100100 100 011011011 101 101101101 101 010010010 110 110110110 110 001001001 111 111111111 111 000000000

"'Virtual Input'" Phenomena 63

sociator, appears to be consistent with our results. Fur- thermore, we observe that both the microscopic and macroscopic interpretations of virtual inputs are very broad and apply to a wide gamut of parallel distributed systems. The fundamental requirement, generally met by all neural networks, is the ability to carry out the completion process.

In addition to supporting our central premise, we make the following observations about the behavior of this simple feedforward network as it condenses via pruning: • Increasing the layer decay ratio, PL1/PL2, increases

the total number of virtual inputs within the death of the network. This effect is tantamount to both normal activations and pruning-induced zeros being propagated in the forward direction and then being interpreted as one of the eight possible training inputs within the hidden and output layers.

• Decreasing the ratio PL1 /PL2 results in preferential damage to the output weight layer, thus degrading its ability to perform the completion process. As a result, we see less virtual inputs.

• In observing the evolution of virtual inputs through- out the pruning cycle, high values o f P L 1 / P L 2 show fairly constant probabilities of virtual input. Virtual input rate does tend to drop off near a fractional pruning of 0.2, but then then recovers shortly there- after.

,

• For low P L I / P L 2 ratios, virtual input rate steadily declines as weights are pruned from the network.

• Total virtual inputs per death are relatively insensitive to the nature of clamping inputs. We therefore expect that all the results of this study apply to the case where inputs are allowed to fluctuate during the pruning sequence. This observation allows us to an- ticipate virtual output streams from the dying net- work with the input of time varying noise.

• Network output tends to settle into distinct trajec- tories, remaining constant over multiple weight prunings. Evolution of output takes place in abrupt changes. We view this effect as distinct stages of graceful degradation separated by catastrophic tran- sition.

• The average number of virtual inputs observed per death is dependent upon the mapping chosen and is seen to decline with less symmetric mappings. Nev- ertheless, in the case of all pattern associators de- stroyed, the rate of appearance of virtual outputs was at least l0 times that which would be seen in the random setting of outputs to values of zero and 1.

• We also note that in regard to hardware implemen- tation of parallel processing, we now have at our dis- posal a means for "damping" virtual inputs should weights be unintentionally lost from the system. Such a safeguard would involve the building in of redun- dancy within the weight layers closest to the input layer, in any feedforward architecture. Such a mea-

sure would ensure the preferential destruction of lay- ers closest to the outputs and the production of clearly distinguishable noise rather than virtual inputs.

• Whereas the association of the term "virtual inputs," as defined here, with that of our commonly accepted notion of phantom experience and hallucination is tantalizing, the role of this phenomenon within ,neu- robiology can only be corroborated by experirhen- tation with more biologically representative networks. Furthermore, a much more quantitative medical data base is needed to validate this paradigm. Neverthe- less, the fundamental mechanism of virtual inputs embodied in eqn (2) may be relevant to the biolog- ically accepted notion of synaptic integration. The use of plastic weights, which may assume either their trained values or zero, may generally emulate the release of global neuroinhibitors or the removal of groups of neurons by metabolic death. When we stop to consider the repercussions of large

numbers of neurons, all embedded within a highly in- teractive neural community, becoming inoperant as the result of stress, trauma, or pharmacology, we appreciate the potential value of such explorations. Certainly, some biochemical research is in progress, focused primarily "on the individual neuron (see for instance, Sapolsky, 1992), but little if any theoretical effort has concen- trated on the global, cooperative effect of large-scale synaptic relaxation, reversing the effect of long-term potentiation or the widespread destruction of whole neurons. In this investigation we have learned that the internal noise generated by a neural ~etwork, as a result of its own decay, is often interpreted by that same net- work as some feature of its environment.

REFERENCES

Crick, E, & Mitchison, G. (1983). The function of dream sleep. Nature, 304, 111-114.

Hinton, G. E., Plaut, D. C., & Shallice, T. (1993). Simulating brain damage. Scientific American, 271, 77-82.

Hinton, G. E., & Shallice, T. ( 1991 ). Lesioning an attractor network: Investigations of acquired dyslexia. Psychological Review, 98 ( 1 ), 74-75.

Hopfield, J., Feinstein, D. I., & Palmer, R. G. (1983). 'Unlearning' has a stabilizing effect in collective memories. Nature, 304, 158- 159.

LeCun, Y., Denker, J. S., & Solla, S. A. (1990). Optimal brain damage. In D. S. Touretzky (Ed.), Advances in neural information pro- cessing systems (pp. 598-605 ). San Mateo, CA: M. Kaufmann.

Plaut, D. C. (1993). Deep dyslexia: A case of connectionist neuro- psychology. Cognitive Neuropsychology, 10(5 ), 377-500.

Rumelhart, D. E., & Zipser, D. (1989). Feature discovery by com- petitive learning. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Exploration in the microstructure of cognition (p. 161 ). Cambridge, MA: MIT Press.

Sapolsky, R. M. (1992). Stress, the aging brain, and the mechanisms of neuron death. Cambridge, MA: MIT Press.

Thaler, S. L. (1993). 4-2-4 encoder death. Proceedings of the World Congress on Neural Networks (Vol. 2, pp. 180-183), Portland, Oregon.

64 S. L. Thaler

A P P E N D I X : C O N N E C T I O N W E I G H T A N D BIAS A S S I G N M E N T S

unit j layer ffi i-1

INDEXING CONVENTION:

CONNECT( !, j, k )

BIAS( 1, k )

+ unit k

layer = I

unit unit unit layer layer layer

1 2 3 ! connection I i connection layer 1 [ layer 2 ~ ! i ~ ~ ........ i

-i - - ~ ; !

~ ~ i ~ .

C O N N E C T ( ~, 1, 1 ) C O N N E C T ( 3 , 1, 1 )

TRAINED WEIGHTS AND BIASES:

Note that inputs have been rescaled from the range [0,1] to [-4,4].

BIAS(2,1)=-2-~34175

CONNECT(2,1,1)ffi-2.694222 CO NNECT(2,2,1)ffi+2.796041 CONNECT(2,3,1)ffi-2.155189

BIAS(2,2)ffi-7.271454

CONNECT(2,1,2)ffi-8.632223 CONNECT(2,2,2)ffi-2.895035 CO NNECT(2,3,2)ffi+5.603945

BIAS(2,3)=+0.074769

CO NNECT(2,1,3)ffi-l.586259 CONNECT(2,2,3)ffi+0.440181 CONNECT(2,3,3)=+0.552238

BIAS(2,4)--7.443289

CONNECT(2,1,4)~-2.453918 CONNECT(2,2,4)~-5.306237 CONNE CT(2,3,4)~+2.962175

BIAS(2,5)=-0.1558695

CONNECT(2,1,5)=+0.832011 CONNECT(2,2,5)ffi-I.656244 CONNECT(2,3,$)ffi.I.485925

BIAS(3,1)=+l.00285

CONNECT(3,1,1 )ffi-2.002&~4 CONNECT(3,2,1)ffi+2.002756 CONNECT( 3,3,1~ffi .0.00304~ CONNECT(3,4,1~-2.002542 CONNECT(3,5,1 )ffi .0.002976

BIAS(3,2)=+0.960393

CONNECT(3,1,2)ffi-0.153159 CONNECT(3,2,2)ffi-2.031300 CONNECT(3,3,2)ffi+0,~29858 CONNECT(3,4,2)ffi+I.9$5010 CONNECT(3,5,2)=-I.952103

BIAS(3,3)=+l.001980

CON NECT(3,1,3)ffi .2.004092 CONNECT(3,2,3)~+Z.001999 CONNECT(3,3,3)~-0.000219 CONNECT(3,4,3)ffi .2.003804 CONNECT(3,5,3)ffi -0.001123

BIAS(3,4)ffi+0.9601006

CONNECT(3,1,4)ffi-0.153642 CONNECT(3,2,4)ffi-2.031553 CONNECT(3,3,4)=+0.230808 CONNECT(3,4,4)ffi+I.954586 CONNECT(3,5,4)ffi.l.951480

BIAS(3,5~ffi+I.188510

CONNECT(3,1,5)=-0.008134 CONNECT(3,2,5)=+0.004704 CONNECT(3,3,$)~.2.182041 CONNECT(3,4,5)=+0.003232 CONNECT(3,5,5)ffi-0.186495

BIAS(3,6)ffi+0.9613197

CONNECT(3,1,6)=-0.151627 CONNECT(3,2,6)=-2.030494 CONNECT(3,3,6)ffi+0.2268461 CONNECT(3,4,6)=+l.956355 CONNECT(3,5,6)=.1.954078

BIAS(3,7)ffi+l.002625

CONNECT(3,1,7)ffi.2.003027 CONNECT(3,2,7)=+2.002560 CONNECT(3,3,7)ffi-0.002313 CON NE CT(3,4,7)=.2.002870 CONNE CT(3,5,7)ffi.0.002495

BIAS(3,8)ffi+0.960731

CONNECT(3,1,8)ffi-0.152602 CONNECT(3,2,8)--2.031006 CON NECT(3,3,g)ffi+0.228762 CONNECT(3,4,8)ffi+I.955498 CONNECT(3,5,8),,.I.952821

BIAS(3,9)=+l.003367

CONNECT(3,1,9)ffi-2.00180Z CONNECT(3,2,9)ffi+2.003204 CONNECT(3,3,9)ffi-0.004722 CONNECT(3,4,9)=.2.001793 CONNECT(5,5,9)=-0.004076

"Virtual Input" Phenomena

NOMENCLATURE

act/

wij

xj ¢~i F

CONNECT ( l, j , k)

BIAS (l, k)

unit layer l

activation of the ith processing unit

connection weight between units i and j

input from the j t h unit bias applied to unit i activation or transfer function of

a unit connection weight between

units j and k in the (l - 1 )th and lth layers, respectively

bias applied to the unit k in the lth layer

the lth layer of processing units

connection layer l

PL1

PL2

virtual input

death

death

65

the layer of weights between unit layers l and l + 1

probability that a weight con- nection is pruned from con- nection layer 1

probability that a weight con- nection is pruned from con- nection layer 2

false indication that a training vector has been applied to a network via the appearance of a training output vector

process of pruning all connec- tion weights from a network

state of all connection weights pruned from the network