Artificial neural network performance degradation under network damage: Stuck-at faults

8
Abstract—Biological neural networks are spectacularly more energy efficient than currently available man-made, transistor- based information processing units. Additionally, biological systems do not suffer catastrophic failures when subjected to physical damage, but experience proportional performance degradation. Hardware neural networks promise great advantages in information processing tasks that are inherently parallel or are deployed in an environment where the processing unit might be susceptible to physical damage. This paper, intended for hardware neural network applications, presents analysis of performance degradation of various architectures of artificial neural networks when subjected to 'stuck-at-0' and 'stuck-at-1' faults. This study aims to determine if a fixed number of neurons should be kept in a single or multiple hidden layers. Faults are administered to input and hidden layer(s) and analysis of unoptimized and optimized, feedforward and recurrent networks, trained with uncorrelated and correlated data sets is conducted. A comparison of networks with single, dual, triple, and quadruple hidden layers is quantified. The main finding is that 'stuck-at-0' faults administered to input layer result in least performance degradation in networks with multiple hidden layers. However, for 'stuck-at-0' faults occurring to cells in hidden layer(s), the architecture that sustains the least damage is that of a single hidden layer. When 'stuck-at-1' errors are applied to either input or hidden layers, the network(s) that offer the most resilience are those with multiple hidden layers. The study suggests that hardware neural network architecture should be chosen based on the most likely type of damage that the system may be subjected to, namely damage to sensors or the neural network itself. I. INTRODUCTION fruit fly requires μW of power for such complex tasks as flight control, food search, and avoidance of predators. IBM’s Blue Gene/P (BG/P) super computer, fit with 147, 456 CPUs and 144 TB of main memory, programmed to perform sub-cortical simulations, consumes about 1.3 megawatts of electricity [1]. Majority of today's computers are built according to the idea proposed by Alan Touring, of sequential execution of universal instructions, with Von Neumann architecture that separates the information and the information processing [2]. Transistors, being the cornerstone of today’s computers, have undergone spectacular advances, such as reduction in size and increase in speed. However, because these systems rely on a Manuscript received Feb 1, 2011. Work supported in part by NSF Safety, Security, and Rescue Research Center and CNS-0923518 grants. R. A. Nawrocki is with the Department of Computer Engineering at University of Denver, Denver, CO 80208 USA phone: 303-871-3266; fax: 303-871-4405; e-mail: [email protected]. R. M. Voyles is with the Department of Computer Engineering at University of Denver, Denver, CO 20808 USA. (e-mail: [email protected]). deterministic approach, large safety margins in the operation of the transistor result in high power consumption. Biological systems, on the other hand, rely on indeterministic approach with massive parallelism of simple processing units, called neurons. This approach results in spectacular power efficiency, generic intelligence, and resilience to unit failure [3]. Artificial Neural Networks (ANNs) are mathematical constructs that are modeled after biological information processing element, the brain. ANNS are often used to model non-linear systems where the relationship is not explicitly known or difficult to determine analytically. Their applications include examples such as image processing [4], and hand-writing recognition [5] where the system is expected to learn over time. In majority of systems that employ ANNs, they are emulated on serial machines (CPUs) hence they usually suffer from the same drawbacks as the system they are trying to replace. A number of possibilities of realizing hardware-based neural networks have been proposed [6]. Most of them are transistor based and employ a number of resistors and capacitors for the purposes of approximating the sigmoidal transfer function. The last few years has seen a number of proposals aiming to construct physical neural networks that are based on synaptic behavior of a memristor (passive two- terminal circuit element with fluctuating resistance that is a function of the history of the current through and voltage across the device). A device made of molecules and nanoparticles, termed Nanoparticle Organic Memory Field- Effect Transistor (NOMFET), was shown to exhibit a behavior similar to a biological spiking synapse [7]. A nanoscale, silicon-based memristor [1] was demonstrated to possess the time-dependent characteristic of a biological spiking synapse. MoNETA (MOdular Neural Exploring Traveling Agent) is Boston University's response to DARPA's SyNAPSE, a proposal that aims to create neuromorphic devices that can be scaled to biological levels (cns.bu.edu). We have also demonstrated memristor-based neuromorphic architecture, termed Synthetic Neural Network, that aims to mimic the behavior of an artificial neural network [8],[9]. To this day very few studies have been conducted that monitor performance degradation of a neural network subjected to neural faults (such as a physical damage or manufacturing faults). The first in-depth study was done by Protzel [10] who investigated recurrent NNs. His work concentrated on the difference between ‘stuck-at-0’ and ‘stuck-at-1’ faults. Tchernev [11] focused on finding the optimal size of a feedforward NN, trained on classification problems, which exhibited the most fault tolerance. We have Artificial Neural Network Performance Degradation Under Network Damage: Stuck-At Faults Robert A. Nawrocki, Richard M. Voyles A Proceedings of International Joint Conference on Neural Networks, San Jose, California, USA, July 31 – August 5, 2011 978-1-4244-9637-2/11/$26.00 ©2011 IEEE 442

Transcript of Artificial neural network performance degradation under network damage: Stuck-at faults

Abstract—Biological neural networks are spectacularly more energy efficient than currently available man-made, transistor-based information processing units. Additionally, biological systems do not suffer catastrophic failures when subjected to physical damage, but experience proportional performance degradation. Hardware neural networks promise great advantages in information processing tasks that are inherently parallel or are deployed in an environment where the processing unit might be susceptible to physical damage. This paper, intended for hardware neural network applications, presents analysis of performance degradation of various architectures of artificial neural networks when subjected to 'stuck-at-0' and 'stuck-at-1' faults. This study aims to determine if a fixed number of neurons should be kept in a single or multiple hidden layers. Faults are administered to input and hidden layer(s) and analysis of unoptimized and optimized, feedforward and recurrent networks, trained with uncorrelated and correlated data sets is conducted. A comparison of networks with single, dual, triple, and quadruple hidden layers is quantified. The main finding is that 'stuck-at-0' faults administered to input layer result in least performance degradation in networks with multiple hidden layers. However, for 'stuck-at-0' faults occurring to cells in hidden layer(s), the architecture that sustains the least damage is that of a single hidden layer. When 'stuck-at-1' errors are applied to either input or hidden layers, the network(s) that offer the most resilience are those with multiple hidden layers. The study suggests that hardware neural network architecture should be chosen based on the most likely type of damage that the system may be subjected to, namely damage to sensors or the neural network itself.

I. INTRODUCTION

fruit fly requires µW of power for such complex tasks as flight control, food search, and avoidance of predators. IBM’s Blue Gene/P (BG/P) super computer,

fit with 147, 456 CPUs and 144 TB of main memory, programmed to perform sub-cortical simulations, consumes about 1.3 megawatts of electricity [1]. Majority of today's computers are built according to the idea proposed by Alan Touring, of sequential execution of universal instructions, with Von Neumann architecture that separates the information and the information processing [2]. Transistors, being the cornerstone of today’s computers, have undergone spectacular advances, such as reduction in size and increase in speed. However, because these systems rely on a

Manuscript received Feb 1, 2011. Work supported in part by NSF Safety, Security, and Rescue Research Center and CNS-0923518 grants.

R. A. Nawrocki is with the Department of Computer Engineering at University of Denver, Denver, CO 80208 USA phone: 303-871-3266; fax: 303-871-4405; e-mail: [email protected].

R. M. Voyles is with the Department of Computer Engineering at University of Denver, Denver, CO 20808 USA. (e-mail: [email protected]).

deterministic approach, large safety margins in the operation of the transistor result in high power consumption. Biological systems, on the other hand, rely on indeterministic approach with massive parallelism of simple processing units, called neurons. This approach results in spectacular power efficiency, generic intelligence, and resilience to unit failure [3].

Artificial Neural Networks (ANNs) are mathematical constructs that are modeled after biological information processing element, the brain. ANNS are often used to model non-linear systems where the relationship is not explicitly known or difficult to determine analytically. Their applications include examples such as image processing [4], and hand-writing recognition [5] where the system is expected to learn over time. In majority of systems that employ ANNs, they are emulated on serial machines (CPUs) hence they usually suffer from the same drawbacks as the system they are trying to replace.

A number of possibilities of realizing hardware-based neural networks have been proposed [6]. Most of them are transistor based and employ a number of resistors and capacitors for the purposes of approximating the sigmoidal transfer function. The last few years has seen a number of proposals aiming to construct physical neural networks that are based on synaptic behavior of a memristor (passive two-terminal circuit element with fluctuating resistance that is a function of the history of the current through and voltage across the device). A device made of molecules and nanoparticles, termed Nanoparticle Organic Memory Field-Effect Transistor (NOMFET), was shown to exhibit a behavior similar to a biological spiking synapse [7]. A nanoscale, silicon-based memristor [1] was demonstrated to possess the time-dependent characteristic of a biological spiking synapse. MoNETA (MOdular Neural Exploring Traveling Agent) is Boston University's response to DARPA's SyNAPSE, a proposal that aims to create neuromorphic devices that can be scaled to biological levels (cns.bu.edu). We have also demonstrated memristor-based neuromorphic architecture, termed Synthetic Neural Network, that aims to mimic the behavior of an artificial neural network [8],[9].

To this day very few studies have been conducted that monitor performance degradation of a neural network subjected to neural faults (such as a physical damage or manufacturing faults). The first in-depth study was done by Protzel [10] who investigated recurrent NNs. His work concentrated on the difference between ‘stuck-at-0’ and ‘stuck-at-1’ faults. Tchernev [11] focused on finding the optimal size of a feedforward NN, trained on classification problems, which exhibited the most fault tolerance. We have

Artificial Neural Network Performance Degradation Under Network Damage: Stuck-At Faults

Robert A. Nawrocki, Richard M. Voyles

A

Proceedings of International Joint Conference on Neural Networks, San Jose, California, USA, July 31 – August 5, 2011

978-1-4244-9637-2/11/$26.00 ©2011 IEEE 442

conducted preliminary study [12] thaunderstand the relationship between overallnetwork's resilience to a physical damadamage to a neuron is equivalent to the ‘stuThis present study investigates the ‘stuck-atequivalent to a neuron being constantly ONcould result from neuron being shorted or error.

In the light of realizing hardware neural the aforementioned proposals, the possibilitgroup of neurons becoming un-operationaldamage or manufacturing error, is real. Thintended to aid in determining the most optof neural networks constructed in hardware.

A. Background In ANNs the neurons, or cells, are

arranged into three separate layers: inpoutput. Input cells provide the network wi(an example being robotic sensors). Outputresult of mapping (robot's wheels or arms application). Hidden neurons are responsibcomputation or mapping.

When constructing an artificial neural netof necessary neurons needs to be determconstructed with too few neurons canninherent complexity of the data and will conot at all. However, providing too manresults in overfitting the data [13]. The topnumber of hidden layers that the neurons wialso needs to be decided. It is generally acctwo hidden layers are sufficient for most app

The aim of our study was to answequestion: given a fixed number of neuronarrange them in a single hidden layer, hidden layers. We have demonstrated [12benefits of separating a fixed number of hidmultiple hidden layers. We showed that, dlocation of ‘stuck-at-0’ faults (equivalent toor removal of a neuron) applied either to hidden neurons, there are preferences as to should a fixed number of neurons be divided

The aim of this study was to extend analysis to include the 'stuck-at-1' faultsuniversal conclusion about the network rfaults. We aimed to compare the performarchitectures when faults occur in the input of the faults in the hidden layer(s). Similarly0' faults investigation, Monte Carlo analysioptimized and unoptimized networks (explsection), trained using highly correlauncorrelated data were examined in order results. Additionally, these four cases wereboth feedforward and recurrent networks.

II. METHODS For the purposes of comparison and gen

data sets, network architectures, and optimiused in this study were kept the same as dur

at attempted to l architecture and age. A physical

uck-at-0’ scenario. t-1’ fault which is N. Such an error a manufacturing

network, such as ty of a single or a l due to physical

his study, then, is timal architecture .

most commonly put, hidden, and ith the input data t cells provide the

being a possible ble for the actual

twork the number mined. Networks

not represent the onverge slowly or ny neurons often pology, that is the ill be arranged in, cepted that one or plications [14]. er the following ns, is it better to or into multiple 2] that there are dden neurons into depending on the

o physical damage input neurons or how many layers d into.

the ‘stuck-at-0’ s and gain more resilience to cell

mance of network layer against that y to our 'stuck-at-s of four cases of ained in the next ted and highly to generalize the

e inspected using

nerality, all of the zation algorithms ring the 'stuck-at-

0' investigation [12].

A. Software

The analysis presented in this pnetworks obtained in MATLABToolbox version 4.0.6). Both netwrecurrent, were trained using retrainrp algorithm and the activatitansig ( 1) for hidden lfor output layer [15]. During the prwas calculated according to the follo| | || |Equation 1. Equation used to calculate the err

where p was the value that the netwothe output of the network after trainprogression analysis).

B. Network Architecture

Four different network architectlayers) were examined. In the case owith both correlated and uncorrelatnumber of cells in hidden layer(s) wthe cells were arranged into a singleinto two layers with 50 cells in eathree layers with 33, 34 and 33 cellsseparated into four layers with 25 Figure 1 for graphical representation

Figure 1. Various arrangements of hidd(unoptimized networks): all cells in a singleinto two hidden layers (50/50), cells sp(33/34/33), and cells split into four hidden lay

In the case of optimized networneurons in all hidden layers was notthe optimization algorithm (descriand, depending on the number of number of neurons varied. In the c(feedforward/recurrent) the number a single hidden layer, 43/47 for duatriple hidden layer, and 35/38 for The number of neurons across mualways kept the same for all of thcorrelated data (feedforward/recnumber of neurons was used: 15/17 13/15 for dual hidden layer, 12/14

paper was performed on B™ (Neural Network works, feedforward and esilient backpropagation on functions used were layer and purelin (a = n) rogression of faults, error owing formula: |

ror.

ork trained on, and q was ning (and during the fault

ures (number of hidden of unoptimized networks, ted training set, the total

was kept fixed at 100 and e hidden layer, separated ach layer, separated into s in consecutive layers, or

cells in each layer. See n.

den layers were investigated e hidden layer (100), cells split plit into three hidden layers yers (25/25/25/25).

rks the total number of t kept constant as, due to bed here: Optimization) hidden layers, the total

case of uncorrelated data of neurons was 94/98 for

al hidden layer, 41/45 for quadruple hidden layer.

ultiple hidden layers was he trials. In the case of

current) the following for a single hidden layer,

4 for triple hidden layer,

443

and 23/24 for quadruple hidden layer. The higher number of cells in the case of four hidden layers was due to the fact that with a smaller number of neurons the network failed to train to an acceptably low error. For more detailed number of neurons used, see Table 1.

Table 1 Number of neurons used with different network architectures (number of hidden layers used).

Number of hidden layers 1 2 3 4

un- optimized

& optimized

(feedforward & recurrent)

un- correlated

input 30 30 30 30 hidden 100 50 33 25 output 10 10 10 10

correlated input 34 34 34 34 hidden 100 50 33 25 output 6 6 6 6

optimized (feedforward)

un- correlated

input 28 28 28 28 hidden 94 43 41 35 output 10 10 10 10

correlated input 13 13 13 13 hidden 15 13 12 23 output 5 5 5 5

optimized (recurrent)

un- correlated

input 28 28 28 28 hidden 98 47 45 38 output 10 10 10 10

correlated input 13 13 13 13 hidden 17 15 14 24 output 5 5 5 5

Two different network types were used with all of the

aforementioned hidden layer division; feedforward, employing no feedback, and recurrent, with feedback. All the networks were trained using backpropagation training algorithm (discussed in the previous subsection). Both, feedforward and recurrent networks were used to investigate the aforementioned uncorrelated and correlated data sets with unoptimized and optimized networks. The feedback path for recurrent network was only from the output layer to the first hidden layer (this had to be explicitly coded into MATLAB as, by default recurrent architecture contains self-feedback only) – see Figure 2 for details.

Figure 2. (A) No use of feedback (feedforward) and the use of feedback (recurrent) architectures. The feedback is only employed from the last hidden layer to the first hidden layer. (B) Snapshot from Matlab simulation illustrating a recurrent network with two hidden layers (50 cells each) with a feedback from the output to the 1st hidden layer.

C. Data Sets Two different data sets were used. The correlated data set

[16] consisted of 404 test vectors with 34 inputs and 6 outputs. This data set was randomly divided into a training set (S1) of 304 vectors and test set (S2) of 100 vectors (S1∩ S2 = Ø).

Uncorrelated data set consisted of 1500 test vectors each with 30 inputs and 10 outputs: 450,000 numbers (30*10*1500), in the range between 0 and 1, were generated using random number generator in C language (C language has a pseudo random capacity, hence a program was written where the generator was tied to the clock when the numbers were generated). The numbers were then mixed before they were used as a training set in order to achieve the most randomized training. The purpose was to achieve a highly uncorrelated data and the network would learn chaos. The testing set (S2) was equal to the training set (S1) or (S1= S2).

D. Procedure

We conducted 50 trials with new randomized connection weights (evenly spanning the [0,1] interval) to obtain a reliable average. The network was trained to acceptable level (the error was below 0.1%) and then the ‘stuck-at-1’ faults were administered, either to the input layer or to the hidden layer(s). Because, with the activation function used (tansig(10)=1), the ‘stuck-at-1’ fault is analogous to a neuron constantly producing a maximum output of ‘1’ (see Figure 3 for illustration), in the case of input-layer faults, the input vector of affected neuron was modified to a value of ‘10’ (equivalent with the neuron generating a constant value of ‘10’). When the hidden layer(s) were subjected to ‘stuck-at-1’ faults, all of the connection weights of affected neuron were set to ‘10’, which resulted in the neuron generating an output equal to ‘1’.

Figure 3. (A) Plot of tansig activation function used for the experiment, demonstrating perpetrating the ‘stuck-at-1’ fault by either fixing the inputs to be equal to ‘10’ (faults to input layer) or all the connection weights of the affected neuron to ‘10’ (faults to hidden layer(s)). (B) With all the connection weights set to ‘10’, the output of a neuron will be ‘1’ for all input values, resulting in the ‘stuck-at-1’ condition.

Faults in neurons of an individual layer (input or hidden layer(s)) were carried out in a random fashion. Fault to the hidden layer(s) was conducted consecutively in successive layers (i.e. for a network with three hidden layers a damage in the first hidden layer was followed by a damage in the

-10 -5 0 5 10-1

0

1

A

B

444

second layer, followed by damage in followed by a fault in the first layer, etc.) unnumber of cells was affected.

The following outlines successive faultsnetwork with three hidden layers:

• remove 1 cell from Hidden 1 (1st ce• remove 1 cell from Hidden 2 (2nd c• remove 1 cell from Hidden 3 (3rd c• remove 1 cell from Hidden 1 (4th c• … • remove 1 cell from Hidden 2 (14th • remove 1 cell from Hidden 3 (15th

with Figure 4 demonstrating the progressuccessive layers.

Figure 4. Graphical representation of the fault patternthe individual cells were affected. Figures A tsequentially affected individual cells from consecutive

the third layer, ntil the maximum

s perpetrated to a

ell removed) cell removed) cell removed) ell removed)

cell removed) cell removed)

sion of faults in

n; the order in which through E represent e layers.

E. Optimization There is a wide body of evidence

number of cells in a hidden layernumber of training examples [13], idata: there is a minimum number ofclassify a training set, with additituning. Removal of 'critical' cells decrease of performance while a cells results in either slight improvement of performance (optimization, called pruning, is banetwork that only contains necesconsidered to be optimized. Becauscomprehensiveness of observatioinvestigate both optimized and unop

A number of optimization algordivided into two broad categodestructive optimizations [17],[18constructive optimization algorithmnumber of cells with additional throughout training. Destructive pruning, start with larger-than-neextraneous cells pruned upon finishi

A simple algorithm for pruningSuzuki [19] was used. This method of the removed neuron on the erretraining of the network. The foalgorithm:

(1) train the network until the error(2) virtually remove a unit and cal(3) repeat (2) for all units in a laye(4) remove the unit that resulted in(5) retrain the network (6) if the retrained network conver

satisfactorily small) stop the alg(2) and repeat the process

The algorithm used to constructivis called Cascade-Correlation Leaand was first proposed by FahlmanTeng [20]. The algorithm, modifcontrol, consists of three phasTRAIN_INPUT and SIZE_CHECsummary of the algorithm:

• TRAIN_OUTPUT trains nesignificant improvement in

• TRAIN_INPUT adds a singconnectivity

• SIZE_CHECK checks if mis reached

Before the networks were optimizand correlated data patterns were freduced in size using Principal Comwith 2% reduction (Gaussian distcalculated, followed by removal ooutside of the 2% range). This proout in MATLAB resulting in reduct

A

B

C

D

E

e which suggests that the r should not exceed the in order not to overfit the f cells needed to coarsely onal cells used for fine results in a significant

removal of 'non-critical' degradation or even

(one form of ANNs sed on cell removal). A sary or critical cells is se this research aimed at on the goal was to ptimized networks. rithms exist that can be ories: constructive and 8]. Generally speaking

ms start with a minimum cells added as needed algorithms, also called

ecessary networks with ing training. g networks proposed by is based on the influence

rror and the consequent llowing summarizes the

r is satisfactorily small culate the error

er n the smallest error

rges (error is gorithm, else go back to

vely optimize the network arning Algorithm (CAS) n [17], and elaborated by fied for maximum size ses: TRAIN_OUTPUT, K. The following is a

etwork until there is no n error gle cell, with full-mesh

maximum allowable size

zed, both the uncorrelated first normalized and then mponent Analysis (PCA) tribution of the data is of components that fall ocedure was also carried ion of correlated input of

445

34 variables to 13 variables and output fromThe uncorrelated data was reduced from 3in the input vector and the size of the outputunchanged as 10 variables. 50 constdestructive trials were conducted in ordaverage number of cells needed (the rationthe number of cells per layer constant in bonon-optimized networks for the purposes The network sizes can be found in Table 1.

As already mentioned in the case of cotraining set and testing set were differenuncorrelated set, the training set was equal t

III. RESULTS AND DISCUSSIO

A. Stuck-At-0

In all of the 'stuck-at-0' cases, as already observed that for damage incurred to increasing the number of layers results ierror (the network that experiences the ledegradation is the network with a single hiFigure 5A and Figure 6A. However, woccurs to the input layer, increasing the nresults in decreasing the error (the networkthe least performance degradation is the nmost hidden layers or, in this case, four hidFigure 5B and Figure 6B. Increasing the nlayers usually results in increasing theNevertheless, in the face of 'stuck-at-0' faulgreater number of hidden layers offer bettenetworks with fewer numbers of hidden lay

Figure 5. ‘Stuck-at-0’ fault in NN: recurrent architnetwork, UNcorrelated data. (A) With faults occurrinnetwork with the lowest number of hidden layers (singthe least performance degradation. (B) With faults occnetwork with the highest number of hidden layers (fouthe least performance degradation.

0

0.05

0.1

0.15

0.2

0.25

0 1 2 3 4 5 6 7 8 9 10 1

Erro

r

Cells Removed

Stuck-at-0 faults in HIDDEN layrecurrent, UNoptimized, UNcorre

1 hidden layer 2 hidden layers 3 hidden lay

0

0.05

0.1

0.15

0.2

0 1 2 3 4 5

Erro

r

Cells Removed

Stuck-at-0 faults in INPUT layer:recurrent, UNoptimized, UNcorrela

1 hidden layer 2 hidden layers 3 hidden lay

m 6 to 5 variables. 0 to 28 variables t vector remained tructive and 50 er to obtain the nale was to keep

oth optimized and of comparison).

orrelated data the nt while with the to the testing set.

ONS

reported [12], we hidden layer(s)

in increasing the east performance dden layer) – see hen the damage number of layers k that experiences network with the dden layers) – see number of hidden e training time. lts, networks with er resilience than ers.

tecture, UNoptimized ng in Hidden layer(s), gle layer) experiences curring in Input layer, ur layers) experiences

Figure 6. ‘Stuck-at-0’ fault in NN: feedforwnetwork, UNcorrelated data. (A) With faultnetwork with the lowest number of hidden lathe least performance degradation. (B) With network with the highest number of hidden lthe least performance degradation.

Figure 7 graphically demonstra‘stuck-at-0’ experiment, with preferrfault is administered to hidden (A) o

Figure 7. (A) When ‘stuck-at-0’ faults are athe network that offers the most resilience isof hidden layer(s). (B) When ‘stuck-at-0’ falayer, the network that offers the most resinumber of hidden layer(s).

An additional observation is parameters, such as the data type orecurrent networks offer an averresilience against ‘stuck-at-0’ faultFigure 5A and Figure 6A; with recurrent network with 4 hiddperformance degradation of about

11 12 13 14 15

yer:elatedyers 4 hidden layers

6 7

:ated

yers 4 hidden layers

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8

Erro

r

Cells Remo

Stuck-at-0 faults in HIDDfeedforward, UNoptimized,

1 hidden layer 2 hidden layers

0

0.1

0.2

0 1 2 3 4

Erro

rCells Rem

Stuck-at-0 faults in INPfeedforward, UNoptimized,

1 hidden layer 2 hidden layers

A

B

ward architecture, UNoptimized ts occurring in Hidden layer(s), ayers (single layer) experiences faults occurring in Input layer,

layers (four layers) experiences

ates the findings of the red architecture when the

or input layers (B).

administered to hidden layer(s), s one with the smallest number faults are administered to input ilience is one with the largest

that, when all other or network optimization, rage of 23% improved ts. This can be seen in 15 neurons affected, a

den layers experiences 23% compared to about

9 10 11 12 13 14 15ved

DEN layer:, UNcorrelated

3 hidden layers 4 hidden layers

4 5 6 7moved

PUT layer:, UNcorrelated

3 hidden layers 4 hidden layers

A

B

A

B

446

35% when the network architecture is feedforward. This represents about 34% improvement. Comparisons of Figure 5B and Figure 6B reveals a difference between recurrent and feedforward architectures of about 20%.

B. Stuck-At-1

In contrast to the findings of the ‘stuck-at-0’ fault, the 'stuck-at-1' error does not result in a difference of performance dependent on the fault location. Regardless of whether the fault occurs, in the input layer or in the hidden layer(s), the network that experiences the least performance degradation is the network where the fixed number of neurons was split into the highest number of hidden layers. Figure 8 and Figure 9 demonstrate this finding. It should be noted that there is usually a noticeable improvement in resilience (decrease of performance degradation) when the number of hidden layers if increased from one to two and, to a lesser degree, from two to three (as can be seen in Figure 8B and Figure 9B). However, a further increase from three to four hidden layers either results in a minimal improvement of network resilience or, temporarily, may even result in a decrease of resilience (increase of the average error). This can be seen in Figure 8A when 4 and 8 cells are removed. Additionally, while Figure 8 and Figure 9 show the average increase of error, individual trials often result in errors varying widely (as much as 250%) from the average performance. Figure 10 graphically demonstrates the findings of ‘stuck-at-1’ experiment indicating departure of the findings of ‘stuck-at-0’ faults shown in Figure 7.

Figure 8. 'Stuck-at-1' fault in NN: recurrent architecture, optimized network, correlated data. (A) With faults occurring in Hidden layer(s), network with the highest number of hidden layers (four layers) experiences the least performance degradation. (B) With faults occurring in Input layer, network with the highest number of hidden layers (four layers) experiences the least performance degradation.

Protzel et. al. [10] noted that, with Traveling Salesman Problem (TSP) and the Assignment Problem (AP), the performance degradation of a network subjected to ‘stuck-at-0’ faults is negligibly small, while the ‘stuck-at-1’ fault results in a significant decrease of network’s fidelity. This is greatly in line with our findings; fault administered to input layer, with feedforward architecture, unoptimized network, and uncorrelated data, with a single hidden layer, resulted in significantly lower error (0.24% with 7 cells removed) when the type of fault was ‘stuck-at-0’ (see Figure 6B), while networks with the ‘stuck-at-1’ error suffered an error about one order of magnitude higher (2.11% with 7 cells removed) (see Figure 9B). The same observation can be made when faults are administered to hidden layer(s), as can be seen in Figure 6A and Figure 9A.

Figure 9. 'Stuck-at-1' fault in NN: feedforward architecture, UNoptimized network, UNcorrelated data. (A) With faults occurring in Hidden layer(s), network with the highest number of hidden layers (four layers) experiences the least performance degradation. (B) With faults occurring in Input layer, network with the highest number of hidden layers (four layers) experiences the least performance degradation.

A likely explanation of the observed trend is that when the faults occur, regardless on the location, the error is propagated throughout the entirety of the network, resulting in "cushioning" effect; the closer the faults occur to the output of the network (single hidden layer is directly connected to the output), the less of the “cushioning” will the network experience and the greater the performance degradation. When the network contains only a single hidden layer, the error, measured at the output, travels only through a single layer (when the fault is administered to input layer) or through no layers (when the fault is administered to hidden layer). However, when the network consists of multiple hidden layers, because of how the fault was being administered to the hidden layers (sequentially),

0

0.5

1

1.5

2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Erro

r

Cells Removed

Stuck-at-1 faults in HIDDEN layer:recurrent, optimized, correlated

1 hidden layer 2 hidden layers 3 hidden layers 4 hidden layers

0

0.5

1

1.5

2

0 1 2 3 4 5 6 7

Erro

r

Cells Removed

Stuck-at-1 faults in INPUT layer:recurrent, optimized, correlated

1 hidden layer 2 hidden layers 3 hidden layers 4 hidden layers

0

0.5

1

1.5

2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Erro

r

Cells Removed

Stuck-at-1 faults in HIDDEN layer:feedforward, UNoptimized, UNcorrelated

1 hidden layer 2 hidden layers 3 hidden layers 4 hidden layers

0

0.5

1

1.5

2

0 1 2 3 4 5 6 7

Erro

r

Cells Removed

Stuck-at-1 faults in INPUT layer:feedforward, UNoptimized, UNcorrelated

1 hidden layer 2 hidden layers 3 hidden layers 4 hidden layers

B

A

A

B

447

the error will at least propagate through onresulting in network “cushioning” the errespecially seen when the error occurs in the

Figure 10. When ‘stuck-at-1’ faults are administered tor input layer (B), the network that offers the most resigreatest number of hidden layer(s). This finding is in findings of ‘stuck-at-0’ experiments, shown in Figure 7

This can also be explained by the analvalues propagated to an output neuron. Befault means that a neuron produces an ouwhen the error occurs in the network wihidden layer, this 'high' or maximum immediately sent to the output neuronsignificant difference from the original However, when the fault occurs in a netwohidden layers, for instance with four hiddenor maximum output of a hidden neuron woccur in 1st hidden layer, resulting in ththrough another 3 hidden layers before thisthe output layer. This signal, then, has threeattenuated (if the connection weight ofsufficiently low) before being received at th

As already mentioned, with ‘stuck-identified improved resilience of recurrecompared to feedforward network architecobserve any meaningful relationship betwand recurrent networks when the error thasubject to was the 'stuck-at-1' fault.

1) Fault in individual hidden layers

To further verify this relationship, we coexperiment where we kept the number of hiat four. We then administered the 'sexclusively to individual hidden layedemonstrates the findings. When the faultshidden layer (the furthest away from the oerror is significantly smaller than if the faulhidden layer (directly providing the input vin the output layer). This, we believeassumption about the "cushioning" effect olayers.

For a full set of graphs relating tdegradation of various neural architectures

ne of more layers ror. This can be input layer.

to hidden layer(s) (A) ilience is one with the a stark contrast to the 7.

lysis of the input ecause 'stuck-at-1' utput equal to '1', ith only a single

value will be n resulting in a

(desired) value. ork with multiple

n layers, this 'high' will in 25% cases he error traveling s signal arrives at e chances of being f that neuron is he output layer. at-0’ faults we ent networks as

cture. We did not ween feedforward at the network is

onducted another idden layers fixed stuck-at-1' faults ers. Figure 11 s occur in the 1st output layer), the lts occur in the 4th values to neurons e, validates our f multiple hidden

the performance s, data types, and

optimization when faults are appllayers, please refer to [21].

Figure 11. The ‘stuck-at-1’ fault is adminishidden layers: either only to the 1st, only to tto the 4th hidden layer.

IV. CONCLUSION AND A

Biological neural networks appebenefits over the conventional, deinformation processing. Hardware-bnetworks, promise similar advantanetworks over conventional informalower power consumption and without additional redundant commany research groups believe that tsystems, such that could potentiacounterparts, can only be realized ivia the use of memristors acting asstudy aimed to understand the efnetwork architecture on the resiAnother words, we aimed to understneurons arranged in a single or offered better resilience against two 0' and 'stuck-at-1' faults.

The conclusion of 'stuck-at-0' anthat network with fewer number of better resilience to faults occurriHowever, when the fault is adminisit is the network with the greater nthat experiences the least performsimulation of 'stuck-at-1' faults conclusion, that which applies to(feedforward and recurrent), regardwould occur: arranging a fixed nugreater number of hidden layers reperformance degradation or increase

A fault in input layer is equivasensors, while a fault in hidden lafault in the processor itself. A robot certain functions based on sensorymost likely utilize a serial comput

0

0.5

1

1.5

2

0 1 2 3 4 5 6 7 8 9

Erro

r

Cells Rem

Fault applied to individ

1 hidden 2 hidden

B

A

lied to input or hidden

stered exclusively to individual the 2nd, only to the 3rd, or only

APPLICATIONS

ar to offer a number of eterministic approach to based, or tangible neural ges of biological neural ation processing, namely built-in fault tolerance mponents. Additionally, true artificial intelligence ally rival the biological n hardware, for example

s artificial synapses. This ffects of internal neural lience of the network. tand if a fixed number of multiple hidden layers types of faults; 'stuck-at-

nalysis points to the fact hidden layers provides a

ing in the input layer. stered to hidden neurons, number of hidden layers mance degradation. The

results in a uniform o all types of networks dless of where the error umber of neurons into a esults in decrease of the e of network resilience. alent to a fault of input ayer(s) is analogous to a that is trained to perform

y readings will currently ting machine, a CPU, to

9 10 11 12 13 14 15 16oved

ual hidden layers

3 hidden 4 hidden

448

emulate the parallel processing environment of an artificial neural network. However, with the emergence of neuromorphic technologies based on memristive artificial synapses soon tangible neural networks employed to control physical systems will become a reality. Analysis of benefits of different neural architectures will become a necessity. This study suggests that the use of multiple hidden layers may offer an advantage in a form of resilience against physical damage or other manufacturing faults.

Biological neural networks are arranged into a highly modular array of many different layers (cell types) that are commonly controlled by yet other layers, or cell types [22]. During a progression of a disease or aging, when either cells deaths or cell mutations occur, the animal does not suffer a fatal error due to a complete brain malfunctioning but experiences a gradual loss of a skill or a function. This study might shed a light as to a reason for highly modular and inter-dependent, multi-layered biological neural architecture.

ACKNOWLEDGMENT We would like to thank Professor Sean E. Shaheen from

the Department of Physics and Astronomy at University of Denver for allowing the simulations to be conducted on computer(s) in his laboratory. We would also like to acknowledge Dr. Majid Shaalan who was the co-author of the previous, ‘stuck-at-0’ study.

REFERENCES [1] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, W. Lu,

“Nanoscale Memristor Device as Synapse in Neuromorphic Systems,” Nano Lett., 10 (4), pp 1297-1301, 2010.

[2] H. G. Cragon, “Computer Architecture and Implementation,” Cambridge University Press, pp 2 – 13, 2000.

[3] D. Fox, “Brain-Like Chip May solve Computers Big Problem: Energy,” Discover, Nov. 2009.

[4] P. Danchenko, F. Lifshits, I. Orion, S. Koren, A. D. Solomon, S. Mark, “NNIC – neural network image compressor for satellite positioning system,” ACTA Astronautica, Vol. 60, Issue 8-9, pp 622-630, 2007.

[5] S. Srihari, J. Collins, R. Srihari, H. Srinivasan, S. Shetty, J. Brutt-Griffler, “Automatic scoring of short handwritten essays in reading comprehension tests,” Artificial Intelligence, Vol. 172, Issue 2-3, pp 300-324, 2008.

[6] R. Genov, G. Cauwenberghs, “Dynamic MOS Sigmoid Array Folding Analog-to-Digital Conversion,” IEEE Trans. On Circuits and Systems-I: Reg Papers, Vol 51, No. 1, 2004.

[7] F. Alibart, S. Pleutin, D. Guerin, “An Organic Nanoparticle Transistor Behaving as a Biological Spiking Synapse,” Adv. Funct. Mater., 20, pp 330-337, 2010.

[8] R. A. Nawrocki, R. M. Voyles, S. E. Shaheen, “Simulating Hardware Neural Networks with Organic Memristors and Organic Field Effect Transistors,” in Intelligent Engineering Systems through Artificial Neural Networks, Vol., 20, 2010.

[9] R. A. Nawrocki, X. Yang, S. E. Shaheen, R. M. Voyles, “Amorphous Computational Material for a Soft Robot: Actuation and Cognition,” to appear in Proc. Of IEEE ICRA, 2011.

[10] P. W. Protzel, D. L. Palumbo, M. K. Arras, “Performance and Fault-Tolerance of Neural Networks for Optimization,” IEEE Transactions on Neural Networks, Vol 4, pp 600-614, 1993.

[11] E. B. Tchernev, R. G. Mulvaney, D. C. Phatak, “Investigating the fault tolerance of neural networks,” Neural Computation, Vol. 17, Issue 7, pp 1646-1664, 2005.

[12] R. A. Nawrocki, R. M. Voyles, M. Shaalan, “Monitoring Artificial Neural Network Performance Degradation Under Network Damage,”

in Intelligent Engineering Systems through Artificial Neural Networks, Vol., 20., 2010.

[13] S. Lawrence, C. L. Giles, A. C. Tsoi, “What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation,” Technical Report, UMIACS-TR-96-22 and CS-TR-3617, 1998.

[14] D. Ostafe, “Neural Network Hidden Layer Number Determination Using Pattern Recognition Techniques,” 2nd Romanian-Hungarian Joint Symposium on Applied computational Intelligence, Timisoara, Romania, 2005.

[15] http://www.mathworks.com/products/neuralnet/ (last accessed on 10 May 2010)

[16] http://www.ncrg.aston.ac.uk/NN/databases.html (last accessed on 7 Feb 2009)

[17] S. E. Fahlman, C. Lebiere, “The cascade-correlation learning architecture,” Advances in Neural Information Processing Systems 2, pp 524-532. Morgan Kaufman, 1990.

[18] H. Drucker, Y. L. Cun, “Improving Generalization Performance Using double Backpropagation”, IEEE Transactions on Neural Networks, Vol. 3, NO 6, 1992.

[19] K. Suzuki, I. Horiba, N. Sugie, “A Simple Neural Network Pruning Algorithm with Application to Filter Synthesis,” Neural Processing Letters 13, pp 43-53, 2001.

[20] C. C. Teng, B. W. Wah, “Automated Learning for Reducing the configuration of a Feed-Forward Neural Network,” IEEE Transactions on Neural Networks, Vol. 7, No. 5, 1996.

[21] R. A. Nawrocki, “Simulation, Application, and Resilience of an Organic Neuromorphic Architecture, Made with Organic Bistable Devices and Organic Field Effect Transistors,” MS Thesis, University of Denver, 2011.

[22] H. E. Atallah, M. J. Frank, R. C. O’Reilly, “Hippocampus, cortex, and basal ganglia: Insights from computational models of complementary learning systems,” Neurobiology of Learning and Memory, 82, pp 253-267, 2004.

449