A Study of the Effects of Transient Fault Injection into the VHDL Model of a Fault-Tolerant...

7
A Study of the Effects of Transient Fault Injection into the VHDL Model of a Fault-Tolerant Microcomputer System D. Gil, J. Gracia, J. C. Baraza, and P. J. Gil Grupo de Sistemas Tolerantes a Fallos (GSTF) Departamento de Informática de Sistemas y Computadores (DISCA) Universidad Politécnica de Valencia. Spain e-mail: {dgil, jgracia, jcbaraza, pgil}@disca.upv.es Abstract This work presents a campaign of fault injection to validate the Dependability of a fault tolerant microcom- puter system. The system is duplex with cold stand-by sparing, parity detection and a watchdog timer. The faults have been injected on a chip-level VHDL model, using an injection tool designed for this purpose. We have carried out a set of injection experiments (with 3000 injections each), injecting transient faults of types stuck-at, bit-flip, indetermination and delay on both the signals and vari- ables of the system, running two different workloads. We have analysed the pathology of the propagated errors, measured their latency, and calculated both detection and recovery coverage. For instance, system detection cover- ages (including non-effective errors) up to 98%, and sys- tem recovery coverage up to 94% have been obtained for short transient faults. 1. Introduction The fault injection is a technique of Fault-Tolerant Systems (FTSs) validation which is being increasingly consolidated and applied in a wide range of fields, and several automatic tools have been designed [1] [2]. The fault injection technique is defined in the following way [3]: Fault injection is the validation technique of the De- pendability of Fault-Tolerant Systems which consists of the accomplishment of controlled experiments where the observation of the system’s behaviour in presence of faults is induced explicitly by the writing introduction (injection) of faults in the system. The fault injection in the hardware of a system can be implemented within three main techniques. 1. Physical fault injection. It is accomplished at physical level, disturbing the hardware with parameters of the environment (heavy ions radiation, electromagnetic interferences, etc.) or modifying the value of the inte- grated circuits pins. 2. Software Implemented Fault injection (SWIFI). The objective of this technique, also called Fault Emula- tion, consists of reproducing at software level the er- rors that would have been produced upon occurring faults in the hardware. It is based on different practical types of injection, such as the modification of the memory data, or the mutation of the application soft- ware or the lowest service layers (at operating system level, for example). 3. Simulated fault injection. In this technique, the system under test is simulated in other computer system. The faults are induced altering the logical values during the simulation. This work is framed in the simulated fault injection, and particularly in the simulation of models based on the VHDL hardware description language. We have chosen this technique due fundamentally to: The growing interest of the simulated injection tech- niques [4], [5], [6], [7], [8], [9], [10], as a complement of the physical fault injection [11], [12], [13], [14], [15] (these have been traditionally more numerous and developed) and Fault Emulation (SWIFI) [16], [17], [18], [19], [20], [21]. The greatest advantage of this method over the previous ones is the Observability and Controllability of all the modelled components. The simulation can be accomplished in different abstraction levels. Another positive aspect of this technique is the possibility of carrying out the validation of the system during the design phase, before having the final prod- uct. 0-7695-0646-1/00 $10.00 ã 2000 IEEE

Transcript of A Study of the Effects of Transient Fault Injection into the VHDL Model of a Fault-Tolerant...

A Study of the Effects of Transient Fault Injection into the VHDL Model of aFault-Tolerant Microcomputer System∗

D. Gil, J. Gracia, J. C. Baraza, and P. J. GilGrupo de Sistemas Tolerantes a Fallos (GSTF)

Departamento de Informática de Sistemas y Computadores (DISCA)Universidad Politécnica de Valencia. Spain

e-mail: {dgil, jgracia, jcbaraza, pgil}@disca.upv.es

Abstract

This work presents a campaign of fault injection tovalidate the Dependability of a fault tolerant microcom-puter system. The system is duplex with cold stand-bysparing, parity detection and a watchdog timer. The faultshave been injected on a chip-level VHDL model, using aninjection tool designed for this purpose. We have carriedout a set of injection experiments (with 3000 injectionseach), injecting transient faults of types stuck-at, bit-flip,indetermination and delay on both the signals and vari-ables of the system, running two different workloads. Wehave analysed the pathology of the propagated errors,measured their latency, and calculated both detection andrecovery coverage. For instance, system detection cover-ages (including non-effective errors) up to 98%, and sys-tem recovery coverage up to 94% have been obtained forshort transient faults.

1. Introduction

The fault injection is a technique of Fault-TolerantSystems (FTSs) validation which is being increasinglyconsolidated and applied in a wide range of fields, andseveral automatic tools have been designed [1] [2]. Thefault injection technique is defined in the following way[3]:

Fault injection is the validation technique of the De-pendability of Fault-Tolerant Systems which consists ofthe accomplishment of controlled experiments where theobservation of the system’s behaviour in presence of faultsis induced explicitly by the writing introduction (injection)of faults in the system.

The fault injection in the hardware of a system can beimplemented within three main techniques.

1. Physical fault injection. It is accomplished at physicallevel, disturbing the hardware with parameters of theenvironment (heavy ions radiation, electromagneticinterferences, etc.) or modifying the value of the inte-grated circuits pins.

2. Software Implemented Fault injection (SWIFI). Theobjective of this technique, also called Fault Emula-tion, consists of reproducing at software level the er-rors that would have been produced upon occurringfaults in the hardware. It is based on different practicaltypes of injection, such as the modification of thememory data, or the mutation of the application soft-ware or the lowest service layers (at operating systemlevel, for example).

3. Simulated fault injection. In this technique, the systemunder test is simulated in other computer system. Thefaults are induced altering the logical values during thesimulation.

This work is framed in the simulated fault injection,and particularly in the simulation of models based on theVHDL hardware description language. We have chosenthis technique due fundamentally to:

• The growing interest of the simulated injection tech-niques [4], [5], [6], [7], [8], [9], [10], as a complementof the physical fault injection [11], [12], [13], [14],[15] (these have been traditionally more numerous anddeveloped) and Fault Emulation (SWIFI) [16], [17],[18], [19], [20], [21]. The greatest advantage of thismethod over the previous ones is the Observability andControllability of all the modelled components. Thesimulation can be accomplished in different abstractionlevels. Another positive aspect of this technique is thepossibility of carrying out the validation of the systemduring the design phase, before having the final prod-uct.

0-7695-0646-1/00 $10.00 � 2000 IEEE

• The good perspectives of modelling systems and faultswith VHDL, that has been consolidated as a powerfulstandard to analyse and design computer systems [22].

This work follows the one carried out in the paper[23], where the Dependability of a fault-tolerant micro-computer system was validated. To do that, we performedan injection campaign by means of a fault injection tooldeployed for such a purpose [24].

We have injected new fault types, and studied thecontribution of the different detection and recoverymechanisms to the Dependability results.

In section 2, we refer briefly to the fault injection tool.In section 3, main features of the computer system, basedon a simple 16 bit-microprocessor, are described. In sec-tion 4 we show the fault models used in the injection. Insection 5 we set the conditions and parameters of the in-jection experiments. In section 6 we present the obtainedresults, basically concerning the coverage factors and thepropagation latencies. Finally, in section 7 we explainsome general conclusions and possible future lines ofwork.

2. The fault injection tool

We have developed an injection tool for automaticfault injection in VHDL models at gate-level, register-level and chip-level.

Each component is modelled by a behavioural archi-tecture, usually with one or more concurrent processes.Both main and spare processors are an enhanced versionof the MARK2 processor [26].

As mentioned before, several fault-tolerance mecha-nisms have been added to increase the dependability of thesystem. The error detection mechanisms include the paritycheck and program control flow check by a watchdogtimer. The error recovery mechanisms include the intro-duction of a back-off cycle when parity error detection,

checkpointing when errors are detected by the watchdogtimer, and starting the spare processor in case of perma-nent errors.

A more detailed description of the detection and re-covery mechanisms can be seen in [23].

The description of this system is around 1500 lines ofVHDL code. The code is divided into 10 entities, 11 ar-chitectures and 1 package, excluding the STD and IEEElibraries. In addition, 416 bytes are used to store the ma-chine code executed by the CPU.

The injection tool is composed by a series of elementsdesigned around the ModelSim VHDL simulator [25]. Itinjects faults via simulator commands in variables andsignals defined in the VHDL model. It offers to the user avariety of predefined fault models as well as other featuresto set-up and automatically manage fault injection cam-paigns on a IBM-PC (or compatible). A more comprehen-sive description of the tool can be seen in [24].

3. Computer system

We have built the VHDL model of a fault-tolerant mi-crocomputer, whose block diagram is shown in Figure 1.The system is duplex with cold stand-by sparing, paritydetection and watchdog timer.

The structural architecture of the model is composedby the following components:

• Main and spare CPUs (CPUA and CPUB, respec-tively).

• RAM memory (MEM).• Output parallel port (PORTOUT).• Interrupt controller (SYSINT).• Clock generator (CLK).• Watchdog timer (WD).• Pulse generator (GENINT).• Two back-off cycle generators (TRGENA, TRGENB).• Two AND gates (PAND2A, PAND2B).

Figure 1: Block diagram of the computer system.

0-7695-0646-1/00 $10.00 � 2000 IEEE

4. Fault models

For the injection experiments, we aim at using a vari-ety of faults that would represent the most usual physicalfaults.

We have used 4 models: stuck-at 0,1, bit-flip, inde-termination and delay. Stuck-at and bit-flip are the mostfrequently used in transient fault injection experiments[27], [28], [29], [30], [31]. In addition, we have consideredtwo new models: indetermination and delay.

Next, we describe in short the physical causes andmechanisms implied in the fault models, to justify theirinclusion.

Figure 2: Some causes and mechanisms of the stuck-at,bit-flip, indetermination and delay transient faults models.

As Figure 2 shows, the stuck-at, bit-flip and indeter-mination models allow to represent transient physicalfaults of different types: transient in power supply, cross-talk, electromagnetic interferences (light, radio, etc.), tem-perature variation, α radiation and cosmic radiation (thelast ones are very important in space applications).

These physical faults can vary directly the values ofvoltage and current of the logical levels of circuit nodes.They also can generate e--h+ pairs, which are swept by theelectric field of the depletion zones of PN unions in tran-sistors. This produces a current of e--h+ pairs which mayalter the charge in DRAM cells [32] and/or vary the logi-cal levels in circuit nodes, and cause therefore their inde-termination, commutation (bit-flip) or a transient stuck-at.

The delay model allows to represent physical faultsdue to transients in power line (VDD), which can alter theswitching delay (τ α l2/(µn(VDD/2))) of MOS transistors orthe charge/discharge delay of parasitic capacitances ininput/output connections.

The injection technique used allows an easy imple-mentation of these four fault models using special simula-tor commands to modify the value of the VHDL modelsignals and variables. In fact, the way to inject faults is toassign to the signal or variable the modified value (‘0’:stuck-at 0, ‘1’: stuck at 1, not(I): bit-flip, ‘X’: indetermi-nation, time±∆t: delay) at injection time.

The injected values belong to the multivalued typestd-logic, declared in the IEEE STD-logic-1164 package.

This type has a resolution function to manage the value ofoutput signals connected in parallel. All the VHDL modelbit-like or register-like variables and signals of type std-logic.

5. Fault injection experiments

We seek to study the response of the system in pres-ence of transient faults. The parameters of the injectioncampaign are, summarised:

1. Number of faults: n = 3000 faults per injection ex-periment. This guarantees the statistical validity of theresults.

2. Workload: Two workloads have been used:a) Calculus of the arithmetic series of n integer

numbers:

series kk

n

==

∑1

, n=6

b) The ordenation algorithm Bubblesort, applied to ninteger numbers, with n = 6.

3. Fault types: The injected faults can be transient,stuck-at 0, stuck-at 1, bit-flip, indetermination or de-lay, and they may affect the signals and the variablesin the model.

4. Injection place: The faults are systematically injectedon any atomic signal (sets of signals, like buses, aredivided into their bits) and variable of the model, inboth the external structural architecture and the be-havioural architectures of the components. Faults arenot injected in the spare CPU, since it is off while thesystem is working properly.

5. Injection instant: It is distributed according to aUniform probability distribution function, in the range[0, tworkload ], where tworkload= workload execution timewithout faults.

6. Simulation duration: The simulation duration in-cludes the execution time of the workload and the re-covery time with the spare CPU (tSimul = tWorkload +tSpare).

7. Fault duration: It is generated randomly in theranges [0.1T-10.0T] and [0.01T-1.0T], where T is theCPU clock cycle. It is been intended to inject “short”faults, with a duration equal to a fraction of the clockcycle (the most common faults, as described in [33]),as well as longer faults, which will ensure in excessthe propagation of the errors to the detection signals.

8. Analysis results: For every injection experiment, thesimulation output with and without fault are com-pared. The estimated coverages and latencies of thedetection and recovery mechanisms are automaticallyrecorded. Fig. 3 shows the Fault-Tolerance mecha-

0-7695-0646-1/00 $10.00 � 2000 IEEE

nisms predicate graph [3]. It represents the fault pa-thology, that is, the process followed by faults sincethey are injected until their detection and recovery bythe FTS.

Figure 3: Fault-Tolerance mechanisms predicate graph.

The following parameters are obtained from the sampledata:

• The percentage of activated faults, PA. A fault is calledactivated when it produces a change on a signal orvariable of the system model, and it is propagated tothe external structural architecture. PA is calculated re-lated to the total number of injected faults, n.

• Error detection coverage. We have distinguished twotypes of coverage estimators:∗ Coverage of the detection mechanisms

Activated

Detectedmechanismsd N

NC =)(

∗ Global system coverage

Activated

effectiveNonDetectedsystemd N

NNC

−+=)(

Errors that do not affect the running application arecalled non-effective errors. A non-effective error is usuallyproduced when the faulty information is overwritten by thenormal execution, or because the faulty data remains dor-mant in an unused part of the system. In the latter case, theerror may eventually become effective. The non-effectiveness of errors is related to inherent system fea-tures. We define a more global coverage called SystemCoverage.

• Recovery coverage. Divided also in two types of cov-erage estimators:∗ Coverage of the recovery mechanisms

Activated

eredreDetectedmechanismsr N

NC cov_

)( =

∗ Global system coverage

Activated

effectiveNoneredreDetectedsystemr N

NNC

−+= cov_

)(

• Propagation, detection and recovery latencies:∗ Lp = tp – tinj, where tp is the time instant when the

fault is propagated to the external structural archi-tecture signals, and tinj is the injection instant.

∗ Ld = td – tp, where td is the time instant when the er-ror is detected.

∗ Lr = tr – td, where tr is the time instant when the re-covery mechanisms finish the recovery process.

6. Results

The experiments have been carried out on a PC-compatible with a Pentium-II processor at 350 MHz with192 Mb of RAM.

Every set of 3000 injections last about 8 hours, andthe analysis of the simulations last about 1 hour for Bub-blesort (the most complex workload).

The exposed results are referred to the fault durationranges previously specified: [0.1T-10.0T] and [0.01T-1.0T].

Arithmetic Series Bubblesort

Parameters [0.01T-1.0T] [0.1T-10.0T] [0.01T-1.0T] [0.1T-10.0T]

PA (%) 20.07 23.47 20.06 25.07Cd (mec) (%) 25.42 29.40 27.20 30.24Cd (sys) (%) 98.51 96.31 97.58 97.34Cr (mec) (%) 20.60 24.29 23.79 26.00Cr (sys) (%) 93.69 91.19 94.18 93.09

Lp (ns) 973 979 1550 1824Ld (ns) 35524 31527 38594 33787Lr (ns) 89554 109915 114136 123976

Table 1: Percentage of activated errors, coverages andlatencies related to the fault duration and the workload.

Table 1 shows the percentage of activated errors, thecoverages and the average latencies for both workloads. Itcan be observed that, as fault duration decreases:

• PA decreases. Short duration faults have lesser influ-ence in the system operation.

• Cd (mechanisms) and Cr (mechanisms) decrease. Short durationfaults are more difficult to detect and recover.

• Cd (sys) and Cr (sys) grow slightly. This is due to the raisein the percentage of non-effective errors, as shown inFigures 4, 5, 6 and 7.

• Lp < Ld << Lr. The values of average latencies don’tseem to have a clear dependency on the fault duration.

These results reproduce the global influence of thefault duration shown in [23]. As Table 1 shows, this be-haviour is independent on the workload, although thevalues of coverages and latencies are usually a bit greaterfor Bubblesort.

0-7695-0646-1/00 $10.00 � 2000 IEEE

Workload = Arithmetic Series Workload = Bubblesort

% Detected errors Average Latency (ns) % Detected errors Average Latency (ns)DetectionMechanism

[0.1T-10.0T] [0.01T-1.0T] [0.1T-10.0T] [0.01T-1.0T] [0.1T-10.0T] [0.01T-1.0T] [0.1T-10.0T] [0.01T-1.0T]

Parity 61.35 56.21 3021 4077 62.60 53.69 7694 9687WDT 38.65 43.79 76779 75888 37.40 46.31 77377 72756

% Recovered Errors Average Latency (ns) % Recovered Errors Average Latency (ns)RecoveryMechanism

[0.1T-10.0T] [0.01T-1.0T] [0.1T-10.0T] [0.01T-1.0T] [0.1T-10.0T] [0.01T-1.0T] [0.1T-10.0T] [0.01T-1.0T]

Back-off 22.81 34.68 13916 23472 23.72 25.25 98360 93556Checkpoint 5.85 7.26 35932 13180 6.65 8.29 15996 19726

Spare 71.34 58.06 146667 13866 69.63 66.46 141346 133467

Table 2: Contribution of the detection and recovery mechanisms to coverage and latency results.

Table 2 s5hows the detached contribution of the dif-ferent detection and recovery mechanisms to coverage andlatency results.

With respect to detection mechanisms, it can be ob-served that:

• % Parity > % WDT. Parity is, in the analysed systemmodel, the most effective detection mechanism (mainlyin longer faults).

• Ld (Parity) << Ld (WDT). Parity presents a quite lowerlatency then WDT.

• No significant differences between the two workloadsare observed.

The most significant results related to recoverymechanisms are:

• % Spare > % Back-off >> % Checkpoint. The longerfault duration provokes a higher percentage of perma-nent errors, which active the spare CPU. No significantdifferences between the two workloads are observed.

• Lr (Spare) is normally the greatest. The values of Lr

(Back-off) and Lr (Checkpoint) vary with both faultduration and workload.

Figures 4, 5, 6 and 7 show the detailed fault and errorpathology for the two fault duration ranges and the twoworkloads studied.

Figure 4: Fault-tolerance mechanisms predicate graph.Duration = Uniform [0.01T-1.0T]. Workload = Arithmetic

series.

Figure 5: Fault-tolerance mechanisms predicate graph.Duration = Uniform [0.1T-10.0T]. Workload = Arithmetic

series.

Figure 6: Fault-tolerance mechanisms predicate graph.Duration = Uniform [0.01T-1.0T]. Workload = Bubblesort.

Figure 7: Fault-tolerance mechanisms predicate graph.Duration = Uniform [0.1T-10.0T]. Workload = Bubblesort.

Finally, Table 3 tries to reflect the influence of thetransient fault models on the Dependability parameters:Two fault model sets have been compared:

a) Fault model set 1 = stuck-at (0, 1), bit-flip, indetermi-nation, delay. This has been the set used in this work.

b) Fault model set 2 = stuck-at (0, 1), indetermination,high impedance. This set was used in the fault injectioncampaign of [23].

0-7695-0646-1/00 $10.00 � 2000 IEEE

Parameters Fault Model Set 1 Fault Model Set 2

PA (%) 23.47 22.83Cd (mec) (%) 29.40 37.52Cd (sys) (%) 96.31 96.35Cr (mec) (%) 24.29 30.80Cr (sys) (%) 91.19 89.64

Lp (ns) 979 1766Ld (ns) 31527 31419Lr (ns) 109915 114267

Table 3: Influence of the transient fault models on the De-pendability parameters. Duration = Uniform [0.1T-10.0T].

Workload = Arithmetic series.

From the table, we can see that fault model set 1 pres-ents a bit higher value for PA, and quite lower values forCd (mechanisms) and Cr (mechanisms). This can be due to the greaterimpact of the new fault model set, which includes moretransient fault types.

These results are preliminary, and they are beingcompleted with other fault model sets, to study moredeeply the influence of the fault models on the Depend-ability parameters.

7. Summary. Conclusions and future work

An injection campaign into a VHDL model of a 16-bitfault-tolerant microcomputer system running two differentworkloads has been performed. We have injected transientfaults (in groups of 3000 in each experiment), of typesstuck-at, bit-flip, indetermination and delay, on the modelsignals and variables, using the simulator commands of aninjection tool designed to that purpose. The objectives hasbeen to study the pathology of the propagated errors,measure their latency, and calculate the detection andrecovery coverages.

The most important conclusions which can be ex-tracted from the work and the results are:

• We have verified the good performance of the usedinjection technique (automatic alteration of variablesand signals of the model by using of special simulatorcommands). Worth to mention the easy implementa-tion, and the high controllability and observability ofthe injection experiments.

• We have showed the usefulness of the injection tool,designed to work on PC platforms and inject faults atchip-level into VHDL models.

• For short duration transient faults ([0.01T-1.0T]),global system detection and recovery coverages ob-tained have been quite high: up to 98% and 94%, re-spectively.

• The most effective detection mechanism has beenParity, having a smaller average latency than theWatchdog Timer.

• The most effective recovery mechanism has been theSpare, followed by the Back-off cycle and Check-pointing. The higher recovery latencies correspondusually with Spare and Checkpoint.

• As fault duration grows ([0.1T-10.0T]), the percentageof activated errors increases, mechanism coveragesraise significantly, while global system coverages de-crease slightly.

• The general dependency on the fault duration, and therelative contribution of the different detection /recove-ry mechanisms have been verifed for the two work-loads. No big differences have been observed.

• The fault model set used has had more incidence in theSTF operation than the used in previous experiments[23]. This has been reflected in the raise of the percent-age of activated errors and the decrease of the mecha-nism coverages.

The information presented in this paper can be used toimprove the design of detection and recovery mechanismsto optimise the values of coverage and latency.

It is intended to complete this work in a short term inthe following aspects:

• Testing new injection techniques that can modify theVHDL code, starting from the idea of saboteurs (atstructural level) and mutants (at behavioural level) [8].

• Studying more deeply the influence of the fault modelson the Dependability parameters.

• Applying the fault injection tool to the validation ofmore complex and real faul tolerant systems.

References

[1] J. Clark.and D. Pradhan, “Fault Injection. A method for vali-dating computer-system dependability”, IEEE. Computer, June 1995.

[2] M. Sueh, T. Tsai, and R.K. Iyer, “Fault Injection Techniquesand Tools”, IEEE Computer, April 1997, pp. 75-82.

[3] J. Arlat, A. Costes, Y. Crouzet, J. Laprie, and D. Powell, “FaultInjection and Dependability Evaluation of Fault-Tolerant Systems”.IEEE Transactions on Computers, Vol. 42, Nº 8, August 1993,pp. 913-923.

[4] K. Goswami and R.K. Iyer, “DEPEND: A simulation-basedenvironment for system level dependability analysis”, Tech. Rep.CRHC-92-11, Center for Reliable and High Performance Computing,University of Illinois (USA), 1991.

[5] G. Choi and R.K. Iyer, “FOCUS: An experimental environ-ment for fault sensitivity analysis”, IEEE Transactions on Comput-ers, Vol. 41, December 1992, pp. 1515-1526.

[6] P. Folkesson, S. Svensson, and J. Karlsson, “A Comparison ofsimulation-based and scan chain implemented fault injection”. Proc.28th International Symposium on Fault Tolerant Computing(FTCS-28), Munich (Germany), June 1998, pp. 284-293.

0-7695-0646-1/00 $10.00 � 2000 IEEE

[7] J. Clark and D. Pradhan, “REACT: Reliable ArchitectureCharacterization Tool”, Tech. Rep. TR-92-CSE-22, University ofMassachusetts, June 1992.

[8] E. Jenn, J. Arlat, M. Rimen, J. Ohlsson, and J. Karlsson, “FaultInjection into VHDL Models: The MEFISTO Tool”, Proc. 24thInternational Symposium on Fault Tolerant Computing (FTCS-24),1994, pp. 66-75.

[9] T.A. DeLong, B.W. Johnson, and J.A. Profeta III, “A faultInjection Technique for VHDL Behavioral-Level models”, IEEEDesign and Test of Computers, Vol. 13, Nº 4, Winter 1996.

[10] J. Arlat, J. Boué, and Y. Crouzet, “Validation-Based Develop-ment of Dependable Systems”, IEEE Micro, Vol. 19, July/August1999, pp. 66-79.

[11] J. Arlat, M. Aguera, L. Amat, Y. Crouzet, J.C. Fabré,J.C. Laprie, E. Martins, and D. Powell, “Fault injection for depend-ability validation: a methodology and some applications”, IEEETransactions on Software Engineering, Vol. 16, February 1990,pp.166-182.

[12] U. Gunneflo, “Physical fault injection for Validation of De-pendable Computing Systems and a Fault-Tolerant Computer designfor Safety-critical missions”, PhD thesis, Chalmers University ofTechnology, Göteborg (Sweden), 1990.

[13] P. Gil, “Sistema Tolerante a Fallos con Procesador de Guardia:Validación mediante Inyección Física de Fallos”, PhD. thesis, De-partamento de Ingeniería de Sistemas, Computadores y Automática,Universidad Politécnica de Valencia (Spain), September 1992.

[14] J. Karlsson, P. Folkesson, J. Arlat, Y. Crouzet, and G. Leber,“Integration and Comparison of Three Physical Fault Injection Tech-niques”, Predictably Dependable Computer Systems, Chapter V:Fault Injection, Ed. Springer Verlag, 1995, pp. 309-329.

[15] R.J. Martínez, P.J. Gil, G. Martín, C. Pérez, and J.J. Serrano,“Experimental Validation of High-Speed Fault-Tolerant SystemsUsing Physical Fault Injection”, Proc. 7th Dependable Computing forCritical Applications (DCCA’7), IEEE Computer Society Press,1999, pp. 233-249.

[16] Z. Segall, et al., “FIAT: Fault-Injection Based AutomatedTesting Environment”, Proc. 18th International Symposium on Fault-Tolerant Computing Systems (FTCS-18), IEEE CS Press, Tokyo(Japan), June 1988, pp. 102-107.

[17] G.A. Kanawati, N.A. Kanawati, and J.A. Abraham,“FERRARI: A Tool for the Validation of System DependabilityProperties”, Proc. 22nd International Symposium on Fault-TolerantComputing Systems (FTCS-22), Boston, Massachusetts (U.S.A.),July, 1992, pp. 336-344.

[18] W. Kao, R.K. Iyer, and D. Tang, “FINE: A Fault Injection andMonitor Environment for Tracing the UNIX System Behavior underFaults”, IEEE Transactions on Software Engineering, Vol. 19, No.11, November 1993, pp. 1105-1118.

[19] S. Han, H.A. Rosenberg, and K.G. Shin, “DOCTOR: AnIntegrateD SOftware Fault InjeCTiOn EnviRonment”, Technicalreport, University of Michigan, December 1993.

[20] M. Rodriguez, F. Salles, J.C. Fabre, and J. Arlat, “MicrokernelAssessment by Fault Injection and Design Aid”, Proc. 3rd EuropeanDependable Computing Conference (EDCC-3), Prague (CzechRepublic), September 1999, pp.143-160.

[21] J.C. Campelo, F. Rodriguez, P.J. Gil, and J.J. Serrano, “Designand validation of a Distributed Industrial Control System’s nodes”,Proc. 18th IEEE Symposium on Reliable Distributed Systems,Lausanne (Switzerland), 1999.

[22] IEEE, IEEE Standard VHDL Language Reference Manual,IEEE Std 1076-1993.

[23] D. Gil, R. Martínez, J.C. Baraza, J.V. Busquets, and P.J. Gil,“Fault Injection into VHDL Models: Experimental Validation of aFault Tolerant Microcomputer System”. Proc. 3rd European Depend-able Computing Conference (EDCC-3), Prague (Czech Republic),September 1999, pp.191-208.

[24] D. Gil, J.V. Busquets, J.C. Baraza and P.J. Gil, “A Fault Injec-tion Tool for VHDL Models”, Fastabs of the 28th Iternational Sym-posium on Fault Tolerant Computing (FTCS-28). Munich (Ger-many), 1998, pp. 72-73.

[25] Model Technology, ModelSim EE/PLUS Reference Manual,1998.

[26] J.R. Armstrong, Chip-Level Modelling with VHDL, PrenticeHall, 1989.

[27] J. Ohlsson, M. Rimén, and U. Gunneflo, “A study of the effectof transient fault injection into a 32-bit RISC with built-in watch-dog”. Proc. 22nd International Symposium on Fault-Tolerant Com-puting (FTCS-22), Boston, Massachusetts (U.S.A.), 1992, pp.316-325.

[28] J. Boué, J. Arlat, Y. Crouzet, and P. Pétillon, “Verification ofFault Tolerance by Means of Fault Injection into VHDL SimulationModels”. LAAS report nº 96463, December 1996.

[29] A.M. Amendola, A. Benso, F. Corno, L. Impagliazzo,P. Marmo, P. Prinetto, M. Rebaudengo, and M. Sonza Reorda, “FaultBehavior Observation of a Microprocessor System through a VHDLSimulation-Based Fault Injection Experiment”. Proc. EuroDAC96,1996.

[30] V. Sieh, O. Tschäche, and F. Balbach, “Comparing DifferentFault Models Using VERIFY”, Proc. 6th Dependable Computing forCritical Applications (DCCA-6), March 1997, pp. 59-76.

[31] C. Constantinescu, “Assessing Error Detection Coverage bySimulated Fault Injection”. Proc. EDCC-3 (European DependableComputing Conference), pp.161-170, Prague (Czech Republic),September, 1999.

[32] E.A. Amerasekera and F.N. Najm, Failure Mechanisms inSemiconductor Devices, John Wiley & Sons. 1997.

[33] H. Cha, E.M. Rudnick, G.S. Choi, J.H. Patel, R.K. Iyer, “AFast and Accurate Gate-Level Transient Fault Simulation Environ-ment”, Proc. 23rd International Symposium on Fault-Tolerant Com-puting Systems (FTCS-23), Toulouse (France), June 1993, pp. 310-319.

0-7695-0646-1/00 $10.00 � 2000 IEEE