A prototype of a VHDL-based fault injection tool: description and application

21
A prototype of a VHDL-based fault injection tool: description and application J.C. Baraza * , J. Gracia, D. Gil, P.J. Gil Grupo de Sistemas Tolerantes a Fallos––Fault Tolerant Systems Group (GSTF), Departamento de Inform atica de Sistemas y Computadores (DISCA), Escuela Universitaria de Inform atica, Universidad Polit ecnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain Abstract This paper presents the prototype of an automatic and model-independent fault injection tool, to be used on an IBM-PC (or compatible) platform. The tool has been built around a commercial VHDL simulator and it is thought to implement different fault injection techniques. With this tool, a wide range of transient and permanent faults can be injected into medium-complexity models. Another remarkable aspect of the tool is the fact that it can analyse the results obtained from injection campaigns, in order to study the Error Syndrome of the system model and/or validate its fault- tolerance mechanisms. Some results of various fault injection campaigns carried out to validate the Dependability of a fault-tolerant microcomputer system are shown. We have analysed the pathology of the propagated errors, measured their latencies, and calculated both error detection and recovery latencies and coverages. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Fault-tolerant systems (FTSs); Fault modelling; Error syndrome; FTS validation; VHDL-based fault injection; Fault injection tool 1. Introduction One of the main problems of developing and exploiting modern dependable systems is their validation. This is because, in addition to the functional aspect, the correct operation of the fault-tolerance mechanisms (FTMs) must also be considered. The validation of the system coverage depends strongly on the coverages of the various FTMs [1]. The complex behaviour of real fault-tolerant systems (FTSs) implies that their validation needs to be partially experimental, because of [2,3]: The highly specialised and novel nature of the system components, both hardware and soft- ware. Moreover, FTSs are normally used in spe- cific applications, and so few units exist. This hinders the availability of experimental data re- lated to their function. The uncertainty of fault pathologies, mainly about how to quantify the influence of faults on system Dependability. Journal of Systems Architecture 47 (2002) 847–867 www.elsevier.com/locate/sysarc * Corresponding author. Tel: +34-96-3879704; fax: +34-96- 3877579. E-mail addresses: [email protected] (J.C. Baraza), [email protected] (J. Gracia), [email protected] (D. Gil), [email protected] (P.J. Gil). 1383-7621/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII:S1383-7621(01)00036-4

Transcript of A prototype of a VHDL-based fault injection tool: description and application

A prototype of a VHDL-based fault injection tool:description and application

J.C. Baraza *, J. Gracia, D. Gil, P.J. Gil

Grupo de Sistemas Tolerantes a Fallos––Fault Tolerant Systems Group (GSTF), Departamento de Inform�aatica de Sistemas yComputadores (DISCA), Escuela Universitaria de Inform�aatica, Universidad Polit�eecnica de Valencia, Camino de Vera s/n,

46022 Valencia, Spain

Abstract

This paper presents the prototype of an automatic and model-independent fault injection tool, to be used on an

IBM-PC (or compatible) platform. The tool has been built around a commercial VHDL simulator and it is thought to

implement different fault injection techniques. With this tool, a wide range of transient and permanent faults can be

injected into medium-complexity models. Another remarkable aspect of the tool is the fact that it can analyse the results

obtained from injection campaigns, in order to study the Error Syndrome of the system model and/or validate its fault-

tolerance mechanisms. Some results of various fault injection campaigns carried out to validate the Dependability of a

fault-tolerant microcomputer system are shown. We have analysed the pathology of the propagated errors, measured

their latencies, and calculated both error detection and recovery latencies and coverages. � 2002 Elsevier Science B.V.

All rights reserved.

Keywords: Fault-tolerant systems (FTSs); Fault modelling; Error syndrome; FTS validation; VHDL-based fault injection; Fault

injection tool

1. Introduction

One of the main problems of developing andexploiting modern dependable systems is theirvalidation. This is because, in addition to thefunctional aspect, the correct operation of thefault-tolerance mechanisms (FTMs) must also beconsidered. The validation of the system coverage

depends strongly on the coverages of the variousFTMs [1].The complex behaviour of real fault-tolerant

systems (FTSs) implies that their validation needsto be partially experimental, because of [2,3]:

• The highly specialised and novel nature of thesystem components, both hardware and soft-ware. Moreover, FTSs are normally used in spe-cific applications, and so few units exist. Thishinders the availability of experimental data re-lated to their function.

• The uncertainty of fault pathologies, mainlyabout how to quantify the influence of faultson system Dependability.

Journal of Systems Architecture 47 (2002) 847–867

www.elsevier.com/locate/sysarc

*Corresponding author. Tel: +34-96-3879704; fax: +34-96-

3877579.

E-mail addresses: [email protected] (J.C. Baraza),

[email protected] (J. Gracia), [email protected] (D. Gil),

[email protected] (P.J. Gil).

1383-7621/02/$ - see front matter � 2002 Elsevier Science B.V. All rights reserved.

PII: S1383-7621 (01 )00036-4

Experimental validation can be carried out intwo different ways [2]:

• Observing the behaviour of the system in pres-ence of faults during its operative phase, ex-tracting real measures of coverage coefficients,number of failures, and the temporal cost ofmaintenance operations.

• By means of controlled experiences, analysingthe behaviour of the system in presence of faultsintroduced deliberately.

The first technique obtains real values for thestudied parameters. However, in most cases itcannot be used, because to obtain statistical valueswith a suitable confidence margin, the observationtime required to study the occurrence of all theworking possibilities is too long.For this reason, the second method, called fault

injection, is more suitable to validate FTSs. Infact, fault injection is being increasingly consoli-dated and applied in a wide range of fields, andseveral automatic tools have been designed[4,5]. Fault injection technique is defined as follows[2]:Fault injection is the validation technique of the

Dependability of Fault-Tolerant Systems whichconsists in the accomplishment of controlled exper-iments where the observation of the system’s be-haviour in presence of faults is induced explicitly bythe written introduction (injection) of faults in thesystem.As Fig. 1 shows, fault injection techniques in

the hardware of a system can be classified in threemain categories [4–7]:

• Physical fault injection (hardware implementedfault injection, HWIFI): This is achieved atthe physical level, disturbing the hardware withparameters of the environment (internal) ormodifying the value of the IC pins (external).

• Software implemented fault injection (SWIFI):The objective of this technique consists of re-producing at software level the errors thatwould have been produced if faults occurredin both the hardware and software. This isbased on different practical types of injection,such as modification of memory data, or muta-

tion of either the application software or thelowest service layers (at operating system level,for example).

• Simulated fault injection: In this technique,a model of the system under test, which canbe developed at different abstraction levels, issimulated in another computer system. Faultsare induced by altering the logical values ofthe model elements during the simulation.

During the design phase of a system, simulationis an important experimental way to get an earlymeasure of the Performance and Dependability.Another interesting advantage of this techniquerespect to other injection techniques is the highobservability and controllability of all the mod-elled components.This paper describes a tool for injecting faults in

VHDL 1 simulation models. The objective hasbeen to have a tool that complements other faultinjection techniques implemented by our researchgroup (AFIT [8] for pin-level HWIFI, and SOFI[9] for SWIFI). This tool is thought to work withmodels at gate, register and chip level describedusing the VHDL language. In the context ofVHDL-based fault injection tools, we have tried tointroduce some specific characteristics. The tool

Fig. 1. Classification of fault injection techniques.

1 Very high speed integrated circuits Hardware Description

Language.

848 J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867

works in a PC (or compatible) under WindowsTM.This leads to a simple and easily portable tool,suitable for injection campaigns in medium com-plexity systems. Besides these, other objectiveshave been to implement different types of FTSanalysis and to improve some features usually notso much developed in other fault injection tools:fault models and the integration of differentVHDL-based fault injection techniques.The distribution of this paper is as follows. In

Section 2 we explain some VHDL simulation-based fault injection techniques that we have usedin the experiments, remarking our implementationcontributions. Section 3 describes the fault injec-tion tool, showing briefly its components and itsmore relevant features. In Section 4 we illustratethe fault models used in the injection tool. InSection 5 we present some experiments that showthe capabilities of the tool. Finally, in Section 6 weindicate some general conclusions and future en-hancements of the tool.

2. VHDL-based fault injection

VHDL has become one of the most suitablehardware description languages from the point ofview of fault injection [10]. The reasons for thesuccess of VHDL can be summarised as:

• It promotes an open standard for digital designspecification.

• It allows to describe a system at different ab-straction levels, as it is possible to makebehavioural and structural descriptions.

• Some elements of its semantics (such as resolu-tion functions, multivalued types, and configura-tion mechanisms) can be used in fault injection.

Previous works in this area show that injectionin VHDL models can be divided in two groups oftechniques [11], as can be seen in Fig. 2.

2.1. Simulator commands technique

This technique is based on the use of the sim-ulator commands to modify the value of the modelsignals and variables [6].

The way that faults are injected depends on theinjection place. To inject on signals, the sequenceof pseudo-commands used is:

1. Simulate_Until [injection instant]2. Modify_Signal [signal name] [fault value]3. Simulate_For [fault duration]4. Restore_Signal [signal name]5. Simulate_For [observation time]

This sequence is thought to inject transientfaults, which are the most common and difficult todetect [12]. To inject permanent faults, the se-quence is the same, but omitting steps 3 and 4. Toinject intermittent faults, the sequence consists ofrepeating steps 1–5, with random separation in-tervals.The sequence of pseudo-commands used to in-

ject on variables is:

1. Simulate_Until [injection instant]2. Assign_Variable [variable name] [fault value]3. Simulate_For [observation time]

The operation is similar to the injection onsignals, but in this case there is no control of thefault duration. This implies that it is not possibleto inject permanent faults on variables using sim-ulator commands.The sequence of commands needed to carry out

the injection (for both transient and permanentfaults) can be included in a macro, where the ele-ments between brackets will be passed to themacro as parameters. This means the injectionconditions can be varied without modifying thecommand code.It is worth noting that, from the point of view of

the injection procedure, VHDL generics are man-aged as ‘‘special’’ variables. This enables injectionof some non-usual fault types, such as delay faults

Fig. 2. VHDL-based fault injection techniques.

J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867 849

[13]. See Section 4 for the description of the faulttypes that we have introduced with this technique.Respect to implementation cost, simulator

commands technique is the easiest one to imple-ment.

2.2. VHDL code modification techniques

Two techniques can be distinguished. The firstone is based on adding components to the VHDLmodel. These components are specific to fault in-jection, and are called saboteurs [6]. The secondtechnique consists of modifying components of theVHDL model, generating altered descriptions ofcomponents, called mutants [6].Other techniques are implemented extending

the VHDL language by the addition of new datatypes and signals and the modification of theVHDL resolution functions [14,15]. The new ele-ments defined include fault behaviour description.However, these techniques require the introduc-tion of ad hoc compilers and control algorithms tomanage the language extensions.

2.2.1. SaboteursA saboteur element is a special VHDL compo-

nent added to the original model. The mission ofthis component is to alter the value, or timingcharacteristics, of one or more signals when a faultis injected, whereas during the normal operation ofthe system it remains inactive. The signals wherefaults can be injected are those which connectcomponents in structural models. In [6] we can seea classification of saboteurs: serial and parallel.Serial saboteurs are VHDL components that are

inserted between the output port(s) of a compo-nent and the input port(s) of its connected com-ponent. When the number of both input andoutput ports is 1, the saboteur is called serial sim-ple, and serial complex if there is more than oneinput or output port.Parallel saboteurs can be implemented as an

additional driver of a signal associated to a reso-lution function, that must be modified so that itcan inject faults.We have extended these types by designing bi-

directional saboteurs and adapted some models tobuses. The example below summarises the imple-

mentation of one of the saboteurs designs, showingthe fault types introduced.

2.2.1.1. An example: design of a serial simple bi-directional saboteur. In Fig. 3 we can see thescheme of the saboteur and the timing of Controlsignal.Besides signals I and O (both input/output sig-

nals), this saboteur has a Control signal to activatethe injection, which will be managed from thesimulator, and whose activation will determineboth the injection instant and fault duration. R=Wsignal determines the data transfer direction. Theselection of the fault type can be made using ex-ternal Selection signals, managed by the simulator.Table 1 shows the expressions that we have used

to generate the value of the output (O or I, de-pending on the value of R=W ), according to thefault type chosen.The design of the serial complex saboteur is

merely a generalisation of the serial simple, con-sidering two extra fault models which are becom-ing increasingly important in CMOS VLSI ICs:short and bridging. For more details about thedesign of bi-directional saboteurs, saboteurs inbuses, and parallel saboteurs, see [11].The main drawback of saboteurs technique is

that a number of control signals have to be added

Fig. 3. Serial simple bi-directional saboteur and fault activa-

tion.

Table 1

Fault types implemented in the serial simple saboteur

Fault type Expression for saboteur output

Stuck-at 0 ‘0’

Stuck-at 1 ‘1’

Bit-flip Not(I)

Open-linea ‘Z’ (high impedance)

Delay I after delay, delay > 0

Indetermination ‘X’

Stuck-opena ‘0’ after tretentiona Exclusively permanent faults.

850 J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867

to the model. These signals are used to activate oneamong the set of inserted saboteurs and to indicatethe type of perturbation that it will have to inject.This adds an additional complexity to both themodel and technique. Although saboteurs tech-nique is more difficult to implement than simulatorcommands, it has a larger fault model capability,as it can be seen in Section 4.

2.2.2. MutantsA mutant is a component which replaces an-

other component. While inactive, it works like theoriginal component, but when activated, it be-haves like the component in presence of faults. It isrelatively easy to implement this replacement tech-nique by using the VHDL configuration mecha-nism, as we will see later.We have generated the mutants modifying the

VHDL code in behavioural descriptions. We haveused eight fault models which have a good corre-spondence with gate-level faults [16]. They can beclassified in two groups (control or data), de-pending on the flow that they disturb:

• Faults in control flow: stuck-then, stuck-else,assignment control, dead process, dead clauseand micro-operations.

• Faults in data flow: local stuck-data and globalstuck-data.

Table 2 summarises the eight fault models.Also in [17], several algorithmic models for

control flow faults are commented. Some faultmodels coincide with the specified in Table 2.Others include modification of synchronisationand timing clauses (after and wait clauses).

2.2.2.1. Implementation of mutants in VHDL. Fig.4 shows the method of implementing mutants us-ing the configuration mechanism in a multi-levelstructural architecture [11,18]:The mutation affects the components of the

structural architecture, where different architec-tures can be assigned to the same component(entity), but only one is actually selected (in thefigure, marked with a thick line). On the otherhand, the assigned architectures can be behavio-ural or structural. In the figure, two configurations

are represented: fault-free and mutant. The ‘‘mu-tated’’ configurations are obtained by varying thecomponent instances. Depending on the modifiedsyntactical elements, different mutants can beobtained from the fault-free original structuralarchitecture. For each component (component i,

Table 2

Fault models by modifying syntactical units in behavioural

descriptions

Fault name Code modification

Stuck-then Replacement of the condition by true

Stuck-else Replacement of the condition by false

Assignment control Disturbing an assignment operation

Dead process Elimination of the sensitivity list of a

process

Dead clause Elimination of a clause in a case

Micro-operation Disturbing an operator

Local stuck-data Disturbing the value of a variable,

constant or signal in an expression

Global stuck-data Elimination of all value modifications

of a variable or signal in an architec-

ture

Fig. 4. Implementation of mutants using the configuration

mechanism: (a) fault-free configuration and (b) example of

mutated configuration.

J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867 851

i ¼ 1; . . . ;M), it is possible to choose the fault-freearchitecture or one mutated architecture from theset: ði; 2Þ; ði; 3Þ; . . . ; ði;NiÞ.Notice that to change the configuration, it is

enough to modify the architecture-to-componentbinding in the configuration declaration. Afterthat, the new configuration must be recompiled. Itis important to note that this is a reduced compi-lation, as the different architectures have alreadybeen compiled in the library. Every new compila-tion affects only the configuration declaration, notthe mutant architectures.Remark also that the architecture-to-compo-

nent binding is static. That is, after compiling aspecific configuration, it remains unaltered duringthe simulation. For this reason, only permanentfaults can be injected on this way.The implementation of transient faults by

means of mutants technique requires the dynamicactivation of the mutated configurations. A pos-sible method based on the use of guard expressionsin blocks (guarded blocks) can be seen in Fig. 5.The idea is that the guarded assignments enablethe blocks and associated architectures to be acti-vated dynamically. By varying the value of theguard signal along the simulation time, it is pos-sible to select dynamically the fault-free architec-

ture or the mutated one. If elec ¼ 1, block 1 isenabled. This block is associated to configuration1 (fault-free), which assigns the fault-free archi-tecture to the global system entity. If elec ¼ 2,block 2 is enabled. This block is associated toconfiguration 2 (mutated), that assigns the mutantarchitecture to the global system entity.The main problem of transient mutants tech-

nique is the high temporal cost of simulations,mainly due to the need of saving the system statebefore architecture commutations. Nevertheless,mutants have a big fault modelling capability be-cause they can use all the syntactic and semanticcapabilities of the VHDL language. In Section5.2.3, some results obtained from the injection oftransient mutants are shown.

3. The fault injection tool

Given the usefulness of VHDL-based fault in-jection for early validation during the designphase, the GSTF has built an injection tool usingthis method. The general objective is to apply in anautomatic way the fault injection techniques in anspecific VHDL model.Firstly, we will study shortly other VHDL in-

jection tools, to know the state of the art of thesubject. From this analysis we have developed ourtool, introducing new characteristics.

3.1. Some significant VHDL-based fault injectiontools

MEFISTO (Multi-level Error/Fault InjectionSimulation TOol) [19] established the basis of thetheory of VHDL-based fault injection, although itonly injected faults using the simulator commandstechnique. MEFISTO was used to analyse theerror syndrome of the DP32 processor [20]. Faultswere transient stuck-at and bit-flip, randomly in-jected in atomic signals and variables of the model.Two improved versions of MEFISTO were

implemented by the LAAS 2 (MEFISTO-L [21])

Fig. 5. Transient mutant implementation. Dynamic modifica-

tion of the architecture using guarded assignments in blocks.

2 LAAS: Laboratoire d’Analyse et d’Architecture des

Syst�eemes, in Toulouse (France).

852 J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867

and the Chalmers University of Technology(MEFISTO-C [22]). These tools can inject faultsby adding saboteurs to the model, or manipulatingthe signals and/or variables of the model via sim-ulator commands. In addition to classical faultmodels (stuck-at, bit-flip), they implement newtypes of faults: open-line (high impedance), inde-termination, etc. With MEFISTO-C the validationof THOR processor (a RISC microprocessor de-veloped by Saab Ericsson Space (AB)) has beendone [23].The three versions of MEFISTO use the Van-

tage Optium VHDL Simulator, simulating inparallel over a network of UNIX workstations.Other tools, like VHDL-based Evaluation of

Reliability by Injecting Faults efficientlY (VER-IFY) [15] and the one implemented in [14], useother fault injection techniques (see Fig. 2) differ-ent from simulator commands, saboteurs and mu-tants. For this reason, they haven’t had so muchinfluence in our tool. However, these tools showthe same general injection phases that are observedin the rest of tools: experiment set-up, simulationand readouts (see Section 3.3).

3.2. General features of the tool

The tool works in a PC (or compatible) underWindowsTM. The design has been realised arounda commercial simulator. It is a simple and easilyportable tool, suitable for injection campaigns inmedium complexity systems. Other significantcharacteristics are:

• The tool can inject a wide range of fault models,surpassing the classical models of stuck-at andbit-flip. In Section 4, we present in detail thefault models used.

• With regard to fault timing, either permanent,transient and intermittent faults can be injected.It is possible to choose among different proba-bility distribution functions (Uniform, Expo-nential, Weibull and Gaussian) verified in realfaults, to determine both the injection instantand duration.

• Different injection techniques can be used: sim-ulator commands, saboteurs and mutants.

• The tool is able to inject faults into VHDLmodels at gate, register and chip level.

• The tool can realise two types of analysis:(1) Error syndrome analysis, where faults and

errors are classified, and their relative inci-dence and propagation latency are calcu-lated. This kind of analysis is interesting todetermine the error detection and recoverymechanisms more suitable to improve theDependability of a system.

(2) FTS validation, where the detection andrecovery mechanisms of the FTS are vali-dated. Dependability parameters are calcu-lated, like the detection and recoverycoverages and latencies. Usually, FTS vali-dation is made after the error syndromeanalysis.

The main drawbacks of the tool can be relatedwith the treatment of VHDL models very complexand the associated simulation time, due to thelimitations of the used simulator and the absenceof some parallelism in the simulations.

3.3. Injection phases

An injection experiment consists of injecting anumber of faults into the model, according to thevalue of the injection parameters. For every faultinjected, the behaviour of the model is analysed,and at the end of the experiment, the value of somespecified parameters are obtained.An injection campaign is a set of different in-

jection experiments, changing any injection pa-rameters. An injection experiment consists of threeindependent phases that appear in the generalstructure of the majority of injection tools.

3.3.1. Experiment set-upHere, both the injection parameters and analy-

sis conditions are specified. The most importantinjection parameters are: Injection technique (sim-ulator commands, saboteurs or mutants), numberof injections, injection targets, fault models forevery injection target, fault instant and fault dura-tion distribution, clock cycle, system workload andsimulation duration. Basically, the analysis con-ditions are:

J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867 853

• The objective of the injection campaign (errorsyndrome or FTS validation).

• If an error syndrome analysis is carried out, thefault classification and error classificationclauses must be established.

• If a FTS validation is performed, the error de-tection and error recovery clauses must be spec-ified.

3.3.2. SimulationIn this phase, two operations are carried out.

Firstly, a set of macros is automatically activated:one performs a golden run simulation (no faultsinjected) of the model; the others have the com-mands needed to inject the number of faultsspecified. Secondly, the macros are executed by theVHDL simulator, obtaining a set of simulationtraces: a golden run, and n with a fault injectionperformed, being n the number of faults injected.This is the most common case when single faultsare injected. However, the tool can inject toomultiple faults in a simulation trace.

3.3.3. Analysis and readoutThe golden run trace is compared to the n fault-

injected simulation traces, studying their differ-ences and extracting the analysis parameters of thesystem. Depending on the analysis type, the ob-jective of the comparison is different. Summaris-ing, the analysis algorithms realise the followingactions:

• In an error syndrome analysis, when a mis-match is found, it is checked if the activated er-ror is effective, testing the workload result. If itis effective, the result of the workload is incor-rect and the injected fault and the effective error

are classified. Also, the error propagation la-tency is measured. Typical results obtainedfrom error syndrome analysis can be the per-centage of effective errors and their latency, infunction of error and fault types.

• In a FTS validation, if no mismatch is found, itconsiders that the injected fault has produced anon-activated error. If a mismatch is found, itsearches for the assessment of the detectionclauses to determine whether the activated errorhas been detected or not. If not, the activatederror can be non-effective if the workload resultis correct, or produce a failure if the result is in-correct. If the error has been detected, itsearches for the assessment of the recoveryclauses to determine whether the error has beenrecovered or not. At the end of the analysis, theFTMs predicate graph (see Fig. 6) is fulfilled[24]. This diagram reflects the pathology offaults, it is to say, the evolution since faultsare injected until errors are detected and eventu-ally recovered. From the graph, the detectionand recovery coverages and latencies can be cal-culated.

3.4. Block diagram

Generally, the injection tool can be representedas the block diagram in Fig. 7. It can be seen how,from the interface with the user, the VHDL modelof the system under study and the VHDL simu-lator, the tool is able to give back to the user theresults of the analysis carried out in the experi-ment.This block diagram is very general. Fig. 8 shows

the detailed block diagram of the tool. It is com-

Fig. 6. FTMs predicate graph.

854 J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867

posed of a series of elements designed arounda commercial VHDL simulator, ModelSim byModel Technology [25].Next, the tool elements are described sum-

marily.

3.4.1. Tool configurationThe mission of this module is to configure the

tool, considering that the simulator is a part of it,setting both the tool and simulator parameters.

The most important aspect is to match the librariesused.

3.4.2. Syntactical and lexicographical analyserThe mission of this module is to scan all the

model files, in order to obtain its Syntactic Tree.Basically, this tree includes all the possible injec-tion targets of the model. The tree structure re-flects the hierarchical architecture of the model,including components, blocks and processes.Depending on the injection technique used, the

type of injection targets may vary:

• Simulator commands: The atomic signals andvariables of the system model, belonging to ei-ther structural or behavioural architectures.

• Saboteurs: The atomic signals belonging to astructural description of the system model.

• Mutants: The syntactical elements of the modelVHDL code in behavioural architectures.

3.4.3. VHDL injector libraryThis library holds a set of predefined VHDL

saboteur components that will be included in themodel when this technique is used. This library willcontain too the mutants generated from the orig-inal model.

3.4.4. Graphic interfaceThis utility is a wide set of window-based

menus, with which the user can specify all theparameters and conditions (related to either theinjection or the analysis) needed to perform aninjection experiment. With these parameters andconditions the injection configuration and theanalysis configuration files are generated.

3.4.5. Injection macro libraryThis library is composed of a set of predefined

macros used to activate the injection mechanismsaccording to the injection technique used. Themacros are designed from a subset of simulatorcommands.

3.4.6. Macro generatorThis module is actually an injection manager,

because it controls the injection process. Using the

Fig. 7. Block diagram of the fault injection tool.

Fig. 8. Detailed block diagram of the fault injection tool.

J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867 855

injection configuration generated by the graphicinterface, it

1. Creates a series of injection macros, in order toperform an error-free simulation and the num-ber of fault-injected simulations specified inthe parameters, and

2. Invokes the simulator to make it run the mac-ros, obtaining the simulation traces.

3.4.7. VHDL simulatorAs indicated before, we have used the com-

mercial VHDL simulator ModelSim, that providesa VHDL environment for IBM-PC (or compati-ble) under Microsoft WindowsTM. This is a simpleand easy-to-use event-driven simulator. When ac-tivated, the simulator executes the file with macrosand generates the output trace of the simulation.

3.4.8. Result analyserThis module takes as input the analysis config-

uration generated by the graphic interface. Ac-cording to those parameters, it compares thegolden run trace with all the fault-injected simu-lation ones, looking for any mismatches, and ex-tracting the analysis parameters specified.

4. Fault models

We have aimed to use a variety of fault modelsthat would represent real physical faults that occurin ICs. Although the most used models are stuck-at ð0; 1Þ (for permanent faults) and bit-flip (fortransient faults), as VLSI ICs integration densityrises, it becomes more necessary to introducenewer, more complex models. Table 3 summarisesthe fault models that we have applied in everyinjection technique.Fault models for simulator commands and

saboteur techniques have been deduced from thephysical causes and mechanisms implied in theoccurrence of faults, at technological and elec-tronic level [11,26]. They can be implemented usingthe VHDL multivalued types. The multivaluedtypes std_ulogic and std_logic from the IEEESTD_Logic_1164 package [10] allow the declara-

tion of signals and variables with a wide range ofvalues in the logic and RT levels: ‘U’ (not initia-lised), ‘X’ (indetermination), ‘0’ (logic 0), ‘1’ (logic1), ‘Z’ (high impedance), ‘W’ (weak indetermina-tion), ‘L’ (weak 0), ‘H’ (weak 1), ‘–’ (do not care).Std_logic type has also a resolution function tomanage the value of output signals connected inparallel.In relation to the fault models used in mutants

technique, we have applied changes in syntacticalelements of the language, at algorithmic level. Ascommented in Section 2.2.2, we have consideredthe models proposed in [16,17]. However, faultmodelling for algorithmic behavioural descriptionsis an open research field, and new fault modelsmust be studied and tested in the future.

5. Application of the injection tool

In this section, we will briefly describe an ex-ample of model system to be injected, and somesignificant experiments that show the capability ofthe tool.

5.1. Example of model system: a fault-tolerantmicrocomputer system

We have built the VHDL model of a fault-tol-erant microcomputer, whose block diagram isshown in Fig. 9. The system is duplex with coldstand-by sparing, parity detection and watchdog

Table 3

Fault models applied in injection techniques

Injection technique Transient Permanent

Simulator commands Stuck-at (0, 1) Stuck-at (0, 1)

Bit-flip Indetermination

Indetermination Open-line

Delay Delay

Saboteurs Stuck-at (0, 1) Stuck-at (0, 1)

Bit-flip Indetermination

Indetermination Open-line, delay

Delay short, bridging

stuck-open

Mutants Syntactical

changes

Syntactical

changes

856 J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867

timer. The structural architecture of the model iscomposed of the following components:

• Main and spare CPUs (CPUA and CPUB, re-spectively),

• RAM memory (MEM),• Output parallel port (PORTOUT),• Interrupt controller (SYSINT),• Clock generator (CLK),• Watchdog timer (WD),• Pulse generator (GENINT),• Two back-off cycle generators (TRGENA,TRGENB),

• Two AND gates (PAND2A, PAND2B).

Each component is modelled by a behaviouralarchitecture, usually with one or more concurrentprocesses. Both main and spare processors are anenhanced version of the MARK2 processor [27].Fig. 10 shows the process graph for the CPU, andthe function of each process.The description of the global microcomputer

system is around 1500 lines of VHDL code. The

code is divided into 10 entities, 11 architecturesand one package, excluding the STD and IEEElibraries.As mentioned before, several FTMs have been

added to increase system Dependability. The errordetection mechanisms include the parity check andprogram control flow check by a watchdog timer.The error recovery mechanisms include the intro-duction of a back-off cycle when parity error de-tection, checkpointing when errors are detected bythe watchdog timer, and starting the spare pro-cessor in case of permanent errors.If the processor detects a parity error, the back-

off generator asserts a signal for a fixed time thatmay be configured. The processor waits for thegiven time in case the cause of the error finishes.When this time expires, the processor re-executesthe last instruction. If the parity error persists, theback-off signal is generated for a permanent time.We have used a periodic interrupt (NINT2)

to cope with the program flow errors. Each timethe interrupt is received, the response routineresets the watchdog timer to avoid overflow.

Fig. 9. Block diagram of the microcomputer system.

J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867 857

Subsequently, the component GENINT is acti-vated to produce a new interrupt. These actionsare performed through the output parallel port.An error of program control flow during the

interrupt routine will produce an overflow ofthe watchdog timer. This situation will activate theprocessor interrupt signal (NINT1) in order torecover the system from a checkpoint previouslystored in memory.There is a second memory bank with a backup

copy of the recovery point and a variable that in-dicates the active bank. This way, the data integ-rity is ensured at any time.In case of two successive overflows of the

watchdog timer, the NHLT signal will be assertedto stop permanently the main CPU and start thespare processor. The spare processor restores therecovery checkpoint from the stable memory tocontinue with the main processor tasks.

5.2. Fault injection experiments

In the following sections, we describe some faultinjection campaigns that show the capabilities ofthe tool. Firstly, an example of error syndromeanalysis. Secondly, an example of FTS validation.Lastly, some experiments comparing the threefault injection techniques implemented.

5.2.1. Error syndrome analysisPreviously to the FTS validation, we present a

study of the error syndrome of the non-FTS: faultclassification, effective error classification, propa-gation latencies, and an analysis of the influence ofthe fault duration, the injection places and thefault distribution [28].The non-FTS [27] consist of an 8-bit mi-

crocomputer, a RAM memory, an input and anoutput parallel port, a parallel-serial/serial-parallelport (UART) and an interrupt controller.The parameters of the injection campaign are,

summarised:

• Number of faults: n ¼ 3000 single faults per in-jection experiment. This guarantees the statisti-cal validity of the results, as the confidenceintervals for parameters estimators have accept-able values.

• Injection place: uniformly defined in all theatomic signals and variables of the model.

• Fault types: stuck-at (‘0’, ‘1’) and open-line.• Injection instant: defined according to Uniform,Exponential and Weibull distributions in therange [0, tworkload], where tworkload is the executiontime of the workload.

• Workload: common workloads in fault injectionexperiments are well known numeric and sort-

Fig. 10. CPU process graph.

858 J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867

ing algorithms. In this case, the workload im-plemented is a simple program that obtainsthe arithmetic series of k integer numbers.

• Fault duration: 0:1T , 0:2T , 0:3T , 0:4T , 0:5T ,1:0T , 1:5T , 2:0T , where T is the CPU clockcycle.

For every injection experiment, the followingdata are collected automatically:

• Propagation latency: elapsed time from thefault injection until the error can be detectedin the external signals.

• Fault in signals or variables.• Fault type classification. Depending on the sig-nal where the fault is injected, there are differenttypes of faults: Latch, Buses, Memory control,Interrupt, Halt, Clk. Regarding variables, thereare two types of faults, depending on whetherthe fault is injected in the processor accumula-tor or in the memory.

• Type of error: effective (‘‘Control flow’’, ‘‘Data’’,‘‘Other’’) or non effective (latent, overwritten).

This terminology is shortly explained below:Effective errors: they are propagated to any of

the external signals and they change the systemstate, affecting the workload result:

• Control flow: they produce a change in the in-struction flow that is not possible under correctoperation of the program. They are usuallycaused by errors in the address bus during the

fetch cycle, an error in the instruction code it-self, or an error in the address bus during theexecution of the unconditional jump instruc-tion.

• Data: they cause an access to a wrong memorylocation during a memory read or write opera-tion, or they may write in memory an erroneousdata during a write operation.

• Other: errors that cannot be classified in any ofthe previous types. For example, faults in condi-tional jumps, that may produce a change in theexecution path to an alternate path that is validanyway. There are also some faults that mayproduce an unexpected halt of the processor.

Non effective errors: they do not change thesystem state, so the workload result is correct.They are classified in overwritten or latent:

• Overwritten errors: the system state is not cor-rupted. Therefore, the error is not detected be-cause a different action has masked oroverwritten the faulty information.

• Latent errors: they cannot be considered neithereffective nor overwritten. The fault remains dor-mant somewhere in the system that is not beingused at the moment. They can eventually be-come effective.

5.2.1.1. Results. Figs. 11–13 show some significantresults of an error syndrome analysis.In the light of these results, and taking into ac-

count the greater presence of ‘‘Control flow’’ errors

Fig. 11. Percentage of effective errors in function of the error type: (a) in signals and (b) in variables.

J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867 859

in signals and their small latency, the introductionof detection mechanisms should be guided towardsthe error detection by program flow control, usingwatchdog timers or watchdog processors. On theother hand, this methods are usually completedwith other mechanisms which can generate aninterruption when an error is detected, suchas: ‘‘address error’’, ‘‘bus error’’, ‘‘illegal opcode’’,

‘‘privilege violation’’, etc. Parity detection can beincluded in memory or registers to locate ‘‘Data’’errors [29].

5.2.2. FTS ValidationThe FTS corresponds to the model explained in

Section 5.1. Some detection and recovery mecha-

Fig. 12. Mean latency in function of the error type: (a) in signals and (b) in variables.

Fig. 13. (a) Sensitivity of the computer zones to transient faults and (b) mean latency in function of the injection place.

860 J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867

nisms have been added to the non-FTS of Section5.2.1.The purpose of the injection campaign is to

study the response of the fault-tolerant micro-computer system in presence of transient andpermanent faults [30]. Briefly, the injection condi-tions of the campaign are:

1. Injection technique: Simulator commands.2. Number of faults: n ¼ 3000 single faults per in-jection experiment. This guarantees the statisti-cal validity of the results.

3. Workload: Arithmetic series of k integer num-bers.

4. Fault types:• Transient: stuck-at (0, 1), bit-flip, indetermi-nation and delay.

• Permanent: stuck-at (0, 1), indetermination,delay and open-line (high impedance).

5. Injection place: Any atomic signal and variableof the model. Faults are not injected in thespare CPU: since it is off while the system isworking properly.

6. Injection instant: Defined according to a Uni-form distribution function in the range [0,tworkload], where tworkload is the execution time ofthe workload.

7. Simulation duration: The simulation durationincludes the execution time of the workloadand the recovery time with the spare CPU(tsimul ¼ tworkload þ tspare).

8. Fault duration: It has been intended to inject‘‘short’’ faults, with a duration equal to a frac-tion of the clock cycle (the most common ones,as described in [31]), as well as longer faults,which will ensure to excess the propagation ofthe errors to the detection signals. Three caseshave been considered:(a) transient faults with a duration generated

randomly in the range [0:1T –10:0T ], whereT is the CPU clock cycle,

(b) transient faults with a fixed duration equalto 100T , and

(c) permanent faults.9. Analysis results: From the sample data, the fol-lowing parameters are obtained (see FTMspredicate graph, in Fig. 6):• Percentage of activated errors

PA ¼ NActivatedNInjected

¼ NActivatedn

where NActivated is the number of activated er-rors. We define an activated error as an errororiginated by an activated fault (a fault thatproduces a change in any model signal orvariable), which is propagated to the propa-gation signals. In our case, propagation sig-nals are those of the external structuralarchitecture of the system.

• Error detection coverage. We have definedtwo types of coverage estimators:(1) Coverage of the detection mechanisms

CdðmechanismsÞ ¼NDetectedNActivated

where NDetected is the number of errors de-tected by the detection mechanisms.(2) Global system detection coverage

CdðsystemÞ ¼NDetected þ Nnon-effective

NActivated

where Nnon-effective is the number of non-effec-tive errors (as commented in Section 3.3). Anon-effective error does not affect the result ofthe running application (workload). It can beproduced when the fault information isoverwritten by the normal execution, or be-cause the faulty data remains dormant in anunused part of the system. In the later case,the fault may eventually become effective. Thenon-effectiveness of errors is related to theintrinsic redundancy of the system. This isthe reason we define a more global coveragecalled system coverage.

• Recovery coverage. Divided also in two typesof coverage estimators:(1) Coverage of the recovery mechanisms

CrðmechanismsÞ ¼NDetected recovered

NActivated

where NDetected recovered is the number of errorsdetected and recovered by detection and re-covery mechanisms.(2) Global system recovery coverage

CrðsystemÞ ¼NDetected recovered þ Nnon-effective

NActivated

J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867 861

where Nnon-effective has the same meaning as inCdðsystemÞ.

• Propagation, detection and recovery laten-cies:(a) Propagation latency (dormancy [1]):

Lp ¼ tp � tinj, where tp is the instant the er-ror reaches the propagation signals, thatis, the instant when the error is activated.tinj is the injection instant.

(b) Detection latency: Ld ¼ td � tp, where td isthe instant the activated error is detectedby detection mechanisms.

(c) Recovery latency: Lr ¼ tr � td, where tr isthe instant the detected error is recoveredby recovery mechanisms.

5.2.2.1. Results. The results have been groupedaccording to two aspects of the FTS validation:the influence of the fault duration and the contri-bution of the different detection and recoverymechanisms.Figs. 14–16 show the FTMs predicate graphs

for each injection duration used in the injectioncampaign. They give us an exhaustive informationabout the fault and error pathology.

Table 4 shows the percentage of activated er-rors, the coverages, and the average latencies infunction of the fault duration. It can be observedthat, as fault duration decreases:

• PA decreases. Short duration faults have less in-fluence in the system operation.

• CdðmechanismsÞ decreases. Short duration faults aremore difficult to detect and recover. However,CdðsystemÞ follows the opposite behaviour. Thisis due to the raise in the percentage of non-effec-tive errors. For short transient faults, the valueis quite high (96% approx.).

• Cr follows the same behaviour as Cd fordurations ¼ Uniform [0:1T–10:0T ] and 100T ,with slightly smaller percentages. This meansthat the recovery mechanisms are working well,and almost any detected error is recovered. ButCr decreases notably for permanent faults (18%approx. for mechanisms and 60% approx. forglobal system). The reason could be that a por-tion of the permanent faults affects the sparesystem dependability (even though the faultsare not originated in such a CPU) and it pre-cludes the system recovery.

Fig. 14. FTMs predicate graph (fault duration in the range [0:1T–10:0T ]).

Fig. 15. FTMs predicate graph (fault duration 100T ).

862 J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867

• Lp < Ld � Lr. The checkpoint recovery greatlycontributes to the total latency. On the otherhand, the values of average latencies do notseem to have a clear dependency on the faultduration.

Tables 5 and 6 show the detached contributionof the different detection and recovery mechanismsto coverage and latency results. The tool can cal-culate the coverages and latencies for each indi-vidual mechanism. To do this, in the set-up phaseis necessary to specify the error detection and re-covery clauses for each mechanism.Table 5 shows that:

• For fault durations in the range [0:1T–10T ], %Parity > % WDT. The most effective detectionmechanism is Parity. For duration ¼ 100T andpermanent faults, it is WDT.

• LdðParityÞ � LdðWDTÞ. Parity presents a quitelower latency than WDT for all fault durations.

Table 6 shows that:

• For fault durations in the range [0:1T –10:0T ], %Spare > % Back-off � Checkpoint.

Fig. 16. FTMs predicate graph (permanent faults).

Table 5

Percentages of detected errors and average detection latency related to detection mechanisms

Detection mechanism % Detected errors Average latency (ns)

[0:1T–10:0T ] 100T Permanent [0:1T–10:0T ] 100T Permanent

Parity 61.35 38.85 42.42 3021 6062 6210

WDT 38.65 61.15 57.58 76779 70165 66255

Table 6

Percentage of recovered errors and average recovery latency related to recovery mechanisms

Recovery mechanism % Recovered errors Average latency (ns)

[0:1T–10:0T ] 100T Permanent [0:1T–10:0T ] 100T Permanent

Back-off 22.81 4.26 6.39 13916 81333 54810

Checkpoint 5.85 44.32 1.83 35932 46577 25194

Spare 71.34 51.42 91.78 146667 151244 128022

Table 4

Percentage of activated errors, coverages and latencies related

to the fault duration

Dependability

measures

Duration

[0:1T–10:0T ] 100T Permanent

PA (%) 23.47 33.40 39.77

CdðmechÞ (%) 29.40 43.40 47.02

CdðsvsÞ (%) 96.31 91.52 88.52

CrðmechÞ (%) 24.29 35.13 18.36

CrðsvsÞ (%) 91.19 83.23 59.85

Lp (ns) 979 8811 7770

Ld (ns) 31527 45261 40781

Lr (ns) 109915 101879 121464

J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867 863

• For fault duration equal to 100T , % Spare > %Checkpoint� % Back-off. So, for short andlong transient faults, the Spare is the most acti-vated recovery mechanism.

• Permanent faults provoke a higher percentageof permanent errors, which activate the spareCPU.

In the range [0:1T –10:0T ], LrðBack-offÞ <LrðCheckpointÞ � LrðSpareÞ. For 100T and per-manent faults, LrðCheckpointÞ < LrðBack-offÞ <LrðSpareÞ.The information presented here can be used to

improve the design of detection and recoverymechanisms and obtaining more optimal values ofthe latencies and coverages.

5.2.3. Comparison of different VHDL-based faultinjection techniquesWe are now working in the enhancement of the

tool, including new injection techniques, such assaboteurs and mutants. We present some results ofinjection into our fault-tolerant microcomputersystem, showing a comparison of the values oflatencies and coverages obtained with the threetechniques: simulator commands, saboteurs andmutants.Fig. 17 shows the detection and recovery cov-

erages in function of the fault duration (transient/permanent faults), the workload (arithmetic series/bubblesort) and the fault injection technique.From the figure, we can obtain some generalconclusions [32]:

Fig. 17. (a) Detection coverage related to the injection technique, fault duration and workload and (b) recovery coverage related to the

injection technique, fault duration and workload.

864 J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867

• In transient faults, the values of detection andrecovery coverages exhibit little differences. Thisallows us to conclude that coverages with tran-sient faults can be obtained quite accuratelywith any of the three techniques. This also en-ables to work with models of different abstrac-tion levels.

• In permanent faults, mutants technique showssome discrepancy respect to the other tech-niques in the values of detection and recoverycoverages. This reinforces our idea that it is nec-essary to improve the fault models for mutants.

• Respect to implementation cost, simulator com-mands technique is the easiest to implement.Saboteurs andmutants have a higher elaborationand application cost, because of the recompila-tion and simulation times. Saboteurs techniqueenlarges the size of trace files due to the increaseof control signals in the model. Mutants tech-nique increases the simulation time due to thenecessity of saving and restoring the systemstate when switching architectures.

• About the relationship between the type ofmodel and the injection techniques, some con-siderations can be made. If logic or RT levelsare used (in structural models), it is convenientto use simulator commands and saboteurs tech-niques. Although saboteurs technique is moredifficult to implement, it has a larger faultmodel capability. At algorithmic level (behavio-ural models, normal situation in the first phasesof the design), both simulator commands andmutants techniques can be used. Nevertheless,mutants also have a larger fault modelling capa-bility, because they can use all the syntactic andsemantic capabilities of the VHDL language.

• In any case, the combined use of the three tech-niques seems to provide a powerful method tovalidate models with different abstraction levels.

6. Summary

In this work, a VHDL-based fault injection toolto run on PC platforms is shown, describing itsmain components and overall performance. Thetool is easy to use, versatile, and appropriate tomedium-complexity system models.

We have verified the usefulness of the toolcarrying out a number of injection experimentsusing various injection parameters, and perform-ing different types of analysis.Specifically, we present an injection campaign

into a VHDL model of a 16-bit fault-tolerant mi-crocomputer system running a workload. We haveinjected transient and permanent faults (in groupsof 3000 per experiment) on the model signals andvariables, using the simulator commands tech-nique. The objectives have been to study the pa-thology of the propagated errors, measure theirlatency and calculate the detection and recoverycoverages.We also present an injection campaign for the

validation of the fault-tolerant microcomputersystems comparing three VHDL-based fault in-jection techniques: simulator commands, saboteursand mutants.In these injection campaigns we have presented

enhanced fault models exploiting the program-ming capabilities of the VHDL.The results obtained in the injection campaigns

can be used to improve the design of detection andrecovery mechanisms to optimise the values ofcoverage and latency.In the future, we will focus this research on

some injection tool enhancements, such as to im-prove the automation of saboteurs and mutantstechniques and to reduce the simulation time inmutants technique. Also, it is important to studyfault modelling in depth, specially in mutantstechnique. In the context of fault modelling, itwould be also interesting to relate fault types ofdifferent fault injection techniques (pin-levelHWIFI, SWIFI, etc.) with VHDL-based fault in-jection. Lastly, another open problem is to studythe influence of the workload on Dependabilitymeasures.

References

[1] J.C. Laprie, Dependability, Basic Concepts and Terminol-

ogy, Springer-Verlag, New York, 1992.

[2] J. Arlat, Validation de la Suuret�ee de Fonctionnement par

Injection de Fautes. M�eethode-Mise en Oeuvre-Application,

Th�eese, Institut National Polytechnique de Toulouse, LAASReserche Report No. 90-399, Laboratoire d’Analyse et

J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867 865

d’Architecture des Syst�eemes du CNRS, Toulouse (France),

December 1990.

[3] P.J. Gil, Sistema Tolerante a Fallos con Procesador de

Guardia: Validaci�oon mediante Inyecci�oon F�ıısica de Fallos,

Tesis Doctoral, Departamento de Ingenier�ııa de Sistemas,

Computadores y Autom�aatica (DISCA), Universidad Po-

lit�eecnica de Valencia (Spain), September 1992.[4] M. Sueh, T. Tsai, R.K. lyer, Fault Injection Techniques

and Tools, IEEE Computer, April 1997, pp. 75–82.

[5] J.A. Clark, D.K. Pradhan, Fault injection: a method for

validating computer-system dependability, IEEE Com-

puter 28 (6) (1995) 47–56.

[6] E. Jenn, Sur la validation des syst�eemes tol�eerant les fautes:

injection de fautes dans des mod�eeles de simulation VHDL,Th�eese, LAAS Reserche Report No. 94-361, Laboratoire

d’Analyse et d’Architecture des Syst�eemes du CNRS,

Toulouse (France), 1994.

[7] D.K. Pradhan, Fault-Tolerant Computer System Design,

ISBN: 0-13-057887-8, Prentice-Hall, Englewood Cliffs, NJ,

1996.

[8] R.J. Mart�ıınez, P.J. Gil, G. Mart�ıın, C. P�eerez, J.J. Serrano,

Experimental validation of high-speed fault-tolerant sys-

tems using physical fault injection, in: Proceedings of the

Dependable Computing for Critical Applications 7

(DCCA-7), vol. 12, San Jose (USA), January 1999, pp.

249–265.

[9] J.C. Campelo, Dise~nno y validaci�oon de nodos de proceso

tolerantes a fallos de sistemas industriales distribuidos,

Tesis Doctoral, Departamento de Inform�aatica de Sistemasy Computadores (DISCA), Universidad Polit�eecnica de

Valencia (Spain), 1999.

[10] IEEE Standard VHDL Language Reference Manual,

IEEE Std 1076-1993.

[11] D. Gil, Validaci�oon de Sistemas Tolerantes a Fallos

mediante inyecci�oon de fallos en modelos VHDL, Tesis

Doctoral, Departamento de Inform�aatica de Sistemas yComputadores (DISCA), Universidad Polit�eecnica de Va-

lencia (Spain), 1999.

[12] R.K. lyer, D. Rosseti, A measurement-based model for

workload dependence of CPU errors, IEEE Transactions

on Computers C35 (1986) 511–519.

[13] D. Gil, J. Gracia, J.C. Baraza, P.J. Gil, A study of the

effects of transient fault injection into the VHDL model of

a fault-tolerant microcomputer system, in: Proceedings of

the 6th IEEE International On-Line Testing Workshop

(IOLTW 2000), Palma de Mallorca (Spain), July 2000, pp.

73–79.

[14] T.A. DeLong, B.W. Johnson, J.A. Profeta III, A fault

injection technique for VHDL behavioral-level models,

IEEE Design and Test of Computers 13 (4) (1996) 24–33.

[15] V. Sieh, O. Tsch€aache, F. Balbach, VERIFY: evaluation ofreliability using VHDL-models with embedded fault de-

scriptions, in: Proceedings of the 27th International Sym-

posium on Fault-Tolerant Computing (FTCS-27), Seattle,

Washington (USA), June 1997, pp. 32–36.

[16] J.R. Armstrong, F.-S. Lam, P.C. Ward, Test generation

and fault simulation for behavioural models, in: J.M.

Schoen (Ed.), Performance and Fault Modelling with

VHDL, Prentice Hall, Englewood Cliffs, 1992, pp. 240–

303.

[17] S. Ghosh, T.J. Chakraborty, On behavior fault modeling

for digital design, Journal of Electronic Testing: Theory

and Applications 2 (1991) 135–151.

[18] D. Gil, J.V. Busquets, J.C. Baraza, P. Gil, Using VHDL in

the techniques of fault injection based on simulation, in:

Proceedings XIII Design of Circuits and Integrated

Systems Conference (DCIS-98), Madrid (Spain), Novem-

ber 1998, pp. 174–180.

[19] E. Jenn, J. Arlat, M. Rimen, J. Ohlsson, J. Karlsson, Fault

injection into VHDL models: the MEFISTO tool, in:

Proceedings of the 24th International Symposium on

Fault-Tolerant Computing (FTCS-24), Austin, Texas

(USA), June 1994, pp. 66–75.

[20] P.J. Ashenden, The VHDL CookBook, Technical Report.

University of Adelaide, South Australia, 1992.

[21] J. Bou�ee, P. P�eetillon, Y. Crouzet, MEFISTO-L: A VHDL-

based fault injection tool for the experimental assessment

of fault tolerance, in: Proceedings of the 28th International

Symposium on Fault-Tolerant Computing (FTCS-28),

Munich (Germany), June 1998, pp. 168–173.

[22] M. Rim�een, J. Ohlsson, S. Svensson, MEFISTO: Multilevel

Error and Fault Injection Simulation Tool. User’s Manual,

Chalmers University of Technology, Gothenburg (Swe-

den), 1997.

[23] P. Folkesson, S. Svensson, J. Karlsson, A comparison of

simulation based and scan-chain implemented fault injec-

tion, in: 28th International Symposium on Fault-Tolerant

Computing (FTCS-28), 1998.

[24] J. Arlat, A. Costes, Y. Crouzet, J.C. Laprie, D. Powell,

Fault injection and dependability evaluation of fault-

tolerant systems, IEEE Transactions on Computers 42 (8)

(1993) 913–923.

[25] Model Technology, ModelSim EE/PLUS Reference Man-

ual, 1998.

[26] E.A. Amerasekera, F.N. Najm, Failure Mechanisms in

Semiconductor Devices, John Wiley & Sons, New York,

1997.

[27] J.R. Armstrong, Chip-Level Modelling with VHDL,

Prentice Hall, Englewood Cliffs, NJ, 1989.

[28] D. Gil, J.C. Baraza, J.V. Busquets, P.J. Gil, Fault injection

into VHDL models: analysis of the error syndrome of a

microcomputer system, in: Proceedings of the 24th Euro-

micro Conference (EUROMICRO 98), vol. 1, V€aaster�aas

(Sweden), August 1998, pp. 418–424.

[29] J. Ohlsson, M. Rim�een, U. Gunneflo, A study of the effectof transient fault injection into a 32-bit RISC with built-in

watchdog, in: Proceedings of the 22nd International

Symposium on Fault Tolerant Computing (FTCS-22),

Boston, USA, 1992, pp. 316–325.

[30] J.C. Baraza, J. Gracia, D. Gil, P.J. Gil, A prototype of a

VHDL-based fault injection tool, in: Proceedings of the

2000 IEEE International Symposium on Defect and Fault

Tolerance in VLSI Systems (DFT2000), Yamanashi

(Japan), October 2000, pp. 396–404.

866 J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867

[31] H. Cha, E.M. Rudnick, G.S. Choi, J.H. Patel, R.K.

Iyer, A fast and accurate gate-level transient fault

simulation environment, in: Proceedings of the 23rd

International Symposium on Fault-Tolerant Comput-

ing (FTCS-23), Toulouse-(France), June 1993, pp. 310–

319.

[32] J. Gracia, J.C. Baraza, D. Gil, P.J. Gil, A study of the

experimental validation of fault-tolerant systems using dif-

ferent VHDL-based fault injection techniques, in: Proceed-

ings of the 7th IEEE International On-Line Testing

Workshop (IOLTW 2001), Taormina (Italy), July 2001,

p. 140.

J.C. Baraza et al. / Journal of Systems Architecture 47 (2002) 847–867 867