Qualitative event-based diagnosis applied to a spacecraft electrical power distribution system

Qualitative event-based diagnosis applied to a spacecraft electricalpower distribution system

Matthew J. Daigle a,1, Indranil Roychoudhury b,1, Anibal Bregon c,n,2

a NASA Ames Research Center, Moffett Field, CA 94035, USAb Stinger Ghaffarian Technologies, Inc., NASA Ames Research Center, Moffett Field, CA 94035, USAc Department of Computer Science, University of Valladolid, Valladolid 47011, Spain

a r t i c l e i n f o

Article history:Received 13 May 2014Accepted 27 January 2015

Keywords:Fault diagnosisModel-based diagnosisStructural model decompositionElectrical power systemsADAPT

a b s t r a c t

Quick, robust fault diagnosis is critical to ensuring safe operation of complex engineering systems. A faultdetection, isolation, and identification framework is developed for three separate diagnosis algorithms:the first using global model; the second using minimal submodels, which allows the approach to scaleeasily; and the third using both the global model and minimal submodels, combining the strengths ofthe first two. The diagnosis framework is applied to the Advanced Diagnostics and Prognostics Testbedthat functionally represents spacecraft electrical power distribution systems. The practical implementa-tion of these algorithms is described, and their diagnosis performance using real data is compared.

& 2015 Elsevier Ltd. All rights reserved.

1. Introduction

Fault diagnosis plays an essential role in ensuring system safetyin many application domains, from industrial power plants toaerospace vehicles. When a fault occurs in a system, diagnosissoftware must be able to quickly detect the presence of the fault,isolate the true fault among many potential fault candidates, andidentify the fault magnitude (Chen & Patton, 1999; Gertler, 1998;Isermann, 1997; Patton, Frank, & Clark, 2000). With this informa-tion, automated mitigation and recovery actions can be taken.Proper recovery actions enable successful continued operation andprevention of catastrophic consequences, both of which lead tocost savings (Goupil, 2010, 2011).

In this paper, a model-based diagnosis approach for theAdvanced Diagnostics and Prognostics Testbed (ADAPT), an elec-trical power distribution system that is representative of those onspacecrafts, is developed. ADAPT serves as a testbed through whichfaults can be injected to evaluate diagnostic and prognostic algo-rithms (Poll et al., 2007a). Located at NASA Ames Research Center,ADAPT has been established as a diagnostic benchmark systemthrough the industrial track of the International Diagnostic Com-petition (DXC) (Kurtoglu et al., 2009; Sweet, Feldman, Narasimhan,

Daigle, & Poll, 2013). Within the DXC, specific diagnostic problemsare defined for ADAPT, and competing algorithms are evaluatedusing real experimental data obtained from the ADAPT hardware.Diagnostic algorithms must deal with a variety of real-world issuesin order to be successful. In particular, this paper is focused ondiagnosing faults on a subset of ADAPT, called the ADAPT-Lite. Theapplication context is that of an unmanned aircraft system, and thediagnosis must be used to provide mission abort/continue com-mands (Kurtoglu et al., 2009). In order to do this, faults must becorrectly detected (i.e., determine if a fault is present in thesystem), isolated (i.e., determine which fault has occurred), andidentified (i.e., estimate the parameters that define the faultbehavior), under the single fault assumption. Although solutionsin this work are specifically developed for ADAPT, the approach ismodel-based and therefore can be applied to different systemsgiven suitable models.

The model-based diagnosis approach developed in this work isrooted in a qualitative fault isolation framework that is based onthe analysis of residual signals, where residuals are computed asthe difference between observed and predicted system variables(Mosterman & Biswas, 1999). Faults in the system are modeled aschanges in the value of the system parameters (Mosterman &Biswas, 1999) and as changes in component modes (Daigle,Koutsoukos, & Biswas, 2009). Faults cause discrepancies inobserved behavior and model-predicted behavior, and thus man-ifest as deviations in the residual signals. Fault detection involvesstatistical testing of the residuals. The transients of residualdeviations are abstracted qualitatively and compared to predictedfault transients to enable quick fault isolation. Both the qualitativechange in the residual signal, expressed as þ and � values in

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/conengprac

Control Engineering Practice

http://dx.doi.org/10.1016/j.conengprac.2015.01.0070967-0661/& 2015 Elsevier Ltd. All rights reserved.

n Corresponding author.E-mail addresses: [email protected] (M.J. Daigle),

[email protected] (I. Roychoudhury), [email protected] (A. Bregon).1 M. Daigle and I. Roychoudhury's work has been partially supported by the NASA

System-wide Safety and Assurance Technologies (SSAT) project.2 A. Bregon's funding for this work has been provided by the Spanish MINECO

DPI2013-45414-R grant.

Please cite this article as: Daigle, M. J., et al. Qualitative event-based diagnosis applied to a spacecraft electrical powerdistribution system. Control Engineering Practice (2015), http://dx.doi.org/10.1016/j.conengprac.2015.01.007i

Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎

www.sciencedirect.com/science/journal/09670661

www.elsevier.com/locate/conengprac

http://dx.doi.org/10.1016/j.conengprac.2015.01.007



http://crossmark.crossref.org/dialog/?doi=10.1016/j.conengprac.2015.01.007&domain=pdf



mailto:[email protected]







https://www.researchgate.net/publication/223167107_Oscillatory_Failure_Case_Detection_In_The_A380_Electrical_Flight_Control_System_By_Analytical_Redundancy?el=1_x_8&enrichId=rgreq-414e8d7f-28c9-443b-8aac-986254993b20&enrichSource=Y292ZXJQYWdlOzI3Mjk4MTk0OTtBUzoyMDI0OTIwMDU2MjE3NjNAMTQyNTI4OTI2NDU5OQ==

https://www.researchgate.net/publication/237606787_First_International_Diagnosis_Competition_-_DXC'09?el=1_x_8&enrichId=rgreq-414e8d7f-28c9-443b-8aac-986254993b20&enrichSource=Y292ZXJQYWdlOzI3Mjk4MTk0OTtBUzoyMDI0OTIwMDU2MjE3NjNAMTQyNTI4OTI2NDU5OQ==

https://www.researchgate.net/publication/237606787_First_International_Diagnosis_Competition_-_DXC'09?el=1_x_8&enrichId=rgreq-414e8d7f-28c9-443b-8aac-986254993b20&enrichSource=Y292ZXJQYWdlOzI3Mjk4MTk0OTtBUzoyMDI0OTIwMDU2MjE3NjNAMTQyNTI4OTI2NDU5OQ==

magnitude and slope, and the temporal ordering of these transi-ents as they manifest in the residuals, are used as diagnosticinformation, establishing an event-based qualitative fault isolationframework (Daigle et al., 2009).

Predicted values of system outputs are computed by using modelsof the system, which can be either a global model of the system orlocal submodels. Structural model decomposition methods can beused to systematically compute the submodels. The use of localsubmodels leads to increased scalability of the diagnosis algorithm(Bregon et al., 2014; Thompson, 1994) and increased diagnosability.They have been successfully used for fault diagnosis in industrial(Arogeti, Wang, Low, & Yu, 2012) and aerospace applications(Fravolini & Campa, 2009). The main idea is to take advantage ofthe analytical redundancy provided by the sensors and the model toderive minimal submodels that provide additional informationuseful for diagnosis. In particular, this paper uses a structural modeldecomposition approach based upon Possible Conflicts (PCs) (Pulido& Alonso-González, 2004), which is a structural model decomposi-tion technique equivalent to Analytical Redundancy Relations (ARRs)(Arogeti et al., 2012, 2010). PCs are computed off-line as the minimalsubsets of the global model constraints that produce inconsistencieswhen faults occur. Residuals may be computed using PCs, andresidual deviations analyzed following the qualitative fault isolationframework (Bregon, Biswas, & Pulido, 2012; Daigle et al., 2009;Mosterman & Biswas, 1999). Then, quantitative fault identificationcan be carried out by using minimal local submodels for parameterestimation (Bregon et al., 2012).

The contributions of this work are as follows. First, a novelmodel-based diagnosis framework is developed that addressesfault detection, isolation, and identification. It combines techniquesfrom qualitative fault isolation and structural model decomposi-tion. Specifically, structural model decomposition is used as anunderlying technique to automatically determine the sets of sub-models for each diagnosis task. From this framework, severaldiagnoser designs can be derived using different sets of modelsand submodels. It is shown that two previous algorithms, QED(Qualitative Event-based Diagnosis) and QED-PC (QED with Possi-ble Conflicts) (Daigle, Bregon, & Roychoudhury, 2012) can beformulated as specific instantiations of this framework, whereQED uses a global system model, and QED-PC uses minimal localsubmodels. A new algorithm, QED-PCþþ , that uses both theglobal model and the minimal local submodels, is formulated,and it is shown how it combines the strengths of QED and QED-PC.The three algorithms are implemented as diagnostic solutions forthe ADAPT case study, which includes the development of modelsof ADAPT suitable for diagnosis, the integration of heuristic faultisolation rules to improve fault isolation performance, and novelfault identification techniques. Using a large, comprehensive set ofexperimental data from the ADAPT hardware, the three algorithmsare applied and their performance is compared. By analyzing theset of experimental results, their limitations are discovered, andpossible future improvements and extensions are suggested.

The paper is organized as follows. Section 2 describes theADAPT case study. Section 3 overviews the diagnosis approach.Section 4 provides the system model and describes the structuralmodel decomposition approach. Section 5 describes fault detectionand Section 6 describes symbol generation. Section 7 discussesfault isolation, and Section 8 describes fault identification. Section9 covers decision-making. Section 10 presents the experimentalresults and discusses lessons learned. Section 11 describes relatedwork, and Section 12 concludes the paper.

2. The Advanced Diagnostics and Prognostics Testbed

The Advanced Diagnostics and Prognostics Testbed is an electricalpower distribution system that is representative of those on space-craft, and has been established as a diagnostic benchmark systemthrough the International Diagnostic competition (Poll et al., 2007b).As mentioned in the previous section, this work focuses on a subset ofADAPT, called ADAPT-Lite, which has been used to define DiagnosticProblem I of the industrial track of the DXC (Kurtoglu et al., 2009;Sweet et al., 2013), in which the ADAPT-Lite hardware is used toemulate the operation of an electrical power system aboard anUnmanned Aircraft System (UAS).

A system schematic for ADAPT-Lite is given in Fig. 1. A battery(BAT2) supplies electrical power to several loads, transmittedthrough several circuit breakers (CB236, CB262, CB266, andCB280), relays (EY244, EY260, EY281, EY272, and EY275), and aninverter (INV2) that converts dc to ac power. ADAPT-Lite has one dcload (DC485) and two ac loads (AC483 and FAN416). There aresensors throughout the system to report electrical voltage (namesbeginning with “E”), electrical current (“IT”), and the positions ofrelays and circuit breakers (“ESH” and “ISH”, respectively). There isone sensor to report the operating state of a load (fan speed, ST516)and another to report the battery temperature (TE228).

A diagnostic algorithm is used to inform the operator if faultshave occurred, and if so, whether the fault requires aborting themission and landing the UAS. Given the time-stamped vectors ofsystem inputs uðtÞ and outputs yðtÞ, the goal of the diagnosisalgorithm is to detect the fault, isolate the faulty component and itsfault mode, identify the fault magnitude, and then generate anabort command if necessary by t¼4 min into the mission. Thenecessity of an abort depends on the fault type, and, in some cases,on the fault magnitude, thus, this diagnostic problem requires thediagnostic algorithm to perform not only fault detection andisolation, but also fault identification.

Table 1 summarizes the abort recommendation for each faultmode in ADAPT-Lite. A command to abort should be given for anyfault that results in a loss of power to the three loads, i.e., faults inany of the circuit breakers or relays, a failure in the inverter, andfailures in the loads themselves. An overspeed fault of the fanresults in an abort, but an underspeed fault does not. For aresistance change in the dc and ac loads, an offset (i.e., bias) fault

Fig. 1. ADAPT-Lite schematic.

M.J. Daigle et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎2





https://www.researchgate.net/publication/224407235_A_Qualitative_Event-Based_Approach_to_Continuous_Systems_Diagnosis?el=1_x_8&enrichId=rgreq-414e8d7f-28c9-443b-8aac-986254993b20&enrichSource=Y292ZXJQYWdlOzI3Mjk4MTk0OTtBUzoyMDI0OTIwMDU2MjE3NjNAMTQyNTI4OTI2NDU5OQ==

https://www.researchgate.net/publication/245157948_Parallel_processing_architectures_for_aerospace_applications?el=1_x_8&enrichId=rgreq-414e8d7f-28c9-443b-8aac-986254993b20&enrichSource=Y292ZXJQYWdlOzI3Mjk4MTk0OTtBUzoyMDI0OTIwMDU2MjE3NjNAMTQyNTI4OTI2NDU5OQ==

triggers an abort if its magnitude is outside an acceptable range, adrift fault triggers an abort if the fault magnitude will be too largeby the end of the mission, and an intermittent resistance offsetfault triggers an abort if the average value of the resistance (overboth nominal and faulty time periods) is outside an acceptablerange. For the sensors deemed critical, IT260, IT267, IT281, andST516, the mission must be aborted whenever the sensor is stuck,or if the offset, drift, or average intermittent value is outside anacceptable range by the end of the mission. For the remaining(noncritical) sensors, an abort is never necessary.

3. Diagnosis approach

The proposed diagnosis architecture is shown in Fig. 2. Thesystem receives inputs uðtÞ (for ADAPT-Lite, circuit breaker andrelay commands) and produces outputs yðtÞ (current measure-ments, voltage measurements, etc.). The set of system models(Section 4), consisting of a global model and/or a set of submodels,given inputs uðtÞ, computes predicted values yðtÞ. The fault detec-tion module (Section 5) compares yðtÞ and yðtÞ and determineswhether the differences indicate a fault. The symbol generationmodule (Section 6) transforms deviations from expected behaviorinto a symbolic representation, where a symbol is denoted by σ.The fault isolation module (Section 7) uses the sequence of thesesymbols to isolate faults F, by comparing to predicted symbolsequences for the faults. Each fault f AF is associated with acomponent, a fault mode, and a set of fault parameters. The faultidentification module (Section 8) estimates, for each fault f AF , the

values of the fault parameters, which also helps to further isolatefaults. The identified fault set is output as Fid and provided to thedecision maker. The decision making module (Section 9) deter-mines the command c to take (a possible abort command), basedon the given diagnosis.

All the algorithms presented in this work (QED, QED-PC, andQED-PCþþ), which will be described in the following sections, arecaptured within this single diagnosis framework and implementedfollowing this same general architecture. The main difference liesin the underlying models used for the diagnosis tasks: QED isbased on a global system model of the system; QED-PC is based onminimal submodels computed from the global system model; andQED-PCþþ is based on the combination of the global model andthe minimal submodels. In the following sections, each piece of thediagnosis architecture is described in detail.

4. System modeling

The proposed diagnosis approach is model-based, requiring amodel describing both nominal and faulty behavior to be used forprediction of nominal values, and for fault detection, isolation, andidentification. The most parsimonious models are used to solve thediagnosis task. Because fault identification is required, the modelsmust ultimately be quantitative.

In this work, the modeling framework described in Roychoudhury,Daigle, Bregon, and Pulido (2013) is used. Its main details are summa-rized here for completeness. First, a model is defined.

Table 1Components of ADAPT-Lite, their failure modes, and abort recommendations by failure mode.

Components Failure mode Recommendation

AC483, DC485 Failed off AbortResistance offset ConditionalResistance drift ConditionalIntermittent resistance offset Conditional

CB236, CB262, CB266, CB280 Failed open Abort

E240, E242, E265, E281, TE228 Offset NoneStuck NoneDrift NoneIntermittent offset None

IT240, IT267, IT281, ST516 Offset ConditionalStuck AbortDrift ConditionalIntermittent offset Conditional

ESH244A, ISH236 Stuck None

EY244, EY260, EY272, EY275, EY284 Stuck open Abort

FAN416 Underspeed NoneOverspeed AbortFailed off Abort

INV2 Failed off Abort

Fig. 2. Diagnosis and recovery architecture.

M.J. Daigle et al. / Control Engineering Practice ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3





Definition 1 (Model). A model M is a tuple M¼ ðV ;CÞ, where V isa set of variables, and C is set of constraints. V consists of fivedisjoint sets, namely the set of state variables, X; the set ofparameters, Θ; the set of inputs, U; the set of outputs, Y; and theset of auxiliary variables, A. Each constraint c¼ ðεc;VcÞAC consistsof an equation εc involving variables VcAV .

Input variables uAU are known and correspond in ADAPT-Liteto relay and circuit breaker commands (6 in total); and the outputvariables yAY correspond to (measured) sensor signals (ADAPT-Lite has 11). Parameters θAΘ include explicit model parametersthat are used in the model constraints. Θ does not need to includeall parameters in the equations, only those that must be includedexplicitly (i.e., fault parameters). The auxiliary variables A are extravariables that are functions of the states, and are used to simplifythe model structure.

A constraint c¼ ðεc;VcÞ is a tuple including an equation εc over aset of variables Vc. It does not impose any computational causalityon the variables Vc, i.e., it does not specify which vAVc is thedependent variable in equation εc. The notion of a causal assign-ment to define the dependent variable is required.

Definition 2 (Causal assignment). A causal assignment α to aconstraint c¼ ðεc;VcÞ is a tuple α¼ ðc; voutc Þ, where voutc AVc isassigned as the dependent variable in equation εc.

A causal assignment of a constraint is written by using theconstraint's equation in a causal form, using ≔ instead of ¼ tomake the causal direction explicit.

As described in Roychoudhury et al. (2013), a set of causalassignments A, for a model M is valid if (i) there are no causalconstraints computing variables in U or Θ, (ii) no variable in Y actsas a dependent variable in any causal constraint, and (iii) everyother variable has exactly one causal constraint computing it. Acausal model is a model extended with a valid set of causalassignments.

Definition 3 (Causal model). Given a model Mn ¼ ðV ;CÞ, a causalmodel for Mn is a tuple M¼ ðV ;C;AÞ, where A is a set of validcausal assignments.

The global model for ADAPT-Lite is next described, followed bya description of the structural model decomposition methodologyand the submodels derived for ADAPT-Lite from the global model.

4.1. Global model

A system schematic for ADAPT-Lite is given in Fig. 1. BAT2consists of two 12 V lead-acid batteries in series, which are lumpedtogether into a single battery model. Battery models typically mustinclude a set of complex nonlinear behaviors (Ceraolo, 2000).However, most of these nonlinear characteristics are not evidentwithin the time frame of the experimental scenarios. Therefore, asimplified electrical circuit equivalent model, consisting of a singlelarge capacitance, C0, in series with a capacitor–resistor pair, Cs andRs, that subtracts from the voltage provided by C0 (see Fig. 1), isused. The battery may then be described as

_v0≔1C0

� iBð Þ; ðc1Þ

v0≔Z t

t0

_v0 dt; ðc2Þ

_vs≔1Cs

iBRs�vsð Þ; ðc3Þ

vs≔Z t

t0

_vs dt; ðc4Þ

vB≔v0�vs; ðc5Þwhere iB is the battery current, vB is the battery voltage, v0 is thevoltage across C0, and vs is the voltage drop across Cs and Rs. Inreality, Rs is a function of state of charge, depth of charge (Ceraolo,2000), and temperature, but, since it is not observed to changesignificantly in the available data, Rs is assumed to be constant.Instead, since the battery voltage decreases faster at lower vol-tages, C0 is expressed as a function that decreases with voltage. Inthe experimental data, battery temperature, TB, is not observed tochange significantly and can be assumed to be constant, i.e.,

_T B≔0; ðc6Þ

TB≔Z t

t0

_T B dt: ðc7Þ

The battery voltage vB is supplied to the loads through a seriesof relays and circuit breakers. The intermediate voltages aredescribed by

v1≔sCB236 � vB; ðc8Þ

v2≔sEY244 � v1; ðc9Þ

v3≔sEY260 � v2; ðc10Þ

vdc≔sCB280 � v3; ðc11Þ

vinv≔sCB262 � v3; ðc12Þwhere the s variables are the on/off states of the relays and circuitbreakers. These states are described by

sCB236≔uCB236 � f CB236; ðc13Þ

sEY244≔uEY244 � f EY244; ðc14Þ


sCB262≔f CB262; ðc16Þ

sCB280≔f CB280; ðc17Þwhere the u variables are the input commands for those compo-nents, and the f variables are fault parameters used to capture thefailed/stuck open fault modes. Note that CB262 and CB280 cannotbe controlled and so have no corresponding input variables.

The battery current iB is the sum of the dc current, idc, and theinput dc current to the inverter, iinv. This is described by

iB≔idcþ iinv; ðc18Þ

idc≔sEY281 � f dc �vBRdc

; ðc19Þ

sEY281≔uEY281 � f EY281; ðc20Þwhere Rdc is the resistance of the dc load, which may change due toa fault, and fdc is a fault parameter used to model the “failed off”failure mode.

The inverter transforms dc power to ac power. When operatingnominally, the rms voltage vrms is controlled very close to 120 V acas long as the input voltage is above 18 V

vrms≔120 � ðvinv418Þ � f INV2; ðc21Þwhere fINV2 is a fault parameter used to model the “failed off” failuremode. From a power balance of the ac and dc sides of the inverter, itresults that vinv � iinv ¼ e � vrms � irms, where e is the inverter efficiency,irms is the inverter rms current, vinv is the inverter voltage on the dcside, and iinv is the input dc current to the inverter. The inverter stilldraws a small amount of current even when irms ¼ 0, and this iscaptured as a dc resistance parallel to the inverter, Rinv. Hence, the






https://www.researchgate.net/publication/3266334_New_dynamical_models_of_lead-acid_batteries?el=1_x_8&enrichId=rgreq-414e8d7f-28c9-443b-8aac-986254993b20&enrichSource=Y292ZXJQYWdlOzI3Mjk4MTk0OTtBUzoyMDI0OTIwMDU2MjE3NjNAMTQyNTI4OTI2NDU5OQ==

https://www.researchgate.net/publication/259930324_A_Structural_Model_Decomposition_Framework_for_Systems_Health_Management?el=1_x_8&enrichId=rgreq-414e8d7f-28c9-443b-8aac-986254993b20&enrichSource=Y292ZXJQYWdlOzI3Mjk4MTk0OTtBUzoyMDI0OTIwMDU2MjE3NjNAMTQyNTI4OTI2NDU5OQ==

following equation is derived:

iinv≔vrms � irms

e � vinvþvinvRinv

: ðc22Þ

The voltage seen by the ac loads is dependent on the state of CB266

v4≔sCB266 � vrms; ðc23Þ

sCB266≔f CB266: ðc24ÞThe dc and ac resistive loads are modeled as pure resistances, withRdc and Rac, respectively. The fan has both resistive and inductiveproperties, so introduces a phase difference ϕ in its current from theinput voltage. Since only rms values of the inverter voltage andcurrent are available, only the steady-state ac relations are used. Thetotal rms current drawn by the ac loads is then

irms≔1ffiffiffi2

pffiffiffi2

p� ifan � cosϕþ j sinϕð Þþ

ffiffiffi2

p� iac

�� ; ðc25Þ

iac≔sEY272 � f ac �v4Rac

; ðc26Þ

ifan≔sEY275 � f fan �v4Rfan

; ðc27Þ


sEY275≔uEY275 � f EY275; ðc29Þwhere Rfan is the magnitude of the fan impedance, and fac and ffan arefault parameters used to model the “failed off” failure modes. The fanspeed is expressed as a function of its input voltage

_ω≔1Jfan

sEY275f fanv4

BfanRfan�ω

� �; ðc30Þ

ω≔Z t

t0_ω dt; ðc31Þ

where Jfan is an inertia parameter and Bfan is a resistance parameter.The sensors can be modeled using the following constraints:

yE240≔bE240þ f E240þv1; ðc32Þ

yE242≔bE242þ f E242þv2; ðc33Þ

yE265≔bE265þ f E265þvrms; ðc34Þ

yE281≔bE281þ f E281þvdc; ðc35Þ

yESH244A≔f ESH244AþsEY244; ðc36Þ

yISH236≔f ISH236þsCB236; ðc37Þ

yIT240≔bIT240þ f IT240þ iB; ðc38Þ

yIT267≔bIT267þ f IT267þ irms; ðc39Þ

yIT281≔bIT281þ f IT281þ idc; ðc40Þ

yST516≔bST516þ f ST516þω; ðc41Þ

yTE228≔TBþbTE228þ f TE228; ðc42Þwhere, for sensor S, the yS variable is the sensor output, the bSvariable is the (nominal) sensor bias, and the fS variable is the faultparameter.

The fault modes of the different components of ADAPT-Lite arelisted in Table 1. As illustrated by the model constraints, in ourmodeling framework, faults explicitly correspond to changes inmodel parameters θAΘ. Table 2 lists the fault parameters in Θ ofthe model M that are associated with each component. For theloads, faults are associated with changes in their resistance

parameters or discrete failures (parameter fL for load L). For asensor S, the fault parameter fS captures an additive fault. For arelay or circuit breaker C, the fault parameter fC captures a changein its discrete state; since in the nominal mode, all circuit breakersand relays are connected, these fault parameters becoming 0 repre-sent the switch going from a connected to a disconnected state.Similarly, inverter failure is captured with the parameter fINV2.

For the resistive load and sensor faults, the fault parameters cantake on one of the several profiles, including offset, drift, andintermittent offset profiles, as shown in Fig. 3 (tf denotes the timeof fault occurrence). For an offset, the faulty value θf ðtÞ is definedby

θf ðtÞ ¼ θðtÞþΔθ;

where θðtÞ is the nominal value, and Δθ is the offset. A drift isdefined by its slope m, i.e,

θf ðtÞ ¼ θðtÞþmðt�tf Þ:

For intermittent offsets, the offset randomly alternates betweenzero and nonzero values, where the periods of faulty values (Δtfi),the periods of nominal values (Δtni), and the faulty values duringeach faulty period (Δθi) are chosen randomly (see Fig. 3c). Theprofile is summarized by three parameters, the mean offset μΔθ ,i.e., meanðΔθ1;Δθ2;…Þ; the mean faulty time μtf, i.e.,meanðΔtf1;Δtf2;…Þ; and the mean time it is nominal μtn, i.e.,meanðΔtn1;Δtn2;…Þ.

For the sensor faults, there is an additional fault profile, stuck,where for sensor S, fS(t) is such that ySðtÞ ¼ c, where the stuck valueis c and the sensor noise is absent.

In summary, the global model is defined by constraints c1–c42and the constituent variable sets. The state, parameter, input, andoutput variable sets are defined by

� X ¼ fTB; vo; vs;ωg� Θ¼ fRac, fac, Rdc, fdc, Rfan, ffan, f CB236, f CB262, f CB266, f CB280, f E240,f E242, f E265, f E281, f ESH244A, f EY244, f EY260, f EY272, f EY275, f EY281,fINV2, f ISH236, f IT240, f IT267, f IT281, f ST516, f TE228g

Table 2Component fault parameters.

Component Fault parameters

AC483 Rac ; f acDC485 Rdc ; f dc

CB236 f CB236CB262 f CB262CB266 f CB266CB280 f CB280

E240 f E240E242 f E242E265 f E265E281 f E281IT240 f IT240IT267 f IT267IT281 f IT281ST516 f ST516TE228 f TE228

ESH244A f ESH244A

ISH236 f ISH236

EY244 f EY244EY260 f EY260EY272 f EY272EY275 f EY275EY284 f EY284

FAN416 Rfan ; f fan

INV2 fINV2






� U ¼ fuCB236;uEY244;uEY260;uEY272;uEY275;uEY281g� Y ¼ fyE240; yE242; yE265; yE281; yESH244A; yISH236; yIT240; yIT267; yIT281;yST516; yTE228g

4.2. Structural model decomposition

In the context of diagnosis, structural model decompositionoffers several advantages. For fault isolation, its primary advantageis that local submodels whose outputs are a function of only asubset of the faults can be created, i.e., some faults becomedecoupled from the residuals computed from the local submodeloutputs, which can improve diagnosability. For fault identification,structural model decomposition enables us to find minimal sub-models to estimate parameters (Bregon et al., 2012). Further, itallows for a distributed diagnosis implementation (Bregon et al.,2014).

Given a system model, submodels are generated that allow forthe computation of a given set of variables using only local inputs.Given a definition of the local inputs (in general, selected from V)and the set of variables that have to be computed by the submodel(selected from V�U), a causal submodel Mi is created from acausal model M. A submodel is obtained in which only a subset ofthe variables in V are computed using only a subset of theconstraints in C. In this way, each submodel computes its variablevalues independently from all other submodels. A submodel can bedefined as follows.

Definition 4 (Causal submodel). A causal submodel Mi of a causalmodel M¼ ðV ;C;AÞ is a tuple Mi ¼ ðVi;Ci;AiÞ, where ViDV ,CiDC, and Ai is a set of (valid) causal assignments for Mi.

A causal submodel is generated from a global (causal) model bydefining the set of local outputs the submodel must compute, andthe set of local inputs that are available to it. The algorithm forderiving a causal submodel from a global model is detailed inRoychoudhury et al. (2013). It works by starting at the local outputs,and propagating backwards trying to resolve variables using con-straints involving the local inputs, reassigning causality in theprocess if needed. The procedure has the property that the gener-ated submodels are minimal, i.e., they contain only the minimalvariables and constraints needed to compute the local outputs.

The (causal) global model offers one way in which to computepredicted values of system outputs, in which the inputs used arethe global inputs to the system. Instead, causal submodels can alsobe used, where for N sensors a set of N submodels is defined, onefor each sensor, where the available local inputs are the global

inputs to the system and the measured values from sensors. Theadvantages to this approach are (i) improving diagnosability and(ii) enabling a distributed implementation. The disadvantage isthat noisy sensor signals are being used as inputs to localsubmodels to compute predicted values of measurements, typicallyresulting in a slightly diminished fault detection performance.

The decomposition approach presented in this paper followsthe main idea of Possible Conflicts (PCs) (Pulido & Alonso-González, 2004), where minimal redundant submodels as com-puted for fault diagnosis purposes by considering sensor measure-ments as inputs. However, our decomposition approachgeneralizes that idea and provides a framework where submodelscan be computed for other tasks such as fault identification.Additionally, our algorithms allow for the computation of submo-dels that are not minimal, which have been proven to be useful fordistributed diagnosis (Bregon et al., 2014).

For ADAPT-Lite, the system has 11 sensors, so 11 minimalsubmodels are obtained. Table 3 shows the local state, parameter,input, and output variable sets for each, along with the submodelconstraints. Note here that for a sensor S, a ySAUi means that themeasured value from the sensor is being used as an input, and aySAYi means that a predicted value of yS is being generated. Notethat each submodel computes its values independently of all othersubmodels, i.e., no submodel outputs are fed into any othersubmodels, as the only inputs are the known inputs U andmeasured sensor values. In this way, the submodels can beexecuted in a parallel, distributed fashion if allowed by thecomputational hardware.

For example, take IT281, where U [ ðY�fyIT281gÞ are the avail-able local inputs. To compute yIT281, the variables f IT281, which is afault parameter with a known nominal value (i.e., 0), and idc(constraint (c40)) need to be computed. Following the causalitybackwards, to compute idc, the variables sEY281, fdc, vdc, and Rdc(constraint (c19)) need to be computed. Both fdc and Rdc areparameters with known nominal values, and sEY281 can be com-puted with the known input uEY281 and f EY281, a fault parameterwith a known nominal value (constraint (c20)). To compute vdc,variables sCB262 and v3 (constraint (c11)) are needed, however, E281also measures this voltage, so the related constraint (constraint(c35)) in the causality where yE281 becomes an input can be used,thus completing the submodel.

As it is shown from Table 3, each submodel is a function of onlya subset of the fault parameters. Therefore, if a fault occurs, it willcause a discrepancy in only a subset of the local outputs from theirobserved values. For example, if Rfan changes, a discrepancy only inthe predicted values of yIT267 and yST516 will be seen. For theremaining submodel outputs, the submodels will correctly predict

t

Δθ

θ(t)

tf

θf(t) θf(t)

ttf

θ(t)

m

t

Δθ1θ(t)

tf

θf(t)

Δθ2 Δθ3

Δtf1 Δtn1 Δtf2 Δtn2 Δtf3

Fig. 3. Fault profiles. (a) Offset fault profile; (b) drift fault profile; (c) intermittent offset fault profile.






https://www.researchgate.net/publication/259930324_A_Structural_Model_Decomposition_Framework_for_Systems_Health_Management?el=1_x_8&enrichId=rgreq-414e8d7f-28c9-443b-8aac-986254993b20&enrichSource=Y292ZXJQYWdlOzI3Mjk4MTk0OTtBUzoyMDI0OTIwMDU2MjE3NjNAMTQyNTI4OTI2NDU5OQ==

Table 3Submodels for the ADAPT system.

States (Xi) Parameters (Θi) Inputs (Ui) Outputs (Yi) Causal assignments (Ai)

v0 ; vs f CB236 ; f E240; f IT240 uCB236; yIT240 yE240 yE240≔bE240þ f E240þv1v1≔sCB236 � vBvB≔v0�vssCB236≔uCB236f CB236vo≔

R tt0

_v0 dt_v0≔� iB=C0

iB≔yIT240�bIT240� f IT240vs≔

R tt0

_vs dt_vs≔ðRsiB�vsÞ=Cs

∅ f E240; f E242 ; f EY244 yE240 ; uEY244 yE242 yE242≔bE242þ f E242þv2v2≔sEY244v1sEY244≔uEY244f EY244v1≔yE240�bE240� f E240

∅ f CB262 ; f E242; f E265, yE242 ; yEY260 yE265 yE265≔bE265þ f E265þvrms

f EY260; f INV2 vrms≔120f INV2ðvinv40Þvinv≔sCB262v3sCB262≔f CB262v3≔sEY260v2sEY260≔uEY260f EY260v2≔yE242�bE242� f E242

∅ f CB280 ; f E242; f E281, yE242 ; uEY260 yE281 yE281≔bE281þ f E281þvdcf EY260 vdc≔sCB280v3

v3≔sEY260v2sCB280≔f CB280sEY260≔uEY260f EY260v2≔yE242�bE242� f E242

∅ f ESH244A ; f EY244 uEY244 yESH244A yESH244A≔f ESH244AþsEY244sEY244≔uEY244f EY244

∅ f CB236 ; f ISH236 uCB236 yISH236 yISH236≔f ISH236þsCB236sCB236≔uCB236f CB236

∅ f CB262 ; f E242; f E265, yE242 ; yE265 ;uEY260, yIT240 yIT240≔bIT240þ f IT240þ iBf EY260; f IT240; f IT267, yIT267; yIT281 iB≔iinvþ idcf IT281 idc≔yIT281�bIT281� f IT281

iinv≔irmsvrms=ðevinvÞþvinv=Rinv

vrms≔yE265�bE265� f E265vinv≔sCB262v3sCB262≔f CB262v3≔sEY260v2sEY260≔uEY260f EY260irms≔yIT267�bIT267� f IT267v2≔yE242�bE242� f E242

∅ Rac ; f ac ; Rfan ; f fan , yE265 ; uEY272; uEY275 yIT267 yIT267≔bIT267þ f IT267þ irms

f CB266 ; f E265; f EY272, irms≔ 1ffiffi2

pffiffiffi2

pifan cosϕþ j sinϕð Þþ

ffiffiffi2

piac

�� f EY275; f IT267 iac≔sEY272f acv4=Rac

sEY272≔uEY272 � f EY272ifan≔sEY275f fanv4=Rfan

v4≔sCB266vrms

sEY275≔uEY275f EY275sCB266≔f CB266vrms≔yE265�bE265� f E265

∅ Rdc ; f dc ; f E281, yE281 ;uEY281 yIT281 yIT281≔bIT281þ f IT281þ idcf EY281; f IT281 idc≔sEY281f dcvdc=Rdc

sEY281≔uEY281f EY281vdc≔yE281�bE281� f E281

ω Rfan ; f fan ; f CB266, uE265 yST516 yST516≔bST516þ f ST516þω

f E265; f ST516 ω≔R tt0

_ω dt

_ω≔�ω=JfanþsEY275f fanv4=ðBfan � Jfan � RfanÞv4≔sCB266vrms

sCB266≔f CB266sEY275≔uEY275f EY275vrms≔yE265�bE265� f E265

TB f TE228 ∅ yTE228 yTE228≔TBþbTE228þ f TE228_T B≔0

TB≔R tt0

_T B dt






the (faulty) sensor values since they are using other sensors asinputs.

The three diagnostic algorithms, QED, QED-PC, and QED-PCþþ ,differ in which set of models/submodels are used. QED uses onlythe global system model, and so computes 11 (global) outputs.QED-PC uses the set of minimal submodels shown in Table 3, andso computes 11 (local) outputs. QED-PCþþ , on the other hand,uses both the global model and the minimal submodels, computinga total of 22 outputs.

Recall that the most parsimonious models to describe thesystem behavior are used. For QED, it is not necessary to modelthe complete dynamics, and can simplify the model in some partsbecause in nominal operation (for which system behavior needs tobe predicted) the dynamics are limited to certain ranges. Forexample, the fan speed (ω) is constant during nominal operationbecause it is always operated at the same speed, where anydeviation from that speed indicates a fault, and hence does notneed to be considered. So, the global model can simply assume_ω ¼ 0. The local submodel computing yST516, on the other hand,must model the dynamics, because some faults independent of thissubmodel, e.g. a fault in the inverter, will cause the fan speed todecrease through a decrease in vrms, for which yE265 is used as aninput to the yST516 submodel. Consequently, this submodel shouldcorrectly predict the gradual decrease in fan speed. Similarly, theglobal model may assume vrms ¼ 120 V rms, since any deviationimplies a fault. It can also omit the Cs/Rs pair from the batterymodel, because it does not need to account for changes in thecurrent drawn from the battery (any change in current implies afault).

5. Fault detection

For every output y, the fault detection module compares themeasured signal y(t) with each available prediction of that signalyðtÞ. The difference of these signals is known as the residual, andour diagnosis approach is fundamentally based on the analysis ofresidual signals.

Definition 5 (Residual). A residual, ry, is a time-varying signalcomputed as the difference between a measurement, yAY , and apredicted value of the measurement y, denoted as y.

QED computes 11 outputs and, from this, 11 residuals arecomputed based on the global model. QED-PC also computes 11outputs and 11 residuals, but based on the local submodels, and sowill have different fault response characteristics. QED-PCþþcomputes the combined 22 residuals and so has more analyticalredundancy to base its diagnosis on.

Nominally, residuals are, in a statistical sense, zero, accountingfor sensor noise and modeling errors. The explicit goal of faultdetection is to determine statistically significant deviations of theresidual values from zero, which indicates the presence of a fault.In this framework, for robust real-time fault detection, the Z-testwith a set of sliding windows is used (see Fig. 4). This approach isdescribed in detail in Biswas et al. (2003) and Daigle et al. (2010)and it is summarized here for completeness. First, a small window,W2, is used to estimate the current mean μrðtÞ of a residual signal,where t refers to discrete time

μrðtÞ ¼1W2

Xti ¼ t�W2 þ1

rðiÞ:

The variance of the nominal residual signal, σ2r ðtÞ, is computedusing a large window W1 preceding W2, by a buffer Wdelay that ismeant to ensure that W1 contains no samples after fault

occurrence. The variance is computed by

σ2r ðtÞ ¼1W1

Xt�W2 �Wdelay

i ¼ t�W2 �Wdelay �W1 þ1

ðrðiÞ�μr0ðtÞÞ2;

where

μr0ðtÞ ¼ 1

W1

Xt�W2 �Wdelay

i ¼ t�W2 �Wdelay �W1 þ1

rðiÞ:

A user-specified confidence level determines the z� o0 andzþ 40 bounds for a two-sided Z-test. The fault detection thresh-olds, ε�r ðtÞ and εþr ðtÞ, are dynamically computed using

ε�r ðtÞ ¼ z�σrðtÞffiffiffiffiffiffiffiffiW2

p �E;

εþr ðtÞ ¼ zþσrðtÞffiffiffiffiffiffiffiffiW2

p þE;

where E is a modeling error term. A fault is detected if μrðtÞ liesoutside of ½ε�r ðtÞ, εþr ðtÞ� at time t.

The parameters W1, W2, Wdelay, the z bounds, and E must betuned to optimize performance. The W1 window must be largeenough to accurately compute the nominal residual variance, butconstrained by memory requirements. The W2 window must belarge enough to accurately compute the mean, but small enough tobe respond quickly to faults. In our implementation, window sizesvaried, and 100rW1r200, 10rW2r50, and Wdelay ¼ 10 weresufficient.

For ADAPT-Lite, the selected error terms E for QED and QED-PCare listed in Table 4 (QED-PCþþ uses both). These values weretuned to obtain maximum fault sensitivity without false alarms.Overall, the thresholds are smaller for QED than for QED-PC. This isexpected since the minimal submodels use typically noisy sensorsignals as inputs to compute their outputs. Further, in some casesthe minimal submodels must accurately capture more nominaldynamics with respect to a wider range of inputs as compared tothe global model. For example, consider the submodel that takes inas input uE265, the rms inverter voltage of 120 V rms, and outputsfan speed yST516. For QED, voltage vrms is always computed to be120 V rms, and so the fan speed is always the same. For QED-PC, onthe other hand, the fan speed is modeled using a local submodel. Inthis case, vrms, which is replaced by yE265 as a local input, canbecome zero if a fault such as the inverter failure occurs. Hence, the

Fig. 4. Sliding windows in the fault detection scheme at time ti.

Table 4E Values for fault detection.

Residual QED QED-PC

rE240 0.090 0.100rE242 0.080 0.100rE265 0.070 0.110rE281 0.150 0.300rESH244A 0.000 0.000rISH236 0.000 0.000rIT240 0.110 0.114rIT267 0.050 0.060rIT281 0.060 0.090rST516 20.000 40.000rTE228 0.000 0.200






https://www.researchgate.net/publication/262971109_A_Comprehensive_Diagnosis_Methodology_for_Complex_Hybrid_Systems_A_Case_Study_on_Spacecraft_Power_Distribution_Systems?el=1_x_8&enrichId=rgreq-414e8d7f-28c9-443b-8aac-986254993b20&enrichSource=Y292ZXJQYWdlOzI3Mjk4MTk0OTtBUzoyMDI0OTIwMDU2MjE3NjNAMTQyNTI4OTI2NDU5OQ==

submodel has to accurately capture the decrease in fan speed whenthis happens, and because these dynamics are more difficult tomodel accurately than a constant value, a larger error threshold isneeded for this residual (i.e., E¼20 rpm for QED and E¼40 rpm forQED-PC).

Note that for stuck faults in sensors (recall from Section 4 thatthese faults exhibit no noise in the signal y(t)), a sensor that isstuck within nominal ranges will not be detected by the abovemethod. Hence, a new detection test for these faults is required.For sensor y,

stuckyðtÞ ¼true;

XNs

i ¼ 1

jyðtÞ�yðt� iÞj ¼ 0

false otherwise

8>><>>: ;

where Ns is a user-defined limit, i.e., if the past Ns samples of y(t)are all the same, then stuckðtÞ ¼ true. To implement this, it issufficient to maintain a counter ks for each sensor that keeps trackof howmany consecutive values of y(t) up to the current time pointare the same. A sensor is stuck when ksZNs. The selected value ofNs depends on the particular sensor. For some sensors in ADAPT-Lite, Ns must be quite large (e.g., Ns¼400), because the sensorsnormally repeat the same value for long periods of time. For thediscrete sensors ISH236 and ESH244A this test is not applied,because these sensors are binary-valued and noiseless, so willnominally have long sequences of repeated measured values.

6. Symbol generation

Robust methods based on the Z-test are also used for symbolgeneration (Daigle et al., 2010). Each of the three algorithms usethe same method for each of their computed residual signals. Thefirst symbol is derived from the result of fault detection. If r(t) isgreater than εþr ðtÞ (resp. less than ε�r ðtÞ), a þ (resp. �) is obtained.The second symbol is calculated for the direction of the slope of theresidual, as explained below.

The approach first starts with an estimate of the initial residualvalue, μr0 ðtdÞ, at the time of fault detection, td, over a small windowW3 (see Fig. 5)

μr0 ðtdÞ ¼1W3

Xtd þW3 �1

i ¼ td

rðiÞ:

The mean of the residual slope is computed over a window from tdto t

μrd ðtÞ ¼1

t�tdþ1

Xti ¼ td

rðiÞ�μr0

0@

1A:

Using bounds z� and zþ , the thresholds are

ε�rd ðtÞ ¼ z� σr1ffiffiffiffiffiffiffiffiW3

p þ 1ffiffiffiffiffiffiffiffiWn

p !

�Es

εþrd ðtÞ ¼ zþ σr1ffiffiffiffiffiffiffiffiW3

p þ 1ffiffiffiffiffiffiffiffiWn

p !

þEs;

where Es is a modeling error term. The þ (resp. �) symbol isgenerated when μrd 4εþrd ðtÞ (resp. μrd oε�rd ðtÞ). The window used tocalculate the slope, Wn, is increased until the symbol is successfullygenerated, or t�td becomes larger than a user-specified limit, atwhich point the slope is reported as 0, implying that the true slopeis either zero or unknown, but very small. The window sizes forsymbol generation are decided upon in the same manner as thosefor fault detection. For all residuals W3 ¼ 10 and Wn¼100 are used.

A discrete change symbol, representing whether a signal haschanged between a nonzero and zero value, which is used to

distinguish between parametric and discrete faults, is also com-puted. A Z symbol is reported if the estimate is nonzero and themeasurement is zero, N symbol if the estimate is zero and themeasurement is nonzero, and X symbol otherwise. The procedureto compute this value is detailed in Daigle et al. (2010).

7. Fault isolation

The goal of the fault isolation module is, based on the output ofthe symbol generator, to generate the list of consistent faults F. Asnew symbols are provided, F shrinks as inconsistent faults areeliminated. A qualitative fault isolation methodology is used thatisolates faults based on the transients they produce in the systembehavior, which manifest as deviations in residual values (Mosterman& Biswas, 1999). The symbolic representation of these deviations,produced by the symbol generator, which represent the immediate(discontinuous) change in magnitude, the first nonzero derivativechange, and discrete zero/nonzero value changes in the measurementfrom the estimate, are called fault signatures. Predicted fault signa-tures, derived from the system models/submodels, are matched tothose derived by symbol generation in order to perform faultisolation. The temporal ordering of observed deviations is also usedas additional diagnostic information, based on the intuition that faulteffects will manifest in some parts of the system before others. Thesetemporal orderings of deviations produced by a fault are termedrelative residual orderings (Daigle et al., 2009). Qualitative faultsignatures and residual orderings can be computed by manualanalysis of the system model, by simulation, or automatically fromcertain types of models, such as a temporal causal graph as presentedin Mosterman and Biswas (1999) and Lo, Wong, and Rad (2006).

Fault signatures for a subset of the ADAPT-Lite faults are shownin Table 5 for the QED algorithm and in Table 6 for the QED-PCalgorithm. As mentioned, the first symbol represents an immediatechange in magnitude, the second represents changes in the slope,and the third represents discrete changes. As an example, let usconsider a positive offset in E240. This fault will cause an abruptincrease in the E240 residual (þ), no change in the slope (0), andno discrete change behavior (X). As can be seen in Table 5, no othersensor is affected by this fault in the QED approach (00X). ForQED-PC, the positive offset in E240 produces not only the abruptincrease in the E240 residual, but also an abrupt decrease in theE242 residual with no change in slope, and no discrete changebehavior (-0X). Since measured values from E240 are used as aninput for the minimal submodel that estimates E242, a fault inE240 causes a deviation in the residual associated with E242.Resistance offsets in AC483 and DC485 cause multiple deviationsfor QED but only one in IT267 and IT281, respectively, for QED-PC.This is caused because the corresponding parameters, Rac and Rdc,are present only in the minimal submodels for IT267 and IT281,respectively, as shown in Table 3.

Using all the signatures and orderings, it can be determined ifthe system is fully diagnosable with that information, i.e., that allfaults can be distinguished from each other. It is clear fromTables 5 and 6, that the system is not fully diagnosable. Specifically,an offset in E240 causes the same (initial) effects in the residuals asan intermittent offset. This is the case for all other sensors as wellalong with DC485 and AC483, and consequently, fault identifica-tion is needed here to resolve the ambiguity. Besides these cases,there are four pairs of faults that produce exactly the same

Fig. 5. Slope symbol calculation.








quantitative behavior on the given residuals: failures in CB262 andINV2, failures in EY281 and DC485, failures in EY272 and AC483,and failures in EY275 and FAN416. Further, for QED, since sensorsare only used at the output to compute residuals, a sensor faultaffects only a single residual. Since no-exoneration is assumed(Cordier et al., 2004; de Kleer & Kurien, 2003) in this work, theabsence of residual activations will not be used to exonerate faultcandidates. The consequence of this is that, when a sensor faultoccurs, it is necessary to wait to confirm that no other residualsdeviate before concluding that a sensor fault has indeed occurred.For QED-PC, since sensors are used not only as an output, but alsoas an input to the submodels, sensor faults will affect multipleresiduals, so this problem is minimized. However, for QED-PC,faults in DC485 and AC483 affect only a single residual, thus havingan analogous situation. Due to these diagnosability properties, ingeneral, QED will isolate nonsensor faults faster than QED-PC, andQED-PC will isolate sensor faults faster than QED. By combiningthese residual sets, diagnosability is improved over QED or QED-PCby themselves, and this is the prime motivation for QED-PCþþ .

In theory, the absence of a residual deviation cannot be used toexonerate a candidate, because the deviation could always occursome time in the future. However, in practice, candidates can beexonerated using this information if a time limit, measured fromthe detection of a fault, within which a deviation must be observed,can be confidently introduced. If an expected deviation is not seenwithin that time limit, this is taken as a direct observation of a00X signature and so any candidate that predicts otherwise iseliminated. Such a strategy can be generally applied in any casewhere a nonzero signature exists in the fault signature table, butbecause this depends on the sensitivity of fault detection, it ispossible that some deviations may not be detected, even ifanticipated. Therefore, for the ADAPT case study, rules of this formare selectively applied only for residuals for which an easilydetected deviation is expected. By using this strategy, the distin-guishability of the faults can be improved, and the fault isolationtimes can be decreased.

The isolation rules implemented for ADAPT-Lite are summarizedin Table 7. The table lists the components, the sensors whose

residuals must deviate, the time when the rule is fired (measuredfrom the time of fault detection), and which algorithm the rule isapplied to (represented using a check mark). The rules are to be readas, “if the candidate concerns component C and the residual forsensor S has not deviated within T seconds since fault detection, theneliminate the candidate”. For example, if CB236 is faulty, the sensormeasuring its position, ISH236, is expected to deviate almostimmediately (within 0.2 s), and E265 to deviate within 10 s. IfFAN416 is faulty, IT267 and ST516 are expected to deviate within30 s, and, for QED only, E265 to deviate within 30 s. For some sensor

Table 5Signatures for selected faults for QED for ADAPT-Lite.

Component Fault mode rE240 rE242 rE265 rE281 rESH244A rISH236 rIT240 rIT267 rIT281 rST516 rTE228

AC483 Offset ðΔp40Þ þ0X þ0X þ0X þ0X 00X 00X -0X -0X þ0X 00X 00XDC485 Offset ðΔp40Þ þ0X þ0X 00X þ0X 00X 00X -0X 00X -0X 00X 00XDC485 Drift ðmo0Þ 0-X 0-X 00X 0-X 00X 00X 0þX 00X 0þX 00X 00XE240 Offset ðΔp40Þ þ0X 00X 00X 00X 00X 00X 00X 00X 00X 00X 00XE240 Drift ðm40Þ 0þX 00X 00X 00X 00X 00X 00X 00X 00X 00X 00XE240 Intermittent offset ðμΔp40Þ þ0X 00X 00X 00X 00X 00X 00X 00X 00X 00X 00XEY244 Stuck open þ0X -0Z -0Z -0Z -0Z 00X -0Z -0Z -0Z 0-X 00XFAN416 Underspeed þ0X þ0X þ0X 00X 00X 00X -0X -0X 00X -0X 00XIT267 Offset ðΔpo0Þ 00X 00X 00X 00X 00X 00X 00X -0X 00X 00X 00X

Table 6Signatures for selected faults for QED-PC for ADAPT-Lite.

Component Fault mode rE240 rE242 rE265 rE281 rESH244A rISH236 rIT240 rIT267 rIT281 rST516 rTE228

AC483 Offset ðΔp40Þ 00X 00X 00X 00X 00X 00X 00X -0X 00X 00X 00XDC485 Offset ðΔp40Þ 00X 00X 00X 00X 00X 00X 00X 00X -0X 00X 00XDC485 Drift ðmo0Þ 00X 00X 00X 00X 00X 00X 00X 00X 0þX 00X 00XE240 Offset ðΔp40Þ þ0X -0X 00X 00X 00X 00X 00X 00X 00X 00X 00XE240 Drift ðm4Þ0 0þX 0-X 00X 00X 00X 00X 00X 00X 00X 00X 00XE240 Intermittent offset ðμΔp40Þ þ0X -nX 00X 00X 00X 00X 00X 00X 00X 00X 00XEY244 Stuck open 00X -0Z -0Z 00X -0Z 00X 00X 00X 00X 00X 00XFAN416 Underspeed 00X 00X nnX 00X 00X 00X 00X -0X 00X -0X 00XIT267 Offset ðΔpo0Þ 00X 00X -0X 00X 00X 00X þ0X -0X 00X 00X 00X

Table 7Heuristic fault isolation rules.

Component Residual Firing time (s) QED QED-PC QED-PCþþ

AC483 rE265 40 ✓

AC483 rIT267 40 ✓ ✓

CB236 rISH236 0.2 ✓ ✓

CB236 rE265 10 ✓ ✓

CB262 rE265 10 ✓ ✓ ✓

CB266 rIT267 10 ✓ ✓ ✓

E265 Any 3 70 ✓

E265 rIT240 70 ✓

E281 Any 2 70 ✓

EY244 rESH244A 0.2 ✓ ✓

EY244 rE265 10 ✓ ✓

EY260 rE265 10 ✓ ✓

EY260 rE281 10 ✓ ✓

EY272 rIT267 40 ✓ ✓ ✓

EY272 rE265 40 ✓

EY275 rE265 30 ✓

EY275 rIT267 30 ✓ ✓

EY275 rST516 30 ✓ ✓ ✓

EY284 rIT281 10 ✓ ✓ ✓

FAN416 rE265 30 ✓

FAN416 rIT267 30 ✓ ✓

FAN416 rST516 30 ✓ ✓ ✓

INV2 rE265 10 ✓ ✓ ✓

IT267 Any 2 70 ✓

IT281 Any 2 70 ✓

Sensor S rS 60 ✓ ✓






faults for QED-PC, a certain number of residual deviations areexpected to confirm a fault (see E265, E281, IT267, and IT281 inthe table). Also for QED-PC, a fault in any sensor is expected to causea deviation in the residual for that sensor within 60 s, as shown inthe last row of the table. Note that for QED, this rule would beredundant because a sensor fault cannot be generated as a candidateunless a deviation in that sensor's residual is seen.

For QED-PCþþ , the system is diagnosable with the combinedresidual set, except for the known indistinguishable faults, so none ofthe fault isolation rules are actually needed for unique isolation offaults. Some of the rules are still kept, since they improve faultisolation times. For example, for QED a rule to rule out non sensorfaults is needed if after a significant amount of time only one residualhas been observed to deviate; QED-PCþþ does not require that rulebecause multiple submodel-based residuals will deviate due to asensor fault. Table 7 shows which rules are still used for QED-PCþþ .

8. Fault identification

The goal of fault identification is to, given a hypothesized fault f(consisting of a faulty component and its fault mode), determinethe parameters that define the fault. Identification is performed foreach fault in the candidate list F to produce a new candidate list Fidaugmented with estimated fault parameters. Each of the algo-rithms use the same approach.

As described in Section 4, each fault is mapped to a singleparameter in the system model. For each parameter θ, there is aknown nominal value θðtÞ. When a fault occurs, it takes on somenew profile, θf ðtÞ (offset, drift, or intermittent offset). The goal offault identification is to identify the signal θf ðtÞ, and based onΔθ¼ θf ðtÞ�θðtÞ, identify its specific profile and the parametersdefining that profile.

The faults that require identification are faults in the resistancesRac and Rdc, and sensor faults (fS for a sensor S). Structural modeldecomposition is used here in order to obtain minimal submodelsthat compute those parameters, where the local output of thesubmodel is the parameter value (i.e., the θf ðtÞ), and the localinputs are measured signals from the available sensors Y in theglobal model. Using the model decomposition algorithm, thesubmodels shown in Table 8 are obtained.

So, using these local submodels, for each fault f, the history of θfover ½td; t� is obtained. Knowing the nominal value θðtÞ, ΔθðtÞ can becomputed, and by analyzing this signal, the parameter values of thefault profile can be computed as follows:

� For stuck faults, the stuck value is taken as θf ðtÞ.� For offset faults, the offset value, Δθ, is computed at time t asthe mean of Δθ over ½td; t�.� For drift faults, for a given interval ½t1; t2�, the mean of Δθ iscomputed over a window of n samples centered at t1, Δθ1, andat t2, Δθ2 (Fig. 6). The slope is then computed as m1;2 ¼ ðΔθ2�Δθ1Þ=ðt2�t1Þ. Three time intervals are chosen: ½td; ðtþtdÞ=2�,½ðtþtdÞ=2; t�, and ½td; t�, and the slope m is computed as themedian of the three slopes (Δθ1, Δθ2, and Δθ3 in the figure).Taking the mean of Δθ over the interval endpoints and taking themedian of computed slope values helps improve the robustness.

� For intermittent offset faults, a limit l. above which ΔθðtÞ isconsidered faulty and below which is considered nominal, isused. The limit l is typically chosen as within 1–2% of thenominal value of y(t) or θðtÞ. The approach steps through thesignal ΔθðtÞ, and maintains two counters kn and kf. Each time atransition from a nominal value to a faulty value occurs, kf isincremented, and when a transition from a faulty value to anominal value occurs, kn is incremented. In effect, these twocounters keep track of the number of times the signal was faulty

and nominal. For each new nominal value, a second counter τn,that keeps track of the total amount of time the signal is nominal,is incremented. Similarly, for each new faulty value, a counter τf,that keeps track of the total amount of time it is faulty, isincremented. A variable vf, that keeps track of the sum of allfaulty values, is also incremented. Then, the fault parameters are

μΔp ¼vfτf; μf ¼

τfkf; μn ¼

τnkn:

In addition to the parameter values of the fault profiles listedabove, for each fault, the fault injection time is also computed. Foroffset, stuck, and intermittent faults, the fault injection time istaken as tf¼td. For drift faults, since the slope m1;2 and the currentvalue of the parameter θf is available, the time when θf crossed zerocan be computed. Since there is always a delay in detecting driftfaults, tf otd is expected to be found. However, due to noise anderrors in the identification of the slope, the computed faultinjection time is sometimes computed to also be before the actualfault injection time.

Fault identification is also used to improve fault isolation. Eachcandidate is represented by a faulty parameter with a proposedfault profile. If the identification results are inconsistent with theproposed profile, the candidate can be either eliminated or itsprofile can be changed to a consistent one. To do this, consistencytests have to be implemented for each fault profile. For example, ifthe proposed profile is an offset but it is found that it has asignificant slope, then it cannot be an offset fault and is likely adrift fault. The proposed fault profile will change according to theresults of these tests, and if all tests fail, the candidate is elimi-nated entirely. Candidates can also be eliminated if the identifiedparameters are invalid (e.g., if an estimated resistance value isnegative).

9. Decision making

As shown in Fig. 2, at the end of the scenario, the decisionwhether to abort or continue the mission must be made. The faultidentification module computes a candidate set Fid, with eachf AFid being defined by the component, its fault mode, and theassociated fault parameters. The decision maker implements afunction DM(f) which, for a given fault, computes a recommendedcommand c. In ADAPT-Lite either c¼abort (i.e., abort the mission)or no command is provided (i.e., continue the mission). Table 1summarizes the abort recommendation for each fault mode inADAPT-Lite.

The cost of the decision (abort or continue) is zero when thecorrect command is chosen. If the algorithm recommends abortwhen the mission should be continued, the associated cost is 25,which is defined to be the cost of the mission. If the algorithmrecommends to continue when it should have been aborted, theassociated cost is 125 which is the cost of the mission, 25, plus thecost of the vehicle, defined to be 100. Therefore, the conservativeapproach is taken and abort is recommended if DMðf Þ ¼ abort for atleast one f AFid. In the case that a fault was detected but allcandidates were eliminated, then one may assume either a falsepositive, or a true positive with incorrect fault isolation. The latteris assumed, and in this case, the approach again conservativelyrecommends an abort. A decision to abort or continue is made at4 min into the mission for this case study.

10. Experimental results

In this section, the QED, QED-PC, and QED-PCþþ algorithmsare applied to real experimental data collected from the ADAPT






Table 8Fault identification submodels for ADAPT-Lite.


∅ f E281 ; f EY281 ; f IT281 yE281; uEY281 ; yIT281 Rdc vdc≔yE281�bE281� f E281idc≔yIT281�bIT281� f IT281sEY281≔uEY281 � f EY281Rdc≔sEY281 � vdc=idc

∅ Rfan ; f CB266; f E265, yE265; uEY272 ; uEY275 Rac iac≔� ifanþ irms

f EY272; f EY275 sEY272≔uEY272 � f EY272ifan≔sEY275 � v4=Rfan

v4≔sCB266 � vrms

sCB266≔f CB266sEY275≔uEY275 � f EY275vrms≔yE265�bE265� f E265

irms≔ 1ffiffi2

pffiffiffi2


ffiffiffi2

piac

�� Rac≔sEY272 � v4=iac

vo; vs f CB236; f IT240 uCB236; yE240; yIT240 f E240 v1≔sCB236 � vBvB≔vo�vssCB236≔f CB236 � uCB236

vs≔R tt0

_vs dt_vs≔ðRs � iB�vsÞ=Cs

vo≔R tt0

_vo dt

iB≔�bIT240� f IT240þyIT240_vo≔� iB=C0

f E240≔�bE240�v1þyE240

∅ f E240 ; f EY244 uEY244 ; yE240 ; yE242 f E242 v2≔sEY244 � v1v1≔�bE240� f E240þyE240sEY244≔f EY244 � uEY244

f E242≔�bE242�v2þyE242

∅ f CB262; f E242 ; f EY260, uEY260 ; yE242 ; yE265 f E265 vrms≔120 � f INV2 � vinvf INV2 vinv≔sCB262 � v3

sCB262≔f CB262v3≔sEY260 � v2v2≔�bE242� f E242þyE242sEY260≔f EY260 � uEY260

f E265≔�bE265�vrmsþyE265

∅ f CB280; f E242; f EY260 uEY260 ; yE242; yE281 f E281 vdc≔sCB280 � v3sCB280≔f CB280v3≔sEY260 � v2v2≔�bE242� f E242þyE242sEY260≔f EY260 � uEY260

f E281≔�bE281�vdcþyE281

∅ f CB262; f E242 ; f E265, uEY260 ; yE242 ; yE265, f IT240 iB≔iinvþ idcf EY260; f IT267 ; f IT281 yIT240 ; yIT267 ; yIT281 idc≔�bIT281� f IT281þyIT281

iinv≔irms � vrms=ðe � vinvÞþvinv=Rinv

irms≔�bIT267� f IT267þyIT267vrms≔�bE265� f E265þyE265vinv≔sCB262 � v3v3≔sEY260 � v2sEY260≔f EY260 � uEY260

v2≔�bE242� f E242þyE242sCB262≔f CB262f IT240≔�bIT240� iBþyIT240

∅ Rac ; Rfan ; f CB266, uEY272 ; uEY275 ; yE265, f IT267 irms≔ 1ffiffi2

pffiffiffi2


ffiffiffi2

piac

�� f E265 ; f EY272 ; f EY275 yIT267 ifan≔sEY275 � v4=Rfan

v4≔sCB266 � vrms

iac≔sEY272 � v4=Rac

sEY275≔f EY275 � uEY275

sEY272≔f EY272 � uEY272

sCB266≔f CB266vrms≔�bE265� f E265þyE265f IT267≔�bIT267� irmsþyIT267

∅ Rdc ; f E281 ; f EY281 uEY281 ; yE281 ; yIT281 f IT281 idc≔sEY281 � vdc=Rdc

vdc≔�bE281� f E281þyE281sEY281≔f EY281 � uEY281

f IT281≔�bIT281� idcþyIT281






hardware. The data was provided as part of the Fourth Interna-tional Diagnostics Competition (Sweet et al., 2013). First, thealgorithms are demonstrated on some sample fault scenarios toshow how online diagnosis works. Then, overall results arepresented on both training and validation data sets, followed byan analysis of the results and a discussion of ways in which thealgorithms could be improved. The algorithms are implementedusing Cþþ with the memory requirements being linear in thenumber of sensors.

10.1. Demonstration of the approach

In order to demonstrate the differences between the diagnosers,a demonstration scenario in which a drift fault in E242 is injectedat 175 s with a slope of 0.075 V/s is described. Measured andpredicted values for relevant outputs are shown in Figs. 7–10. QEDdetects the fault at 176.7 s, with an increase in the E242 residual. Atthis point, the possible faults are in AC483, CB262, DC485, E242,EY260, EY272, EY275, EY284, FAN416 (failed off or underspeed),and INV2. At 176.9 s, the stuck mode of E242 is eliminated sinceconsecutive measurements were of different values. At 177.1 s, thediscrete change symbol is computed as X, which does not changethe candidate list. At 179.2 s, the slope of E242 is computed as þ ,eliminating all candidates but AC483 resistance drift, DC485resistance drift, and E242 drift. At 220 s, since only one residual

had deviated, it is concluded to be a sensor fault and E242 drift isisolated. The injection time is computed as 169.8 s and themagnitude as 0.069 V/s. The recommended action is ∅, which iscorrect.

0 50 100 150 200 25014.5

15

15.5

16

16.5

17

17.5

Time (s)

IT24

0 (P

C) Measured

Predicted

Fig. 10. Measured and predicted values of IT240 (local submodel) for E242drift fault.

0 50 100 150 200 25022

23

24

25

26

27

28

Time (s)

E281

(PC

)

MeasuredPredicted

Fig. 9. Measured and predicted values of E281 (local submodel) for E242 drift fault.

0 50 100 150 200 25023

24

25

26

27

28

29

Time (s)

E242

(Glo

bal)

MeasuredPredicted

Fig. 7. Measured and predicted values of E242 (global model) for E242 drift fault.

0 50 100 150 200 25023

24

25

26

27

28

29

Time (s)

E242

(PC

)

MeasuredPredicted

Fig. 8. Measured and predicted values of E242 (local submodel) for E242 drift fault.

Table 8 (continued )


ω Rfan ; f CB266; f E265 yE265; yST516 f ST516 ω≔R tt0

_ω dt

_ω≔�ω=Jfanþv4=ðBfan � Jfan � RfanÞv4≔sCB266 � vrms

sCB266≔f CB266vrms≔�bE265� f E265þyE265f ST516≔�bST516�ωþyST516

TB ∅ yTE228 f TE228 TB≔R tt0

_T B dt_T B≔0f TE228≔�TB�bTE228þyTE228

Fig. 6. Identification of drift faults.






QED-PC detects the fault at 178.1 s on the PC for E242. Thethresholds for the residuals generated by the local submodels arelarger, so, as expected, fault detection is slower than with QED. Theinitial candidate list consists only of faults in E240 and E242, sincethe remaining faults are decoupled from the E242 residual by thelocal submodel design. This is in contrast to QED, where mostcomponents except sensors were implicated with the first residualdeviation. At 178.3 s, it is determined that neither E240 nor E242 isstuck. At 182.1 s the discrete change symbol for E242 is computedas X, which does not change the candidate list. At 182.1 s, adecrease in the E281 submodel residual and an increase in theIT240 submodel residual are detected, isolating the fault to E242(drift or offset). At 183.1 s, the slope on E242 is computed as þ ,therefore isolating a drift in E242 as the fault. The injection time iscomputed as 169.7 s, and the slope as 0.069 V/s. The recommendedaction is ∅.

QED-PCþþ detects the fault at 176.7 s with the E242 globalmodel residual. The initial candidate list is the same as with QED;the candidate faults are in AC483, CB262, DC485, E242, EY260,EY272, EY275, EY284, FAN416 (failed off or underspeed), and INV2.At 176.9 s, it is determined that E242 is not stuck. At 177.1 s, thediscrete change symbol for E242 is computed as X, and thecandidate list remains unchanged. At 178.1 s, an increase in theresidual for the E242 submodel is detected, thus eliminating allcandidates except for faults in E242. Thus, fault isolation iscompleted far earlier than QED (about 30 s), and also earlier thanQED-PC (about 5 s). The fault is computed as a drift with aninjection time of 169.8 s and a slope of 0.069 V/s. The recom-mended action is ∅.

10.2. Summary of results

The set of experimental data is split into two sets, one fortraining (228 scenarios) and one for validation (115 scenarios). Inorder to evaluate the performance of the diagnosis algorithms, thefollowing set of metrics are used (Kurtoglu et al., 2009; Sweet et al.,2013): the mean time to detect faults Mfd, which is the faultdetection time minus the fault injection time; the false negativerate Mfn, which is the number of missed detections over thenumber of faulty scenarios; the false positive rate Mfp, which isthe number of times a fault was detected when a fault was notpresent over the number of scenarios; detection accuracy Mda,which is one minus the sum of false positives and false negativesover the number of scenarios; the mean time to isolate faults Mfi,which is the time the candidate list last changed minus the faultinjection time; the number of classification errors Merr, which isthe Hamming distance between the true and diagnosed compo-nent mode vectors; the mean CPU time Mcpu; the mean peakmemory usage Mmem; and the overall recovery cost Mrc.

The results for the training data set are summarized in Table 9.Ideally, the algorithms need to be tuned only with respect to thefault detectors and symbol generators. While evaluating the algo-rithms on the training data, these are tuned to be as sensitive aspossible without obtaining any false alarms. Some instances ofspikes in the data are also found, which are not considered to befaults and cause spurious false alarms. To filter out such dataspikes, a median filter over three samples is implemented.

Overall, all algorithms do very well after tuning. In the trainingdata set, there are 12 unavoidable classification errors due to thecases for the indistinguishable faults (e.g., DC485 failing versus itsrelay failing). For QED, the remaining errors are due to incorrectfault mode identification (e.g., identifying as intermittent offsetinstead of drift) and missed detections. The missed detections areacceptable, because the faults are small enough that the correctaction is ∅. Except in one case, the incorrect mode identificationscenarios did not result in an incorrect recommendation. The one

scenario that did was one where the identified fault parameterswere off just enough so that an abort was recommended when thecorrect action was ∅, resulting in the recovery cost of 25.

QED-PC had similar errors, and also some additional ones inwhich the DC485 and IT281 faults were confused. This is a difficultsituation to correct, because the thresholds were difficult to tune.DC485 can only be uniquely isolated if only IT281 deviates, so ifadditional deviations are seen, it can be concluded that it is anIT281 fault. However, sometimes IT281 did not cause large enoughdeviations in other residuals so it was misdiagnosed as DC485.These scenarios did not result in a bad recommendation, though. Intwo cases, QED-PC recommended ∅ when the correct action was toabort. In the first case, an ST516 fault was misdiagnosed as an E265fault. This error is due to threshold sensitivities. In the second case,an ST516 intermittent offset fault was misdiagnosed as an offsetfault, for which the identified offset was not enough to trigger anabort recommendation.

QED-PCþþ generally performed the same as QED. For the caseswhere the submodel-based residuals were not analyzed correctly,the use of the global model residuals ended up being used forreasoning instead, and so QED-PCþþ avoided the errors QED-PChad. QED-PCþþ also had the best fault detection and faultisolation times. This was expected since QED-PCþþ has moreresiduals to use for fault detection and isolation.

Overall results for the validation data set are shown in Table 10.In this set of scenarios, there were 20 scenarios in which errors areexpected in fault isolation due to indistinguishable faults, so eachalgorithm can at best have 20 classification errors. For QED, therewere no false positives, and seven classification errors. The classi-fication errors were due in two cases to a misidentified componentmode (offset instead of drift, and offset instead of intermittentoffset), resulting in four total errors, plus three cases in which allcandidates were eliminated due to an incorrectly generated slopesymbol. None of these scenarios had an associated recovery cost.The 125 recovery cost came from one scenario in which the faultwas correctly detected and isolated, but the drift was under-estimated, and so an abort decision was missed. Fault detectionand isolation times were overall lower than in the trainingscenarios, because on average the fault magnitudes were larger,so faults were easier to detect, and there were relatively fewer driftfaults, so, on average, faults could be isolated faster.

For QED-PC, a significant increase in false positives occurred.The majority of these scenarios were due to the residual for E265.Upon further inspection, it was found that the noise characteristicsfor E265 between the training and validation data sets weredifferent (these data sets were gathered a year apart), i.e. a bias

Table 10Diagnosis results over validation data set.

Diagnosticalgorithm

Mfd

(s)Mfn Mfp Mda Mfi

(s)Merr Mcpu

(ms)Mmem

(kb)Mrc

QED 3.70 0.0 0.0 1.00 72.00 27 10.70 7503 125QED-PC 3.13 0.0 0.4087 0.5913 67.33 87 9.78 7689 1375QED-PCþþ 1.65 0.0 0.4435 0.5565 43.84 67 13.62 7850 1250

Table 9Diagnosis results over training data set.

Diagnosticalgorithm

Mfd

(s)Mfn Mfp Mda Mfi (s) Merr Mcpu

(ms)Mmem

(kb)Mrc

QED 9.38 0.0152 0.0 0.987 125.61 19 22.72 7675 25QED-PC 14.66 0.025 0.0 0.978 127.70 37 23.70 7743 275QED-PCþþ 9.32 0.0152 0.0 0.987 124.94 19 24.49 7835 25






was introduced, and this is not something the Z-test in its currentimplementation can account for. This resulted in many falsepositives on E265. As a result, fault isolation was compromised inthe scenarios in which the false alarm was obtained beforeisolation of the true fault was completed. The performance ofQED-PC was actually comparable to its performance on the trainingdata, omitting these cases. Additional false alarms appeared onIT267 (2 scenarios), IT240 (2 scenarios), and E242 (1 scenario),although these were more likely due to their detectors being toosensitive.

Because QED-PCþþ used the same residual for E265 as QED-PC, it also had problems related to false alarms with that residual.Because QED-PCþþ also had the global model residuals to rely on,its performance is improved over QED-PC. It also had the bestoverall fault detection and isolation times.

10.3. Lessons learned

In this subsection, some lessons learned are presented from theapplication of our algorithms on a real system that highlightedsome of the strengths and weaknesses of the algorithms. This helpsto critically identify and evaluate the shortcomings of the algo-rithms and explore potential solutions to address these short-comings and improve the diagnosis framework.

The diagnosis approach presented here offers many advantages.First and foremost, the approach is model-based. Given a globalmodel of the system, most aspects are generated automatically:generation of submodels for residual generation and fault identi-fication and generation of fault signatures and relative residualorderings. Some manual tuning of parameters is required for thefault detectors, symbol generators, and the consistency tests forfault identification. Fault isolation is fast, with the candidate setbeing reduced to a small number of candidates only a short timeafter fault detection. This limits the amount of computation thatmust be performed by fault identification to only these fewcandidates. Further, fault isolation is efficient through the use oflocal submodels.

Of course, all these approaches are dependent on correctsymbol generation for correct fault isolation, and this turns outto be the source of most of the errors. Sensor noise makes correctsymbol generation difficult. For the submodel-based residuals, theeffect of sensor noise is greater because (noisy) sensor values areused as local inputs. The use of the Z-test attempts to mitigate theeffect of noise, however, additional hand-tuning is typicallyrequired. Even when hand-tuning is not required, symbols can stillbe incorrectly generated if the noise characteristics of the sensorchange compared to the training data used. This, in fact, is whathappened with the submodel-based residual for E265 with QED-PCand QED-PCþþ in the validation data set. This was particularlyproblematic because for all the sensor fault scenarios (about halfthe total number of scenarios), this false alarm was present. As aresult, the fault isolation was incorrect in many of these scenariosand this resulted in classification errors and recovery costs.Incorrectly generated slope symbols, and false alarms on othersensors, also caused some diagnostic errors.

In addition, for the submodel-based residuals in particular, somefault detection thresholds were too sensitive. The tuning of thesethresholds is difficult, especially since both the presence of (i.e.,through predicted fault signatures) and the absence of (i.e., throughthe isolation rules) residual deviations for fault isolation are used.That is, in some cases a change in a sensor needs to be detected inorder to properly diagnose, and in other cases it is necessary to makesure the detector does not fire when there is no change in the sensor.Perhaps a more robust approach is to resolve ambiguities using onlyfault identification without relying on heuristic isolation rules,although this may increase fault isolation times.

All these issues related to symbol generation motivate the needfor a more robust qualitative fault isolation approach, in whichthere are probabilities associated with the correctness of generatedsymbols, and diagnostic reasoning can be performed consideringthe case where the symbol was incorrectly generated and theassociated confidence in the generated symbols. Current work onthat front is reported in Daigle, Roychoudhury, and Bregon (2014).

Intermittent faults also introduced some issues. For one, inter-mittent faults detected as þþ (or --) are not accounted for.Ideally, the symbol generators should have generated þ- (or -þ)and þ0 (or -0) symbols for every time intermittent faultssurfaced, but this relies on a more explicit test for discontinuitydetection. Second, the algorithms can benefit from a more robustintermittent fault identification approach.

Because QED-PCþþ makes use of both the global modelresiduals and the submodel-based residuals, it combines thestrengths of both residual sets. The first set is most useful fornonsensor faults, which affect many residuals and therefore give alot of information for fault isolation, and the second set is mostuseful for sensor faults, which affect more residuals than non-sensor faults (for a good model decomposition) and therefore givea lot of information for fault isolation for those types of faults. Asobserved in both the training and validation data sets, faster faultdetection and isolation are seen with the combined residual set.Using both residual sets also eliminated most of the heuristic faultisolation rules, reducing another potential source of diagnosticerrors due to problems tuning the time periods used in those rules.

11. Related work

A variety of other diagnosis approaches have also been appliedto ADAPT. Only one other was tested on the same data set, so a faircomparison with the available metrics is tried to be made here.

An approach similar to ours is described in Carl, Mack, Tantawy,Biswas, and Koutsoukos (2012), which shares the same qualitativefault isolation methodology, but without residual orderings. A keydifference is that the different fault profiles are isolated directlywithin the fault detectors. The correct fault was detected andisolated 60% of the time. Like our approach, tuning the faultdetectors properly becomes critical, as errors in this step propagateto reasoning.

Component models similar to ours are used in Maiga,Chanthery, and Trave-Massuyes (2012), and ARRs are derived fora subset of ADAPT-Lite. Changes in residual values associated toARRs are used to diagnose the system mode and faults that haveoccurred, and so must precompute ARRs for each mode of thesystem. Since only single faults are considered and all modechanges correspond to faults in ADAPT-Lite, really only a subsetof these ARRs are needed. Only a limited comparison to a previousincarnation of QED was performed, finding that this approach hadsimilar fault detection times for abrupt faults.

In Wilson and Kurtoglu (2010), a model-based diagnosisapproach implemented using the Testability Engineering andMaintenance System (TEAMS) (Pattipati, Raghavan, Shakeri, Deb,& Shrestha, 1994) is applied. TEAMS is based on functional faultmodeling, and computes, similar to ARRs, a dependency matrix ofones and zeros that stores which faults affect which test points.Unlike ARRs and PCs, the tests may not correspond directly toresiduals and are constructed by hand. This approach was appliedto the entire ADAPT system, but relative to QED had worse faultdetection and isolation performance. This could be improved withbetter tests, but the advantage of an approach like QED is that thetests are defined automatically in a specific way (i.e., based onresiduals). A similar approach to Wilson and Kurtoglu (2010) wasfollowed in Almqvist et al. (2010), where manually created tests






were constructed and fault isolation was performed based on thetest results. Fault isolation performance was very accurate, butwith a false positive rate of 25% and a false negative rate of 7%.

A classification approach using artificial immune systems wasapplied to ADAPT-Lite in Mange, Daniszewski, and Dunn (2011).Nominal data was used to train a set of detectors. The set oftriggered detectors was then used to isolate the faults. Theapproach detects faults about six times slower than QED, butisolates faults about twice as fast. Despite this, the approach hadabout 50% more fault isolation errors than QED, but the number oferrors was still relatively low.

An approach that diagnoses faults in the dc subsystem based onconvex optimization is employed in Gorinevsky, Boyd, and Poll(2009). Diagnostic metrics are not provided. In contrast to suchpurely quantitative approaches, the main idea of QED is to startfirst with a quick fault isolation step that decreases significantly thesize of the remaining quantitative problem (i.e., fault identifica-tion). As systems increase in size, purely quantitative approachesbecome more difficult to apply.

In Bunus, Isaksson, Frey, and Munker (2009), conflict-drivenfault detection and isolation is performed. The inference engine is aconstraint solver that checks whether a solution exists, computingevery solution that explains the available observations. In a similarapproach, the Hybrid Diagnosis Engine (HyDE) (Narasimhan &Brownston, 2007) was applied to the full ADAPT system in Sweet(2008). HyDE is a model-based diagnosis engine which generatesconflicts (and eventually fault candidates) by using consistencybetween the model predictions and the system observations. AHyDE model may use boolean, discrete, real, or interval-valuedvariables to describe the behavior of a system. In Sweet (2008), asteady-state model was developed, and so was very slow in faultdetection and isolation, yielding only a 24% detection accuracywhen applied to ADAPT-Lite with the DXC data. HyDE was alsoapplied to the same data set used in this paper, having significantlyhigher costs (2650 total), due to having almost five times as manydiagnostic errors. Many of these errors are due to the use of asteady-state model, and one would expect HyDE's performance toimprove significantly with the model used by QED.

Steady-state diagnosis was also performed in Grastien and Kan-John (2009), resulting in static diagnosis problems that were solvedwith Reiter's approach (Reiter, 1987). Steady-state approaches havesimple fault isolation algorithms, but at the cost of delayed faultdetection. Further, some faults cannot be identified using only thesteady states (e.g., drift faults).

Probabilistic diagnosis algorithms are used in Ricks andMengshoel (2009), using discrete and static Bayesian networkmodels that are compiled with arithmetic circuits, which derivemarginal probabilities using addition and multiplication opera-tions. The approach can handle multiple faults, and achieved 96%diagnosis accuracy on its test data. A problem with this approach isthat the reasoning is so simple that much effort has to be placedinto careful development and tuning of the Bayesian models. Withan approach like QED, the model already has much embeddedinformation for reasoning, so less effort is expended on fine-tuningit (the fine-tuning, which is relatively less effort, moves to the faultdetectors and symbol generators).

12. Conclusions

In this work, a model-based diagnosis architecture that com-bines qualitative fault isolation and quantitative fault identificationis developed. From the diagnostic framework, three alternativealgorithms are instantiated, one which uses the global model of thesystem (QED), one which uses model decomposition to derive localsubmodels (QED-PC), and one which combines both the global

model and local submodels (QED-PCþþ). The differentapproaches are applied to the ADAPT system, and several enhance-ments in the framework are developed, such as stuck faultdetection, heuristic fault isolation rules, and intermittent faultdiagnosis. These new enhancements can be easily applied to othermodel-based fault diagnosis approaches as well.

Overall, the algorithms did very well on the case study. Allapproaches had very good fault detection and isolation. For thetraining data set, fault isolation and identification was goodenough that QED and QED-PCþþ never neglected to commandan abort when it was necessary, and QED-PC only missed an abortcommand twice (out of 227 experiments). For the validation dataset, QED performed very well, but QED-PC and QED-PCþþsuffered, because the noise characteristics of one sensor changedenough to cause a number of false alarms, leading to several faultisolation errors, and, as a result, incorrect recovery recommenda-tions. Although this is an easy issue to resolve, it still motivates theneed for a more robust approach to fault isolation that can handleoccasional incorrect symbol generation.

In the future, these algorithms will be applied to larger systems,e.g., the complete ADAPT system. ADAPT is a hybrid system, andcan have multiple faults. Hence, in the future, the currentapproaches will be extended to diagnosis of hybrid systems, andmultiple fault diagnosis. Additionally, ongoing work is investigat-ing more robust fault isolation frameworks.

References

Almqvist, E., Eriksson, D., Lundberg, A., Nilsson, E., Wahlstrom, N., Frisk, E., et al.(2010). Solving the ADAPT benchmark problem: A student project study. InProceedings of the 21st international workshop on principles of diagnosis, Portland,OR (pp. 153–160).

Arogeti, S., Wang, D., & Low, C. B. (2010). Mode identification of hybrid systems in thepresence of fault. IEEE Transactions on Industrial Electronics, 57(4), 1452–1467.http://dx.doi.org/10.1109/TIE.2009.2030213.

Arogeti, S., Wang, D., Low, C. B., & Yu, M. (2012). Fault detection isolation andestimation in a vehicle steering system. IEEE Transactions on Industrial Electro-nics, 59(12), 4810–4820. http://dx.doi.org/10.1109/TIE.2012.2183835.

Biswas, G., Simon, G., Mahadevan, N., Narasimhan, S., Ramirez, J., & Karsai, G. (2003).A robust method for hybrid diagnosis of complex systems. In Proceedings of the5th symposium on fault detection, supervision and safety for technical processes(pp. 1125–1131).

Bregon, A., Biswas, G., & Pulido, B. (2012). A decomposition method for nonlinearparameter estimation in TRANSCEND. IEEE Transactions on Systems, Man andCybernetics, Part A: Systems and Humans, 42(3), 751–763. http://dx.doi.org/10.1109/TSMCA.2011.2170065.

Bregon, A., Daigle, M., Roychoudhury, I., Biswas, G., Koutsoukos, X., & Pulido, B.(2014). An event-based distributed diagnosis framework using structural modeldecomposition. Artificial Intelligence, 210, 1–35 http://dx.doi.org/10.1016/j.artint.2014.01.003.

Bunus, P., Isaksson, O., Frey, B., & Munker, B. (2009). RODON: A model-based diagnosisapproach for the DX diagnostic competition. In Proceedings of the 20th internationalworkshop on principles of diagnosis, Stockholm, Sweden (pp. 423–430).

Carl, J. D., Mack, D. L. C., Tantawy, A., Biswas, G., & Koutsoukos, X. D. (2012). Faultisolation for spacecraft systems: An application to a power distribution testbed.In Proceedings of the eighth IFAC symposium on fault detection, supervision andsafety of technical processes, Mexico City, Mexico (pp. 168–173).

Ceraolo, M. (2000). New dynamical models of lead-acid batteries. IEEE Transactionson Power Systems, 15(4), 1184–1190.

Chen, J., & Patton, R. (1999). Robust model-based fault diagnosis for dynamic systems.Norwell, MA, USA: Kluwer Academic Publishers.

Cordier, M., Dague, P., Lévy, F., Montmain, J., Staroswiecki, M., & Travé-Massuyès, L.(2004). Conflicts versus Analytical Rendundancy Relations: A comparativeanalysis of the model-based diagnosis approach from the artificial intelligenceand automatic control perspectives. IEEE Transactions on Systems, Man, andCybernetics, Part B: Cybernetics, 34(5), 2163–2177.

Daigle, M. J., Koutsoukos, X., & Biswas, G. (2009). A qualitative event-based approachto continuous systems diagnosis. IEEE Transactions on Control Systems Technol-ogy, 17(4), 780–793.

Daigle, M., Roychoudhury, I., Biswas, G., Koutsoukos, X., Patterson-Hine, A., & Poll, S.(2010). A comprehensive diagnosis methodology for complex hybrid systems: Acase study on spacecraft power distribution systems. IEEE Transactions of Systems,Man, and Cybernetics, Part A, 4(5), 917–931.

Daigle, M., Bregon, A., & Roychoudhury, I. (2012). Qualitative event-based diagnosiswith possible conflicts applied to spacecraft power distribution systems. InProceedings of the eighth IFAC symposium on fault detection, supervision and safetyof technical processes (pp. 265–270).



http://dx.doi.org/10.1109/TIE.2009.2030213






http://dx.doi.org/10.1109/TSMCA.2011.2170065




dx.doi.org/10.1016/j.artint.2014.01.003

dx.doi.org/10.1016/j.artint.2014.01.003

http://refhub.elsevier.com/S0967-0661(15)00025-8/sbref9



















Daigle, M., Roychoudhury, I., & Bregon, A. (2014). Qualitative event-based faultisolation under uncertain observations. In Annual conference of the prognosticsand health management society 2014 (pp. 347–355).

de Kleer, J., & Kurien, J. (2003). Fundamentals of model-based diagnosis. InProceedings of the fifth IFAC symposium on fault detection, supervision and safetyfor technical processes, Washington, DC, USA (pp. 25–36).

Fravolini, M., & Campa, G. (2009). Design of robust redundancy relations for a semi-scale yf-22 aircraft model. Control Engineering Practice, 17(7), 773–786.

Gertler, J. (1998). Fault detection and diagnosis in engineering systems. New York:Marcel Dekker, Inc.

Gorinevsky, D., Boyd, S., & Poll, S. (2009). Estimation of faults in dc electrical powersystem. In American control conference (pp. 4334–4339).

Goupil, P. (2010). Oscillatory failure case detection in the A380 electrical flightcontrol system by analytical redundancy. Control Engineering Practice, 18(9),1110–1119.

Goupil, P. (2011). Airbus state of the art and practices on FDI and FTC in flight controlsystem. Control Engineering Practice, 19(6), 524–539.

Grastien, A., & Kan-John, P. (2009). Wizards of oz: Description of the 2009 DXCentry. In Proceedings of the 20th international workshop on principles of diagnosis,Stockholm, Sweden (pp. 409–413).

Isermann, R. (1997). Supervision, fault-detection and fault-diagnosis methods—anintroduction. Control Engineering Practice, 5(5), 639–652.

Kurtoglu, T., Narasimhan, S., Poll, S., Garcia, D., Kuhn, L., de Kleer, J., et al. (2009). Firstinternational diagnosis competition—DXC'09. In Proceedings of the 20th interna-tional workshop on principles of diagnosis (pp. 383–396).

Lo, C. H., Wong, Y., & Rad, A. (2006). Intelligent system for process supervision andfault diagnosis in dynamic physical systems. IEEE Transactions on IndustrialElectronics, 53(2), 581–592. http://dx.doi.org/10.1109/TIE.2006.870707.

Maiga, M., Chanthery, E., & Trave-Massuyes, L. (2012). Hybrid system diagnosis: Testof the diagnoser hydiag on a benchmark of the international diagnosticcompetition dxc 2011. In Proceedings of the eighth IFAC symposium on faultdetection, supervision and safety of technical processes, SAFEPROCESS12, MexicoCity, Mexico (pp. 271–276).

Mange, J., Daniszewski, D., & Dunn, A. (2011). Artificial immune systems fordiagnostic classification problems. In Proceedings of the 22nd internationalworkshop on principles of diagnosis, Murnau, Germany (pp. 279–284).

Mosterman, P. J., & Biswas, G. (1999). Diagnosis of continuous valued systems intransient operating regions. IEEE Transactions on Systems, Man and Cybernetics—Part A, 29(6), 554–565.

Narasimhan, S., & Brownston, L. (2007). HyDE—A general framework for stochasticand hybrid model-based diagnosis. In Proceedings of the 18th internationalworkshop on principles of diagnosis (pp. 162–169).

Pattipati, K., Raghavan, V., Shakeri, M., Deb, S., & Shrestha, R. (1994). Teams:Testability engineering and maintenance system. In American control conference,(Vol. 2, pp. 1989–1995).

Patton, R., Frank, P., & Clark, R. (2000). Issues of fault diagnosis for dynamic systems.London: Springer.

Poll, S., et al. (2007). Evaluation, selection, and application of model-based diagnosistools and approaches. In AIAA Infotech@Aerospace 2007 conference and exhibit.

Poll, S., Patterson-Hine, A., Camisa, J., Garcia, D., Hall, D., Lee, C., et al. (2007).Advanced diagnostics and prognostics testbed. In Proceedings of the 18thinternational workshop on principles of diagnosis (pp. 178–185).

Pulido, B., & Alonso-González, C. (2004). Possible conflicts: A compilation techniquefor consistency-based diagnosis. IEEE Transactions on Systems, Man, and Cyber-netics, Part B, 34(5), 2192–2206.

Reiter, R. (1987). A theory of diagnosis from first principles. Artificial Intelligence, 32(1), 57–95.

Ricks, B., & Mengshoel, O. (2009). The diagnostic challenge competition: Probabil-istic techniques for fault diagnosis in electrical power systems. In Proceedings ofthe 20th international workshop on principles of diagnosis, Stockholm, Sweden(pp. 415–422).

Roychoudhury, I., Daigle, M., Bregon, A., & Pulido, B. (2013). A structural modeldecomposition framework for systems health management. In Proceedings of the2013 IEEE aerospace conference.

Sweet, A. (2008). Testing HyDE on ADAPT, Technical report. NASA/TM 2008-214570,NASA Ames Research Center, Moffet Field, CA, USA.

Sweet, A., Feldman, A., Narasimhan, S., Daigle, M., & Poll, S. (2013). Fourthinternational diagnostic competition—DXC'13. In Proceedings of the 24th inter-national workshop on principles of diagnosis (pp. 224–229).

Thompson, H. (1994). Parallel processing architectures for aerospace applications.Control Engineering Practice, 2(3), 509–520.

Wilson, M., & Kurtoglu, T. (2010). TRT4ADAPT—A model-based algorithm for thesecond international diagnostic competition. In Proceedings of the 21st interna-tional workshop on principles of diagnosis, Portland, OR (pp. 387–394).
































Qualitative event-based diagnosis applied to a spacecraft electrical power distribution system

Documents

Transcript of Qualitative event-based diagnosis applied to a spacecraft electrical power distribution system