Markov Regenerative Stochastic Petri Nets to Model and Evaluate Phased Mission Systems Dependability

15
Markov Regenerative Stochastic Petri Nets to Model and Evaluate Phased Mission Systems Dependability Ivan Mura and Andrea Bondavalli, Member, IEEE Computer Society Abstract—This study deals with model-based dependability transient analysis of phased mission systems. A review of the studies in the literature [25] showed that several aspects of multiphased systems pose challenging problems to the dependability evaluation methods and tools. To attack the weak points of the state-of-the-art we propose a modeling methodology that exploits the power of the class of Markov regenerative stochastic Petri net models. By exploiting the techniques available in the literature for the analysis of the Markov Regenerative Processes, we obtain an analytical solution technique with a low computational complexity, basically dominated by the cost of the separate analysis of the system inside each phase. Last, the existence of analytical solutions allows us to derive the sensitivity functions of the dependability measures, thus providing the dependability engineer with additional means for the study of phased mission systems. Index Terms—Phased mission systems, analytical modeling and evaluation, Markov regenerative stochastic Petri nets, Markov regenerative processes, dependability, performability, sensitivity analysis. æ 1 INTRODUCTION I N this paper, we address the analytical dependability modeling of a class of systems that, in the literature, are known as Phased Mission Systems (PMS, hereafter). PMS perform a series of tasks that must be accomplished in sequence. Their operational life thus consists of a sequence of nonoverlapping periods, called “phases.” During a specific phase, a PMS is devoted to the execution of a particular set of tasks, which may be completely different from those performed within other phases. The perfor- mance and dependability requirements of a PMS can be utterly different from one phase to another. Moreover, during some phases, the system may be subject to particularly stressing environments, thus experiencing dramatic increases in the failure rate of its components. To accomplish its mission, a PMS may need to change its configuration over time to adopt the most suitable one with respect to the performance and dependability requirements of the phase being currently executed or simply to be more resilient to an hazardous external environment. Examples of PMS can be found in various application domains. A typical class of PMS frequently found in the literature is represented by the on-board systems for the aided-guide of aircraft, whose mission consists of take-off, ascent, cruise, approach, landing phases. A PMS analysis can be also applied for studying the dependability of a system during its whole life-cycle to establish a proper schedule of maintenance operations. In this context, we talk of the Scheduled Maintenance System (SMS) problem. Another typical example of PMS is provided by the computing systems embarked on long-life spacecraft, which must survive long periods of very low activity and then are required to achieve the scientific goals of the mission in a short period of intense solicitation. Because of their deployment in critical applications, the dependability modeling and analysis of PMS is an issue of primary relevance that has been widely investigated. However, due to their dynamic behavior, PMS offer challenges in dependability modeling and evaluation. For the sake of a cost effective analytical solution, several methods proposed in the literature necessarily need to introduce simplifying modeling assumptions. The first simplification concerns the duration of phases, which are usually assumed to last for deterministic periods of time. While this assumption may be realistic for certain PMS scenarios, as, for instance, space applications where phases are typically preplanned on the ground, it is certainly not appropriate for other classes of PMS for which the starting and ending of a phase are instead triggered by unpredictable events. On the other hand, considering a constant phase duration enables a comparatively simple solution of the models, whereas dealing with random duration makes the analysis considerably more complex. The second kind of approximation is introduced to keep the intraphase model within the boundaries of time- homogeneous Markov chains, thus exploiting the efficient and well-experienced solution techniques available for this class of stochastic processes. This requires all the timed activities of the system to be modeled by exponentially distributed random variables, no matter which kind of distribution they actually follow; further approximations are thus introduced inside the models. IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 12, DECEMBER 2001 1337 . I. Mura is with the Motorola Technology Center Italy, Via C. Massaia 83, 10147 Torino, Italy. E-mail: [email protected]. . A. Bondavalli is with the Dipartimento di Sistemi e Informatica, Universita´di Firenze, Via Lombroso 6/17, 50134 Firenze, Italy. E-mail: [email protected]. Manuscript received 21 Mar. 2000; revised 31 May 2001; accepted 5 June 2001. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 110490. 0018-9340/01/$10.00 ß 2001 IEEE

Transcript of Markov Regenerative Stochastic Petri Nets to Model and Evaluate Phased Mission Systems Dependability

Markov Regenerative Stochastic Petri Netsto Model and Evaluate Phased Mission

Systems DependabilityIvan Mura and Andrea Bondavalli, Member, IEEE Computer Society

AbstractÐThis study deals with model-based dependability transient analysis of phased mission systems. A review of the studies in

the literature [25] showed that several aspects of multiphased systems pose challenging problems to the dependability evaluation

methods and tools. To attack the weak points of the state-of-the-art we propose a modeling methodology that exploits the power of the

class of Markov regenerative stochastic Petri net models. By exploiting the techniques available in the literature for the analysis of the

Markov Regenerative Processes, we obtain an analytical solution technique with a low computational complexity, basically dominated

by the cost of the separate analysis of the system inside each phase. Last, the existence of analytical solutions allows us to derive the

sensitivity functions of the dependability measures, thus providing the dependability engineer with additional means for the study of

phased mission systems.

Index TermsÐPhased mission systems, analytical modeling and evaluation, Markov regenerative stochastic Petri nets, Markov

regenerative processes, dependability, performability, sensitivity analysis.

æ

1 INTRODUCTION

IN this paper, we address the analytical dependabilitymodeling of a class of systems that, in the literature, are

known as Phased Mission Systems (PMS, hereafter). PMSperform a series of tasks that must be accomplished insequence. Their operational life thus consists of a sequenceof nonoverlapping periods, called ªphases.º During aspecific phase, a PMS is devoted to the execution of aparticular set of tasks, which may be completely differentfrom those performed within other phases. The perfor-mance and dependability requirements of a PMS can beutterly different from one phase to another. Moreover,during some phases, the system may be subject toparticularly stressing environments, thus experiencingdramatic increases in the failure rate of its components.To accomplish its mission, a PMS may need to change itsconfiguration over time to adopt the most suitable one withrespect to the performance and dependability requirementsof the phase being currently executed or simply to be moreresilient to an hazardous external environment.

Examples of PMS can be found in various applicationdomains. A typical class of PMS frequently found in theliterature is represented by the on-board systems for theaided-guide of aircraft, whose mission consists of take-off,ascent, cruise, approach, landing phases. A PMS analysiscan be also applied for studying the dependability of asystem during its whole life-cycle to establish a proper

schedule of maintenance operations. In this context, we talkof the Scheduled Maintenance System (SMS) problem.Another typical example of PMS is provided by thecomputing systems embarked on long-life spacecraft, whichmust survive long periods of very low activity and then arerequired to achieve the scientific goals of the mission in ashort period of intense solicitation.

Because of their deployment in critical applications, thedependability modeling and analysis of PMS is an issue ofprimary relevance that has been widely investigated.However, due to their dynamic behavior, PMS offerchallenges in dependability modeling and evaluation. Forthe sake of a cost effective analytical solution, severalmethods proposed in the literature necessarily need tointroduce simplifying modeling assumptions.

The first simplification concerns the duration of phases,which are usually assumed to last for deterministic periodsof time. While this assumption may be realistic for certainPMS scenarios, as, for instance, space applications wherephases are typically preplanned on the ground, it iscertainly not appropriate for other classes of PMS for whichthe starting and ending of a phase are instead triggered byunpredictable events. On the other hand, considering aconstant phase duration enables a comparatively simplesolution of the models, whereas dealing with randomduration makes the analysis considerably more complex.

The second kind of approximation is introduced to keepthe intraphase model within the boundaries of time-homogeneous Markov chains, thus exploiting the efficientand well-experienced solution techniques available for thisclass of stochastic processes. This requires all the timedactivities of the system to be modeled by exponentiallydistributed random variables, no matter which kind ofdistribution they actually follow; further approximationsare thus introduced inside the models.

IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 12, DECEMBER 2001 1337

. I. Mura is with the Motorola Technology Center Italy, Via C. Massaia 83,10147 Torino, Italy. E-mail: [email protected].

. A. Bondavalli is with the Dipartimento di Sistemi e Informatica,Universita di Firenze, Via Lombroso 6/17, 50134 Firenze, Italy.E-mail: [email protected].

Manuscript received 21 Mar. 2000; revised 31 May 2001; accepted 5 June2001.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 110490.

0018-9340/01/$10.00 ß 2001 IEEE

In this paper, we propose a methodology for themodeling and evaluation of PMS, based on the MarkovRegenerative Stochastic Petri Nets (MRSPN) approach. Themajor contributions of our work are the following ones:

. The definition of an MRSPN modeling frameworkthat accommodates in quite a natural way thespecific profile and the reconfigurations that mayoccur at phase change times.

. The tailoring of the MRSPN general transientanalysis techniques proposed in [14], [13], [6] forthe solution of PMS models. The general transientanalysis technique is simplified thanks to the phasedstructure of PMS, which allows immediate identifi-cation of a sequence of regeneration points in theunderlying MRGP process. Moreover, the phasedstructure also allows defining a partitioning of themarking space of the MRSPN model of a PMS,which results in a simplification of the analyticalsolution.

. The definition of a general analytical solutionscheme, which completely solves the issues posedby the phased behavior, does not require introdu-cing restrictions on the distributions of phaseduration, and easily lends itself to automation [8].

. The sensitivity study of PMS, which exploits theexistence of analytical expressions for the markingoccupation probability distribution to obtain thesensitivity functions of dependability measures.

This paper is organized as follows: Section 2 identifiesthe contribution offered by the MRSPN approach withrespect to the current state-of-the-art. In Section 3, weintroduce the MRSPN modeling approach and apply it toan example of PMS that serves as a case study throughoutthe paper. The general analytical solution of the MRSPNmodels of PMS is then discussed in Section 4 and severalspecializations of that solution technique are given inSection 5 for special PMS instances. Section 6 addressesthe task of analytical sensitivity analysis of PMS. Section 7provides guidelines for the practical utilization of themodeling methodology and the theoretical results pre-sented in the paper and also reports the numericalevaluation of some dependability attributes for the con-sidered example of PMS. Last, conclusions are given inSection 8.

2 OUR CONTRIBUTION TO PMS DEPENDABILITY

ANALYSIS

PMS have been widely investigated over the past decades.Many works have been proposed, either based on combi-natorial models, such as Fault Trees and Reliability BlockDiagrams, e.g., [19], [31], [11], [26], or on state spaceoriented models, such as Markov chains and various classesof Petri nets [1], [2], [9], [17], [24], [29], [30], [25]. Because oftheir ability in representing complex dependencies amongsystem components, state space approaches offer thepotentialities to address the features of the most complexinstances of PMS. To identify the limits of the variousmethods, we present in the following a series of systemscenarios that progressively include more and more

features of realistic PMS. For each scenario of PMS, wediscuss which methodologies are able to deal with theparticular case. Also, we highlight which ones among theraised issues we intend to tackle with our MRSPNapproach.

2.1 Scenario 1

Consider first a PMS that has to perform a set of phases ofconstant duration in a given, fixed order and for which theintraphase behavior is satisfactorily modeled by a time-homogeneous Markov chain. This scenario of PMS repre-sents, in a sense, the common ground for the methods in theliterature: All of them address this minimal instance of PMSand are able to deal with it. Some of the methods providemore flexibility/reusability, others are more efficient atsolution time; a comprehensive review and comparison canbe found in [25]. Our MRSPN approach easily accommo-dates this PMS scenario, as well. More precisely, in thisparticular case, our methodology simplifies to that pre-sented in [25], which is based on Deterministic andStochastic Petri Nets (DSPN) and which was shown toprovide several advantages with respect to other Markov-based approaches, concerning both the modeling and theevaluation aspects.

2.2 Scenario 2

Consider now a mission for which different goals aredefined, some of primary and others of secondary rele-vance. In this scenario, it is natural to endow the PMS withthe ability of dynamically adapting the mission profile byselecting which phase to perform next at the time thepreceding phase ends. For instance, phases performed toreach secondary goals may be skipped in favor of the moreimportant primary ones, depending on particular eventsoccurred during PMS life-time. The mission profile is, inthis case, represented by a tree of possible choices, ratherthan a chain, as it is in the case of a predefined sequence ofphases. Even if the assumptions of constant phase durationand Markov intraphase process are kept, this flexibilityfeature exceeds the modeling and solution capabilities ofmost of the methods proposed, with the exception of [9].The MRSPN methodology we present in this paper is ableto account for adaptive mission profiles.

2.3 Scenario 3

Consider now a PMS whose phase changes are triggered byunexpected events, such as a fault leading to the removal of asystem component or an emergency situation of operationarising. This turns out in phases of random duration whoseinitial and ending time instants may be unknown. Whenphases of random duration are to be considered, an exactanalysis becomes much more complex [3], even if theintraphase process is still a time-homogeneous Markovprocess. Just to give a clue of the difficulties of the analysis,notice that, in this scenario, the actual phase completion timesand, thus, the mission ending time, are unknown. If negativeexponential distributions adequately represent the phaseduration, then they can be employed to build an overallmodel of the PMS that still enjoys the Markov property, at anypoint in time, for which the Markov solution techniquesapply. However, there are classes of distributions that are not

1338 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 12, DECEMBER 2001

suitably modeled by exponential distributions, e.g., finitesupport distribution with low variance. Among those studiesthat addressed the modeling of PMS with random phaseduration, some adopted an approximate solution approach(without any bound on the error introduced), such as [1], [30],others resorted to the numerical solution of an associated setof differential equations [29], others ended in a simulativesolution [3]. Our MRSPN approach supports the modelingand the exact analytical evaluation of PMS for which theduration of the phases is drawn from the family ofdistributions that admit an analytical Laplace transform,which encompasses the exponential, deterministic, uniform,Coxian, phase-type distributions.

2.4 Scenario 4

Suppose now that modeling the intraphase behavior of aPMS through an homogeneous Markov chain does notresult in a realistic representation of system behavior. Forinstance, consider a PMS in which failed components arereplaced so that they are only unavailable for a determi-nistic amount of time. Fixed-duration activities are a goodexample of those finite support distributions with a lowvariance (the variance is null in the particular case)mentioned above, for which an approximation withnegative exponential distributed random variables mayresults in significant discrepancies between the behavior ofthe real system and that of the modeled one. Among themethods that appeared in the literature, only the onereported in [29] is concerned with more general intraphaseprocesses. The authors proposed building a nonhomoge-neous Markov model of the PMS in which time varyingfailure and repair behavior are easily modeled. However,the adopted solution approach requires a computationalcost that makes it practically feasible only for very simplePMS. Thanks to the representative power of MRSPN, theapproach in this paper allows enlarging the class of PMSthat can be analytically modeled to include more generalintraphase stochastic processes. Since, at model solutiontime, each intraphase process can be solved separately, thespecific issues related to the analysis of each phase can betreated in isolation from one phase to the other. Thus, if onephase has a simple time-homogeneous Markov chainintraphase process, e.g., the MRSPN model reduces to anSPN within that phase, an appropriate solver can be used toobtain the local solution of the model within that phase. Asubsequent phase may have a Markov Regenerative intra-

phase process for which analytical solvers exist, e.g., theMRSPN model may reduce to a DSPN model that belongseither to the class described in [15] or to the one in [21]. In thiscase, a DSPN transient solver can be used to obtain thesolution of the intraphase process for that phase. The limit ofpractical applicability of our results is represented by thepossibility of computing the transient solution of theintraphase process. Under given assumptions about the typeof nonexponential activities included in the intraphaseprocess, we provide an analytical solution framework inwhich the separate solution of the various intraphaseprocesses can be composed to obtain an exact overall modelsolution. In case such assumptions are not satisfied, anapproximate solution technique is still applicable whichprovides upper and lower bounds on the exact results.

To complete the matching between the set of possiblePMS scenarios and the methods that apply for theirsolution, consider the following assumptions:

assumption a. The duration of the phases is deterministic;

assumption b: The mission profile is static;

assumption c: Each intraphase process is a time-homo-geneous Markov chain.

Denote a PMS scenario with the corresponding letters of theassumptions it satisfies so that ªabcº is the PMS corre-sponding to the first scenario among those listed above,ªacº to the second one, and ª-º is the most general scenarioof PMS where assumptions a, b, and c have all been relaxed.

Table 1 shows the applicability limits of the state-spacebased modeling methodologies that have appeared in theliterature and points out the generality of our MRSPNapproach.

3 MRSPN MODELING OF PMS

Modeling the complex dynamic behavior of PMS requiresemploying highly representative and expressive tools. Toattack the problem, we resort to the MRSPN models, whichcouple a representative power with a set of modelingfeatures that significantly improve their expressiveness andallow for a compact modeling of the PMS peculiar aspects.MRSPN models are defined as follows:

Definition 1. A Petri net is an MRSPN if its underlyingmarking process is a Markov regenerative process (MRGP)[14], [13].

MURA AND BONDAVALLI: MARKOV REGENERATIVE STOCHASTIC PETRI NETS TO MODEL AND EVALUATE PHASED MISSION SYSTEMS... 1339

TABLE 1Applicability of the Methods for PMS Analysis

Definition 1 does not give an immediate description ofthe modeling features allowed for MRSPNs. Since Markovprocesses are special cases of MRGP, the SPN and GSPNclasses of Petri nets belong in fact to the MRSPN class. Also,the set of DSPN for which an analytic solution methodexists is included in the MRSPN. Moreover, MRGPencompass the class of semi-Markov processes, too, forwhich activities having generally distributed duration areallowed. Thus, MRSPN may include exponential, instanta-neous, deterministic, as well as general transitions. How-ever, to guarantee that the marking process is indeed anMRGP, some restrictions must be imposed on the con-current enabling of the nonmemoryless transitions. We shallcome back to this issue later when dealing with the model'sanalytical solution.

In the following, we present the guidelines for theMRSPN modeling of PMS through an example of PMSapplication which includes most of the typical features ofthe PMS that pose issues to the modeling and evaluationmethodologies.

3.1 A Reference PMS Example

We consider a PMS that executes four phases whoseduration is a random variable having general distributionFi�t�, i � 1; 2; . . . ; 4. The PMS is equipped with fourredundant identical processors which can be used invarious configurations. Within a given phase, the PMSadopts the configuration that best fits the ideal one for theactivities executed therein. Unused processors act as coldspares.

The ideal and minimal configuration of the variousphases are shown in Table 2, where a denotes thenumber of active processors. Active processors are subjectto faults, whereas spare ones are not. Active processorsfail independently from each other and the time to failureis exponentially distributed. The failure rate is constantwithin a phase, though the same processor may havedifferent failure rates in different phases, as shown inTable 2. When a processor fails, a spare component is

activated to recover the ideal configuration. This activity,performed in a negligible amount of time, succeeds withprobability c. In case it fails, the spare processor that doesnot switch on is declared faulty and the PMS configurationfurther degrades. A faulty processor is replaced with a newone. This replace takes a deterministic amount of time � .Repaired processors become either spares or active proces-sors, depending on the requirements of the current phase.

The goals of the mission are represented by the activitiesthe PMS performs inside phase 3 and phase 4. However,phase 4 is the primary objective that has to be necessarilyperformed, whereas phase 3 is a secondary optionalobjective. Since, during phase 3, the PMS is subject to ahigh failure rate, the secondary objective is pursued if andonly if it does not jeopardize the reliable execution of themore important phase 4. The decision on whether toperform phase 3 or not is taken at the time phase 2 ends,by evaluating the starting criterion given in Table 2, where fdenotes the number of faulty processors. Therefore, boththe sequence of phases performed by the PMS and theoverall duration of the mission itself may dynamicallychange depending on processors availability.

3.2 MRSPN Modeling of the Example

At a high abstraction level, we build the model of a PMS ascomposed of two logically separate MRSPNs. One part iscalled System Net (SN) and represents the system, that is itscomponents, their interactions, their failure/repair beha-vior. The other part is called Phase Net (PhN) andrepresents the control part, which describes the phasechanges. The adequacy of this scheme for the modeling ofPMS is found in the possibility of expressing even complexbehaviors through the relations between the PhN and SNsubmodels.

The MRSPN model of the PhN is shown in Fig. 1. Atoken in place Pi, i � 1; 2; . . . ; 4; 40, models the execution ofthe corresponding phase of the PMS. The sequence ofphases ends with a token in the STOP place, whichrepresents the end of the mission. Notice that place P40 isintroduced because phase 4 can be executed along twopossible different paths of the mission. The generaltransition ti, i � 1; 2; . . . ; 4; 40, models the duration of thecorresponding phases. To depict transitions, we adoptedthe usual convention: Instantaneous transitions are depictedas thin bars, exponential ones as empty rectangles, andnonmemoryless transitions as black-filled rectangles.

The dynamic features of the PMS mission profile areeasily modeled by making the evolution of the PhNdependent on the marking of the SN. For instance, place

1340 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 12, DECEMBER 2001

TABLE 2Phase-Dependent Parameters of the PMS

Fig. 1. PhN submodel for the example of PMS.

Pchoice plus the two instantaneous transitions, tyes and tno,model the dynamic selection of the next phase to beperformed, which takes place at the end of phase 2. The twoinstantaneous transitions are guarded by enabling condi-tions that relate to the marking of the SN submodel. All thedependencies of the PhN parameters on the SN markingsare given in Table 3. The initial marking of the PhN assignsone token to the place P1, modeling the start of the firstphase.

The SN submodel is shown in Fig. 2. The state of theprocessors is modeled by the three places Up, Down, andSpare. The marking of these places represents the numberof actively working, failed, and cold spare processors,respectively. The exponential transition, t1, and the deter-ministic one, t2, model the failure and repair process of theprocessors, respectively, moving tokens back and forthbetween place Up and place Down. The mission goes onunless a token reaches place Fail, which models the missionfailure.

The SN evolution is conditioned to the marking of thePhN by proper marking dependent predicates, whichmodify transition rates and enabling conditions. Thephase-dependent parameters of the SN are shown inTable 4. The rate of transition t1 is made dependent fromthe marking of the PhN, to model the different failureintensities of the processors inside the various phases.The instantaneous transition Turnÿ off models theswitching off of a processor that, according to therequirements of the phase being currently executed, nolonger needs to be active. Similarly, the two instantaneoustransitions, Recÿ ok and Recÿ nok, model the sparereactivation and the possible successful or failed resultof this operation. The three instantaneous transitionsTurnÿ off , Recÿ ok, and Recÿ nok, are controlled byenabling conditions depending on the marking of the PhNsubnet, as shown in Table 4. Finally, the instantaneoustransitions S ÿ fail1 and S ÿ fail2 are introduced to modelthe failure criteria of the different phases. They arecontrolled by the guards shown in Table 4 and, onceenabled, flush all the tokens of the corresponding inputplaces to the Fail place. Notice that the input arcs of these

two transitions are variable cardinality arcs whose weights

are equal to the marking of the corresponding input places.

The initial marking of the SN is shown in Fig. 2, with three

tokens in the Up place and one in Spare.The resulting model of the PMS is extremely concise. The

two submodels shown in Fig. 1 and Fig. 2 completely

represent our example of PMS. Notice the close resem-

blance of the conditions that relate the behavior of the two

submodels as they are presented in Table 3 and Table 4 and

the form these same conditions are given in Table 2. Our

high level MRSPN modeling allows us to define models of

PMS in a way that is very similar to that of a system

specification language, easily understandable, unambigu-

ous, and easily modifiable.

4 ANALYTICAL SOLUTION

In this section, we present an analytical solution method to

obtain the time-dependent marking occupation probabil-

ities of the MRSPN models of PMS. This method is based on

the general solution method for MRGP [12], which applies

for the solution of any MRSPN model. By taking into

account the special structure of the PMS models, we derive

a solution strategy much more efficient than the general one

in terms of computational complexity. However, before

moving our attention toward the solution technique, we

have to be sure that the models built according to the

methodology explained above are indeed MRSPN models

or, in other words, we have to show that a Markov renewal

sequence is definitely found, embedded in the underlying

marking process.

MURA AND BONDAVALLI: MARKOV REGENERATIVE STOCHASTIC PETRI NETS TO MODEL AND EVALUATE PHASED MISSION SYSTEMS... 1341

TABLE 3Parameters of the PhN Dependent on the SN Marking

Fig. 2. SN submodel for the example of PMS.

TABLE 4Parameters of the SN Subnet Dependent on the PhN Marking

4.1 MRGP Structure of the Marking Process

Because of their generality, it is not easy to check whether agiven timed Petri net model is an MRSPN or not. In general,the marking process must be inspected to verify theexistence of a Markov regenerative sequence, but this canbe a very costly check for complex models. Moreover, evenif a Petri net is an MRSPN model, it is not assured that ananalytical model solution is viable. A way to build Petri netsthat are MRSPN analytically solvable models is to prescribeconstraints on the net structure in such a way that a Markovrenewal sequence with a specific structure is definitelyfound in the underlying marking processes. To this aim, asufficient constraint to be imposed on the net structure, isthe following one:

Constraint 1. At most one general transition is enabled in eachmarking of the Petri net.

In this case, the Markov property holds (at least) eachtime a general transition fires or it is disabled. Thisconstraint is the one usually adopted [14], [13] for itprovides an easy-to-check condition to test for the existenceof the MRGP structure in the underlying marking process.

The Petri net model we built in the previous section doesnot comply with this constraint. Indeed, two generaltransitions, namely the one representing the phase in thePhN, and that modeling the repair in the SN, are allowed tobe simultaneously enabled. Notice that the conditionexpressed by Constraint 1 would be obviously fulfilled bythe PMS model sketched above if the only generaltransitions of the model were those included in the PhNsubnet. This would mean keeping the SN as a simple GSPNmodel or, equivalently, maintaining the intraphase processwithin the boundaries of time-homogeneous Markovchains. In fact, in the special case of the models of PMS, amuch less restrictive constraint is sufficient to ensure thatthe Petri net model is an MRSPN:

Constraint 2. Whenever a timed transition fires in the PhN, eachnonmemoryless enabled transition resamples a new firing timewhich can only depend on the net marking.

Owing to the structure of the PhN submodel, thisconstraint ensures that the firing times of the timedtransitions in the PhN (the phase completion times) areregeneration points of the MRSPN model. Notice that thecondition prescribed by the constraint is a specialization forthe PMS models of the dominant transition concept definedin [7]. Also, this constraint appears in [16], though not in thecontext of PMS. Here, we provide a specific PMS-tailoredconstraint that is easily verified and that ensures theexistence of an underlying MRGP in the marking processof the model.

It is worthwhile observing that the condition stated byConstraint 2 is much less restrictive than that prescribed byConstraint 1. Between two phase completion instants, themarking process of the model can be a completely generalstochastic process. This allows including general transitionsin the SN submodel, which can be concurrently enabledwith the general transitions of the PhN, as long as notransition fires in the PhN.

The loss of memory at phase boundaries is perfectlyappropriate for some classes of PMS, for intance, those inwhich maintenance activities are performed betweenconsecutive phases. In this case, keeping track of thework already performed but not yet completed is no useat all. This scenario is common in the context of SMS,which can be modeled through the PMS methodologies,as shown in [10].

On the other hand, the Petri net model we defined in theprevious section does not respect Constraint 2. Indeed, therepair is naturally modeled by a transition t2 with anenabling memory firing policy, which does not guaranteethat phase completion times are regeneration points for theunderlying marking stochastic process. Even if, for somespecial cases of PMS, it might be possible to further weakenthe constraint on the net structure and still obtain anMRSPN model, we will assume Constraint 2 as thecondition to be met for the analytical tractability of themodels. Therefore, in those cases in which the resampling atthe phase completion is not in the semantic of transitions,we will force it. Obviously, this leads to an approximatemodeling of PMS behavior.

For the example introduced in the previous section,resampling a new firing time for t2 would imply that thememory of the repair work already done has to be lost ateach phase ending time (see [5] for a comprehensivetreatment of the preemption policies modeling). However,forcing the resampling still leaves a degree of freedom indetermining which one, between the elapsed firing timeand the residual firing time, is set to zero. In the first case,the transition restarts from the beginning, whereas, in thesecond one, the transition prematurely completes. Asensible choice between these two solutions can be usedto obtain upper and lower bound models. To give a flavorof this bound methodology, consider our example of PMSand suppose that, at the phase ending times, we force thepremature completion of the repairs that are currently beingexecuted. It is easy to realize that such a model provides anoptimistic estimation of the dependability measures of theoriginal PMS as the modeled repair is faster than that of thereal system. Conversely, we can obtain a pessimisticestimation of the dependability measures if the repairs thatare not completed when a phase ends simply start againfrom the beginning as the next phase starts. These twoalternatives provide upper and lower bounds on the exactdependability measures, thus allowing for the estimation ofthe error introduced with the enforced loss of memory.

The MRSPN modeling formalism can be readily ex-tended to represent the hybrid memory policies of thosetransitions that need to lose their memory. We enrich thestandard MRSPN models with the graphical notationshown in Fig. 3, to represent the forced start over (Fig. 3a)and completion (Fig. 3b) of the repair transition for ourexample of PMS, respectively. In (Fig. 3a), the backwardclosed-loop arrow means that the repair resamples a newfiring time with the same distribution of the preceding onewhen a PhN transition completes, whereas, in (Fig. 3b), theforward closed-loop arrow represents the resampling of aninstantaneous firing time, i.e., the premature completion ofthe repair. Both of these SN models will be solved in the

1342 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 12, DECEMBER 2001

following to show the quality of the approximation that canbe achieved with our bound methodology.

4.2 The Transient Marking Occupation Probabilities

We now present a tailoring of the general solution methodfor MRGP to the analytical study of the MRSPN models ofPMS. Denote with S the set of markings of the MRSPNreachability graph. Consider the time-dependent markingoccupation probability of marking m0 at time t, conditionedto the initial marking m, and denote it vm;m0 �t�, m;m0 2 S,t � 0. Moreover, denote with V �t� � kvm;m0 �t�k the transientprobability matrix that collects these conditional probabil-ities. According to the MRGP theory [12], matrix V �t� is thesolution to the following generalized Markov renewalequation:

V �t� � E�t� �K � V �t�; �1�where matrices K�t� � kkm;m0 �t�k and E�t� � kem;m0 �t�k arethe global and the local kernel of the MRGP and K � V �t� isthe matrix whose generic element is obtained with the rowby column convolution of K�t� and V �t�.

Let 1; 2; . . . ; n be the set of phases performed through themission and denote with s�i� the set of phases which can beperformed after phase i, i � 1; 2; . . . ; n. For the sake ofsimplicity, we assume that the ordering of phases is suchthat j > i, for each j 2 s�i�. Note that such an ordering canalways be found because of the acyclic structure of the PhN.Let ti be the transition of the PhN that models the time thePMS spends in phase i, i � 1; 2; . . . ; n, respectively. Thefiring time of transition ti is a general random variable withprobability density function fi�t� and cumulative distribu-tion function Fi�t� �

R t0 fi�u�du, i � 1; 2; . . . ; n.

Denote with fm�t�; t � 0g the stochastic process repre-senting the evolution of the marking of the MRSPN. DefineT0 � 0 and let Ti be the time instant at which transition ticompletes, i � 1; 2; . . . ; n. By construction, the phase endingtimes Ti represent regeneration points for the markingprocess fm�t�; t � 0g. Indeed, Constraint 2 guarantees thatthe memoryless Markov property holds for the markingprocess of the MRSPN at the phase completion times. Thus,the sequence of bivariate random variables f�Yi; Ti�g, whereYi, denotes the marking of the MRSPN model at the timeimmediately after Ti, is a Markov renewal sequence, and,

consequently, the marking process fm�t�; t � 0g is anMRGP. To obtain the transient probability matrix V �t�, weproceed by studying the marking process fm�t�; t � 0g overthe different phases.

Consider the stochastic process fm�t�; t � 0g during thetime interval �Tiÿ1; Ti�, when transition ti is enabled, anddenote with �i�t� its transient probability matrix and withSi its state space, i � 1; 2; . . . ; n. The set Sn�1 � S n

Sni�1 Si

contains those absorbing markings reached by the model atthe end of the mission. Because of the structure of theMRSPN model of a PMS, the sets Si, i � 1; 2; . . . ; n� 1, forma partition of S. Let Ci be the cardinality of set Si,i � 1; 2; . . . ; n� 1. The global kernel K�t� and the localkernel E�t� can be reordered according to the partitiondefined by sets Si, i � 1; 2; . . . ; n� 1, and thus partitioned inblocks Ki;j�t� and Ei;j�t� having dimensions Ci � Cj,respectively. Due to the PMS structure, K�t� is such thatthe block matrices Ki;j�t� take nonzero values if and only ifj > i� 1, whereas Ei;j�t� is nonzero if and only if j � i. Let�i;j, i; j � 1; 2; . . . ; n� 1 be the Ci � Cj branching-probabil-ity matrix, defined as follows:

�i;j � k�i;jm;m0 k � Prob�Yi � m0jm�Tiÿ� � m�;m 2 Si;m0 2 Sj;

where Ti ÿ denotes the time instant immediately before Ti.The elements of K�t� are as follows:

km;m0 �t� � Prob�Yi � m0; Ti � tjY0 � m�

�Z t

0

Prob�Yi � m0; Ti � ujY0 � m�du

�Z t

0

Prob�Yi � m0jTi � u; Y0 � m�fi�u�du;m;m0 2 S; t � 0:

�2�

According to (2) and owing to the fact that the preemptionof transitions modeling phases is not allowed, the nonzeroblocks matrix K�t� can be obtained as follows:

Ki;j�t� �Z t

0

�i�u��i;jfi�u�du;i � 1; 2; . . . ; n; j 2 s�i� t � 0:

MURA AND BONDAVALLI: MARKOV REGENERATIVE STOCHASTIC PETRI NETS TO MODEL AND EVALUATE PHASED MISSION SYSTEMS... 1343

Fig. 3. Extended MRSPN notation.

Similarly, the entries em;m0 �t� of the local kernel matrix aregiven by:

em;m0 �t� � Prob�m�t� � m0; Ti > tjY0 � m�� Prob�m�t� � m0jTi > t; Y0 � m�Prob�Ti > t�;m;m0 2 S; t � 0

and, consequently, the nonzero blocks Ei;j�t� of the localkernel matrix E�t� are as follows:

Ei;i�t� � �i�t� 1ÿ Fi�t�� �; i � 1; 2; . . . ; n� 1; t � 0:

Consider the block partition of matrix V �t� induced bythe sets Si, i � 1; 2; . . . ; n� 1. Owing to the block structureof matrices K�t� and E�t�, all the blocks Vi;j�t� are null ifi > j, that is, matrix V �t� exhibits an upper triangular blockform. The generalized Markov renewal (1) can be rewrittenaccording to the block partitioning of the matrices:

Vi;j�t� � Ei;j�t� �Z t

0

Xh2s�i�

dKi;h�u�Vh;j�tÿ u�;

i; j � 1; 2; . . . ; n� 1; t � 0

�3�

and a backward substitution procedure can be applied toobtain all the nonzero blocks of V �t� in the jth column,starting from the diagonal block Vj;j�t� � Ej;j�t�.

From the transient marking occupation probabilitymatrix V �t� we can compute the dependability measuresof interest. For instance, we obtain the probability R ofsuccessfully completing the mission. If m0 is the initialprobability vector over the markings of the reducedreachability graph of the PMS model, then m0V1;n�1�t� isthe time-dependent marking occupation probability vectorat time t, t � 0, over the markings of Sn�1, which representthe state of the PMS at the end of the mission. Since thesemarkings are all absorbing, we can compute R as thesolution to the following limit:

R � limt!1m0V1;n�1�t��; �4�

where � is the vector that selects those markings thatrepresent successful states of the system, according to thesuccess criterion defined for the mission of the PMS.

Directly attacking the solution of the system of integralsin (3) to compute blocks Vi;j�t� can be a computationallyexpensive procedure. Indeed, each of the substitution stepsrequires the solution of the convolution integral in (3) forthe associated submatrices, which is costly and mayintroduce numerical approximations. Alternatively, wecan resort to the analysis in the Laplace-Stieltjes transform(LST) domain. If we take the LST of both sides of (3) weobtain:

V �i;j�s� � E�i;j�s� �Xh2s�i�

K�i;h�s�V �h;j�s�; i; j � 1; 2; . . . ; n� 1;

�5�where V �i;j�s�, E�i;j�s�, and K�i;h�s� are the LST of Vi;j�t�, Ei;j�t�,and Ki;h�t�, respectively. The backward substitution can be

completed by exploiting the PhN acyclic structure.Equation (5) is remarkable for its generality. It allows

us to derive the LST of the time dependent marking

occupation probabilities for the MRSPN model of a PMS

with general duration of the phases and general intra-

phase marking process. Of course, a numerical antitrans-

formation is needed to obtain the final result in the time

domain. However, we only need to antitransform a single

block V �i;j�s�, rather than performing a sequence of numer-

ical solutions of integral equation systems, as required by

the analysis in the time-domain with (3). Furthermore, it is

also possible to take advantage of the LST theory to directly

compute the limit in (4) without obtaining the time-domain

expression of V1;n�1�t�, but, rather, dealing with its trans-

form. Precisely because of the time-invariance of m0 and �

and from the final-value theorem of Laplace transforms, the

reliability R is obtained as:

R � limt!1m0V1;n�1�t�� � m0 lim

t!1V1;n�1�t�� �

� m0 lims!0�

V �1;n�1�s�� �

�:

�6�

To avoid a cumbersome notation, let us consider the case

when the PhN submodel is a chain rather than a tree, that is,

phases 1; 2; . . . ; n are performed in a fixed order. In this

case, the transform V �1;n�1�s� takes the following form:

V �1;n�1�s� �Yni�1

K�i;i�1�s�E�n�1;n�1�s� �Yni�1

K�i;i�1�s�;

where the last equivalence comes from the fact that all the

markings of Sn�1 are absorbing markings and, thus,

E�n�1;n�1�s� is the identity matrix. Let K�i;i�1 denote the limit

for s! 0� of matrix K�i;i�1�s�, which surely exists for the

properties of the transient probability matrices. Then, we

can rewrite (6) as follows:

R � m0 lims!0�

Yni�1

K�i;i�1�s�� � m0

Yni�1

K�i;i�1�: �7�

Thus, an explicit formula can be obtained for probability

R without any need to evaluate the transient state

probability matrix of the MRGP underlying the PMS model.

It is worthwhile observing that the approach followed to

compute the reliability of the PMS at the time the mission

ends can also be applied to obtain the reliability at the

system at each phase completion instant. Indeed, each

phase of the system can be seen as being the last one the

system has to perform and, thus, the probability distribu-

tion at the end of phase i, i � 1; 2; . . . ; n, is computed as an

absorption probability through the limit of the global kernel

submatrices transforms.Other dependability attributes of interest, such as the

pointwise reliability and availability, can be evaluated once

the transient marking occupation probability matrix V �t�has been obtained. Cumulative measures are computed

from V �t� by integration. For instance, by superimposing a

reward structure over the markings of the MRSPN, it is

possible to compute the performability of the PMS as the

total reward cumulated during the mission.

1344 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 12, DECEMBER 2001

5 SPECIALIZATIONS FOR PARTICULAR CLASSES

OF PMS

Equations (3) and (5) for the transient marking occupationprobabilities and (4) for the reliability at mission completiontime provide completely general results which can bespecialized for some specific and interesting PMS scenarios.In this section, we tailor those general results for thefollowing two special cases:

. phases having deterministic duration, denoted byªaº in Section 2;

. subordinate processes that are homogeneous con-tinuous-time Markov chains, that is, scenario ªcº ofSection 2.

Last, we discuss the applicability limits of our approach,dealing with the most general scenario, denoted as ª-º.

5.1 Constant Phase Duration

Let �i be the deterministic firing time of transition ti,i � 1; 2; . . . ; n. In this special case, the probability densityfunction fi�t� of the phase duration is the Dirac impulsefunction at �i and the cumulative distribution function is thedelayed unit step function Fi�t� � u�tÿ �i�. The blocks ofthe global kernel matrix K�t� take the following form:

Ki;j�t� �Z t

0

�i�u��i;jfi�u�du � �i��i��i;jFi�t�;i � 1; 2; . . . ; n; j 2 Si; t � 0

and, thus, the transient matrix blocks Vi;j�t� can be rewrittenas follows:

Vi;j�t� � Ei;j�t� �Z t

0

Xh2s�i�

d�i��i��i;hFi�u�Vh;j�tÿ u�

� Ei;j�t� �Xh2s�i�

�i��i��i;h

Z t

0

fi�u�Vh;j�tÿ u�du

� Ei;j�t� �Xh2s�i�

�i��i��i;hVh;j�tÿ �i�:

Notice that the integrals are solved due to the fact thatthe Dirac impulse function allows reducing the convolutionintegral to just a time shift. We can complete the analysis inthe time domain without resorting to the LST, completingthe backward substitution in (3). For instance, in the case ofa linear mission profile, we obtain the following expressionfor the transient probability matrix blocks Vi;j�t�:

Vi;j�t� �Yjÿ1

h�i�h��h��hEj;j�tÿ

Xjÿ1

h�1

�h�;

i � 1; 2; . . . ; n; j � i; t � 0;

�8�

where the product over an empty set is to be intendedas the identity matrix. Notice that similar equations butfor Markov intraphase processes have been obtained in[22], [25].

It is worthwhile remarking that (8) applies to whateverMRSPN model of a PMS with constant phase durationsatisfies Constraint 2, regardless of the structure of the SN.The determinism of phase duration allows solving the

issues posed by the phased behavior without moving theanalysis of the subordinate processes to the transformdomain. However, notice that the transform analysis maybe needed nonetheless to evaluate the marking occupationprobability distribution of the subordinate processes. Forinstance, if the SN while within phase h is an MRSPN itself,computing �h��h� may require resorting to the transformdomain analysis or to some other kind of numericalsolution. In this respect, (8) offers an open solution scheme,with the possibility of plugging in results obtained withdifferent solution techniques.

5.2 Markov Chain Subordinate Processes

In this interesting case, many studies in the literature havefocused on the case when phases are of deterministicduration. Here, we are able to jointly tackle the analytictreatment of deterministic and random phase duration.Thanks to the Markov property, the transient probabilitymatrix �i�t� of the subordinate process i can be expressedvia the matrix exponential eQit, where Qi is the infinitesimalgenerator of the subordinate Markov process, i � 1; 2; . . . ; n.The expressions given for the global and local kernel matrixblocks are easily rewritten as follows:

Ki;j�t� �Z t

0

eQiu�i;jfi�u�du; Ei;i�t� � eQit�1ÿ Fi�t��;i � 1; 2; . . . ; n; t � 0:

The LST of the preceding expressions can be computed asfollows:

K�i;j�s� � L�eQitfi�t���i;j; E�i;i�s� � sL�eQit�1ÿ Fi�t���;i � 1; 2; . . . ; n;

�9�where L��� denotes the Laplace transform operator. Quiteinterestingly, these expressions can be completely workedout and the integration solved in many interesting specialcases. For instance, consider the case of the finite supportdistribution obtained by truncating, at time � > 0, thenegative exponential distribution of parameter � and thennormalizing with respect to the probability eÿ�� . The LST ofthe kernel blocks is obtained from (9) as follows:

K�i;j�s� �Z �

0

eÿsteQit�eÿ�t

1ÿ eÿ�� dt�i;j

� ���s� ��I ÿQi�ÿ1

1ÿ eÿ�� I ÿ e��s���IÿQi��� �

�i;j:

Similarly, we compute the blocks of the local kernel,obtaining the following expression:

E�i;j�s� �s

1ÿ eÿ�� �s� ��I ÿQi� �ÿ1 I ÿ e��s���IÿQi��� �h

ÿeÿ�� sI ÿQi� �ÿ1 I ÿ e�sIÿQi��� �i

:

Due to the fact that the explicit expressions for the LST ofthe global kernel matrix blocks are available, we cancompute the reliability R by applying (4). For instance,consider a periodic PMS alternating the execution of aphase whose duration is a random variable exponentiallydistributed with parameter � and a constant duration phase

MURA AND BONDAVALLI: MARKOV REGENERATIVE STOCHASTIC PETRI NETS TO MODEL AND EVALUATE PHASED MISSION SYSTEMS... 1345

that lasts for � units of time. The PMS cyclically executes thetwo phases over time. The reliability R2n at the time the 2nthphase ends is obtained as follows:

R2n � m0 lims!0�

Yni�1

���s� ��I ÿQi�ÿ1�1eÿ�sIÿQ2���2

� ��

� m0 ���I ÿQi�ÿ1�1eQ2���2

� �n�:

Notice that deterministic and random phase duration canbe freely mixed and the reliability of the PMS is still easilyobtained according to (7).

5.3 General Instances of PMS

The general results stated by (3), (5), and (4) are valid underany scenario of PMS among those considered in this paper.Random phase duration, flexible mission profile, andgeneral intraphase stochastic process are all accommodatedin our modeling and solution scheme. The generality ofsuch equations represents the major practical contributionof this work. The peculiar solution scheme exploits theparticular PMS structure to ªunfoldº the stochastic processand reduce the complexity of the analysis.

The limit of practical exploitation of the theoretic resultsderived in this paper for the MRSPN models of PMS isrepresented by the possibility of evaluating the time-dependent marking occupation probabilities of the sub-ordinate process i during the enabling period of transitionti, i � 1; 2; . . . ; n. This information is indeed necessary tocomplete the analytical derivation of the expressions for thetime-dependent marking occupation probabilities and thereliability of the models. This limit of our approachcoincides with the limit of practical applicability of theMRGP solution techniques for MRSPN [16]. Therefore, aslong as the subordinate processes analysis is concerned, wedo not offer any contribution, apart from the possibility ofrecursively applying our solution scheme when the sub-ordinate process is a PMS itself.

However, it must be observed that our solution schemeallows plugging in specific solution algorithms for parti-cular intraphase processes when available. Besides the caseof homogeneous Markov chains we dealt with above, thereare other types of subordinate process that are amenable toanalytical solution:

. In the case of intraphase nonhomogeneous Markovchain, for instance, when there is a dependency ontime of failure rates, the exact method for thetransient analysis defined in [28] can be used toobtain the time-dependent marking occupationprobabilities.

. If an intraphase process is a semi-Markov process, asit is in the case when some general transitions areincluded in the SN submodel and they restart theirfiring each time a marking change occurs, theanalytical method proposed in [18] will serve thepurposes of obtaining the time-dependent markingoccupation probabilities.

. Last, an intraphase process could be an MRGP itself.In this case, the SN is allowed to contain generaltransitions, which must, however, fulfill Constraint 1,

that is, a single general transition can be enabled ineach marking of the nested MRGP. In this case, theintraphase model is solved by resorting to thealready mentioned general MRSPN transient analy-sis technique by [14], [13]. This case is the one thatwe will deal with when solving the MRSPN modelof the example of PMS introduced in the paper.

6 SENSITIVITY ANALYSIS

Suppose, now, that we are interested in studying the effects

that the variations in the value of some parameters have on

the dependability measures of the PMS, either because such

parameters are only known with uncertainty or because

different scenarios of the system are to be analyzed. In our

MRSPN scenario, there is the possibility of conducting an

analytical time-dependent sensitivity analysis of PMS

models. Notice that the steady-state sensitivity analysis of

MRSPN models has been already taken into consideration

in [23]. Here, we tackle the transient MRSPN sensitivity

analysis and we only focus on the class of MRSPN models

of PMS.The existence of an analytic expression for the marking

of dependent occupation probabilities allows us to evaluate

the derivatives of the dependability measures of interest

with respect to the variations of some parameters. These

derivatives, also called sensitivity functions [20], comple-

ment the means available for the study of PMS, providing

guidelines for the system optimization [4] and, in some

cases, avoiding or limiting the need for performing long

batches of evaluations for a number of different values of

the parameters.In the following, we analytically derive the sensitivity

functions that represent the effects that parameter varia-

tions may have on the transient marking occupation

probability matrix V �t� and on the probability R of

successfully completing the mission.

6.1 Sensitivity Analysis of V �t�To carry on the sensitivity analysis of the transient marking

occupation probability matrix, we consider the matrix V �t�as being a function of some independent parameter �, that

is, V �t� � V ��; t�. We denote with S��; t� the derivative of

V ��; t� with respect to �.Matrix S��; t� can be partitioned into its blocks Si;j��; t�

according to the same partition as that defined for V ��; t�.Since the expression for the transient marking occupation

probabilities is given in terms of the LST of the kernel

matrices in (6), we obtain the sensitivity function matrix in

the transform domain. Let us denote with V �i;j��; s� and

S�i;j��; s� the Laplace Stieltjes transform of block matrix

Vi;j��; t� and Si;j��; t�, respectively. The sensitivity function

transform is obtained as follows:

S�i;j��; s� �Z 1

0

eÿstd@

@�Vi;j��; t�

� @

@�

Z 10

eÿstdVi;j��; t� �@V �i;j��; s�

@�;

1346 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 12, DECEMBER 2001

where the second equality comes from the fact that � is

independent of both s and t. The differentiation of V �i;j��; s�according to (5) gives the following expressions:

S�i;j��; s� �@

@�V �i;j��; s�

� @

@�E�i;j��; s� �

Xh2s�i�

@

@�K�i;h��; s�

� �V �h;j��; s�

�K�i;h��; s�@

@�V �h;j��; s�

� �;

i; j � 1; 2; . . . ; n� 1:

�10�

Thus, the evaluation of the sensitivity function S��; t� is

reduced to the computation of the derivatives with respect

to � of the LST of the blocks of the kernel matrices. When

more specific assumptions on the structure of the MRSPN

model of the PMS are made, the general equation (10) can

be further simplified. For instance, in the case where the

subordinate processes are homogeneous Markov chains, the

derivatives of the LST of the kernel matrix blocks can be

obtained as follows:

@K�i;h��; s�@�

� @

@�L eQit�i;jfi�t�� �

� L @eQit

@��i;jfi�t�

� �� L eQit

@�i;j

@�fi�t�

� �� L eQit�i;j

@fi�t�@�

� �@E�i;h��; s�

@�

� @

@�sL eQit�1ÿ Fi�t��� �

� sL @eQit

@��1ÿ Fi�t��

� �ÿ sL eQit

@�1ÿ Fi�t��@�

� �:

6.2 Sensitivity Analysis of R

Consider now the probability R of successfully completing

the mission as being a function of some independent

parameter �, that is, R � R���. Let r��� denote the derivative

of R��� with respect to parameter �. From (4), we evaluate

r��� as follows:

r��� � @R���@�

� @

@�m0 lim

s!0�V �1;n�1��; s��

� �� m0 lim

s!0�

@

@�V �1;n�1��; s�

� ��:

If we consider a particular distribution of the phase

duration and a specific type of subordinate process, the

general expression given above for function r��� can be

further specialized. For instance, consider an MRSPN

model for a PMS that sequentially performs n phases

having an exponential duration and whose subordinate

process that, in each phase, is a homogeneous Markov

chain. In this case, we obtain the following expression:

r��� � m0

Xnj�1

Yjÿ1

h�1

�I ÿQh=�h�ÿ1�h@

@��I ÿQj=�j�ÿ1�j

� �Ynh�j�1

�I ÿQh=�h�ÿ1�h�:

7 PUTTING THE MRSPN APPROACH TO WORK

In this section, we first discuss the practical relevance of themethodology we described in the preceding sections,presenting the main guidelines for its effective exploitation.Then, we present the results of a numerical evaluation ofthe dependability attributes of the example of PMS weconsidered in this paper to show an example of practicalapplication of the theoretical concepts introduced.

7.1 Guidelines for the Practitioner

First of all, let us focus on the advantages that the type ofmodeling we proposed offers to the practitioner. TheMRSPN approach represents a relevant improvement ofthe current PMS modeling practice. From the reviewpresented in [25], it appears that several limits of thestate-of-the-art are due to the low-level modeling toolsemployed, typically Markov chain models. Moving from adirect state-based representation to a more abstract andconcise one allows the dependability engineer, on one side,to maintain a more comprehensible view of the correspon-dence between the real system and the model and, on theother side, to easily represent PMS features that could notbe accommodated beforehand.

These effective modeling capabilities are then coupledwith an efficient solution technique. The results weobtained in this paper can be exploited to develop andimplement specific solution algorithms, to take advantageof the separability of the analytical solution over the variousphases. For instance, suppose we are interested in comput-ing the probability that the mission has successfullycompleted at time t, for any t > 0, for a given initialprobability vector m0. This requires evaluating the expres-sion m0V1;n�1�t��. To compute matrix block V1;n�1�t�according to the results we derived above, a solutionalgorithm would consist of the following steps:

1. build the ith subordinate intraphase process fromthe analysis of the reachability graph of the modelduring the enabling period of transition ti,i � 1; 2; . . . ; n;

2. build the branching probability matrix �i;j fromphase i to phase j, j 2 s�i�, i � 1; 2; . . . ; n;

3. obtain the transient state occupation probabilitymatrix �i�t� of the subordinate intraphase processi, i � 1; 2; . . . ; n;

4. if phases have deterministic duration, then completethe analysis in the time domain according to (3) orelse compute the LST of the kernel matrix blocks,apply (5), and then antitransform to get the finalresult in the time domain.

Notice that the actual computational cost of the algorithmis dependent on the properties of the subordinate processesand of the phase duration distributions. However, the state

MURA AND BONDAVALLI: MARKOV REGENERATIVE STOCHASTIC PETRI NETS TO MODEL AND EVALUATE PHASED MISSION SYSTEMS... 1347

space of the MRGP process never needs to be generatedand handled as a whole, but rather the various sub-ordinate intraphase processes are separately generatedand solved.

For some classes of PMS, the implementation of thealgorithmic steps described above is already available. Forinstance, for PMS with deterministic phase duration andintraphase homogeneous continuous-time Markov pro-cesses (scenario ªacº), the required algorithms have beenimplemented in many of the tools for the automatedevaluation of dependability and what is missing is just thepartitioned generation of the state spaces, of transitionmatrices, and the ordered application of the steps describedabove. The dependability modeling and evaluation toolDEEM [8] has in fact implemented the above algorithm forthe ªacº PMS scenario, and is now being extended to coverthe case of phases with general distribution (ªcº PMSscenario).

To give a more precise idea about the computational costof our approach, suppose we are interested in evaluatingthe reliability, at mission completion time, for an MRSPNmodel of a PMS whose intraphase process is a Markovchain. More precisely, suppose phase i has an underlyingMarkov chain whose state space has Ci states, i � 1; 2; . . . ; n.Table 5 shows the computational cost required to executethe steps of the algorithm in the various possible scenarios:where qi is the maximum module entry of the Markov chaingenerator matrix of phase i and �i the deterministicduration of phase i. In those scenarios where constantphase duration is assumed, the computational complexity isbasically dominated by the cost needed for the evaluation ofthe time-dependent state occupation probability distribu-tion at phase ending times (9), whereas, when a randomphase duration is assumed, the dominating cost is that tocompute the limiting absorption probability distributionthrough the inverse of the fundamental matrix (7). Observethat the computational complexity always grows linearlywith the number of phases. When restricted to the

particular PMS scenario ªabc,º the MRSPN approach turnsout to perform as efficiently as the best solution approacheslisted in [25]. Notice that all the computational costsreported here are intended as upper-bounds. In fact, thematrices obtained are quite often sparse, thus allowingcheaper solution techniques [27].

7.2 Numerical Evaluation

We now present the result of a numerical evaluation weconducted on the example of PMS considered throughoutthe paper. The unreliability of the system at missioncompletion time has been selected as the measure ofinterest for the evaluation. As the loss of memory imposedon transition t2 changes the semantics of the repairactivities, the results we obtain are approximate. However,we exploit the approach explained in Section 4.1 to obtainupper bounds and lower bounds on the measures ofinterest, thus providing estimates of the error introduced.

The duration of the four phases of the mission arecharacterized by the random variables listed in Table 6.Notice that the unit of measure of all the time measuresexpressed is the hour �h�. Both distributions with finite andinfinite support are considered. The values of the para-meters necessary to fully define the SN submodel are givenin Table 7. Those values are given without any pretension ofrealism, just to experience the analyses that can be carriedout within the modeling framework presented here. Afew evaluations will be performed by considering anumber of possible system scenarios which are obtainedby varying the parameter values in a range around theirnominal values given by Table 7. To compute the blockmatrices required for the evaluation, we have implemen-ted the algorithmic steps described above using thegeneral purpose tool Mathematica, including the algo-rithms for the LST analysis and those for the symbolicderivative manipulation.

We conduct a first evaluation session to estimate theeffect that a more or less accurate modeling of the repairactivity may have on the final dependability results. More

1348 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 12, DECEMBER 2001

TABLE 5Computational Costs for Evaluating R under Various PMS Scenarios

TABLE 6Stochastic Characterization of Phase Duration

TABLE 7Values for SN Parameters

precisely, we plot, in Fig. 4a, the unreliability of the PMS atmission completion time, as obtained from three differentmodels. The first model is the one we developed in thispaper in which the repair is modeled by a deterministictransition and for which a pair of bounds on the exactunreliability result are provided by solving the two SNmodels shown in Fig. 3, the analytical solution method. Thesecond model approximates the deterministic transitionwith an exponential transition of expected firing time � � 1.Since the subordinate processes are, in this case, simpleMarkov chains, we are able to compute the exact unrelia-bility of such a model. The last model approximates therepair with a transition having a firing time drawn from theErlang(4, 4) distribution, whose expected values is again� � 1. By splitting the Erlang repair into the four exponen-tial stages it is composed of, we can analytically obtain theexact unreliability for this model as well.

Parameters other than the successful spare insertion c arefixed to their respective nominal value, whereas c rangeswithin �0:9; 0:99�. From Fig. 4a, we appreciate the accuracyachieved with the modeling of the deterministic repair. Theresults obtained with the exponential repair would indeedlead to an excessively optimistic assessment of system

dependability for high values of c. The Erlang distribution

better approximates the deterministic repair than the

exponential one due to its lower coefficient of variation.

However, modeling the four exponential stages of the

Erlang distribution significantly increases the computation

time required for the solution. Notice the tightness of the

pair of bounds our method returns for the deterministic

repair, which permits achieving a very good quality of the

approximation of the exact unreliability results. Moreover,

by splitting the deterministic repair transition into two

shorter repair stages, each of length �=2, we are able to

reduce the impact of the approximation by increasing the

amount of memory the model is able to keep moving from

one phase to another. Fig. 4b shows the relative error

incurred by our bounded solution in case one single stage

or two stages are used to model the repair activity.

Subsequent decompositions of the repair process lead to

further improvements of the bound tightness. Obviously,

this also results in an increase of the state space of the

intraphase processes and requires devising adequate trade-

offs between the computational cost and the accuracy of the

estimation.

MURA AND BONDAVALLI: MARKOV REGENERATIVE STOCHASTIC PETRI NETS TO MODEL AND EVALUATE PHASED MISSION SYSTEMS... 1349

Fig. 4. Estimation of the reliability of the PMS.

Fig. 5. Interpolation error by exploiting the sensitivity functions.

We consider now the unreliability of the PMS for valuesof the failure rate � ranging within the interval�10ÿ4=h; 10ÿ2=h�, while the other parameters are fixed totheir nominal values. The plot in Fig. 5a shows how theunreliability increases with � and remain quite close to each

other. The two bounding functions were evaluated at100 points within the range �10ÿ4=h; 10ÿ2=h�. Fig. 5b showsthe approximations obtained by exploiting the sensitivityfunctions. More precisely, we have computed three differ-ent approximate curves for the unreliability lower bound by

interpolating the function and its derivative in two, three,and four interpolation nodes, respectively. Fig. 5b showsthe relative error introduced by the various interpolationsover the whole interval of values considered for �. As canbe observed, increasing the number of interpolation nodesallows us to obtain more and more accurate results and four

nodes appear sufficient to obtain a very tight approximationof the unreliability figures.

8 CONCLUSIONS

In this paper, we addressed the analytical dependabilitymodeling of PMS by proposing a new methodology fortheir modeling and evaluation based on an MRSPN

approach. With respect to the state-of-the-art in PMSmodeling and evaluation, our proposal allows accommo-dating phases of random duration still providing an exactanalytical solution and enlarging the class of intraphaseprocess beyond that of Markov chain processes. Anexample of PMS has been used throughout the paper to

exercise our MRSPN approach in the modeling andevaluation of PMS. It includes most of the typical featuresof PMS and has been intentionally defined to include someaspects which previous PMS analytical solution methodsare not able to deal with.

Our solution for PMS models is based on the specializa-

tion of the Markov Regenerative Process (MRGP) theory forthe solution of MRSPN models of PMS. The computationalcomplexity of the analytical solution is reduced to the oneneeded for the separate solution of the different phases. Theissues introduced by the phased behavior are solved

without requiring additional computational costs. Theapplicability of our approach is only limited by the size ofthe biggest Petri net model which can be handled by thedependability evaluation tools and by the possibility ofevaluating the time-dependent matrix of marking occupa-tion probabilities of the subordinate processes should they

not be homogeneous Markov chains. Moreover, by comput-ing the partial derivatives with respect to the varyingparameters, we have shown how to exploit the analyticaltime-dependent sensitivity functions of PMS to enrich theevaluation tools available to the dependability engineer.

ACKNOWLEDGMENTS

The authors wish to thank Prof. Kishor S. Trivedi from the

Department of Electrical and Computer Engineering ofDuke University for the fruitful discussions held during thepreliminary stages of this work.

REFERENCES

[1] M. Alam and U.M. Al-Saggaf, ªQuantitative Reliability Evaluationof Reparaible Phased-Mission Systems Using Markov Approach,ºIEEE Trans. Reliability, vol. 35, pp. 498-503, 1986.

[2] J. Arlat, T. Eliasson, K. Kanoun, D. Noyes, D. Powell, and J. Torin,ªEvaluation of Fault-Tolerant Data Handling Systems for Space-craft: Measures, Techniques and Example Applications,º Techni-cal Report 86.321, LAAS-CNRS, 1986.

[3] B.E. Aupperle, J.F. Meyer, and L. Wei, ªEvaluation of Fault-Tolerant Systems with Nonhomogeneous Workloads,º Proc. IEEEFault-Tolerant Computing Symp. (FTCS '89), pp. 159-166, June 1989.

[4] J.T. Blake, A.L. Reibman, and K.S. Trivedi, ªSensitivity Analysis ofReliability and Performability Measures for MultiprocessorSystems,º Proc. ACM SIGMETRICS Int'l Conf. Measurement andModeling of Computer Systems, 1988.

[5] A. Bobbio, A. Puliafito, and M. Telek, ªA Modeling Framework toImplement Pre-Emption Policies in Non-Markovian StochasticPetri Nets,º IEEE Trans. Software Eng., vol. 26, pp. 35-54, 2000.

[6] A. Bobbio, A. Puliafito, M. Telek, and K. S. Trivedi, ªRecentDevelopments in Non-Markovian Stochastic Petri Nets,º J. Systemsand Circuits, vol. 8, pp. 119-158, 1998.

[7] A. Bobbio and M. Telek, ªMarkov Regenerative SPN with Non-Overlapping Activity Cycles,º Proc. Int'l Computer Performance andDependability Symp. (IPDS95), pp. 124-133, 1995.

[8] A. Bondavalli, I. Mura, S. Chiaradonna, R. Filippini, S. Poli, and F.Sandrini, ªDEEM: A Tool for the Dependability Modeling andEvaluation of Multiple Phased Systems,º Proc. DSN2000 Int'l Conf.Dependable Systems and Networks (FTCS-30 and DCCA-8), pp. 231-236, 2000.

[9] A. Bondavalli, I. Mura, and M. Nelli, ªAnalytical Modelling andEvaluation of Phased-Mission Systems for Space Applications,ºProc. IEEE High Assurance System Eng. Workshop (HASE '97) pp. 85-91, 1997.

[10] A. Bondavalli, I. Mura, and K.S. Trivedi, ªDependability Model-ling and Sensitivity Analysis of Scheduled Maintenance Systems,ºProc. European Dependable Computing Conf. (EDCC-3), 1999.

[11] G.R. Burdick, J.B. Fussell, D.M. Rasmuson, and J.R. Wilson,ªPhased Mission Analysis: A Review of New Developments andan Application,º IEEE Trans. Reliability, vol. 26, pp. 43-49, 1977.

[12] E. CË inlar, Introduction to Stochastic Processes. Prentice Hall, 1975.[13] H. Choi, ªPerformance and Reliability Modeling Using Markov

Regenerative Stochastic Petri Nets,º PhD thesis, Duke Univ.,Durham N.C., 1993.

[14] H. Choi, V.G. Kulkarni, and K.S. Trivedi, ªMarkov RegenerativeStochastic Petri Nets,º Performance Evaluation, vol. 20, pp. 337-356,1994.

[15] H. Choi, V.G. Kulkarni, and K.S. Trivedi, ªTransient Analysis ofDeterministic and Stochastic Petri Nets,º Proc. 14th Int'l Conf.Application and Theory of Petri Nets, pp. 66-185, 1993.

[16] G. Ciardo, R. German, and C. Lindemann, ªA Characterisation ofthe Stochastic Process Underlying a Stochastic Petri Net,º IEEETrans. Software Eng., vol. 20, pp. 506-515, 1994.

[17] J.B. Dugan, ªAutomated Analysis of Phased-Mission Reliability,ºIEEE Trans. Reliability, vol. 40, pp. 45-52, 1991.

[18] J.B. Dugan, K.S. Trivedi, R.M. Geist, and V.F. Nicola, ªExtendedStochastic Petri Nets: Applications and Analysis,º Proc. PERFOR-MANCE '84, 1984.

[19] J.D. Esary and H. Ziehms, Reliability Analysis of Phased Missions,pp. 213-236, Philadelphia: SIAM, 1975.

[20] P.M. Frank, Introduction to System Sensitivity Theory. AcademicPress, 1978.

[21] R. German, ªTransient Analysis of Deterministic and StochasticPetri Nets by Method of Supplementary Variables,º Proc. Int'lSymp. Modeling Analysis and Simulation of Computer and Telecomm.Systems (MASCOTS '95), 1995.

[22] D. Logothetis and K.S. Trivedi, ªTransient Analysis of the LeakyBucket Rate Control Scheme under Poisson and On-Off Sources,ºProc. INFOCOM94-2 13th Ann. Joint Conf. IEEE Computer andComm. Soc. Networking for Global Comm., pp. 490-499, 1994.

[23] V. Mainkar, H. Choi, and K.S. Trivedi, ªSensitivity Analysis ofMarkov Regenerative Stochastic Petri Nets,º Proc. Fifth Int'lWorkshop Petri Nets and Performance Models, PNPM93, pp. 180-189, 1993.

[24] J.F. Meyer, D.G. Furchgott, and L.T. Wu, ªPerformability Evalua-tion of the SIFT Computer,º Proc. IEEE Fault-Tolerant ComputingSymp. (FTCS '79), pp. 43-50, 1979.

1350 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 12, DECEMBER 2001

[25] I. Mura, A. Bondavalli, X. Zang, and K.S. Trivedi, ªDependabilityModelling and Evaluation of Phased Mission Systems: A DSPNApproach,º Proc. IFIP Int'l Conf. Dependable Computing for CriticalApplications (DCCA-7), pp. 299-318, 1999.

[26] A. Pedar and V.V.S. Sarma, ªPhased-Mission Analysis forEvaluating the Effectiveness of Aerospace Computing-Systems,ºIEEE Trans. Reliability, vol. 30, pp. 429-437, 1981.

[27] A. Reibman and K.S. Trivedi, ªNumerical Transient Analysis ofMarkov Models,º Computers and Operation Research, vol. 15, pp. 19-36, 1988.

[28] A. Rindos, S. Woolet, and I. Viniotis, ªExact Methods for theTransient Analysis of Non-Homogeneous Continuous-Time Mar-kov Chains,º Proc. Second Int'l Workshop Numerical Solution ofMarkov Chains, 1995.

[29] M. Smotherman and K. Zemoudeh, ªA Non-HomogeneousMarkov Model for Phased-Mission Reliability Analysis,º IEEETrans. Reliability, vol. 38, pp. 585-590, 1989.

[30] A.K. Somani, J.A. Ritcey, and S.H.L. Au, ªComputationally-Efficent Phased-Mission Reliability Analysis for Systems withVariable Configurations,º IEEE Trans. Reliability, vol. 41, pp. 504-511, 1992.

[31] A.K. Somani and K.S. Trivedi, ªPhased-Mission Systems UsingBoolean Algebraic Methods,º Performance Evaluation Review,pp. 98-107, 1994.

Ivan Mura received the Laurea degree incomputer science and the PhD degree incomputer engineering from the University ofPisa, Italy, in 1994 and 1999, respectively. Hisresearch interests include the modeling ofprocessing and telecommunications systemsfor performance, dependability, and performabil-ity evaluation. During 1995, he was a fellowshipholder at the IEI, an institute of the ItalianNational Research Council, carrying out re-

search on performability optimization in fault-tolerant real-time systems.In 1998, as partial fulfillment of his PhD course requirements, he was avisiting student at Duke University, Durham, North Carolina, under thesupervision of Professor Kishor Trivedi. In 1999, he was appointed to theposition of researcher at the CNUCE institute of the Italian NationalResearch Council. That same year, he joined the Motorola TechnologyCenter in Torino, Italy, where he holds a project leader position.

Andrea Bondavalli is an associate professor onthe faculty of science at the University ofFirenze, Italy. Previously, he was a researcherfor the Italian National Research Council, work-ing at the CNUCE Institute in Pisa. His researchactivity is focused on dependability. In particular,he has been working on software fault tolerance,evaluation of dependability attributes, such asreliability, availability, and performability, and onthe development of design methodologies for

real-time dependable systems. Professor Bondavalli participated inseveral projects funded by the European Community including ESPRITBRA 3092 PDCS, 6362 PDCS-2, ESPRIT 27439 HIDE, and ESPRIT20716 GUARDS and has authored or coauthored more than 70 paperswhich have appeared in international journals and proceedings ofinternational conferences. He has served as a member of the programcommittee for several international conferences. Currently, he is servingas program cochair of the IEEE High Assurance System EngineeringWorkshop 2001 and as co-guest editor of a special issue of the IEEETransactions on Computers and will serve as program chair of theFourth European Dependable Computing Conference (2002). He is amember of the IEEE Computer Society, the IFIP W.G. 10.4 WorkingGroup on ªDependable Computing and Fault-Tolerance,º ENCRESSClub Italy, and the AICA Working Group on ªDependability in ComputerSystems.º

. For more information on this or any computing topic, please visitour Digital Library at http://computer.org/publications/dlib.

MURA AND BONDAVALLI: MARKOV REGENERATIVE STOCHASTIC PETRI NETS TO MODEL AND EVALUATE PHASED MISSION SYSTEMS... 1351