Survey of tools for risk assessment of cascading outages

9
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 Abstract—this paper is a result of ongoing activity carried out by Understanding, Prediction, Mitigation and Restoration of Cascading Failures Task Force under IEEE Computer Analytical Methods Subcommittee (CAMS). The task force's previous papers [1, 2] are focused on general aspects of cascading outages such as understanding, prediction, prevention and restoration from cascading failures. This is the second of two new papers, which extend this previous work to summarize the state of the art in cascading failure risk analysis methodologies and modeling tools. The first paper reviews the state of the art in methodologies for performing risk assessment of potential cascading outages [3]. This paper describes the state of the art in cascading failure modeling tools, documenting the view of experts representing utilities, universities and consulting companies. The paper is intended to constitute a valid source of information and references about presently available tools that deal with prediction of cascading failure events. This effort involves reviewing published literature and other documentation from vendors, universities and research institutions. The assessment of cascading outages risk evaluation is in continuous evolution. Investigations to gain even better understanding and identification of cascading events are the subject of several research programs underway aimed at solving the complexity of these events that electrical utilities face today. Assessing the risk of cascading failure events in planning and operation for power transmission systems require adequate mathematical tools/software. Index Terms— Transmission Planning, Operations, Cascading Outages, Risk Assessment I. INTRODUCTION THIS paper is a result of ongoing activity carried out by Understanding, Prediction, Mitigation and Restoration of Cascading Failures Task Force under IEEE Computer Analytical Methods Subcommittee (CAMS). The cascading failure task force promotes development new methods, technologies and tools in order to better understand, predict, Manuscript received July 17, 2010. *Task Force Contributing Members: M. Papic (Lead), K. Bell, Y. Chen, I. Dobson, L. Fonte, E. Haq, P. Hines, D. Kirschen, X. Luo, S. Miller, N. Samaan, M. Vaiman, M. Varghese, P. Zhang, prevent and restore the cascading failures; sponsors technical sessions, tutorial courses, workshops, conferences to effectively exchange the state-of-the-art information of best practices, procedures and strategies. This paper aims at describing the state of the art of cascading risk assessment tools by means of an updated view prepared by experts representing utilities, universities, research institutes and consulting companies. The paper is intended to constitute a valid source of information and references about presently available tools that deal with prediction of cascading failure events. This effort involves reviewing published literature and other documentation from vendors, universities and research institutions. The assessment of cascading outages risk evaluation is in continuous evolution. Investigations to gain even better understanding and identification of cascading events are the subject of several research programs underway aimed at solving the complexity of these events that electrical utilities face today. Assessing the risk of cascading failure events in planning and operation for power transmission systems require adequate mathematical tools/software. The background to and motivation for study of cascading outages has been described in a companion paper [3]. The present paper has three main sections. The first section focuses on present industry practices for identifying the cascading sequences and for performing a risk assessment. The second section of the paper is devoted to both deterministic and risk- based tools used in power system planning and operation. Some of the common commercially available tools as well as research-based tools are presented. An attempt was made to classify programs used in operation, planning and real-time environments and by evaluation of steady-state and dynamics analysis. The available computer programs show considerable differences in many factors, such as load relief, modeling of protection failures, modeling of operating policies, calculated risk indices, etc. The final section of this paper addresses the pros and cons of the different tools, and makes some observations on the most useful features for the utility industry. II. PRESENT INDUSTRY PRACTICE Detecting and preventing cascading outages is crucial to maintaining the power system reliability and security. Planning and operation engineers as well as control room operators are faced with complex situations resulting from Survey of Tools for Risk Assessment of Cascading Outages Prepared by the Task Force on Understanding, Prediction, Mitigation and Restoration of Cascading Failures of the IEEE Computing & Analytical Methods (CAMS) Subcommittee* 978-1-4577-1002-5/11/$26.00 ©2011 IEEE

Transcript of Survey of tools for risk assessment of cascading outages

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

1

Abstract—this paper is a result of ongoing activity carried out by Understanding, Prediction, Mitigation and Restoration of Cascading Failures Task Force under IEEE Computer Analytical Methods Subcommittee (CAMS). The task force's previous papers [1, 2] are focused on general aspects of cascading outages such as understanding, prediction, prevention and restoration from cascading failures. This is the second of two new papers, which extend this previous work to summarize the state of the art in cascading failure risk analysis methodologies and modeling tools. The first paper reviews the state of the art in methodologies for performing risk assessment of potential cascading outages [3]. This paper describes the state of the art in cascading failure modeling tools, documenting the view of experts representing utilities, universities and consulting companies. The paper is intended to constitute a valid source of information and references about presently available tools that deal with prediction of cascading failure events. This effort involves reviewing published literature and other documentation from vendors, universities and research institutions. The assessment of cascading outages risk evaluation is in continuous evolution. Investigations to gain even better understanding and identification of cascading events are the subject of several research programs underway aimed at solving the complexity of these events that electrical utilities face today. Assessing the risk of cascading failure events in planning and operation for power transmission systems require adequate mathematical tools/software.

Index Terms— Transmission Planning, Operations, Cascading Outages, Risk Assessment

I. INTRODUCTION

THIS paper is a result of ongoing activity carried out by

Understanding, Prediction, Mitigation and Restoration of Cascading Failures Task Force under IEEE Computer Analytical Methods Subcommittee (CAMS). The cascading failure task force promotes development new methods, technologies and tools in order to better understand, predict,

Manuscript received July 17, 2010.

*Task Force Contributing Members: M. Papic (Lead), K. Bell, Y. Chen, I. Dobson, L. Fonte, E. Haq, P. Hines, D. Kirschen, X. Luo, S. Miller, N. Samaan, M. Vaiman, M. Varghese, P. Zhang,

prevent and restore the cascading failures; sponsors technical sessions, tutorial courses, workshops, conferences to effectively exchange the state-of-the-art information of best practices, procedures and strategies.

This paper aims at describing the state of the art of cascading risk assessment tools by means of an updated view prepared by experts representing utilities, universities, research institutes and consulting companies. The paper is intended to constitute a valid source of information and references about presently available tools that deal with prediction of cascading failure events. This effort involves reviewing published literature and other documentation from vendors, universities and research institutions. The assessment of cascading outages risk evaluation is in continuous evolution. Investigations to gain even better understanding and identification of cascading events are the subject of several research programs underway aimed at solving the complexity of these events that electrical utilities face today. Assessing the risk of cascading failure events in planning and operation for power transmission systems require adequate mathematical tools/software.

The background to and motivation for study of cascading outages has been described in a companion paper [3]. The present paper has three main sections. The first section focuses on present industry practices for identifying the cascading sequences and for performing a risk assessment. The second section of the paper is devoted to both deterministic and risk-based tools used in power system planning and operation. Some of the common commercially available tools as well as research-based tools are presented. An attempt was made to classify programs used in operation, planning and real-time environments and by evaluation of steady-state and dynamics analysis. The available computer programs show considerable differences in many factors, such as load relief, modeling of protection failures, modeling of operating policies, calculated risk indices, etc. The final section of this paper addresses the pros and cons of the different tools, and makes some observations on the most useful features for the utility industry.

II. PRESENT INDUSTRY PRACTICE Detecting and preventing cascading outages is crucial to

maintaining the power system reliability and security. Planning and operation engineers as well as control room operators are faced with complex situations resulting from

Survey of Tools for Risk Assessment of Cascading Outages

Prepared by the Task Force on Understanding, Prediction, Mitigation and Restoration of Cascading Failures of the IEEE Computing & Analytical Methods (CAMS) Subcommittee*

978-1-4577-1002-5/11/$26.00 ©2011 IEEE

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

2

cascading failure events. The importance of conducting risk-based reliability studies has been emphasized by researchers and industry due to the number of blackout events that have occurred across the globe in the recent years [1-3].

This section of the paper summarizes some of the approaches currently taken in different parts of the world to managing these phenomena.

A. Regulation Governing Extreme Event Analysis in North America

NERC standards require Balancing Authorities (BA) to perform system simulations and associated assessments periodically to ensure that reliable systems are developed that meet specified performance requirements with sufficient lead time, and continue to be modified or upgraded as necessary to meet present and future system needs [4].

The BAs are required to consider 4 different contingency categories A, B, C, and D. Category D covers extreme events resulting in two or more elements removed or Cascading out of service.

The stander identifies category D contingencies such as 3-phase faults with delayed clearing, loss of tower line with three or more circuits, loss of all transmission lines on a common right-of way, loss of substation or switching station, loss of all generating units in a station, failure or miss-operation of fully redundant special protection system and the impact of severe power swings or oscillations.

The standard states that: “A number of extreme contingencies that are listed under Category D and judged to be critical by the transmission planning entity(ies) will be selected for evaluation. It is not expected that all possible facility outages under each listed contingency of Category D will be evaluated.”

B. Examples of Current Practice in North America Considerable published work over the past several years

has been devoted to the reliability evaluation of various aspects of cascading events primarily in planning arena [1, 2].

The following subsections explain current practice for extreme event evaluation adopted by some utilities and system operators.

1) Con Edison Con Edison has recently developed a comprehensive

automated approach to identify and predict cascading outages [5].

This approach deals with two main issues related to the assessment of the cascading outages:

• Determination of the contingencies which cause cascading outages (e.g., initiating events) and their spread and consequences,

• Identification of the optimal remedial actions needed to prevent cascades or mitigate their effect.

Developed approach enables the system planners to quantify a power system’s ability to withstand cascading outages caused by the thermal overloads. 2) Idaho Power

Idaho Power has developed and implemented an approach

to specifically address a quantitative impact of NERC category B, C and D events based on the secure region of power system operation [6]. Contingencies are ranked based on the size of the secure operating region, and the most limiting contingencies are identified. The effect of mitigation measures on the alleviating violations such as thermal, voltage and voltage stability can be measured by monitoring the size change of the secure operating region.

3) ISO New England ISO New England only performs cascading failure analysis

when determining Bulk Power System (BPS) elements, based on NPCC reference [7]. The classification of BPS is based on system performance, instead of using voltage classes like in many other regions. Normally two tests are needed to determine the BPS elements: transient stability test and steady state test.

For steady state test, at the station under test all elements are opened and the power flow case is solved. If there are no voltage violations and no short-time emergency (STE) thermal rating violations then the test bus is classified as not part of the BPS. In the event that elements exceed their STE limits, perform a cascading analysis by opening the element with the greatest percent STE overload, and solving the power flow again. Continue with this process until there are no further STE overloads. If the element is a circuit that contains multiple branches, all of the branches should be opened. If a leg of a three-terminal line is overloaded, trip the entire line, not just the overloaded branch. Complete the simulation by continuing this iterative process until there are no more overloads or the case diverges.

For transient stability test, a three-phase fault is applied to the test bus that is un-cleared locally assuming no communications from the station under test to the remote terminals. Remote terminals are opened based on expected design fault clearing time. If the transient stability results show one of the following, the tested bus is determined to be BPS element: a significant adverse impact outside the local area, transiently unstable with wide spread power system collapse, transiently stable with un-damped or sustained power system oscillations, loss of source greater than 1200 MW.

4) Midwest ISO Midwest ISO has recently developed, tested and

implemented an effective automated approach to perform NERC compliance studies [8]. This approach incorporates the following computational capabilities to: identify critical contingencies, determine transmission system bottlenecks, determine potential cascading chains, compute minimum amount of necessary system adjustments, minimize the amount of load curtailment, automatic reporting capabilities etc.

5) Southern Company Southern Company has developed and implemented a

unique methodology that allows screening multiple initiating contingencies, simulating the cascading process, evaluating the system impacts, ranking the cascading scenarios based on

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

3

their severity and likelihood, and identifying the top contingencies that require primarily attention [9, 10].

C. Examples of Current Practice in Europe Transmission operators and planners in Europe have long

been accustomed to working with reliability rules such as ‘N-1’ (secure against the loss of a to single primary component or a single line) or ‘N-D’ (secure against the loss of a double circuit overhead line). However, there is a growing recognition of the need to fully comprehend the consequences of unplanned outage events. For example, alongside the introduction of a single GB-wide security standard in Great Britain in 2005 [11], there was a clarification that, following any power system disturbance, protection and control equipment may normally be expected to respond automatically. However, assessment of secured events should therefore take account of such responses that are consequential to them, e.g. cascade tripping of circuits, auto-switching, switching of capacitor banks, AVR responses and transformer tapping. In particular, it should be established that a new steady state is reached that exists within normal operating limits. Otherwise, suitable preventive actions should be taken.

In common with many parts of Europe, in Italy there is increasing focus on minimization of the extent to which the network acts as a barrier to inter-area trades of electrical energy. This has led to increasing adoption of system integrity protection schemes to facilitate automatic post-fault actions and reduce pre-fault constraint of power transfers. However, major disturbances such as those in Italy in 2003 and Western Europe in 2006 have focused attention on the need to study the consequences not only of initiating events but also the actions of these schemes, including dynamic responses. Tools such as SICRE in Italy [12] and Assess in France (see section III below) have been developed, at least in part, for that purpose. In addition, the robustness of generators against large frequency or voltage deviations must be verified [13].

D. Need of Remedial Action Schemes (RAS) for Extreme Event Analysis in Real Time Operation

Real-Time Operations deal with significantly different and varying system conditions than planning models. In particular, it must consider unexpected de-ratings and the uncertainties of future renewable energy resources.

Remedial Action Schemes (RAS) described in [14] are designated to mitigate the overloads and violations caused by credible single & multiple-contingency problems. RAS may also be used for wide-area problems that usually require separate solutions which are generally not provided by specific protection schemes based on equipment zones of protection. RAS have become more common primarily because they are less costly and quicker to design and implement, when compared to other more traditional alternatives such as constructing transmission lines and power plants. In general this statement may not be true, and one can make an argument both ways. On one hand, if one assumes that the RAS can make adjustments faster and more decisively

than a system operator, such actions could help reduce the possibility that a cascading outage will occur. On the other hand, if the RAS fails to operate correctly, and this is not immediately detected, this may actually increase the chance of a cascading outage. The first side of this argument is the primary reason for installing RAS. However it is important to remember that in the 2003 Northeast blackout, part of the problem was loss of visibility of the system due to problems with the SCADA system. RAS are designed to detect a pre-defined set of abnormal real time system conditions, and take corrective action to preserve system integrity and provide acceptable system performance. Reference [15] shows that out of 24 cases reported by NERC between 1986 and 1995 there were 16 examples of RAS success and 8 examples of RAS failures. The reasons for failure are attributed to flaws in logic, incorrect setting, hardware or software problems and errors in RAS arming signals.

Markov modeling suggested in [16] is generally used to develop either event based or response based remedial action schemes. California ISO has implemented combination of both. For example, the Internal RAS will generally take prescriptive action based on some events such as inserting or bypassing caps due to line operations. WECC paths 15 and 26 are sets of lines that connect the North and the South of California. For Path 15 RAS, the arming of RAS is based on load. So this type of RAS requires the combination of an event such as the loss of a line and an unacceptable load flow on the remaining elements.

Many of the earlier cascading power failures in North America involved failure of special protection schemes. NERC Planning Standards have stated that RAS shall be designed so that cascading transmission outages or system instability do not occur for failure of a single component. A major concern in this area is an issue of overlapping RAS schemes. In some cases, the actions taken by one RAS scheme may overlap those taken in another RAS. For example, the Path 15 RAS may drop some load that is also identified in the Path 26 RAS. So, in the event that one RAS does indeed drop some load, it makes part of the other RAS ineffective. A large and expensive effort was made in the 1980s to improve RAS reliability. Improved remedial action schemes for infrequent abnormal conditions are described in [17]. Since multiple outage events are infrequent, one must provide controls to mitigate the effect of disturbances. For example, following the 1965 blackout, under-frequency load shedding became standard utility practice. Another example given in [16] states that load shedding or fast capacitor bank switching in Idaho would have contained the July 2 1996 initial outages. Since then RAS using capacitor bank switching and load shedding controls have been adopted by industry.

Electric Power Research Institute (EPRI) has recently conducted a survey about present industry practices in the area of performing system studies under extreme contingency events. Results of this survey have shown that adequate methodology and tool in dealing with cascading outages is of paramount importance to the utility industry.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

4

III. TOOLS FOR RISK ASSESSMENT OF CASCADING OUTAGES The risk evaluation of cascading outages is extremely

complex as it requires detailed modeling of a sequence of reactions to an initiating event or events. Considerable efforts have been devoted to develop risk-based tools able to take into account cascading failures [1-3]. Due to the very large number of possible combinations of events that might lead to a cascading failure, many of these tools adopt a probabilistic approach. This section provides an overview of both deterministic and risk-based or probabilistic cascading tools used in power system planning and operation. Some of the commercially available tools as well as research-grade tools are described. An attempt has been made to classify tools used in planning, operational planning and real-time environments and by other criteria such as: applied methodology, ac or dc load flow solution, limits on system size etc. The available computer programs show considerable differences in many factors, such as load relief, modeling of protection failures, modeling of operating policies, calculated risk indices, etc.

Table I summarizes the presently known cascading tools. This table includes only those tools that can specifically address cascading failure events and their consequences.

A. Commercially Available Tools This section provides a more detailed description of the four commercially available tools listed in TABLE I. 1) ACCESS

ACCESS is a commercial tool developed by the French transmission system operator, RTE, in collaboration with its equivalent in England and Wales, National Grid [18]. It provides a single software environment in which the user can specify, quite precisely, a very wide range of uncertainties, and allow their impact to be explored quite systematically. This is achieved by means of four facilities: 1. A security-constrained AC optimal power flow [19] used

to represent how an operator (or, to some degree, ‘the market') would have dispatched the available power system facilities.

2. a quasi-steady state simulation - developed between RTE and University of Liege, called ‘Astre' [20] - that, while assuming electromechanical equilibrium of the system, models the action of voltage control devices in particular (and has some simple model of protection of branches of the network).

3. A full time-domain simulation - Eurostag [21] - that allows modeling of many controls on the system, including field current limiters on generators, governors, some forms of generator protection and, in some sense, zone 3 protection on overhead lines.

4. Access to a suite of statistical analysis tools that can be applied to detailed simulation results stored for many scenarios in a database.

While ACCESS was designed to be applicable in many kinds of study, the possibility of modeling sequences of events, whether independent or consequential to the current state of the system in a simulation, and varying a range of system parameters such as protection settings, line ratings, fault clearance times, etc., provides a powerful means of assessing the possibility of cascading outages occurring and their impact. If the user can have confidence that the specified sampling laws for variation of different initial conditions, equipment parameters and independent fault events are accurate, the probability of different outcomes might also be used in decision making. The downside of such a flexible tool is that it requires specialist users and, especially if a full time-domain simulation is to be carried out, considerable volumes of data. A study must also be carefully designed in respect of its specific aims. It must generally be established quite early on whether steady state or quasi steady state analysis will suffice, and, if uncertainties are to be assessed how to define them so as to concentrate results on areas of interest. Weblink:http://www.rte-france.com/htm/an/activites/assess.jsp 2) CAT

CAT (Cascade Analysis Tool) is a part of the TRANSMISSION 2000® suite of programs developed by Commonwealth Associates Inc. (CAI) and is commercially available [22, 23]. The CAT utilizes TRANSMISSION 2000 software environment to objectively evaluate the potential

TABLE I A SUMMARY TABLE OF CASCADING TOOLS

Cascading Tool Methodology AC/DC Power Flow

Max. number of buses

Web add-ress

Commercially Available Tools

ASSESS by RTE, France & National Grid, UK

Analytical + Monte Carlo

DC or AC steady state plus dynamic simulation

Practical limit of around 2000 buses

Yes

CAT by Commonwealth Associates, Inc., USA

Analytical AC 64,000 Yes

POM-PCM by V&R Energy Systems Research, Inc., USA

Analytical AC steady state plus dynamic simulation

No Limit Yes

TRELSS by Electric Power Research Institute (EPRI), USA

Analytical AC or DC

13,000 No

Research-Grade Tools HIDDEN FAILURE, USA

Monte Carlo AC 300 No

MANCHESTER by The University of Manchester, UK

Monte Carlo AC 1,500 No

OPA by ORNL-PSERC- Alaska, USA

Monte Carlo, complex system

DC 1000 No

PSA by Los Alamos National Laboratory, USA

Monte Carlo AC or DC

64,000 No

TAM by Texas A&M University, USA

Monte Carlo AC 24 No

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

5

vulnerability to widespread outages and uncontrolled cascading. The CAT automatically runs a set of contingencies to determine the potential to initiate facility and/or load losses beyond the initial contingency. For each contingency, the tool checks the post-contingency operating state against user-specified criteria. If a subsequent loss is indicated, the tool automatically simulates the loss. The most common criteria that might be used in an analysis are: Thermal Overload Criterion, Low Voltage Criterion and Voltage Change Criterion.

For contingencies that cause thermal overloads or low voltages, the next outage is determined by looking at thermal violations and identifying the worst overload (as a percent of rating), or, if there are no thermal violations, by dropping load at the bus with the lowest actual voltage. Only one facility is added to the list of outages per iteration. This process is repeated until there are no further criteria violations. For contingencies that cause the power flow to diverge, or if any step is taken to relieve a violation causes divergence, load is dropped at the bus associated with the divergence and another attempt is made to solve the case. The process repeats until one of four conditions is reached:

1. The case solves without violations 2. The next load drop would exceed the user-specified

maximum load drop 3. A low-voltage condition is encountered, indicating that

load drop is warranted, but there is no load in the vicinity of the voltage violation to drop, or

4. The case interrupts If the case solves without violations, it means there isn't a

reasonable vulnerability to widespread outages. Provided that the probabilities of the initiating events are known and independent, by assuming that the conditional probability that cascading outages cannot be precluded is 1.0, a probability or an index of performance can be computed by summing the probabilities of initiating events.

Web link: http://www.cai-engr.com/?id=content/t2k/ca.html 3) POM-PCM

PCM (Potential Cascading Modes) tool is a part of POM (Physical and Operational Margins) Suite developed by V&R Energy Systems Research, Inc. and is commercially available [24, 25]. PCM utilizes POM software environment to simultaneously monitor voltage stability, thermal overloads and voltage violations. Execution time for an AC solution for one contingency is approx. 0.1 sec for a 50000-bus case.

Initiating events/contingencies are generated either automatically as a result of the "cluster" approach [3] or from user-specified contingency lists. Millions of initiating events may be analyzed within one simulation run.

Following an initiating event, cascading chains are automatically identified. A cascading chain is a series of consecutive tripping events following an initiating event which are caused by overloads exceeding the branch tripping threshold, low voltage or high voltage violation below or above load/generator tripping thresholds. All the thresholds are user-defined. All violated elements are identified during

the process, but only those at which violations exceed the user-specified tripping thresholds are automatically tripped. This consecutive tripping occurs until one of the following events happens:

• System fails to solve due to voltage instability; • Loss of load/generation exceeds a user-specified

threshold value; • Islanding with imbalance of load and/or generation

within an island; • A violation of thermal, low and high voltage limits is

alleviated or drops below the corresponding threshold value.

As a result of this process, potential cascading modes in the power system network are identified.

After potential cascading modes are identified, the probability of each PCM may be estimated, and the impact of each PCM may be also estimated. A vulnerability index of cascading, which is based on the estimated likelihood and impact of the events, is computed. This index is further used to rank cascading outages.

PCM has the capability to identify the optimal remedial actions to prevent and mitigate cascading outages, [7]:

• To prevent the cascades, the software applies the remedial actions after an initiating event and at each cascading tier in order to completely prevent or decrease the spread of cascading outages.

• To mitigate the cascades, remedial actions are determined in order to alleviate voltage collapse after cascading had happened. Thus, the consequences of a cascading outage after all cascading tiers occur are mitigated.

The above scenarios to prevent and mitigate the cascades are utilized in Optimal Mitigation Measures (OPM) application [26], which is a part of POM Suite and is fully integrated with PCM. Available remedial actions include: MW dispatch, MVAr dispatch, transformer tap change, phase-shifter adjustment, capacitor and reactor switching, emergency load curtailment, line switching in and out, and optimal capacitor placement (new sources of reactive power). These actions are used to alleviate voltage, thermal and voltage stability violations.

PCM allows the user to analyze the cascading outages as both steady-state and transient stability phenomena. Frequency issues and relay operation are included within the transient stability approach. PCM capabilities include:

• Analyzing the cascading outages from steady-state and transient stability perspectives.

• Predicting the cascading outages by quickly identifying initiating events and possible cascading chains, and accurately modeling protection relays and optimizing relay settings.

• Determining preventive actions to halt cascading, such as under-voltage load shedding, under-frequency load shedding, generator re-dispatch or using other available active and reactive sources, [27].

• Appling effective islanding techniques, including under-frequency load shedding.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

6

• Ranking the cascading outages. • Visualizing initiating events, their spread, severity and

control actions to prevent cascades, in order to improve situational awareness of a utility’s/ISO’s operators.

PCM has the capability to quickly identify and prevent potential cascading outages in near real-time, operations and planning environments.

Weblink: http://www.vrenergy.com/index.php/powersystemsoftware.html

4) TRELSS TRELSS (Transmission Reliability Evaluation of Large

Scale Systems) is commercially available tool for reliability assessment of composite generation and transmission systems developed by EPRI in cooperation with Southern Company Services [5]. Cascading failure analysis in TRELSS aims to capture the cascade path starting from an aggravated system condition and an initiating (triggering) event. The user can prepare a list of thousands of initiating events which TRELSS will evaluate each of them separately. A set of threshold values such as the loading level at which a transmission line trips, and the threshold low voltage at which a load is dropped, are set. The model simulates the cascading process as a sequence of quasi-steady state system conditions caused by a sequence of tripping events. It is at present based upon an apriori assumption of tripping sequence, e.g., it is assumed that given both a heavily overloaded circuit and a load-bus voltage that is below a specified threshold, a voltage-triggered tripping will occur before the overloaded line trips.

A unique feature in TRELSS is the modeling of the protection system actions to realistically simulate potential cascading failures. It is assumed that initiating events are triggered by action of a set of breakers comprising a protection zone. Since several bulk-power transmissions system components are protected by a set of breakers all of these components are taken out of service. A set of components protected by a common set of breakers is termed a Protection and Control Group (PCG). When a PCG goes out of service due to action of the breakers defining the PCG boundary, other components belonging to a different protection zone may also go out of service. In its turn, these initial outages could cause severe overloads and voltage deviations in transmission facilities. This may trigger further tripping action of other PCGs, and so on. Cascading outages can propagate through the interconnection incurring significant loss of load potentially leading to system collapse.

TRELSS includes a very fast decoupled power-flow algorithm that implements both partial matrix re-factorization and factor update algorithms to modify the system matrix during bus type switching. Auxiliary solution in the Q-V iteration aids in smoothing solution perturbations introduced due to bus-type switching. These enhancements have resulted in extremely fast solution speed while enhancing the robustness of the solution-algorithm. Within each cascading failure step, generating units are re-dispatched through one of the following methods: unit margin, generating unit

participation factor and full or fixed-loss economic generation dispatch. The linear programming module provides a mixed-integer solution and incorporates both continuous and discrete controls. Control actions include generator MW and MVAR re-dispatch, transformer tap and phase shift adjustment, capacitor and reactor switching, three classes of load curtailment and even relaxation of area interchange. The remedial actions algorithm is based upon the computation of sensitivity of system constraints, such as overloads and voltage violations, with respect to system controls. The sensitivity computation is exact and utilizes the full Jacobean matrix. User specified remedial actions can be selected such as circuit switching, load transfer or load curtailment when contingencies or system problems occur, and the specification of both study and remedial action areas.

Recently EPRI is working on a new program called Transmission Contingency Analysis and Reliability Evaluation (TransCARE) to replace TRELSS. All of the algorithms, models and calculations in TRELSS will have been carried over to TransCARE without sacrificing the modeling and mathematical rigor of TRELSS. A main advantage of TransCARE is a feature that allows the automatic placement of breakers and PCGs. Breaker locations are determined automatically by the program and can also be modified to match actual breaker locations. PCGs are then identified by TRELSS automatically from the breaker locations using a network trace algorithm. The network trace would not only trace the components within a PCG but also other components and islands that may go out of service as a result of the action of breakers in the primary protection zone.

B. Research-Grade Tools This section provides a more detailed description of the five research-grade tools listed in TABLE I. 1) HIDDEN FAILURE

The HF (Hidden Failures) is a research-grade tool developed by Chen and Thorp [28]. HF is based on AC load flow representation with primary focus on modeling of hidden failures thermal overloads and generator re-dispatch. In power system hidden failure refers to permanent defects that would cause a relay or a relay system to incorrectly and inappropriately react to disturbances. The hidden failures in power system are usually triggered by other events, and not frequently occur, but they may have disastrous consequences [29]. Hidden failures of the protection system are modeled by probabilistic approaches in HF. HF uses fast simulation technique and heuristic random search to identify critical relays that contribute too many possible cascades. Maintaining these relays is a cost-effective mitigation of cascading failures. The availability of protection data to support simulation and the burden of processing it are issues. 2) MANCHESTER

The Manchester model [30, 31] is a research-grade tool which aims to represent a range of cascading failure interactions, including cascading and sympathetic tripping of

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

7

transmission lines, heuristic representation of generator instability, under frequency load shedding, post-contingency re-dispatch of active and reactive resources, and emergency load shedding to prevent a complete system blackout caused by a voltage collapse. In addition to the standard network data needed to run an ac power flow, the input data consists of probabilities of failures of the generation and transmission components as well as estimates of the probabilities of hidden failures in the protection system. Note that the probabilities of failures can be adjusted to take into account the effect of the weather conditions. One of the distinctive features of the model is that it estimates the time required to restore the load following an outage. The user can also specify the interval of uncertainty and the degree of certainty of the Monte Carlo simulation. Monte Carlo trials are performed until the convergence criterion specified by these two last parameters is satisfied. The output consists mostly of an estimate of the Expected Energy Not Served (EENS) for the operating conditions considered. The Manchester model is used by Rios et all [30] to evaluate expected blackout cost using Monte Carlo simulation with a 53-bus system and by Kirschen et al. [31] to apply correlated sampling to develop a calibrated reference scale of system stress that relates system loading to blackout size on a 1000-bus large power system.

3) OPA The Oak Ridge-PSERC-Alaska (OPA) [32] is a research-

grade tool for studying the complex dynamics of an upgrading power system with cascading line outages. OPA represents cascading outages and line overloads with a DC load flow model. Starting from a solved base case, blackouts are initiated by random line outages. Whenever a line is outaged, the generation and load are re-dispatched using standard linear programming methods. The cost function is weighted to ensure that load shedding is avoided where is possible. If any lines were limited during the optimization then these lines are outaged with a fixed probability. The process of re-dispatch and testing for outages is iterated until there are no more outages. The total load shed is, then, the power lost in the blackout. The OPA model neglects many of the cascading processes in blackouts and the timing of events, but it does represent in a simplified way a dynamical process of cascading overloads and outages that is consistent with some basic network and operational constraints. The distinctive feature of the OPA simulation is that it accounts for the complex system dynamics of upgrade so that self-organization of an evolving power system can be studied. Average load slowly increases, lines involved in blackouts are upgraded, and generation is increased to maintain margins and coordinate with the line increases. The simple representation of the cascading and upgrading processes is desirable both to study only the main interactions governing the complex dynamics and for pragmatic reasons of model tractability and simulation run time. The input data for OPA is a DC load flow description of the network, line flow limits, and parameters controlling the probabilistic tripping of lines, average growth rate and upgrading of lines and generation. The output data

describes a series of cascading blackouts as the power system gradually evolves, including the lines tripping and load shed in stages of each blackout.

4) PCA The Power System Analyzer (PSA) suite of numerical tools

[33, 34] was developed at Los Alamos National Laboratory to permit model building, analysis, and graphical display of electric power transmission networks. With respect to PSA, a model is defined as a geographic representation of an electric transmission network that can be used to compute both linear and nonlinear power-flow solutions that have been benchmarked against a filed base-case solution. The three primary components are the Model Editor (ME), the Analyzer, and the Banshee Tool Builder.

The model editor (ME) is used to import electric power transmission data and to build, display, and modify models. Data can be imported from either (1) the Federal Energy Regulatory Commission (FERC) Form 715 power-flow cases submitted by utilities to FERC, (2) Power Technology Incorporated (PTI) format, (2) PSA binary models previously saved from PSA, (4) Paradox database tables also previously written from PSA, or extended-markup language (.xml) files written by the IEISS infrastructure interdependence code of Los Alamos. Historically, both IEEE format and MapInfo data have also been read into PSA. The ME is also used to maintain and update data from year to year.

The Analyzer operates on models built by or imported using the ME. The Analyzer's dual purpose is to perform isolation and connectivity network analyses and to compute and evaluate power-flow related effects for both base-case and component-failure conditions. Many types of network analysis capabilities have been incorporated into PSA, including such things as shortest-path computations, least number-of-cuts isolation, determination of electrical islands, etc. The Analyzer can solve both linearized (DC) and AC power flow equations. The Analyzer contains an interface that links it to the Transmission 2000 program developed by Commonwealth Associates Inc. (CAI). Therefore, PSA can solve systems up to 64000 nodes and simulations of the Western and Eastern US grids are usually "easy" to perform.

The Banshee Tool Builder provides a powerful utility to both the ME and Analyzer that permits an integrated ability to construct C scripts that manipulate the data and the graphical display of the ME and Analyzer to perform a wide variety of useful applications associated with operating and analyzing results of PSA. The Banshee Tools allow scripts to be written, compiled, executed, modified, saved, and cataloged.

5) TAM The TAM is a research-grade program developed by Texas

A & M University [35]. It is a part of general model for reliability analysis developed by Singh and Patton [36] with a particular capability to differentiate various protection failure modes. Two major failure modes in protection system: "failure to operate" and "undesired tripping" are the major cause of cascading outages. The former means that when a fault occurs in a power system, the protection system fails to clear the

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

8

fault. The later refers to either spontaneous operation in the absence of a fault or trip for faults outside the protection zone. After the initial fault is cleared, power flow in the system would change due to the changing topology. This might lead to redistribution of load on certain lines, which are then risk to trip subsequently. In fact a more explicit model of component paired with protection system is established to include two types of protection failures. Non-sequential Monte Carlo simulation approach is applied to simulate system behavior under cascading outages. Besides common reliability indices such as Loss of Load Probability (LOLP) and Bus Isolation Probability (BIP), one new index (EPL) is introduced to depict the severity of the impact by cascading outages. The reliability results are the system-wide and represent the degree of vulnerability and load curtailment under cascading outages in a particular system. Monte Carlo simulation takes much longer time to converge than other reliability assessment methods. However, it can handle sophisticated stochastic process problem in a more realistic manner.

IV. LIMITATIONS AND GAPS IN EXISTING TOOLS The N-1 criterion, which is common industry practice, may

not be adequate to assess the vulnerability of cascading failures because multiple unrelated events may occur in a system and result in cascading failures. In order to further understand cascading failures, N-2 and even higher order (N-x) contingency events need to be considered. An analysis of major blackouts [37] has shown that in many (if not most) cases, a failure in the information infrastructure contributed to the cascade. Such failures are usually not taken into account in security assessment tools. When N-x contingency are considered, there are two major issues: the massive number of cases and the massive amount of data. Since the sheer number of cases leads to the impracticality of even the simplified computation for all cases, to solve the first issue, we need smart contingency selection methods, as well as high performance computing technique and hardware. In order to solve the second issue, advanced visualization techniques are needed to provide real-time situational awareness and help operators to anticipate, recognize, and respond to emergencies in time. However, most of the current commercial tools use tabular data form to represent system status which provides very little decision support to operators.

CIGRE publications [38-39] have emphasized the increasing need for adequate management of power interruptions including failures due to extreme cascading outage events.

One of the issues that face utilities when analyzing cascading outages from the steady-state perspective is the lack of an approach to accurately simulate protective relays that trip lines automatically. Therefore, various utilities employ different line tripping thresholds during the analysis of cascading outages. Line tripping due to overloading could be initiated by over current, or zone 3 or zone 2 distance relays. Work should be done to develop requirements for corresponding input data for analysis and modeling, to investigate what the minimal necessary set of additional data

on relay and their set points is. There is work currently under way to develop and test line-tripping mechanisms simulating protective relays.

V. CONCLUSIONS This paper describes the state of the art in cascading failure

modeling tools, documenting the view of experts representing utilities, universities and consulting companies. The work reported in this paper is intended to constitute a valid source of information and references about presently available tools that deal with prediction and evaluation of cascading failure events.

We have recognized the limitations and gaps in some of the existing tools that are being used to identify and evaluate the risk of cascading failure events. The existing tools should be used with careful restraint while modifying them or developing new tools.

Data collection of cascading failure events in interconnected power systems has always being receiving considerable attention and this trend should continue in the future.

Future work should be focused at better modeling and evaluating cascading outages from the steady-state and transient stability perspectives.

A number of utilities are pioneering the application for risk evaluation of cascading failure events. Reported experience is primarily based on ongoing NERC’s requirements under compliance studies to meet planning standards.

It is noted that existing tools are dealing, in general, with cascading events in steady-state domain while very limited use has been seen in dynamic domain.

It should also be emphasized that the presented work will be useful to those developing new tools for risk prediction and prevention of cascading failure events.

Finally, the presented work can be extended in the future to perform benchmarking of the described tools using both test and the actual interconnected systems.

Given the scale of the effort required and the enormity of the challenges ahead, collaboration among policy makers, utilities, vendors and research organization is essential to solve this challenging industry problem.

VI. ACKNOWLEDGMENT The authors would like to thank Marian Anghel from Los

Alamos National Security, LLC for his valuable suggestions and comments.

VII. REFERENCES [1] IEEE PES CAMS Task Force on Understanding, Prediction, Mitigation

and Restoration of Cascading Failures, "Initial review of methods for cascading failure analysis in electric power transmission systems," IEEE Power and Energy Society General Meeting, Pittsburgh, PA, USA July 2008.

[2] IEEE PES CAMS Task Force on Understanding, Prediction, Mitigation and Restoration of Cascading Failures, "Vulnerability Assessment for Cascading Failures in Electric Power Systems", IEEE Power and Energy Society Power Systems Conference and Exposition 2009, March 2009, Seattle, WA

[3] IEEE PES CAMS Task Force on Understanding, Prediction, Mitigation and Restoration of Cascading Failures, “Risk Assessment

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

9

Methodologies for Cascading Outages”, Submitted to IEEE Trans on Power Systems, 2010.

[4] NERC Planning Standards TPL-001 through TPL-004, North American Electric Reliability Corporation, February 2005

[5] M.K. Koenig, P. Duggan, J. Wong, M.Y. Vaiman, M.M. Vaiman, M. Povolotskiy, "Prevention of Cascading Outages in Con Edison’s Network", IEEE T & D Conference, April 2010.

[6] M. Papic, M.Y. Vaiman, M.M. Vaiman, "Determining a Secure Region of Operation for Idaho Power Company", IEEE Power Engineering Society GM, San Francisco, June 2005

[7] NERC NPCC Document A-10 Classification of Bulk Power System Elements, 2007.

[8] D. Chatterjee, J. Webb, Q. Gao, M.Y. Vaiman, M.M. Vaiman, M. Povolotskiy, "N-1-1 AC Contingency Analysis as a Part of NERC Compliance Studies at Midwest ISO", IEEE T & D Conference, April 2010.

[9] Transmission reliability evaluation for large-scale systems (TRELSS): version 6.0 User's manual, EPRI, Palo Alto, CA: 2000. 1001035

[10] R.C. Hardiman, M.T. Kumbale, Y.V. Makarov, "An Advanced Tool for Analyzing Multiple Cascading Failures", Eighth International Conference on Probability Methods Applied to Power Systems, Ames Iowa, September 2004.

[11] A. Berizzi and M. Sforna, “Dynamic Security Issues in the Italian Deregulated Power System”, IEEE Power Engineering Society General Meeting, 2006.

[12] National Grid, GB Security and Quality of Supply Standard, issue 1, September 2004.

[13] CIGRE Working Group C 1.17, Planning to Manage Power Interruption events, CIGRE, Paris, July 2010.

[14] WECC RAS Design http://www.wecc.biz/library/Library/Remedial%20Action%20Schemes/RAS_Guide_9_clean2_12-7-06.pdf .

[15] IEEE Task Force Report, Blackout Experiences and Lessons, Best Practices for System Dynamic Performance, and the Role of New Technologies, Final Report, IEEE, May 2007.

[16] J. D. McCalley, W. Fu, Reliability of Special Protection Systems, IEEE Transactions on Power Systems, Vol. 14, No. 4, 1999, pp 1400 to 1406.

[17] Carson Taylor Seminars, State-of-the-Art in SPS: BPA and WECC Experience, April 1999.

[18] J.-P. Paul and K.R.W. Bell, A Flexible and Comprehensive Approach to the Assessment of Large-Scale Power System Security Under Uncertainty, International Journal of Electrical Power and Energy Systems, vol. 26, pp. 265-272, 2004.

[19] G. Blanchon, K. Boukir, S. Fliscounakis, "Active-Reactive OPF using an Interior Point Method. Application to the network management in a deregulated environment", 13th CEPSI, Manila, Oct. 23-27, 2000.

[20] T. Van Cutsem, Y. Jacquemart, J.N. Marquet and P. Pruvot, "A Comprehensive Analysis of Mid-Term Voltage Stability", IEEE Trans. on Power Systems, vol. 10, 1995, pp. 1173-1182.

[21] S. Henry, E. Breda-Seyes, H. Lefebvre, V. Sermanson, M. Bena, "Probabilistic Study of the Collapse Modes of an Area of the French Network", 9th PMAS Conference, Stockholm, Sweden, June 2006.

[22] Transmission 2000 - http://www.cai-engr.com/?id=content/t2k/ca.html [23] Stephan S. Miller, P.E., Extending Traditional Planning Methods to

Evaluate the Potential for cascading Failures in Electric Power Grids, 08GM1365, Panel paper, 2008 PES General Meeting. (Transmission 2000).

[24] POM version 5 Manual, November 2009, http://www.vrenergy.com/ [25] Bhatt, N.; Sarawgi, S.; O'Keefe, R.; Duggan, P.; Koenig, M.; Leschuk,

M.; Lee, S.; Sun, K.; Kolluri, V.; Mandal, S.; Peterson, M.; Brotzman, D.; Hedden, S.; Litvinov, E.; Maslennikov, S.; Luo, X.; Uzunovic, E.; Fardanesh, B.; Hopkins, L.; Mander, A.; Carman, K.; Vaiman, M.Y.; Vaiman, M.M.; Povolotskiy, M., "Assessing Vulnerability to Cascading Outages", PSCE 2009, 15-18 March 2009, pp.1 - 9, Digital Object Identifier 10.1109/PSCE.2009.4840032.

[26] E. Bajrektarevic, S. W. Kang, et. al., “Identifying Optimal Remedial Actions for Mitigating Violations and Increasing Available Transfer Capability in Planning and Operations Environments”, 2006 CIGRE Session, Paper 38-105, Paris, 2006.

[27] Choice of Contingency Arming Schemes Actions Using Analytical Approaches", National Science Foundation Grant No.111-9360318, V&R Co., Energy System Research, 1994.

[28] A.G. Phadke, J.S. Thorp, "Expose Hidden Failures to Prevent Cascading Outages", IEEE Computer Application in Power, vol.9, pp. 20-23, 1996

[29] J. Chen, J.S. Thorp, I Dobson," Cascading Dynamics and Mitigation Assessment in Power System Disturbances via a Hidden Failure Model", International Journal of Electrical Power and Energy Systems, vol. 27, no.4, May 2005, pp. 318-326.

[30] M.A. Rios, D.S. Kirschen, D. Jawayeera, D.P. Nedic, R.N. Allan, "Value Of Security: Modeling Time-Dependent Phenomena and Weather Conditions", IEEE Transactions on Power Systems, vol. 17, 543-548, 2002

[31] D.S. Kirschen, D. Jawayeera, D.P. Nedic, R.N. Allan," A Probabilistic Indicator of System Stress", IEEE Transactions on Power Systems, vol. 19, no. 3, 2004, pp. 1650-1657.

[32] B.A. Carreras, V.E. Lynch, I. Dobson, D.E. Newman, "Complex Dynamics of Blackouts in Power Transmission Systems" Chaos, vol. 14, no. 3, September 2004, pp. 643-652.

[33] The Power System Analyzer (PSA) Suite of Numerical Tools, LA-UR-03-8268,an internal LANL report.

[34] M. Anghel, K. A. Werley, A. E. Motter, "Stochastic Model for Power Grid Dynamics", HICSS ’07: Proceedings of the 40th Annual Hawaii International Conference on System Sciences, IEEE Computer Society, Washington, DC, USA, 2007, p. 113.

[35] X. Yu, C. Singh, "A Practical Approach for Integrated Power System Vulnerability Analysis with Protection Failures", IEEE Trans. Power Systems, vol. 19, no. 4, Nov. 2004, pp. 1811-1820.

[36] C. Singh, and A. D. Patton, Models and concepts for power system reliability evaluation including protection-system failures, Int. J. Elect. Power and Energy Syst. Vol. 2, No. 4, pp. 161-168, Oct. 1980

[37] D S Kirschen, F Bouffard, “Keep the Lights On and the Information Flowing”, IEEE Power and Energy magazine, Vol. 7, No. 1, January/February 2009, pp. 55-60.

[38] CIGRE Working Group C1.2, Maintenance of Acceptable Reliability in an Uncertain Environment, Technical Brochure 334, CIGRE, Paris, December 2007.

[39] CIGRE Task Force C2.02.24, Defense plan against extreme contingencies, Technical Brochure 316, CIGRE, Paris, April 2007.

VIII. BIOGRAPHIES Milorad Papic (M’87, SM’05) is with Idaho Power Co.,

Boise, ID, USA (e-mail: [email protected]) Keith Bell (M’09) is with University of Strathclyde, UK

(e-mail: [email protected]) Yousu Chen (SM'10) is with PNNL, WA, USA (e-mail:

[email protected]) Ian Dobson (F’06) is with University of Wisconsin-

Madison, USA (e-mail: [email protected]) Louis Fonte (M’90) is with CAISO, USA (e-mail:

[email protected]) Enamul Haq (M’82, SM’93) is with CAISO, USA (e-

mail: [email protected]) Paul Hines (S'96, M’07) is with University of Vermont,

USA (e-mail: [email protected]) Daniel Kirschen (F’) is with University of Manchester,

UK (e-mail: [email protected]) Xiaochuan Luo (M’00) is with ISO NE, Holyoke, MA,

USA (e-mail: [email protected]) Stephen S. Miller (M'76, SM) is with Commonwealth

Associates, Inc., Jackson. MI, USA (e-mail: [email protected])

Nader Samaan (S’00, M’04) is with PNNL, WA , USA (e-mail: [email protected])

Marianna Vaiman (M'97) is with V&R Energy Systems Research, Inc, Los Angeles, CA, USA (e-mail: [email protected])

Matthew Varghese (M’98) is with CAISO, USA (e-mail: [email protected])

Pei Zhang (SM'05) is with EPRI, Palo Alto, CA, USA (e-mail: [email protected])