Rule Acquisition For Dynamic Engineering Domains

10
PQ2 - 31" Rule Acquisition For Dynamic Engineering Domains Michael R. Hieb, Barry G. Silverman and Toufic M. Mezher* ABSTRACT-The focus of this research is to enable discovery of control knowledge in a complex, real-time environment. We present a metJwdology for automated discovery of rules in a dynamic engineering domain. TIu! addition of a discovery system to an expert system in a reactive environment offers an improvement in performance by enabling tlu! expert system to learn new knowledge and improve its existing knowledge base. In addition. discovery techniques can aid operators confronted by UIlfamiliar and complex situations. and assist autonomous machines to find problem-solving rules for unanticipated situations. TIu! goal of this system is unsupervised learning of a rule base by obse",ing how well the performance system controls the environmelll. This performance system explores and experiments with the environment when its present rules are inadequate. Through this experimentation. the learning component of the system predicts the proper setting of a set of control variables. To illuslTale the methodology. we describe an implementation called the HUBBLE Discovery System (HDS). built on a NASA software testbed. This system performs real-time operation. diagnosis. and repair of tile Electrical Power Subsystem (EPS) of a software simulalor of the HUBBLE Space Telescope satellite. The HDS successfully discovered control rules with both complete and incomplete domain theories. The performance of the HDS learning/problem solving system is compared to a control case of a blackboard system using hand-crafted rules. Introduction Research on automated discovery focuses on the acquisition of useful knowledge in the absence of supervisIOn. This knowledge can improve system performance and response to unanticipated events. Consider well structured engineering systems. such as process control systems for power plants or spacecraft These systems are richly documented and their individual components can easily be modeled. They are complex in their scale and many relationships exist throughout time. Because of this. it is difficult to manually discover a complete rule base. Yet, discovery techniques can fmd problem-solving rules for unantiCipated situations. Such a system can aid operators who are confronted by unfamiliar and complex sitWltions. It also can assist autonomous machines to solve problems for which their present knowledge is inadequate. We present a methodology for automated discovery of engineering knowledge base rules in a complex engi- Hicb can be readied al 1hc Center Cor Anifidal Im.eUigence. Depanment oC Computer Science. George Mason Univcrsity, Fairfu. VA 22030. Tel 703-993·1535. Email: B. SUvennan and T. Mczhcr can be reached al 1hc Institute Cor Anifidal Intelligcrn;c, School of Engineering and Applied Science, GeorJe Washington Universily, Washington., D.C. 20052. TcL 102-676-5110. neering domain. To illustrate the methodology, we describe an implementation called the HUBBLE Discovery System (lIDS). built on a software testbed devclor:cd for NASA. This work is an extension of the research described in Silverman et al. [7. 81 and Rieb [4] which focus on leaming by experimentation. The main resulLS to date demonstrate the HUBBLE Discovery System's ability to generate a new knowledge base capable of conuolling a simplified real-time, dynamic. process. Research Questions The focus of this research is to enable automatic discovery of useful knowledge in a complex, real-time envirorunenl One of the goals of machine learning is to learn how to build autonomous systems that can respond to changes in the environment for which they were not specifically programmed. This is a long-tenn aim of the research presented here. Present knowledge-based systems. such as expert systems. do not respond well when they are confronted with unfamiliar situations. Another goal of tlus research is to craft a method to automate knowledge acquisition. The system presented here investigates how to acquire new factual knowledge from a dynamic environment and represent it in a manner conducive to problem-solving. Many knowledge-based systems that acquire new factual knowledge do so by some form of inductive generalization. They operate independently of a goal-driven problem solver and do not interact with the HIEURISTICS, The Journal of Knowledge Engineering, Special Issue The Journal of Knowledge Engineering . . . nd Machine

Transcript of Rule Acquisition For Dynamic Engineering Domains

PQ2 - 31

Rule Acquisition For Dynamic Engineering Domains

Michael R Hieb Barry G Silverman and Toufic M Mezher

ABSTRACT-The focus of this research is to enable discovery of control knowledge in a complex real-time environment We present a metJwdology for automated discovery of rules in a dynamic engineering domain TIu addition of a discovery system to an expert system in a reactive environment offers an improvement in performance by enabling tlu expert system to learn new knowledge and improve its existing knowledge base In addition discovery techniques can aid operators confronted by UIlfamiliar and complex situations and assist autonomous machines to find problem-solving rules for unanticipated situations TIu goal of this system is unsupervised learning of a rule base by obseing how well the performance system controls the environmelll This performance system explores and experiments with the environment when its present rules are inadequate Through this experimentation the learning component of the system predicts the proper setting of a set of control variables To illuslTale the methodology we describe an implementation called the HUBBLE Discovery System (HDS) built on a NASA software testbed This system performs real-time operation diagnosis and repair of tile Electrical Power Subsystem (EPS) of a software simulalor of the HUBBLE Space Telescope satellite The HDS successfully discovered control rules with both complete and incomplete domain theories The performance of the HDS learningproblem solving system is compared to a control case of a blackboard system using hand-crafted rules

Introduction

Research on automated discovery focuses on the acquisition of useful knowledge in the absence of supervisIOn This knowledge can improve system performance and response to unanticipated events Consider well structured engineering systems such as process control systems for power plants or spacecraft These systems are richly documented and their individual components can easily be modeled They are complex in their scale and many relationships exist throughout time Because of this it is difficult to manually discover a complete rule base

Yet discovery techniques can fmd problem-solving rules for unantiCipated situations Such a system can aid operators who are confronted by unfamiliar and complex sitWltions It also can assist autonomous machines to solve problems for which their present knowledge is inadequate

We present a methodology for automated discovery of engineering knowledge base rules in a complex engishy

~f Hicb can be readied al 1hc Center Cor Anifidal ImeUigence Depanment oC ComputerScience George Mason Univcrsity Fairfu V A 22030 Tel 703-993middot1535 Email hie~aicgmuedu B SUvennan and T Mczhcr can be reached al 1hc Institute Cor Anifidal Intelligcrnc School of Engineering and Applied Science GeorJe Washington Universily Washington DC 20052 TcL 102-676-5110

neering domain To illustrate the methodology we describe an implementation called the HUBBLE Discovery System (lIDS) built on a software testbed devclorcd for NASA This work is an extension of the research described in Silverman et al [7 81 and Rieb [4] which focus on leaming by experimentation The main resulLS to date demonstrate the HUBBLE Discovery Systems ability to generate a new knowledge base capable of conuolling a simplified real-time dynamic process

Research Questions

The focus of this research is to enable automatic discovery of useful knowledge in a complex real-time envirorunenl One of the goals of machine learning is to learn how to build autonomous systems that can respond to changes in the environment for which they were not specifically programmed This is a long-tenn aim of the research presented here Present knowledge-based systems such as expert systems do not respond well when they are confronted with unfamiliar situations Another goal of tlus research is to craft a method to automate knowledge acquisition The system presented here investigates how to acquire new factual knowledge from a dynamic environment and represent it in a manner conducive to problem-solving Many knowledge-based systems that acquire new factual knowledge do so by some form of inductive generalization They operate independently of a goal-driven problem solver and do not interact with the

HIEURISTICS The Journal of Knowledge Engineering Special Issue The Journal of Knowledge Engineering

nd Machine

external world The purpose of the method pursued here is to explore ways to learn from real world dynamic processes to enhance an expen systems rule base

EPS Domain

A very demanding and dynamic domain was utilized for investigation This is a real-time environment where a process (in this case a power system) must be controlled to function properly The process is represented by a software simulation so that it can be more amenable to study and manipulation The domain chosen involves real-time operation diagnosis and repair of an Electrical Power Subsystem (EPS) of the HUBBLE Space Telescope satellite The full EPS would take about 2000 rules to command and control The entire EPS was scaled down by approximately one fourth and reduced in complexity to assist the development of the HDS Figure 1 shows the EPS This reduced Simulator obviously does not need nearly as many rules to command and control but it still captures the nature of the actual power system On the other hand to properly stress the discovery system several BPS perfonnance and anomaly situations have been added to the Simulator that represent actual operating conditions

SOlAR ARAAY1

SOlAR ARRAY

The main objective of any electrical power subsystem is to provide its users with a steady supply of electrical power To Culml this objective the control system must monitor the simulator at all times The simulator is capable of self-preservation in emergencies but it is not capable of maintaining optimum productivity without outside suppon If not controlled the power production of the EPS will eventually fail leaving its users unsupponed

The following components are represented in the Model (see Figure 1)

Solar arrays There are two solar array panels in the simulator each with ten solar cells Power production takes place in the solar arrays For optimum power production the arrays are adjusted in small increments to maximize their exposure to the sun Orientation and cell errors are randomly generated within certain limits and probabilities

The network The network is a set of power lines equipped with switches and various sensors It is divided into two areas Network I is the set of power lines from the solar arrays to the battery and Network 2 is the set of

BAlTERY

Figure 1 Space Telescope EPS Diagram

Volume 5 Number 4 Winter 1992 73

I

power lines from the battery to the bus These networks distribute and direct the power generated by the solar arrays through the system

In the entire network there are six switches for rerouting current through the system As useful and as necessary as they may be switches are also the cause of serious malfunctions within actual EPS systems In this simulation switches generate random errors

Sensors measure the current and voltage on the network In the entire simulalor there are four ammeters and a voltmeter Their locations are an important factor in the design of rules which detect EPS errors Network losses are disregarded in all simulations

The battery The battery stores the excess electrical power generated by the solar arrays during the day and then releases it in response to nighttime power requirements Battery charge level battery voltage stability and reconditioning problems are simulated

The bus The bus is the embodiment of all users of electrical power within the system In the simulator the bus power requirements can be adjusted depending on power production or system mission schedule In these experiments the minimum requirement for the bus power was fixed at 300 watts

Time The simulator runs on a clock of its ownshyminutes differ in length depending on machine work load The speed of the clock will vary with the speed of the machine on which it runs A pass or simulated earth orbit isalways 90 minutes with 60 minutes of it spent in sunlight This independent clock allows us to view different times and behaviors faster than would have been possible otherwise For the experiments eight orbits of dala were taken in approximately one and one half hours

This is a difficult environment for a discovery system to operate in Errors are randomly generated and must be attended to and repaired Thus there are several exceUent criteria available for measurement of the performance of the learning system including maximizing the power produced minimizing the number of power outages maximizing number of errors ftxed and improving the quality of rules generated

Hubble Discovery System Methodology

The methodology given here follows that of Silverman et aI [7] The goal of the system is unsupervised learning of a rule base by having a Critic and Learner observe how well a Performer controls the Environment (simulator)

74

The Performer explores and experiments with the Environment A Teacher or ideal system docs not exist to help the Leamer in unsupervised learning Through experimenlation the Leamer makes a formal prediction about the proper setting that a set of control variables should assume

During experimenlation a Critic exists that notices the result of making a control action and assigns blame and rewards accordingly A Leamer stores the rules that the Critic recommends for later use

In particular this theory proceeds according to the following steps

1) Given swting knowledge monitor sensed parameters (collect parameter data)

2) Notice patterns (screen the dalamiddot for abnormalities and select parameters of interest)

3) If possible generate and experiment with theories (create rule sets and test them)

4) Criticize and evaluate results (assign credit)

5) Repeat from step 1 exhaustively

The specific design of the- HDS involves a Performer which constantly monitors detects errors diagnoses faults and atlempts repairs If the Performer lacks repair steps or the repairs are not successful it asks the Discovery subsystem to learn correct repair rules

The Performer system has three agents utilizing two knowledge bases Figure 2 shows the general architecture of the Performer System The rust agent is the Screener which monitors environmental parameters to see if any are out of bounds or have abnormal readings It uses a data base called a goal-list for this screening which contains upper and lower bounds for parameters according to the mode of the process system A mode is a characteristic behavior of a system that persists for some period A mode is like a template placed over the system parameters so that they are restricted to certain subsets of their ranges In the EPS domain two example modes are day and night During the day the system operates via solar power During the night it operates by a battery

If Screener detects a problem then the Checker the second agent is given the task of processing this problem correctly If a valid repair rule exists it will send the problem to the Critic with the rule it found If no valid repair information exists then it calls the Discovery Subshysystem specifying whether it is an old or new problem

The Jownal of Knowledge Engineering

Telemetry

Repair History

Critic

Commands

Parameter Selector

(Planner 1)

Strategy Selector (Planner 2) o Specialize o Generalize o Randomize

Object Oriented Simulator (Spacecrah)

Tesl IGeneral

Rules Generator

Meta Rules

Legend

III --

Blackboard

Knowledge Base

____ one-way link

-~___shy - two-way link

c=J one or more cooperating agents

B OataFile

Figure 2 Perfonner System Architecture

The Discovery Subsystem will Lhen propose a plausible control action to try This action is implemented and its effect is evaluated by a Critic Rules are either added to or retracted from the knowledge base of the Perfonner based upon the environmental feedback received

The Lhird agent is the Critic This agent must either fire a rule to repair the EPS (if the repair knowledge is in its rulemiddotbase) or must send a control action proposed by the Discovery Subsystem In either case it will monitor the result to see if the problem is repaired If the problem disappears then Critic hypotheses Lhat the rule is a valid one and stores it in its rule-base If the problem remains it either removes the old rule from Lhe knowledge base or docs not add the proposed rule

Design of the Discovery Subsystem

The Discovery Subsystem or Proposer consists of three agents plus a set of procedures cllioo the Meta system This system is shown in Figure 3 The system utilizes an object hierarchy to model the objects dealt with in the

system The Meta system contains heuristics for pursuing different convergence strategies depending upon the history of previous experimentation

The input to the Proposer is a set of abnonnal parameters a world state (listing of selected parameter values) and the stahlS of the problem (whether it is old or new) The abnonnal parameters detected are described in qualitative tenns as in Forbus Qualitative Process Theory [3] rather than specifying exact numbers or tolerances Thus the descriptions utilized are of quantities which are world stale variables that can take on numeric values but are reasoned about in the qualitative model in purely qualitative terms

The fIrst agent the Parameter Selector (or Planner) is only activated if the problem is new Its job is to suggest a set of sub-goals that consist of parameters associated with or related to the abnonnal parameters to the nex( agent the Strategy Selector (or Planner2) To determine the sub-goals the Parameter Selector utilizes an object hierarchy and retrieves the relevant parameters to uulize

Volume 5 umbcr 4 Winler 1992 7S

Input Abnormal paramatars

World stata Status (OldlNaw)

Planner Recommends

parameters for Planner2 to operate on

t Problem Translator

Translates parameters

Into the DisCovery System Data Structures

-

IControl-Action Proposer I Control Action

Planner2 Recomands

Problem Solving Strategies

Using the meta functions

t Meta Functions

Operates on Parameters

to Create Meta-Rules to Effect the Control

Actions

Output

Figure 3 Discovery Subsystem Architecture

Discovery Objects Name

Parameters Convergence-strategy Control-action Related-parameters

Parameter-Subclass Parameter-Subclass Symbolic Numeric Symbol-set Upper-bound Convergence-strategy Lower-bound

Convergence-strategy

Ammeter2 Ammeter3Switch 38 Switch 3b

Figure 4 Discovery Subsystem Object Hierarchy

in the next step A schematic depiction of the parameter suggest heuristics (ie meta-rules in the meta-rule hierarchy utilized is shown in Figure 4 base) for proposing new rules based upon the sub-goals

given by the Parameter Selector agent Basically it will The second agent the Strategy Selector (or Planner2) will find useful meta-rules and fire them to hypothesize new

The Journal of Knowl~ge Engineering 76

parameter value settings for the current sub-goal by proposing new control rules to either reset or adjust the settings that the previous control rule implemented

Sometimes the meta-rules may not be sufficient 10 solve the current problem and new meta-rules must be created There are three planning strategies used by the Strategy Selector 10 accomplish this same-line-of-investigation similar -line-of -investigation and new-line-of-investigation Thus the system cannot only create new problem-solving rules but it can also create and modify its own control knowledge The same-line-of-investigation strategy creates new meta-rules by changing the settings of some of the parameters in the rules New meta-rules will be put back in the meta-rule base and will suggest new ways of proposing new problem-solving rules If this planning strategy fails the similar-line-of-investigation will be employed which uses generalization and specialization 10 modify existing meta-rules If these two strategies fail then the planner resorts to its last method that of newmiddot line-of-investigation whicH is simply a random selection

The last agent is the Control-Action Proposer which takes the current meta-rule and creates a new rule which effects a control parameter The Control-Action Proposer then sends this rule to the Performer System to be tested

Experiments

Three cases were investigated with the HDS

1) HDS operation with perfect sensor information

2) HDS operation with imperfect sensor information

3) lIDS operation using imperfect sensor information without storing the generated rules

In all cases the EPS was run in a random error mode where any of the six switches could fail The conmiddot sequences of the failure ranged from mild (a failure in switch 1 at night had no effect) to severe (a failure in switch 4 always cut off power to the bus completely) The repair action consisted of resetting the switch to its proper position Only one error was generated at a time which meant that the present error had to be fixed prior to another error being generated

The flJst case consisted of nmning the HDS with perfect sensor information The HDS sampled aU the parameters and was able to ascertain switch position information so that it could directly deduce which switch needed to be reset The actual goal-list used is given in Figure 5

Complete Domain Theory

Day Night

Ammeter 1 Ammetcr2 Ammeter3 Ammeter4 Bus-Power Switchl Switch2 Switch3A Switch38 Switch4 Battery

Switch

155 155 31 31 10lD 2 2 2 1 2 2

105 105 22 22

300 2 2 2 1 2 2

Ammeterl Ammeter Ammeter3 Ammeter4 Bus-Power Switchl Switch Switch3A SwitCh3B Switch4 Bauery

Switch

0 0 0 3l 1010 1 I 1 2 2 2

0 Q 0 9 300

I I 1 2 2 2

Incomplete Domain Theory

Day Sighl

Ammeter1 155 105 Ammeter I 0 Q Ammeter2 Ammeter3

15S 31

105 Ammelerl Ammeter3

0 Q

0 Q

Ammeter4 31 22 Ammeter4 I 9 Bus-Power 1010 300 Bus-Power 1010 300

Figure S HUBBLE Discovery System Goal Lists

Volume 5 Number 4 Winter 1992 77

In the second case the HDS only had imperfcct sensor information with ammeters and bus power as the sensed par~lIntcrs Since the errors were occurring in the switches the system had to infer thcse errors from the scnscJ parameters The goal-list used is also given in Figure 5

The third case is just like the second with one exception The critic lacks memory It must rediscover good rules on each run This was to verify that the rules created by the HDS produced an improvement in EPS operation over a random selection of rules

Two baseline runs were controls on the experiment First the simulator was run without any errors to determine the maximwn accumulated power Second a blackboard system (the Real-Time Monitor Blackboard System) utilizing 11 different rule-bases was run to compare the performance of hand-crafted rules against the performance of the HOSs discovered rules These rules were obtained from a systematic failure analysis and then optimized

Cumulative power production over time is the primary mlt1sure of the controlling systems performance Any delay in fixing errors due to incorrcct inconsislCnt or incomplete rules lead to a drop in powcr production Another measure is the number of power outages the system experiences A power outage is a period when the power is insufficient to supply the bus requirements As the system learns more and bener rules it experiences fewer outages It repairs errors using the rules that it learns

Two SUN workstations were utilized in aU experiments

The EPS simulator rJn on onc workstation and sent telemetry packets to the HOS operating on anottler SUN workSlation The HDS then sent command-loads (lists of commands) to operate the simulator This ensured thal the controlling system was operating independently of the simulation

Rule Discovery with the HUBBLE Discoverv System

- Initially the simulator was run with no errors introduced to determine the maximum power production (run Rl) This produced a set of performance data giving the accumulated power production over time (see the first row of Table 1 and Figure 6)

The second run (run R2) was made with the Real-Time Monitor Blackboard System to determine the performance of the hand-crafted rules Random switch errors were sel As the second row of Table 1 shows the manually constructed rules of the Real-Time Monitor Blackboard System control the power level very well allowing only 2 power outages the fewest of any of the runs attempted This run is graphically portrayed in Figure 6

The nrst run with the HOS (HOS 1) third row of Table 1 used perfect sensor information described earlier as case 1 There were 11 power outages during the eight orbits with 17 rules generated as shown in the third row of Table 1 In the same period it retracted only I rule that it had learned from the rule-base The rules generated were of higher quality and were more specific than the rule base generated with imperfect sensor information The rule-base generated is shown in Appendix 1

MEASURES

RUNS Number of Rules Number of RulesPower Produced Number of Power Created RetractedOutages

No Errors Generated (Rl) na

RealmiddotTime Monitor (R2)

8058 na na

na

HDS with perfect sensor

na6810 2

17 16281 11 information (HDS 1)

HDS with imperfect 6546 7 78 sensor information (HDS2)

HOS with no rule storage 4064 nana39 (HDS3)

Table 1 Comparison of HDS Testbed Cases

The Journal of Knowledge Engineering 78

IltXXX

CI) -r I 80000

c bull d

~ 600)-

S = v -I 40000 0 r c r QI

~ 20000 c

0

The performance of the HDS using imperfect sensor information (run HDS2) for the second case was not quite as good as the HDS operating with imperfect sensor information However the power produced was only 5 less than the case with the Real-Time Monitor There were 7 power OUlages an average number compared to the other runs The system generated 8 rules and in the same period retracted 7 rules from the rule-base The rule-base generated is shown in Appendix 1

The HDS for case 3 did not operate weU when unable to store any of the rules it created It only generated approximately half the power possible in run HDS3 It is obvious from the last row of Table 1 that there were many times when the system was not able to quickly determine the correct repair During this session 39 power outages were experienced the highest number of any of the experiments

A comparison of the 5 runs of the Testbed is presented in Table 1 and Figure 7middot The immediate observation is that the run with perfect sensor information was not as successful as the run with imperfect sensor information One of the reasons that this discrepancy occurred is due to the experimental methodology employed Because only one error was genemted at a time a system with better performance would encounter more errors as it would expect another error to be generated after it successfully fixed the current error If a controlling system could not

-pound1shy No Errors Generaled shy _shyshy Real-Time Blackboard Syslem

~ -shy

~ a rfI1

If

tshy

L77

shy1 o 20 40 iii)

Time (10 hours =I orhit)

Figure 6 Baseline Performance Graph

fix an error for a lengthy period of time then it would be exposed to fewer errors over the 8 orbit time period

This proved to be the case as the run with imperfcct sensor information only encountered 11 errors while the run with perfect sensor information repaired 21 errors Over a longer period of time (more than 8 orbits) the HDS with perfect sensor information should perform beuer due to a more complete and better quality rule base In some cases the error would change completely when a mode transition occurred In one such case a switch that w~ in the wrong position during the day mode was in the right position when the mode changed to night Yet the switch in question prevented another switch from changing position The system fued the new error by resetting both switches In this case when the mode transition occurred the error changed as a result Because the system was able to adapt it tracked the error across the mode transition and repaired it

The system was also tested with an incorrect domain theory where some parameter information in the goal database was incorrect (errors were not sensed properly) In this case some spurious errors were sensed This impaired but it did not cripple the learning capability Bad rules were created and placed into the rule-base but they were retracted when they did not work Thus the system managed to function acceptably although at a lower level of performance

Volume 5 Number 4 Winter 1992 79

tt~1 r-----------------------------------

---J--l--shy --__-_ -0- No Errors GCllcrlhJ I

MUlll - -0- IIl)S with CUlllpiclC Thcory -----1lZ[- III I I )

liDS witlllmolllplclc Them) I -shy BLJS with IlU Lcamillg 1----f-----1-----=ooo---- ~~----------J ~ l~

60Jll

-sect-~ I ~ 4(O---~r---_+---~-~~~~-_4~~-~--__4 I c loo lt- o c 2~t----r_-~~~~lt~~~~j_----_i------r---i----_l-1

01F---~----_+-----~-----+_----~---~---+_--~-J Il 20 IlII

Time (10 hours =1 urbit)

Figure 7 HOS Perfonnance Graph

In the longest experiment recorded one run continued for 34 passes or orbits (equivalent to approximately 50 hours of simulated time and 5 hours of wall clock time) The run was conducted with the HDS operating with incomplete knowledge During this time the HDS created 56 rules correctly keeping the good rules and retracting the ones which did not work

Related Work

To enable rule discovery by penurbing a realmiddottime process we have integrated our learning system with a simulator Buchanan et a1 [1] utilized a simulator to learn rules in the domain of high energy physics by induction over a large number of training examples In this case the simulator served as a generator of training examples from which rules were learned for an expen system We use our simulator for the same purpose but with a different learning mechanism Because of the real-time nature of our system we integrated our learning system into the perfonnance element This enabled us to take advantage of experimentation techniques

Experimentation in machine learning has been used by many systems FAHRENHEIT [9] uses a discovery program to design its own experiments to make the system closer to real world applications Zytkow applied these methods to the area of databases Rajamoney and DeJong [6] have described an elegant approach to experimentation

that is used to learn theories of physical devices This work focuses on using experiment design coupled with explanation-based learning to revise incomplete inuactable or incorrect domain theories Experimentation is perfonned to deny or conftrm predictions made by wellshyfonned hypotheses and if possible reduce the number of multiple explanations While FAHRENHEIT uses a quantitative mode) of the world Rajamoney and DeJong employ qualitative techniques We have begun to explore both paradigms in the HDS testbed

Another system utilizing experimentation is PRODIGY [2] PRODIGY experiments to improve the domain theory of a planning system As with the work of Rajamoney and DeJong PRODIGY experiments to discriminate between multiple explanations As with HOS experimentation is demand-driven and uses both domain constraints and any external feedback received HOS is more domain specific and is tailored to an engineering application

Conclusion

A methodology for discovering control knowledge in complex engineering systems has been presented The addition of a discovery system to an expert system in a dynamic environment such as process control should offer an improvement in performance and enable an expen system to learn new knowledge and improve its existing

The Journal of Knowledge Engineering 80

knowledge base Adding the ability to learn in CWTent knowledge-based systems reduces certain brittleness concerns The HDS is a demonstration of how such a system could be implemented

The HDS was tested in a domain characterized by realshytime operation and a dynamic environment The system was able to discover adequate EPS control rules while starting with no rules in its knowledge base The discovery process worked well when given either perfect sensor information or when given less perfect sensor information The performance of the combined learningproblem solving system was compared to a control case of a blackboard system that handled multiple rule bases that contained hand-crafted rules This blackboard system the Real-Time Monitor performed only slightly better than the HDS after evaluation of several EPS perfonnance criteria The HDS was tested in the case where it could not store any created rules The pertonnance of the discovery system was much worse if there was no memory of past experience

Currently the system is only able to deal with a certain subset of EPS problems Obviously a more practical system would have to deal with a wider range of parameters Also the system is only presented with situations where only one error exists at a time More robust algorithms must be tried to properly assign credit when multiple errors are present in the environment A system capable of dealing with many parameters would require a much more discriminating credit assignment

Another area which should be explored is to conven the rules learned into the format of the hand-crafted rules so that the system could be added onto the existing RealshyTime MonitOr system The discovery system was operated by itself in the testbed to obtain unbiased experimental dala Yet in a more practical application one would expect the discovery system to work in tandem with an existing knowledge base It could then add rules and improve the existing rules already formed

One of the current problems with unsupervised learning is the problem of credit assignment To this end many different strategies have been tried This area is critical as was found even in the simple domain explored here Once a rule is put into use it may be discovered that the rule is in error When docs one dccide that it is a bad rule thal it no longer contributes DynamiC real-time systems offer a potential of immediate feedback in a repeatable fashion For cases where there are multiple rules acting in sequence a classifier system as given in Holland et aI (5] may be more appropriate

Volume 5 ~umb~r 4 Winter 1992

Acknowledgments

The suppon of the National Aeronautics and Space Administration GSFC code 5223 of the work presented here is gratefully acknowledged although the positions taken are those of the authors alone

References

1 Buchanan BG Sullivan 1 Cheng TP and Clearwater SH 1988 Simulated-Assisted Inductive Learning In Proceedings ofAAAI-88 Morgan Kaufmann Saint Paul MN

2 Carbonell 1 G and Gil Y 1990 Learning By Experimenlation The Operator Reftnement Method In Machine Learning An Artijiciallnzelligence Approach Volume III ed Y KodratOff and R Michalski Morgan Kaufmann San Mateo CA

3 Forbus K D 1984 Qualilative Process Theory Artificialllllelligence 24 85-168

4 Hieb M R 1990 Knowledge Discovery in Complex Engineering Systems MS Thesis George Washington University

5 Holland 1 H Holyoak Kbull Nisbett R and Thagard P 1986 Induction Processes of Inference Learning and Discovery MIT Press Cambridge MA

6 Rajamoney S A and Dejong G F 1988 Active Explanation Reduction An Approach to the ~Iultiple Explanations Problem In Proceedings of 114 51h International Conference on MachiM Letl17ling Ann Arbor MI

7 Silverman B Gbull Hieb M R Yang H Wu Lbull Truszkowski W and Dominy R 1989 nmiddotestigarion of a Simulator-Trained Machine Discovery System for Knowledge Base Management Purposes In Procudings of JCAl-89 Workshop on Knowledge Disccgtfry in Databases Detroit MI

8 Silverman B Gbull Hieb W Rbullbull and MeLlet T ~t 1991 Unsupervised Discovery in an Operatiorll (errol Setting In Knowledge Discovery in Daubullbull~lS ~d_ G Piatetsky-Shapiro and W Frawley Cambrid~ ~ ~llT Press

9 Zytkow 1 M 1989 Overcoming FA-~-==Ts Experimentation Habit Discovery Systel lUes a Darabase Assignment Proceedings ofJC4J-~9 ~i cp on Knowledge Discovery in Databases De- )c

51

external world The purpose of the method pursued here is to explore ways to learn from real world dynamic processes to enhance an expen systems rule base

EPS Domain

A very demanding and dynamic domain was utilized for investigation This is a real-time environment where a process (in this case a power system) must be controlled to function properly The process is represented by a software simulation so that it can be more amenable to study and manipulation The domain chosen involves real-time operation diagnosis and repair of an Electrical Power Subsystem (EPS) of the HUBBLE Space Telescope satellite The full EPS would take about 2000 rules to command and control The entire EPS was scaled down by approximately one fourth and reduced in complexity to assist the development of the HDS Figure 1 shows the EPS This reduced Simulator obviously does not need nearly as many rules to command and control but it still captures the nature of the actual power system On the other hand to properly stress the discovery system several BPS perfonnance and anomaly situations have been added to the Simulator that represent actual operating conditions

SOlAR ARAAY1

SOlAR ARRAY

The main objective of any electrical power subsystem is to provide its users with a steady supply of electrical power To Culml this objective the control system must monitor the simulator at all times The simulator is capable of self-preservation in emergencies but it is not capable of maintaining optimum productivity without outside suppon If not controlled the power production of the EPS will eventually fail leaving its users unsupponed

The following components are represented in the Model (see Figure 1)

Solar arrays There are two solar array panels in the simulator each with ten solar cells Power production takes place in the solar arrays For optimum power production the arrays are adjusted in small increments to maximize their exposure to the sun Orientation and cell errors are randomly generated within certain limits and probabilities

The network The network is a set of power lines equipped with switches and various sensors It is divided into two areas Network I is the set of power lines from the solar arrays to the battery and Network 2 is the set of

BAlTERY

Figure 1 Space Telescope EPS Diagram

Volume 5 Number 4 Winter 1992 73

I

power lines from the battery to the bus These networks distribute and direct the power generated by the solar arrays through the system

In the entire network there are six switches for rerouting current through the system As useful and as necessary as they may be switches are also the cause of serious malfunctions within actual EPS systems In this simulation switches generate random errors

Sensors measure the current and voltage on the network In the entire simulalor there are four ammeters and a voltmeter Their locations are an important factor in the design of rules which detect EPS errors Network losses are disregarded in all simulations

The battery The battery stores the excess electrical power generated by the solar arrays during the day and then releases it in response to nighttime power requirements Battery charge level battery voltage stability and reconditioning problems are simulated

The bus The bus is the embodiment of all users of electrical power within the system In the simulator the bus power requirements can be adjusted depending on power production or system mission schedule In these experiments the minimum requirement for the bus power was fixed at 300 watts

Time The simulator runs on a clock of its ownshyminutes differ in length depending on machine work load The speed of the clock will vary with the speed of the machine on which it runs A pass or simulated earth orbit isalways 90 minutes with 60 minutes of it spent in sunlight This independent clock allows us to view different times and behaviors faster than would have been possible otherwise For the experiments eight orbits of dala were taken in approximately one and one half hours

This is a difficult environment for a discovery system to operate in Errors are randomly generated and must be attended to and repaired Thus there are several exceUent criteria available for measurement of the performance of the learning system including maximizing the power produced minimizing the number of power outages maximizing number of errors ftxed and improving the quality of rules generated

Hubble Discovery System Methodology

The methodology given here follows that of Silverman et aI [7] The goal of the system is unsupervised learning of a rule base by having a Critic and Learner observe how well a Performer controls the Environment (simulator)

74

The Performer explores and experiments with the Environment A Teacher or ideal system docs not exist to help the Leamer in unsupervised learning Through experimenlation the Leamer makes a formal prediction about the proper setting that a set of control variables should assume

During experimenlation a Critic exists that notices the result of making a control action and assigns blame and rewards accordingly A Leamer stores the rules that the Critic recommends for later use

In particular this theory proceeds according to the following steps

1) Given swting knowledge monitor sensed parameters (collect parameter data)

2) Notice patterns (screen the dalamiddot for abnormalities and select parameters of interest)

3) If possible generate and experiment with theories (create rule sets and test them)

4) Criticize and evaluate results (assign credit)

5) Repeat from step 1 exhaustively

The specific design of the- HDS involves a Performer which constantly monitors detects errors diagnoses faults and atlempts repairs If the Performer lacks repair steps or the repairs are not successful it asks the Discovery subsystem to learn correct repair rules

The Performer system has three agents utilizing two knowledge bases Figure 2 shows the general architecture of the Performer System The rust agent is the Screener which monitors environmental parameters to see if any are out of bounds or have abnormal readings It uses a data base called a goal-list for this screening which contains upper and lower bounds for parameters according to the mode of the process system A mode is a characteristic behavior of a system that persists for some period A mode is like a template placed over the system parameters so that they are restricted to certain subsets of their ranges In the EPS domain two example modes are day and night During the day the system operates via solar power During the night it operates by a battery

If Screener detects a problem then the Checker the second agent is given the task of processing this problem correctly If a valid repair rule exists it will send the problem to the Critic with the rule it found If no valid repair information exists then it calls the Discovery Subshysystem specifying whether it is an old or new problem

The Jownal of Knowledge Engineering

Telemetry

Repair History

Critic

Commands

Parameter Selector

(Planner 1)

Strategy Selector (Planner 2) o Specialize o Generalize o Randomize

Object Oriented Simulator (Spacecrah)

Tesl IGeneral

Rules Generator

Meta Rules

Legend

III --

Blackboard

Knowledge Base

____ one-way link

-~___shy - two-way link

c=J one or more cooperating agents

B OataFile

Figure 2 Perfonner System Architecture

The Discovery Subsystem will Lhen propose a plausible control action to try This action is implemented and its effect is evaluated by a Critic Rules are either added to or retracted from the knowledge base of the Perfonner based upon the environmental feedback received

The Lhird agent is the Critic This agent must either fire a rule to repair the EPS (if the repair knowledge is in its rulemiddotbase) or must send a control action proposed by the Discovery Subsystem In either case it will monitor the result to see if the problem is repaired If the problem disappears then Critic hypotheses Lhat the rule is a valid one and stores it in its rule-base If the problem remains it either removes the old rule from Lhe knowledge base or docs not add the proposed rule

Design of the Discovery Subsystem

The Discovery Subsystem or Proposer consists of three agents plus a set of procedures cllioo the Meta system This system is shown in Figure 3 The system utilizes an object hierarchy to model the objects dealt with in the

system The Meta system contains heuristics for pursuing different convergence strategies depending upon the history of previous experimentation

The input to the Proposer is a set of abnonnal parameters a world state (listing of selected parameter values) and the stahlS of the problem (whether it is old or new) The abnonnal parameters detected are described in qualitative tenns as in Forbus Qualitative Process Theory [3] rather than specifying exact numbers or tolerances Thus the descriptions utilized are of quantities which are world stale variables that can take on numeric values but are reasoned about in the qualitative model in purely qualitative terms

The fIrst agent the Parameter Selector (or Planner) is only activated if the problem is new Its job is to suggest a set of sub-goals that consist of parameters associated with or related to the abnonnal parameters to the nex( agent the Strategy Selector (or Planner2) To determine the sub-goals the Parameter Selector utilizes an object hierarchy and retrieves the relevant parameters to uulize

Volume 5 umbcr 4 Winler 1992 7S

Input Abnormal paramatars

World stata Status (OldlNaw)

Planner Recommends

parameters for Planner2 to operate on

t Problem Translator

Translates parameters

Into the DisCovery System Data Structures

-

IControl-Action Proposer I Control Action

Planner2 Recomands

Problem Solving Strategies

Using the meta functions

t Meta Functions

Operates on Parameters

to Create Meta-Rules to Effect the Control

Actions

Output

Figure 3 Discovery Subsystem Architecture

Discovery Objects Name

Parameters Convergence-strategy Control-action Related-parameters

Parameter-Subclass Parameter-Subclass Symbolic Numeric Symbol-set Upper-bound Convergence-strategy Lower-bound

Convergence-strategy

Ammeter2 Ammeter3Switch 38 Switch 3b

Figure 4 Discovery Subsystem Object Hierarchy

in the next step A schematic depiction of the parameter suggest heuristics (ie meta-rules in the meta-rule hierarchy utilized is shown in Figure 4 base) for proposing new rules based upon the sub-goals

given by the Parameter Selector agent Basically it will The second agent the Strategy Selector (or Planner2) will find useful meta-rules and fire them to hypothesize new

The Journal of Knowl~ge Engineering 76

parameter value settings for the current sub-goal by proposing new control rules to either reset or adjust the settings that the previous control rule implemented

Sometimes the meta-rules may not be sufficient 10 solve the current problem and new meta-rules must be created There are three planning strategies used by the Strategy Selector 10 accomplish this same-line-of-investigation similar -line-of -investigation and new-line-of-investigation Thus the system cannot only create new problem-solving rules but it can also create and modify its own control knowledge The same-line-of-investigation strategy creates new meta-rules by changing the settings of some of the parameters in the rules New meta-rules will be put back in the meta-rule base and will suggest new ways of proposing new problem-solving rules If this planning strategy fails the similar-line-of-investigation will be employed which uses generalization and specialization 10 modify existing meta-rules If these two strategies fail then the planner resorts to its last method that of newmiddot line-of-investigation whicH is simply a random selection

The last agent is the Control-Action Proposer which takes the current meta-rule and creates a new rule which effects a control parameter The Control-Action Proposer then sends this rule to the Performer System to be tested

Experiments

Three cases were investigated with the HDS

1) HDS operation with perfect sensor information

2) HDS operation with imperfect sensor information

3) lIDS operation using imperfect sensor information without storing the generated rules

In all cases the EPS was run in a random error mode where any of the six switches could fail The conmiddot sequences of the failure ranged from mild (a failure in switch 1 at night had no effect) to severe (a failure in switch 4 always cut off power to the bus completely) The repair action consisted of resetting the switch to its proper position Only one error was generated at a time which meant that the present error had to be fixed prior to another error being generated

The flJst case consisted of nmning the HDS with perfect sensor information The HDS sampled aU the parameters and was able to ascertain switch position information so that it could directly deduce which switch needed to be reset The actual goal-list used is given in Figure 5

Complete Domain Theory

Day Night

Ammeter 1 Ammetcr2 Ammeter3 Ammeter4 Bus-Power Switchl Switch2 Switch3A Switch38 Switch4 Battery

Switch

155 155 31 31 10lD 2 2 2 1 2 2

105 105 22 22

300 2 2 2 1 2 2

Ammeterl Ammeter Ammeter3 Ammeter4 Bus-Power Switchl Switch Switch3A SwitCh3B Switch4 Bauery

Switch

0 0 0 3l 1010 1 I 1 2 2 2

0 Q 0 9 300

I I 1 2 2 2

Incomplete Domain Theory

Day Sighl

Ammeter1 155 105 Ammeter I 0 Q Ammeter2 Ammeter3

15S 31

105 Ammelerl Ammeter3

0 Q

0 Q

Ammeter4 31 22 Ammeter4 I 9 Bus-Power 1010 300 Bus-Power 1010 300

Figure S HUBBLE Discovery System Goal Lists

Volume 5 Number 4 Winter 1992 77

In the second case the HDS only had imperfcct sensor information with ammeters and bus power as the sensed par~lIntcrs Since the errors were occurring in the switches the system had to infer thcse errors from the scnscJ parameters The goal-list used is also given in Figure 5

The third case is just like the second with one exception The critic lacks memory It must rediscover good rules on each run This was to verify that the rules created by the HDS produced an improvement in EPS operation over a random selection of rules

Two baseline runs were controls on the experiment First the simulator was run without any errors to determine the maximwn accumulated power Second a blackboard system (the Real-Time Monitor Blackboard System) utilizing 11 different rule-bases was run to compare the performance of hand-crafted rules against the performance of the HOSs discovered rules These rules were obtained from a systematic failure analysis and then optimized

Cumulative power production over time is the primary mlt1sure of the controlling systems performance Any delay in fixing errors due to incorrcct inconsislCnt or incomplete rules lead to a drop in powcr production Another measure is the number of power outages the system experiences A power outage is a period when the power is insufficient to supply the bus requirements As the system learns more and bener rules it experiences fewer outages It repairs errors using the rules that it learns

Two SUN workstations were utilized in aU experiments

The EPS simulator rJn on onc workstation and sent telemetry packets to the HOS operating on anottler SUN workSlation The HDS then sent command-loads (lists of commands) to operate the simulator This ensured thal the controlling system was operating independently of the simulation

Rule Discovery with the HUBBLE Discoverv System

- Initially the simulator was run with no errors introduced to determine the maximum power production (run Rl) This produced a set of performance data giving the accumulated power production over time (see the first row of Table 1 and Figure 6)

The second run (run R2) was made with the Real-Time Monitor Blackboard System to determine the performance of the hand-crafted rules Random switch errors were sel As the second row of Table 1 shows the manually constructed rules of the Real-Time Monitor Blackboard System control the power level very well allowing only 2 power outages the fewest of any of the runs attempted This run is graphically portrayed in Figure 6

The nrst run with the HOS (HOS 1) third row of Table 1 used perfect sensor information described earlier as case 1 There were 11 power outages during the eight orbits with 17 rules generated as shown in the third row of Table 1 In the same period it retracted only I rule that it had learned from the rule-base The rules generated were of higher quality and were more specific than the rule base generated with imperfect sensor information The rule-base generated is shown in Appendix 1

MEASURES

RUNS Number of Rules Number of RulesPower Produced Number of Power Created RetractedOutages

No Errors Generated (Rl) na

RealmiddotTime Monitor (R2)

8058 na na

na

HDS with perfect sensor

na6810 2

17 16281 11 information (HDS 1)

HDS with imperfect 6546 7 78 sensor information (HDS2)

HOS with no rule storage 4064 nana39 (HDS3)

Table 1 Comparison of HDS Testbed Cases

The Journal of Knowledge Engineering 78

IltXXX

CI) -r I 80000

c bull d

~ 600)-

S = v -I 40000 0 r c r QI

~ 20000 c

0

The performance of the HDS using imperfect sensor information (run HDS2) for the second case was not quite as good as the HDS operating with imperfect sensor information However the power produced was only 5 less than the case with the Real-Time Monitor There were 7 power OUlages an average number compared to the other runs The system generated 8 rules and in the same period retracted 7 rules from the rule-base The rule-base generated is shown in Appendix 1

The HDS for case 3 did not operate weU when unable to store any of the rules it created It only generated approximately half the power possible in run HDS3 It is obvious from the last row of Table 1 that there were many times when the system was not able to quickly determine the correct repair During this session 39 power outages were experienced the highest number of any of the experiments

A comparison of the 5 runs of the Testbed is presented in Table 1 and Figure 7middot The immediate observation is that the run with perfect sensor information was not as successful as the run with imperfect sensor information One of the reasons that this discrepancy occurred is due to the experimental methodology employed Because only one error was genemted at a time a system with better performance would encounter more errors as it would expect another error to be generated after it successfully fixed the current error If a controlling system could not

-pound1shy No Errors Generaled shy _shyshy Real-Time Blackboard Syslem

~ -shy

~ a rfI1

If

tshy

L77

shy1 o 20 40 iii)

Time (10 hours =I orhit)

Figure 6 Baseline Performance Graph

fix an error for a lengthy period of time then it would be exposed to fewer errors over the 8 orbit time period

This proved to be the case as the run with imperfcct sensor information only encountered 11 errors while the run with perfect sensor information repaired 21 errors Over a longer period of time (more than 8 orbits) the HDS with perfect sensor information should perform beuer due to a more complete and better quality rule base In some cases the error would change completely when a mode transition occurred In one such case a switch that w~ in the wrong position during the day mode was in the right position when the mode changed to night Yet the switch in question prevented another switch from changing position The system fued the new error by resetting both switches In this case when the mode transition occurred the error changed as a result Because the system was able to adapt it tracked the error across the mode transition and repaired it

The system was also tested with an incorrect domain theory where some parameter information in the goal database was incorrect (errors were not sensed properly) In this case some spurious errors were sensed This impaired but it did not cripple the learning capability Bad rules were created and placed into the rule-base but they were retracted when they did not work Thus the system managed to function acceptably although at a lower level of performance

Volume 5 Number 4 Winter 1992 79

tt~1 r-----------------------------------

---J--l--shy --__-_ -0- No Errors GCllcrlhJ I

MUlll - -0- IIl)S with CUlllpiclC Thcory -----1lZ[- III I I )

liDS witlllmolllplclc Them) I -shy BLJS with IlU Lcamillg 1----f-----1-----=ooo---- ~~----------J ~ l~

60Jll

-sect-~ I ~ 4(O---~r---_+---~-~~~~-_4~~-~--__4 I c loo lt- o c 2~t----r_-~~~~lt~~~~j_----_i------r---i----_l-1

01F---~----_+-----~-----+_----~---~---+_--~-J Il 20 IlII

Time (10 hours =1 urbit)

Figure 7 HOS Perfonnance Graph

In the longest experiment recorded one run continued for 34 passes or orbits (equivalent to approximately 50 hours of simulated time and 5 hours of wall clock time) The run was conducted with the HDS operating with incomplete knowledge During this time the HDS created 56 rules correctly keeping the good rules and retracting the ones which did not work

Related Work

To enable rule discovery by penurbing a realmiddottime process we have integrated our learning system with a simulator Buchanan et a1 [1] utilized a simulator to learn rules in the domain of high energy physics by induction over a large number of training examples In this case the simulator served as a generator of training examples from which rules were learned for an expen system We use our simulator for the same purpose but with a different learning mechanism Because of the real-time nature of our system we integrated our learning system into the perfonnance element This enabled us to take advantage of experimentation techniques

Experimentation in machine learning has been used by many systems FAHRENHEIT [9] uses a discovery program to design its own experiments to make the system closer to real world applications Zytkow applied these methods to the area of databases Rajamoney and DeJong [6] have described an elegant approach to experimentation

that is used to learn theories of physical devices This work focuses on using experiment design coupled with explanation-based learning to revise incomplete inuactable or incorrect domain theories Experimentation is perfonned to deny or conftrm predictions made by wellshyfonned hypotheses and if possible reduce the number of multiple explanations While FAHRENHEIT uses a quantitative mode) of the world Rajamoney and DeJong employ qualitative techniques We have begun to explore both paradigms in the HDS testbed

Another system utilizing experimentation is PRODIGY [2] PRODIGY experiments to improve the domain theory of a planning system As with the work of Rajamoney and DeJong PRODIGY experiments to discriminate between multiple explanations As with HOS experimentation is demand-driven and uses both domain constraints and any external feedback received HOS is more domain specific and is tailored to an engineering application

Conclusion

A methodology for discovering control knowledge in complex engineering systems has been presented The addition of a discovery system to an expert system in a dynamic environment such as process control should offer an improvement in performance and enable an expen system to learn new knowledge and improve its existing

The Journal of Knowledge Engineering 80

knowledge base Adding the ability to learn in CWTent knowledge-based systems reduces certain brittleness concerns The HDS is a demonstration of how such a system could be implemented

The HDS was tested in a domain characterized by realshytime operation and a dynamic environment The system was able to discover adequate EPS control rules while starting with no rules in its knowledge base The discovery process worked well when given either perfect sensor information or when given less perfect sensor information The performance of the combined learningproblem solving system was compared to a control case of a blackboard system that handled multiple rule bases that contained hand-crafted rules This blackboard system the Real-Time Monitor performed only slightly better than the HDS after evaluation of several EPS perfonnance criteria The HDS was tested in the case where it could not store any created rules The pertonnance of the discovery system was much worse if there was no memory of past experience

Currently the system is only able to deal with a certain subset of EPS problems Obviously a more practical system would have to deal with a wider range of parameters Also the system is only presented with situations where only one error exists at a time More robust algorithms must be tried to properly assign credit when multiple errors are present in the environment A system capable of dealing with many parameters would require a much more discriminating credit assignment

Another area which should be explored is to conven the rules learned into the format of the hand-crafted rules so that the system could be added onto the existing RealshyTime MonitOr system The discovery system was operated by itself in the testbed to obtain unbiased experimental dala Yet in a more practical application one would expect the discovery system to work in tandem with an existing knowledge base It could then add rules and improve the existing rules already formed

One of the current problems with unsupervised learning is the problem of credit assignment To this end many different strategies have been tried This area is critical as was found even in the simple domain explored here Once a rule is put into use it may be discovered that the rule is in error When docs one dccide that it is a bad rule thal it no longer contributes DynamiC real-time systems offer a potential of immediate feedback in a repeatable fashion For cases where there are multiple rules acting in sequence a classifier system as given in Holland et aI (5] may be more appropriate

Volume 5 ~umb~r 4 Winter 1992

Acknowledgments

The suppon of the National Aeronautics and Space Administration GSFC code 5223 of the work presented here is gratefully acknowledged although the positions taken are those of the authors alone

References

1 Buchanan BG Sullivan 1 Cheng TP and Clearwater SH 1988 Simulated-Assisted Inductive Learning In Proceedings ofAAAI-88 Morgan Kaufmann Saint Paul MN

2 Carbonell 1 G and Gil Y 1990 Learning By Experimenlation The Operator Reftnement Method In Machine Learning An Artijiciallnzelligence Approach Volume III ed Y KodratOff and R Michalski Morgan Kaufmann San Mateo CA

3 Forbus K D 1984 Qualilative Process Theory Artificialllllelligence 24 85-168

4 Hieb M R 1990 Knowledge Discovery in Complex Engineering Systems MS Thesis George Washington University

5 Holland 1 H Holyoak Kbull Nisbett R and Thagard P 1986 Induction Processes of Inference Learning and Discovery MIT Press Cambridge MA

6 Rajamoney S A and Dejong G F 1988 Active Explanation Reduction An Approach to the ~Iultiple Explanations Problem In Proceedings of 114 51h International Conference on MachiM Letl17ling Ann Arbor MI

7 Silverman B Gbull Hieb M R Yang H Wu Lbull Truszkowski W and Dominy R 1989 nmiddotestigarion of a Simulator-Trained Machine Discovery System for Knowledge Base Management Purposes In Procudings of JCAl-89 Workshop on Knowledge Disccgtfry in Databases Detroit MI

8 Silverman B Gbull Hieb W Rbullbull and MeLlet T ~t 1991 Unsupervised Discovery in an Operatiorll (errol Setting In Knowledge Discovery in Daubullbull~lS ~d_ G Piatetsky-Shapiro and W Frawley Cambrid~ ~ ~llT Press

9 Zytkow 1 M 1989 Overcoming FA-~-==Ts Experimentation Habit Discovery Systel lUes a Darabase Assignment Proceedings ofJC4J-~9 ~i cp on Knowledge Discovery in Databases De- )c

51

I

power lines from the battery to the bus These networks distribute and direct the power generated by the solar arrays through the system

In the entire network there are six switches for rerouting current through the system As useful and as necessary as they may be switches are also the cause of serious malfunctions within actual EPS systems In this simulation switches generate random errors

Sensors measure the current and voltage on the network In the entire simulalor there are four ammeters and a voltmeter Their locations are an important factor in the design of rules which detect EPS errors Network losses are disregarded in all simulations

The battery The battery stores the excess electrical power generated by the solar arrays during the day and then releases it in response to nighttime power requirements Battery charge level battery voltage stability and reconditioning problems are simulated

The bus The bus is the embodiment of all users of electrical power within the system In the simulator the bus power requirements can be adjusted depending on power production or system mission schedule In these experiments the minimum requirement for the bus power was fixed at 300 watts

Time The simulator runs on a clock of its ownshyminutes differ in length depending on machine work load The speed of the clock will vary with the speed of the machine on which it runs A pass or simulated earth orbit isalways 90 minutes with 60 minutes of it spent in sunlight This independent clock allows us to view different times and behaviors faster than would have been possible otherwise For the experiments eight orbits of dala were taken in approximately one and one half hours

This is a difficult environment for a discovery system to operate in Errors are randomly generated and must be attended to and repaired Thus there are several exceUent criteria available for measurement of the performance of the learning system including maximizing the power produced minimizing the number of power outages maximizing number of errors ftxed and improving the quality of rules generated

Hubble Discovery System Methodology

The methodology given here follows that of Silverman et aI [7] The goal of the system is unsupervised learning of a rule base by having a Critic and Learner observe how well a Performer controls the Environment (simulator)

74

The Performer explores and experiments with the Environment A Teacher or ideal system docs not exist to help the Leamer in unsupervised learning Through experimenlation the Leamer makes a formal prediction about the proper setting that a set of control variables should assume

During experimenlation a Critic exists that notices the result of making a control action and assigns blame and rewards accordingly A Leamer stores the rules that the Critic recommends for later use

In particular this theory proceeds according to the following steps

1) Given swting knowledge monitor sensed parameters (collect parameter data)

2) Notice patterns (screen the dalamiddot for abnormalities and select parameters of interest)

3) If possible generate and experiment with theories (create rule sets and test them)

4) Criticize and evaluate results (assign credit)

5) Repeat from step 1 exhaustively

The specific design of the- HDS involves a Performer which constantly monitors detects errors diagnoses faults and atlempts repairs If the Performer lacks repair steps or the repairs are not successful it asks the Discovery subsystem to learn correct repair rules

The Performer system has three agents utilizing two knowledge bases Figure 2 shows the general architecture of the Performer System The rust agent is the Screener which monitors environmental parameters to see if any are out of bounds or have abnormal readings It uses a data base called a goal-list for this screening which contains upper and lower bounds for parameters according to the mode of the process system A mode is a characteristic behavior of a system that persists for some period A mode is like a template placed over the system parameters so that they are restricted to certain subsets of their ranges In the EPS domain two example modes are day and night During the day the system operates via solar power During the night it operates by a battery

If Screener detects a problem then the Checker the second agent is given the task of processing this problem correctly If a valid repair rule exists it will send the problem to the Critic with the rule it found If no valid repair information exists then it calls the Discovery Subshysystem specifying whether it is an old or new problem

The Jownal of Knowledge Engineering

Telemetry

Repair History

Critic

Commands

Parameter Selector

(Planner 1)

Strategy Selector (Planner 2) o Specialize o Generalize o Randomize

Object Oriented Simulator (Spacecrah)

Tesl IGeneral

Rules Generator

Meta Rules

Legend

III --

Blackboard

Knowledge Base

____ one-way link

-~___shy - two-way link

c=J one or more cooperating agents

B OataFile

Figure 2 Perfonner System Architecture

The Discovery Subsystem will Lhen propose a plausible control action to try This action is implemented and its effect is evaluated by a Critic Rules are either added to or retracted from the knowledge base of the Perfonner based upon the environmental feedback received

The Lhird agent is the Critic This agent must either fire a rule to repair the EPS (if the repair knowledge is in its rulemiddotbase) or must send a control action proposed by the Discovery Subsystem In either case it will monitor the result to see if the problem is repaired If the problem disappears then Critic hypotheses Lhat the rule is a valid one and stores it in its rule-base If the problem remains it either removes the old rule from Lhe knowledge base or docs not add the proposed rule

Design of the Discovery Subsystem

The Discovery Subsystem or Proposer consists of three agents plus a set of procedures cllioo the Meta system This system is shown in Figure 3 The system utilizes an object hierarchy to model the objects dealt with in the

system The Meta system contains heuristics for pursuing different convergence strategies depending upon the history of previous experimentation

The input to the Proposer is a set of abnonnal parameters a world state (listing of selected parameter values) and the stahlS of the problem (whether it is old or new) The abnonnal parameters detected are described in qualitative tenns as in Forbus Qualitative Process Theory [3] rather than specifying exact numbers or tolerances Thus the descriptions utilized are of quantities which are world stale variables that can take on numeric values but are reasoned about in the qualitative model in purely qualitative terms

The fIrst agent the Parameter Selector (or Planner) is only activated if the problem is new Its job is to suggest a set of sub-goals that consist of parameters associated with or related to the abnonnal parameters to the nex( agent the Strategy Selector (or Planner2) To determine the sub-goals the Parameter Selector utilizes an object hierarchy and retrieves the relevant parameters to uulize

Volume 5 umbcr 4 Winler 1992 7S

Input Abnormal paramatars

World stata Status (OldlNaw)

Planner Recommends

parameters for Planner2 to operate on

t Problem Translator

Translates parameters

Into the DisCovery System Data Structures

-

IControl-Action Proposer I Control Action

Planner2 Recomands

Problem Solving Strategies

Using the meta functions

t Meta Functions

Operates on Parameters

to Create Meta-Rules to Effect the Control

Actions

Output

Figure 3 Discovery Subsystem Architecture

Discovery Objects Name

Parameters Convergence-strategy Control-action Related-parameters

Parameter-Subclass Parameter-Subclass Symbolic Numeric Symbol-set Upper-bound Convergence-strategy Lower-bound

Convergence-strategy

Ammeter2 Ammeter3Switch 38 Switch 3b

Figure 4 Discovery Subsystem Object Hierarchy

in the next step A schematic depiction of the parameter suggest heuristics (ie meta-rules in the meta-rule hierarchy utilized is shown in Figure 4 base) for proposing new rules based upon the sub-goals

given by the Parameter Selector agent Basically it will The second agent the Strategy Selector (or Planner2) will find useful meta-rules and fire them to hypothesize new

The Journal of Knowl~ge Engineering 76

parameter value settings for the current sub-goal by proposing new control rules to either reset or adjust the settings that the previous control rule implemented

Sometimes the meta-rules may not be sufficient 10 solve the current problem and new meta-rules must be created There are three planning strategies used by the Strategy Selector 10 accomplish this same-line-of-investigation similar -line-of -investigation and new-line-of-investigation Thus the system cannot only create new problem-solving rules but it can also create and modify its own control knowledge The same-line-of-investigation strategy creates new meta-rules by changing the settings of some of the parameters in the rules New meta-rules will be put back in the meta-rule base and will suggest new ways of proposing new problem-solving rules If this planning strategy fails the similar-line-of-investigation will be employed which uses generalization and specialization 10 modify existing meta-rules If these two strategies fail then the planner resorts to its last method that of newmiddot line-of-investigation whicH is simply a random selection

The last agent is the Control-Action Proposer which takes the current meta-rule and creates a new rule which effects a control parameter The Control-Action Proposer then sends this rule to the Performer System to be tested

Experiments

Three cases were investigated with the HDS

1) HDS operation with perfect sensor information

2) HDS operation with imperfect sensor information

3) lIDS operation using imperfect sensor information without storing the generated rules

In all cases the EPS was run in a random error mode where any of the six switches could fail The conmiddot sequences of the failure ranged from mild (a failure in switch 1 at night had no effect) to severe (a failure in switch 4 always cut off power to the bus completely) The repair action consisted of resetting the switch to its proper position Only one error was generated at a time which meant that the present error had to be fixed prior to another error being generated

The flJst case consisted of nmning the HDS with perfect sensor information The HDS sampled aU the parameters and was able to ascertain switch position information so that it could directly deduce which switch needed to be reset The actual goal-list used is given in Figure 5

Complete Domain Theory

Day Night

Ammeter 1 Ammetcr2 Ammeter3 Ammeter4 Bus-Power Switchl Switch2 Switch3A Switch38 Switch4 Battery

Switch

155 155 31 31 10lD 2 2 2 1 2 2

105 105 22 22

300 2 2 2 1 2 2

Ammeterl Ammeter Ammeter3 Ammeter4 Bus-Power Switchl Switch Switch3A SwitCh3B Switch4 Bauery

Switch

0 0 0 3l 1010 1 I 1 2 2 2

0 Q 0 9 300

I I 1 2 2 2

Incomplete Domain Theory

Day Sighl

Ammeter1 155 105 Ammeter I 0 Q Ammeter2 Ammeter3

15S 31

105 Ammelerl Ammeter3

0 Q

0 Q

Ammeter4 31 22 Ammeter4 I 9 Bus-Power 1010 300 Bus-Power 1010 300

Figure S HUBBLE Discovery System Goal Lists

Volume 5 Number 4 Winter 1992 77

In the second case the HDS only had imperfcct sensor information with ammeters and bus power as the sensed par~lIntcrs Since the errors were occurring in the switches the system had to infer thcse errors from the scnscJ parameters The goal-list used is also given in Figure 5

The third case is just like the second with one exception The critic lacks memory It must rediscover good rules on each run This was to verify that the rules created by the HDS produced an improvement in EPS operation over a random selection of rules

Two baseline runs were controls on the experiment First the simulator was run without any errors to determine the maximwn accumulated power Second a blackboard system (the Real-Time Monitor Blackboard System) utilizing 11 different rule-bases was run to compare the performance of hand-crafted rules against the performance of the HOSs discovered rules These rules were obtained from a systematic failure analysis and then optimized

Cumulative power production over time is the primary mlt1sure of the controlling systems performance Any delay in fixing errors due to incorrcct inconsislCnt or incomplete rules lead to a drop in powcr production Another measure is the number of power outages the system experiences A power outage is a period when the power is insufficient to supply the bus requirements As the system learns more and bener rules it experiences fewer outages It repairs errors using the rules that it learns

Two SUN workstations were utilized in aU experiments

The EPS simulator rJn on onc workstation and sent telemetry packets to the HOS operating on anottler SUN workSlation The HDS then sent command-loads (lists of commands) to operate the simulator This ensured thal the controlling system was operating independently of the simulation

Rule Discovery with the HUBBLE Discoverv System

- Initially the simulator was run with no errors introduced to determine the maximum power production (run Rl) This produced a set of performance data giving the accumulated power production over time (see the first row of Table 1 and Figure 6)

The second run (run R2) was made with the Real-Time Monitor Blackboard System to determine the performance of the hand-crafted rules Random switch errors were sel As the second row of Table 1 shows the manually constructed rules of the Real-Time Monitor Blackboard System control the power level very well allowing only 2 power outages the fewest of any of the runs attempted This run is graphically portrayed in Figure 6

The nrst run with the HOS (HOS 1) third row of Table 1 used perfect sensor information described earlier as case 1 There were 11 power outages during the eight orbits with 17 rules generated as shown in the third row of Table 1 In the same period it retracted only I rule that it had learned from the rule-base The rules generated were of higher quality and were more specific than the rule base generated with imperfect sensor information The rule-base generated is shown in Appendix 1

MEASURES

RUNS Number of Rules Number of RulesPower Produced Number of Power Created RetractedOutages

No Errors Generated (Rl) na

RealmiddotTime Monitor (R2)

8058 na na

na

HDS with perfect sensor

na6810 2

17 16281 11 information (HDS 1)

HDS with imperfect 6546 7 78 sensor information (HDS2)

HOS with no rule storage 4064 nana39 (HDS3)

Table 1 Comparison of HDS Testbed Cases

The Journal of Knowledge Engineering 78

IltXXX

CI) -r I 80000

c bull d

~ 600)-

S = v -I 40000 0 r c r QI

~ 20000 c

0

The performance of the HDS using imperfect sensor information (run HDS2) for the second case was not quite as good as the HDS operating with imperfect sensor information However the power produced was only 5 less than the case with the Real-Time Monitor There were 7 power OUlages an average number compared to the other runs The system generated 8 rules and in the same period retracted 7 rules from the rule-base The rule-base generated is shown in Appendix 1

The HDS for case 3 did not operate weU when unable to store any of the rules it created It only generated approximately half the power possible in run HDS3 It is obvious from the last row of Table 1 that there were many times when the system was not able to quickly determine the correct repair During this session 39 power outages were experienced the highest number of any of the experiments

A comparison of the 5 runs of the Testbed is presented in Table 1 and Figure 7middot The immediate observation is that the run with perfect sensor information was not as successful as the run with imperfect sensor information One of the reasons that this discrepancy occurred is due to the experimental methodology employed Because only one error was genemted at a time a system with better performance would encounter more errors as it would expect another error to be generated after it successfully fixed the current error If a controlling system could not

-pound1shy No Errors Generaled shy _shyshy Real-Time Blackboard Syslem

~ -shy

~ a rfI1

If

tshy

L77

shy1 o 20 40 iii)

Time (10 hours =I orhit)

Figure 6 Baseline Performance Graph

fix an error for a lengthy period of time then it would be exposed to fewer errors over the 8 orbit time period

This proved to be the case as the run with imperfcct sensor information only encountered 11 errors while the run with perfect sensor information repaired 21 errors Over a longer period of time (more than 8 orbits) the HDS with perfect sensor information should perform beuer due to a more complete and better quality rule base In some cases the error would change completely when a mode transition occurred In one such case a switch that w~ in the wrong position during the day mode was in the right position when the mode changed to night Yet the switch in question prevented another switch from changing position The system fued the new error by resetting both switches In this case when the mode transition occurred the error changed as a result Because the system was able to adapt it tracked the error across the mode transition and repaired it

The system was also tested with an incorrect domain theory where some parameter information in the goal database was incorrect (errors were not sensed properly) In this case some spurious errors were sensed This impaired but it did not cripple the learning capability Bad rules were created and placed into the rule-base but they were retracted when they did not work Thus the system managed to function acceptably although at a lower level of performance

Volume 5 Number 4 Winter 1992 79

tt~1 r-----------------------------------

---J--l--shy --__-_ -0- No Errors GCllcrlhJ I

MUlll - -0- IIl)S with CUlllpiclC Thcory -----1lZ[- III I I )

liDS witlllmolllplclc Them) I -shy BLJS with IlU Lcamillg 1----f-----1-----=ooo---- ~~----------J ~ l~

60Jll

-sect-~ I ~ 4(O---~r---_+---~-~~~~-_4~~-~--__4 I c loo lt- o c 2~t----r_-~~~~lt~~~~j_----_i------r---i----_l-1

01F---~----_+-----~-----+_----~---~---+_--~-J Il 20 IlII

Time (10 hours =1 urbit)

Figure 7 HOS Perfonnance Graph

In the longest experiment recorded one run continued for 34 passes or orbits (equivalent to approximately 50 hours of simulated time and 5 hours of wall clock time) The run was conducted with the HDS operating with incomplete knowledge During this time the HDS created 56 rules correctly keeping the good rules and retracting the ones which did not work

Related Work

To enable rule discovery by penurbing a realmiddottime process we have integrated our learning system with a simulator Buchanan et a1 [1] utilized a simulator to learn rules in the domain of high energy physics by induction over a large number of training examples In this case the simulator served as a generator of training examples from which rules were learned for an expen system We use our simulator for the same purpose but with a different learning mechanism Because of the real-time nature of our system we integrated our learning system into the perfonnance element This enabled us to take advantage of experimentation techniques

Experimentation in machine learning has been used by many systems FAHRENHEIT [9] uses a discovery program to design its own experiments to make the system closer to real world applications Zytkow applied these methods to the area of databases Rajamoney and DeJong [6] have described an elegant approach to experimentation

that is used to learn theories of physical devices This work focuses on using experiment design coupled with explanation-based learning to revise incomplete inuactable or incorrect domain theories Experimentation is perfonned to deny or conftrm predictions made by wellshyfonned hypotheses and if possible reduce the number of multiple explanations While FAHRENHEIT uses a quantitative mode) of the world Rajamoney and DeJong employ qualitative techniques We have begun to explore both paradigms in the HDS testbed

Another system utilizing experimentation is PRODIGY [2] PRODIGY experiments to improve the domain theory of a planning system As with the work of Rajamoney and DeJong PRODIGY experiments to discriminate between multiple explanations As with HOS experimentation is demand-driven and uses both domain constraints and any external feedback received HOS is more domain specific and is tailored to an engineering application

Conclusion

A methodology for discovering control knowledge in complex engineering systems has been presented The addition of a discovery system to an expert system in a dynamic environment such as process control should offer an improvement in performance and enable an expen system to learn new knowledge and improve its existing

The Journal of Knowledge Engineering 80

knowledge base Adding the ability to learn in CWTent knowledge-based systems reduces certain brittleness concerns The HDS is a demonstration of how such a system could be implemented

The HDS was tested in a domain characterized by realshytime operation and a dynamic environment The system was able to discover adequate EPS control rules while starting with no rules in its knowledge base The discovery process worked well when given either perfect sensor information or when given less perfect sensor information The performance of the combined learningproblem solving system was compared to a control case of a blackboard system that handled multiple rule bases that contained hand-crafted rules This blackboard system the Real-Time Monitor performed only slightly better than the HDS after evaluation of several EPS perfonnance criteria The HDS was tested in the case where it could not store any created rules The pertonnance of the discovery system was much worse if there was no memory of past experience

Currently the system is only able to deal with a certain subset of EPS problems Obviously a more practical system would have to deal with a wider range of parameters Also the system is only presented with situations where only one error exists at a time More robust algorithms must be tried to properly assign credit when multiple errors are present in the environment A system capable of dealing with many parameters would require a much more discriminating credit assignment

Another area which should be explored is to conven the rules learned into the format of the hand-crafted rules so that the system could be added onto the existing RealshyTime MonitOr system The discovery system was operated by itself in the testbed to obtain unbiased experimental dala Yet in a more practical application one would expect the discovery system to work in tandem with an existing knowledge base It could then add rules and improve the existing rules already formed

One of the current problems with unsupervised learning is the problem of credit assignment To this end many different strategies have been tried This area is critical as was found even in the simple domain explored here Once a rule is put into use it may be discovered that the rule is in error When docs one dccide that it is a bad rule thal it no longer contributes DynamiC real-time systems offer a potential of immediate feedback in a repeatable fashion For cases where there are multiple rules acting in sequence a classifier system as given in Holland et aI (5] may be more appropriate

Volume 5 ~umb~r 4 Winter 1992

Acknowledgments

The suppon of the National Aeronautics and Space Administration GSFC code 5223 of the work presented here is gratefully acknowledged although the positions taken are those of the authors alone

References

1 Buchanan BG Sullivan 1 Cheng TP and Clearwater SH 1988 Simulated-Assisted Inductive Learning In Proceedings ofAAAI-88 Morgan Kaufmann Saint Paul MN

2 Carbonell 1 G and Gil Y 1990 Learning By Experimenlation The Operator Reftnement Method In Machine Learning An Artijiciallnzelligence Approach Volume III ed Y KodratOff and R Michalski Morgan Kaufmann San Mateo CA

3 Forbus K D 1984 Qualilative Process Theory Artificialllllelligence 24 85-168

4 Hieb M R 1990 Knowledge Discovery in Complex Engineering Systems MS Thesis George Washington University

5 Holland 1 H Holyoak Kbull Nisbett R and Thagard P 1986 Induction Processes of Inference Learning and Discovery MIT Press Cambridge MA

6 Rajamoney S A and Dejong G F 1988 Active Explanation Reduction An Approach to the ~Iultiple Explanations Problem In Proceedings of 114 51h International Conference on MachiM Letl17ling Ann Arbor MI

7 Silverman B Gbull Hieb M R Yang H Wu Lbull Truszkowski W and Dominy R 1989 nmiddotestigarion of a Simulator-Trained Machine Discovery System for Knowledge Base Management Purposes In Procudings of JCAl-89 Workshop on Knowledge Disccgtfry in Databases Detroit MI

8 Silverman B Gbull Hieb W Rbullbull and MeLlet T ~t 1991 Unsupervised Discovery in an Operatiorll (errol Setting In Knowledge Discovery in Daubullbull~lS ~d_ G Piatetsky-Shapiro and W Frawley Cambrid~ ~ ~llT Press

9 Zytkow 1 M 1989 Overcoming FA-~-==Ts Experimentation Habit Discovery Systel lUes a Darabase Assignment Proceedings ofJC4J-~9 ~i cp on Knowledge Discovery in Databases De- )c

51

Telemetry

Repair History

Critic

Commands

Parameter Selector

(Planner 1)

Strategy Selector (Planner 2) o Specialize o Generalize o Randomize

Object Oriented Simulator (Spacecrah)

Tesl IGeneral

Rules Generator

Meta Rules

Legend

III --

Blackboard

Knowledge Base

____ one-way link

-~___shy - two-way link

c=J one or more cooperating agents

B OataFile

Figure 2 Perfonner System Architecture

The Discovery Subsystem will Lhen propose a plausible control action to try This action is implemented and its effect is evaluated by a Critic Rules are either added to or retracted from the knowledge base of the Perfonner based upon the environmental feedback received

The Lhird agent is the Critic This agent must either fire a rule to repair the EPS (if the repair knowledge is in its rulemiddotbase) or must send a control action proposed by the Discovery Subsystem In either case it will monitor the result to see if the problem is repaired If the problem disappears then Critic hypotheses Lhat the rule is a valid one and stores it in its rule-base If the problem remains it either removes the old rule from Lhe knowledge base or docs not add the proposed rule

Design of the Discovery Subsystem

The Discovery Subsystem or Proposer consists of three agents plus a set of procedures cllioo the Meta system This system is shown in Figure 3 The system utilizes an object hierarchy to model the objects dealt with in the

system The Meta system contains heuristics for pursuing different convergence strategies depending upon the history of previous experimentation

The input to the Proposer is a set of abnonnal parameters a world state (listing of selected parameter values) and the stahlS of the problem (whether it is old or new) The abnonnal parameters detected are described in qualitative tenns as in Forbus Qualitative Process Theory [3] rather than specifying exact numbers or tolerances Thus the descriptions utilized are of quantities which are world stale variables that can take on numeric values but are reasoned about in the qualitative model in purely qualitative terms

The fIrst agent the Parameter Selector (or Planner) is only activated if the problem is new Its job is to suggest a set of sub-goals that consist of parameters associated with or related to the abnonnal parameters to the nex( agent the Strategy Selector (or Planner2) To determine the sub-goals the Parameter Selector utilizes an object hierarchy and retrieves the relevant parameters to uulize

Volume 5 umbcr 4 Winler 1992 7S

Input Abnormal paramatars

World stata Status (OldlNaw)

Planner Recommends

parameters for Planner2 to operate on

t Problem Translator

Translates parameters

Into the DisCovery System Data Structures

-

IControl-Action Proposer I Control Action

Planner2 Recomands

Problem Solving Strategies

Using the meta functions

t Meta Functions

Operates on Parameters

to Create Meta-Rules to Effect the Control

Actions

Output

Figure 3 Discovery Subsystem Architecture

Discovery Objects Name

Parameters Convergence-strategy Control-action Related-parameters

Parameter-Subclass Parameter-Subclass Symbolic Numeric Symbol-set Upper-bound Convergence-strategy Lower-bound

Convergence-strategy

Ammeter2 Ammeter3Switch 38 Switch 3b

Figure 4 Discovery Subsystem Object Hierarchy

in the next step A schematic depiction of the parameter suggest heuristics (ie meta-rules in the meta-rule hierarchy utilized is shown in Figure 4 base) for proposing new rules based upon the sub-goals

given by the Parameter Selector agent Basically it will The second agent the Strategy Selector (or Planner2) will find useful meta-rules and fire them to hypothesize new

The Journal of Knowl~ge Engineering 76

parameter value settings for the current sub-goal by proposing new control rules to either reset or adjust the settings that the previous control rule implemented

Sometimes the meta-rules may not be sufficient 10 solve the current problem and new meta-rules must be created There are three planning strategies used by the Strategy Selector 10 accomplish this same-line-of-investigation similar -line-of -investigation and new-line-of-investigation Thus the system cannot only create new problem-solving rules but it can also create and modify its own control knowledge The same-line-of-investigation strategy creates new meta-rules by changing the settings of some of the parameters in the rules New meta-rules will be put back in the meta-rule base and will suggest new ways of proposing new problem-solving rules If this planning strategy fails the similar-line-of-investigation will be employed which uses generalization and specialization 10 modify existing meta-rules If these two strategies fail then the planner resorts to its last method that of newmiddot line-of-investigation whicH is simply a random selection

The last agent is the Control-Action Proposer which takes the current meta-rule and creates a new rule which effects a control parameter The Control-Action Proposer then sends this rule to the Performer System to be tested

Experiments

Three cases were investigated with the HDS

1) HDS operation with perfect sensor information

2) HDS operation with imperfect sensor information

3) lIDS operation using imperfect sensor information without storing the generated rules

In all cases the EPS was run in a random error mode where any of the six switches could fail The conmiddot sequences of the failure ranged from mild (a failure in switch 1 at night had no effect) to severe (a failure in switch 4 always cut off power to the bus completely) The repair action consisted of resetting the switch to its proper position Only one error was generated at a time which meant that the present error had to be fixed prior to another error being generated

The flJst case consisted of nmning the HDS with perfect sensor information The HDS sampled aU the parameters and was able to ascertain switch position information so that it could directly deduce which switch needed to be reset The actual goal-list used is given in Figure 5

Complete Domain Theory

Day Night

Ammeter 1 Ammetcr2 Ammeter3 Ammeter4 Bus-Power Switchl Switch2 Switch3A Switch38 Switch4 Battery

Switch

155 155 31 31 10lD 2 2 2 1 2 2

105 105 22 22

300 2 2 2 1 2 2

Ammeterl Ammeter Ammeter3 Ammeter4 Bus-Power Switchl Switch Switch3A SwitCh3B Switch4 Bauery

Switch

0 0 0 3l 1010 1 I 1 2 2 2

0 Q 0 9 300

I I 1 2 2 2

Incomplete Domain Theory

Day Sighl

Ammeter1 155 105 Ammeter I 0 Q Ammeter2 Ammeter3

15S 31

105 Ammelerl Ammeter3

0 Q

0 Q

Ammeter4 31 22 Ammeter4 I 9 Bus-Power 1010 300 Bus-Power 1010 300

Figure S HUBBLE Discovery System Goal Lists

Volume 5 Number 4 Winter 1992 77

In the second case the HDS only had imperfcct sensor information with ammeters and bus power as the sensed par~lIntcrs Since the errors were occurring in the switches the system had to infer thcse errors from the scnscJ parameters The goal-list used is also given in Figure 5

The third case is just like the second with one exception The critic lacks memory It must rediscover good rules on each run This was to verify that the rules created by the HDS produced an improvement in EPS operation over a random selection of rules

Two baseline runs were controls on the experiment First the simulator was run without any errors to determine the maximwn accumulated power Second a blackboard system (the Real-Time Monitor Blackboard System) utilizing 11 different rule-bases was run to compare the performance of hand-crafted rules against the performance of the HOSs discovered rules These rules were obtained from a systematic failure analysis and then optimized

Cumulative power production over time is the primary mlt1sure of the controlling systems performance Any delay in fixing errors due to incorrcct inconsislCnt or incomplete rules lead to a drop in powcr production Another measure is the number of power outages the system experiences A power outage is a period when the power is insufficient to supply the bus requirements As the system learns more and bener rules it experiences fewer outages It repairs errors using the rules that it learns

Two SUN workstations were utilized in aU experiments

The EPS simulator rJn on onc workstation and sent telemetry packets to the HOS operating on anottler SUN workSlation The HDS then sent command-loads (lists of commands) to operate the simulator This ensured thal the controlling system was operating independently of the simulation

Rule Discovery with the HUBBLE Discoverv System

- Initially the simulator was run with no errors introduced to determine the maximum power production (run Rl) This produced a set of performance data giving the accumulated power production over time (see the first row of Table 1 and Figure 6)

The second run (run R2) was made with the Real-Time Monitor Blackboard System to determine the performance of the hand-crafted rules Random switch errors were sel As the second row of Table 1 shows the manually constructed rules of the Real-Time Monitor Blackboard System control the power level very well allowing only 2 power outages the fewest of any of the runs attempted This run is graphically portrayed in Figure 6

The nrst run with the HOS (HOS 1) third row of Table 1 used perfect sensor information described earlier as case 1 There were 11 power outages during the eight orbits with 17 rules generated as shown in the third row of Table 1 In the same period it retracted only I rule that it had learned from the rule-base The rules generated were of higher quality and were more specific than the rule base generated with imperfect sensor information The rule-base generated is shown in Appendix 1

MEASURES

RUNS Number of Rules Number of RulesPower Produced Number of Power Created RetractedOutages

No Errors Generated (Rl) na

RealmiddotTime Monitor (R2)

8058 na na

na

HDS with perfect sensor

na6810 2

17 16281 11 information (HDS 1)

HDS with imperfect 6546 7 78 sensor information (HDS2)

HOS with no rule storage 4064 nana39 (HDS3)

Table 1 Comparison of HDS Testbed Cases

The Journal of Knowledge Engineering 78

IltXXX

CI) -r I 80000

c bull d

~ 600)-

S = v -I 40000 0 r c r QI

~ 20000 c

0

The performance of the HDS using imperfect sensor information (run HDS2) for the second case was not quite as good as the HDS operating with imperfect sensor information However the power produced was only 5 less than the case with the Real-Time Monitor There were 7 power OUlages an average number compared to the other runs The system generated 8 rules and in the same period retracted 7 rules from the rule-base The rule-base generated is shown in Appendix 1

The HDS for case 3 did not operate weU when unable to store any of the rules it created It only generated approximately half the power possible in run HDS3 It is obvious from the last row of Table 1 that there were many times when the system was not able to quickly determine the correct repair During this session 39 power outages were experienced the highest number of any of the experiments

A comparison of the 5 runs of the Testbed is presented in Table 1 and Figure 7middot The immediate observation is that the run with perfect sensor information was not as successful as the run with imperfect sensor information One of the reasons that this discrepancy occurred is due to the experimental methodology employed Because only one error was genemted at a time a system with better performance would encounter more errors as it would expect another error to be generated after it successfully fixed the current error If a controlling system could not

-pound1shy No Errors Generaled shy _shyshy Real-Time Blackboard Syslem

~ -shy

~ a rfI1

If

tshy

L77

shy1 o 20 40 iii)

Time (10 hours =I orhit)

Figure 6 Baseline Performance Graph

fix an error for a lengthy period of time then it would be exposed to fewer errors over the 8 orbit time period

This proved to be the case as the run with imperfcct sensor information only encountered 11 errors while the run with perfect sensor information repaired 21 errors Over a longer period of time (more than 8 orbits) the HDS with perfect sensor information should perform beuer due to a more complete and better quality rule base In some cases the error would change completely when a mode transition occurred In one such case a switch that w~ in the wrong position during the day mode was in the right position when the mode changed to night Yet the switch in question prevented another switch from changing position The system fued the new error by resetting both switches In this case when the mode transition occurred the error changed as a result Because the system was able to adapt it tracked the error across the mode transition and repaired it

The system was also tested with an incorrect domain theory where some parameter information in the goal database was incorrect (errors were not sensed properly) In this case some spurious errors were sensed This impaired but it did not cripple the learning capability Bad rules were created and placed into the rule-base but they were retracted when they did not work Thus the system managed to function acceptably although at a lower level of performance

Volume 5 Number 4 Winter 1992 79

tt~1 r-----------------------------------

---J--l--shy --__-_ -0- No Errors GCllcrlhJ I

MUlll - -0- IIl)S with CUlllpiclC Thcory -----1lZ[- III I I )

liDS witlllmolllplclc Them) I -shy BLJS with IlU Lcamillg 1----f-----1-----=ooo---- ~~----------J ~ l~

60Jll

-sect-~ I ~ 4(O---~r---_+---~-~~~~-_4~~-~--__4 I c loo lt- o c 2~t----r_-~~~~lt~~~~j_----_i------r---i----_l-1

01F---~----_+-----~-----+_----~---~---+_--~-J Il 20 IlII

Time (10 hours =1 urbit)

Figure 7 HOS Perfonnance Graph

In the longest experiment recorded one run continued for 34 passes or orbits (equivalent to approximately 50 hours of simulated time and 5 hours of wall clock time) The run was conducted with the HDS operating with incomplete knowledge During this time the HDS created 56 rules correctly keeping the good rules and retracting the ones which did not work

Related Work

To enable rule discovery by penurbing a realmiddottime process we have integrated our learning system with a simulator Buchanan et a1 [1] utilized a simulator to learn rules in the domain of high energy physics by induction over a large number of training examples In this case the simulator served as a generator of training examples from which rules were learned for an expen system We use our simulator for the same purpose but with a different learning mechanism Because of the real-time nature of our system we integrated our learning system into the perfonnance element This enabled us to take advantage of experimentation techniques

Experimentation in machine learning has been used by many systems FAHRENHEIT [9] uses a discovery program to design its own experiments to make the system closer to real world applications Zytkow applied these methods to the area of databases Rajamoney and DeJong [6] have described an elegant approach to experimentation

that is used to learn theories of physical devices This work focuses on using experiment design coupled with explanation-based learning to revise incomplete inuactable or incorrect domain theories Experimentation is perfonned to deny or conftrm predictions made by wellshyfonned hypotheses and if possible reduce the number of multiple explanations While FAHRENHEIT uses a quantitative mode) of the world Rajamoney and DeJong employ qualitative techniques We have begun to explore both paradigms in the HDS testbed

Another system utilizing experimentation is PRODIGY [2] PRODIGY experiments to improve the domain theory of a planning system As with the work of Rajamoney and DeJong PRODIGY experiments to discriminate between multiple explanations As with HOS experimentation is demand-driven and uses both domain constraints and any external feedback received HOS is more domain specific and is tailored to an engineering application

Conclusion

A methodology for discovering control knowledge in complex engineering systems has been presented The addition of a discovery system to an expert system in a dynamic environment such as process control should offer an improvement in performance and enable an expen system to learn new knowledge and improve its existing

The Journal of Knowledge Engineering 80

knowledge base Adding the ability to learn in CWTent knowledge-based systems reduces certain brittleness concerns The HDS is a demonstration of how such a system could be implemented

The HDS was tested in a domain characterized by realshytime operation and a dynamic environment The system was able to discover adequate EPS control rules while starting with no rules in its knowledge base The discovery process worked well when given either perfect sensor information or when given less perfect sensor information The performance of the combined learningproblem solving system was compared to a control case of a blackboard system that handled multiple rule bases that contained hand-crafted rules This blackboard system the Real-Time Monitor performed only slightly better than the HDS after evaluation of several EPS perfonnance criteria The HDS was tested in the case where it could not store any created rules The pertonnance of the discovery system was much worse if there was no memory of past experience

Currently the system is only able to deal with a certain subset of EPS problems Obviously a more practical system would have to deal with a wider range of parameters Also the system is only presented with situations where only one error exists at a time More robust algorithms must be tried to properly assign credit when multiple errors are present in the environment A system capable of dealing with many parameters would require a much more discriminating credit assignment

Another area which should be explored is to conven the rules learned into the format of the hand-crafted rules so that the system could be added onto the existing RealshyTime MonitOr system The discovery system was operated by itself in the testbed to obtain unbiased experimental dala Yet in a more practical application one would expect the discovery system to work in tandem with an existing knowledge base It could then add rules and improve the existing rules already formed

One of the current problems with unsupervised learning is the problem of credit assignment To this end many different strategies have been tried This area is critical as was found even in the simple domain explored here Once a rule is put into use it may be discovered that the rule is in error When docs one dccide that it is a bad rule thal it no longer contributes DynamiC real-time systems offer a potential of immediate feedback in a repeatable fashion For cases where there are multiple rules acting in sequence a classifier system as given in Holland et aI (5] may be more appropriate

Volume 5 ~umb~r 4 Winter 1992

Acknowledgments

The suppon of the National Aeronautics and Space Administration GSFC code 5223 of the work presented here is gratefully acknowledged although the positions taken are those of the authors alone

References

1 Buchanan BG Sullivan 1 Cheng TP and Clearwater SH 1988 Simulated-Assisted Inductive Learning In Proceedings ofAAAI-88 Morgan Kaufmann Saint Paul MN

2 Carbonell 1 G and Gil Y 1990 Learning By Experimenlation The Operator Reftnement Method In Machine Learning An Artijiciallnzelligence Approach Volume III ed Y KodratOff and R Michalski Morgan Kaufmann San Mateo CA

3 Forbus K D 1984 Qualilative Process Theory Artificialllllelligence 24 85-168

4 Hieb M R 1990 Knowledge Discovery in Complex Engineering Systems MS Thesis George Washington University

5 Holland 1 H Holyoak Kbull Nisbett R and Thagard P 1986 Induction Processes of Inference Learning and Discovery MIT Press Cambridge MA

6 Rajamoney S A and Dejong G F 1988 Active Explanation Reduction An Approach to the ~Iultiple Explanations Problem In Proceedings of 114 51h International Conference on MachiM Letl17ling Ann Arbor MI

7 Silverman B Gbull Hieb M R Yang H Wu Lbull Truszkowski W and Dominy R 1989 nmiddotestigarion of a Simulator-Trained Machine Discovery System for Knowledge Base Management Purposes In Procudings of JCAl-89 Workshop on Knowledge Disccgtfry in Databases Detroit MI

8 Silverman B Gbull Hieb W Rbullbull and MeLlet T ~t 1991 Unsupervised Discovery in an Operatiorll (errol Setting In Knowledge Discovery in Daubullbull~lS ~d_ G Piatetsky-Shapiro and W Frawley Cambrid~ ~ ~llT Press

9 Zytkow 1 M 1989 Overcoming FA-~-==Ts Experimentation Habit Discovery Systel lUes a Darabase Assignment Proceedings ofJC4J-~9 ~i cp on Knowledge Discovery in Databases De- )c

51

Input Abnormal paramatars

World stata Status (OldlNaw)

Planner Recommends

parameters for Planner2 to operate on

t Problem Translator

Translates parameters

Into the DisCovery System Data Structures

-

IControl-Action Proposer I Control Action

Planner2 Recomands

Problem Solving Strategies

Using the meta functions

t Meta Functions

Operates on Parameters

to Create Meta-Rules to Effect the Control

Actions

Output

Figure 3 Discovery Subsystem Architecture

Discovery Objects Name

Parameters Convergence-strategy Control-action Related-parameters

Parameter-Subclass Parameter-Subclass Symbolic Numeric Symbol-set Upper-bound Convergence-strategy Lower-bound

Convergence-strategy

Ammeter2 Ammeter3Switch 38 Switch 3b

Figure 4 Discovery Subsystem Object Hierarchy

in the next step A schematic depiction of the parameter suggest heuristics (ie meta-rules in the meta-rule hierarchy utilized is shown in Figure 4 base) for proposing new rules based upon the sub-goals

given by the Parameter Selector agent Basically it will The second agent the Strategy Selector (or Planner2) will find useful meta-rules and fire them to hypothesize new

The Journal of Knowl~ge Engineering 76

parameter value settings for the current sub-goal by proposing new control rules to either reset or adjust the settings that the previous control rule implemented

Sometimes the meta-rules may not be sufficient 10 solve the current problem and new meta-rules must be created There are three planning strategies used by the Strategy Selector 10 accomplish this same-line-of-investigation similar -line-of -investigation and new-line-of-investigation Thus the system cannot only create new problem-solving rules but it can also create and modify its own control knowledge The same-line-of-investigation strategy creates new meta-rules by changing the settings of some of the parameters in the rules New meta-rules will be put back in the meta-rule base and will suggest new ways of proposing new problem-solving rules If this planning strategy fails the similar-line-of-investigation will be employed which uses generalization and specialization 10 modify existing meta-rules If these two strategies fail then the planner resorts to its last method that of newmiddot line-of-investigation whicH is simply a random selection

The last agent is the Control-Action Proposer which takes the current meta-rule and creates a new rule which effects a control parameter The Control-Action Proposer then sends this rule to the Performer System to be tested

Experiments

Three cases were investigated with the HDS

1) HDS operation with perfect sensor information

2) HDS operation with imperfect sensor information

3) lIDS operation using imperfect sensor information without storing the generated rules

In all cases the EPS was run in a random error mode where any of the six switches could fail The conmiddot sequences of the failure ranged from mild (a failure in switch 1 at night had no effect) to severe (a failure in switch 4 always cut off power to the bus completely) The repair action consisted of resetting the switch to its proper position Only one error was generated at a time which meant that the present error had to be fixed prior to another error being generated

The flJst case consisted of nmning the HDS with perfect sensor information The HDS sampled aU the parameters and was able to ascertain switch position information so that it could directly deduce which switch needed to be reset The actual goal-list used is given in Figure 5

Complete Domain Theory

Day Night

Ammeter 1 Ammetcr2 Ammeter3 Ammeter4 Bus-Power Switchl Switch2 Switch3A Switch38 Switch4 Battery

Switch

155 155 31 31 10lD 2 2 2 1 2 2

105 105 22 22

300 2 2 2 1 2 2

Ammeterl Ammeter Ammeter3 Ammeter4 Bus-Power Switchl Switch Switch3A SwitCh3B Switch4 Bauery

Switch

0 0 0 3l 1010 1 I 1 2 2 2

0 Q 0 9 300

I I 1 2 2 2

Incomplete Domain Theory

Day Sighl

Ammeter1 155 105 Ammeter I 0 Q Ammeter2 Ammeter3

15S 31

105 Ammelerl Ammeter3

0 Q

0 Q

Ammeter4 31 22 Ammeter4 I 9 Bus-Power 1010 300 Bus-Power 1010 300

Figure S HUBBLE Discovery System Goal Lists

Volume 5 Number 4 Winter 1992 77

In the second case the HDS only had imperfcct sensor information with ammeters and bus power as the sensed par~lIntcrs Since the errors were occurring in the switches the system had to infer thcse errors from the scnscJ parameters The goal-list used is also given in Figure 5

The third case is just like the second with one exception The critic lacks memory It must rediscover good rules on each run This was to verify that the rules created by the HDS produced an improvement in EPS operation over a random selection of rules

Two baseline runs were controls on the experiment First the simulator was run without any errors to determine the maximwn accumulated power Second a blackboard system (the Real-Time Monitor Blackboard System) utilizing 11 different rule-bases was run to compare the performance of hand-crafted rules against the performance of the HOSs discovered rules These rules were obtained from a systematic failure analysis and then optimized

Cumulative power production over time is the primary mlt1sure of the controlling systems performance Any delay in fixing errors due to incorrcct inconsislCnt or incomplete rules lead to a drop in powcr production Another measure is the number of power outages the system experiences A power outage is a period when the power is insufficient to supply the bus requirements As the system learns more and bener rules it experiences fewer outages It repairs errors using the rules that it learns

Two SUN workstations were utilized in aU experiments

The EPS simulator rJn on onc workstation and sent telemetry packets to the HOS operating on anottler SUN workSlation The HDS then sent command-loads (lists of commands) to operate the simulator This ensured thal the controlling system was operating independently of the simulation

Rule Discovery with the HUBBLE Discoverv System

- Initially the simulator was run with no errors introduced to determine the maximum power production (run Rl) This produced a set of performance data giving the accumulated power production over time (see the first row of Table 1 and Figure 6)

The second run (run R2) was made with the Real-Time Monitor Blackboard System to determine the performance of the hand-crafted rules Random switch errors were sel As the second row of Table 1 shows the manually constructed rules of the Real-Time Monitor Blackboard System control the power level very well allowing only 2 power outages the fewest of any of the runs attempted This run is graphically portrayed in Figure 6

The nrst run with the HOS (HOS 1) third row of Table 1 used perfect sensor information described earlier as case 1 There were 11 power outages during the eight orbits with 17 rules generated as shown in the third row of Table 1 In the same period it retracted only I rule that it had learned from the rule-base The rules generated were of higher quality and were more specific than the rule base generated with imperfect sensor information The rule-base generated is shown in Appendix 1

MEASURES

RUNS Number of Rules Number of RulesPower Produced Number of Power Created RetractedOutages

No Errors Generated (Rl) na

RealmiddotTime Monitor (R2)

8058 na na

na

HDS with perfect sensor

na6810 2

17 16281 11 information (HDS 1)

HDS with imperfect 6546 7 78 sensor information (HDS2)

HOS with no rule storage 4064 nana39 (HDS3)

Table 1 Comparison of HDS Testbed Cases

The Journal of Knowledge Engineering 78

IltXXX

CI) -r I 80000

c bull d

~ 600)-

S = v -I 40000 0 r c r QI

~ 20000 c

0

The performance of the HDS using imperfect sensor information (run HDS2) for the second case was not quite as good as the HDS operating with imperfect sensor information However the power produced was only 5 less than the case with the Real-Time Monitor There were 7 power OUlages an average number compared to the other runs The system generated 8 rules and in the same period retracted 7 rules from the rule-base The rule-base generated is shown in Appendix 1

The HDS for case 3 did not operate weU when unable to store any of the rules it created It only generated approximately half the power possible in run HDS3 It is obvious from the last row of Table 1 that there were many times when the system was not able to quickly determine the correct repair During this session 39 power outages were experienced the highest number of any of the experiments

A comparison of the 5 runs of the Testbed is presented in Table 1 and Figure 7middot The immediate observation is that the run with perfect sensor information was not as successful as the run with imperfect sensor information One of the reasons that this discrepancy occurred is due to the experimental methodology employed Because only one error was genemted at a time a system with better performance would encounter more errors as it would expect another error to be generated after it successfully fixed the current error If a controlling system could not

-pound1shy No Errors Generaled shy _shyshy Real-Time Blackboard Syslem

~ -shy

~ a rfI1

If

tshy

L77

shy1 o 20 40 iii)

Time (10 hours =I orhit)

Figure 6 Baseline Performance Graph

fix an error for a lengthy period of time then it would be exposed to fewer errors over the 8 orbit time period

This proved to be the case as the run with imperfcct sensor information only encountered 11 errors while the run with perfect sensor information repaired 21 errors Over a longer period of time (more than 8 orbits) the HDS with perfect sensor information should perform beuer due to a more complete and better quality rule base In some cases the error would change completely when a mode transition occurred In one such case a switch that w~ in the wrong position during the day mode was in the right position when the mode changed to night Yet the switch in question prevented another switch from changing position The system fued the new error by resetting both switches In this case when the mode transition occurred the error changed as a result Because the system was able to adapt it tracked the error across the mode transition and repaired it

The system was also tested with an incorrect domain theory where some parameter information in the goal database was incorrect (errors were not sensed properly) In this case some spurious errors were sensed This impaired but it did not cripple the learning capability Bad rules were created and placed into the rule-base but they were retracted when they did not work Thus the system managed to function acceptably although at a lower level of performance

Volume 5 Number 4 Winter 1992 79

tt~1 r-----------------------------------

---J--l--shy --__-_ -0- No Errors GCllcrlhJ I

MUlll - -0- IIl)S with CUlllpiclC Thcory -----1lZ[- III I I )

liDS witlllmolllplclc Them) I -shy BLJS with IlU Lcamillg 1----f-----1-----=ooo---- ~~----------J ~ l~

60Jll

-sect-~ I ~ 4(O---~r---_+---~-~~~~-_4~~-~--__4 I c loo lt- o c 2~t----r_-~~~~lt~~~~j_----_i------r---i----_l-1

01F---~----_+-----~-----+_----~---~---+_--~-J Il 20 IlII

Time (10 hours =1 urbit)

Figure 7 HOS Perfonnance Graph

In the longest experiment recorded one run continued for 34 passes or orbits (equivalent to approximately 50 hours of simulated time and 5 hours of wall clock time) The run was conducted with the HDS operating with incomplete knowledge During this time the HDS created 56 rules correctly keeping the good rules and retracting the ones which did not work

Related Work

To enable rule discovery by penurbing a realmiddottime process we have integrated our learning system with a simulator Buchanan et a1 [1] utilized a simulator to learn rules in the domain of high energy physics by induction over a large number of training examples In this case the simulator served as a generator of training examples from which rules were learned for an expen system We use our simulator for the same purpose but with a different learning mechanism Because of the real-time nature of our system we integrated our learning system into the perfonnance element This enabled us to take advantage of experimentation techniques

Experimentation in machine learning has been used by many systems FAHRENHEIT [9] uses a discovery program to design its own experiments to make the system closer to real world applications Zytkow applied these methods to the area of databases Rajamoney and DeJong [6] have described an elegant approach to experimentation

that is used to learn theories of physical devices This work focuses on using experiment design coupled with explanation-based learning to revise incomplete inuactable or incorrect domain theories Experimentation is perfonned to deny or conftrm predictions made by wellshyfonned hypotheses and if possible reduce the number of multiple explanations While FAHRENHEIT uses a quantitative mode) of the world Rajamoney and DeJong employ qualitative techniques We have begun to explore both paradigms in the HDS testbed

Another system utilizing experimentation is PRODIGY [2] PRODIGY experiments to improve the domain theory of a planning system As with the work of Rajamoney and DeJong PRODIGY experiments to discriminate between multiple explanations As with HOS experimentation is demand-driven and uses both domain constraints and any external feedback received HOS is more domain specific and is tailored to an engineering application

Conclusion

A methodology for discovering control knowledge in complex engineering systems has been presented The addition of a discovery system to an expert system in a dynamic environment such as process control should offer an improvement in performance and enable an expen system to learn new knowledge and improve its existing

The Journal of Knowledge Engineering 80

knowledge base Adding the ability to learn in CWTent knowledge-based systems reduces certain brittleness concerns The HDS is a demonstration of how such a system could be implemented

The HDS was tested in a domain characterized by realshytime operation and a dynamic environment The system was able to discover adequate EPS control rules while starting with no rules in its knowledge base The discovery process worked well when given either perfect sensor information or when given less perfect sensor information The performance of the combined learningproblem solving system was compared to a control case of a blackboard system that handled multiple rule bases that contained hand-crafted rules This blackboard system the Real-Time Monitor performed only slightly better than the HDS after evaluation of several EPS perfonnance criteria The HDS was tested in the case where it could not store any created rules The pertonnance of the discovery system was much worse if there was no memory of past experience

Currently the system is only able to deal with a certain subset of EPS problems Obviously a more practical system would have to deal with a wider range of parameters Also the system is only presented with situations where only one error exists at a time More robust algorithms must be tried to properly assign credit when multiple errors are present in the environment A system capable of dealing with many parameters would require a much more discriminating credit assignment

Another area which should be explored is to conven the rules learned into the format of the hand-crafted rules so that the system could be added onto the existing RealshyTime MonitOr system The discovery system was operated by itself in the testbed to obtain unbiased experimental dala Yet in a more practical application one would expect the discovery system to work in tandem with an existing knowledge base It could then add rules and improve the existing rules already formed

One of the current problems with unsupervised learning is the problem of credit assignment To this end many different strategies have been tried This area is critical as was found even in the simple domain explored here Once a rule is put into use it may be discovered that the rule is in error When docs one dccide that it is a bad rule thal it no longer contributes DynamiC real-time systems offer a potential of immediate feedback in a repeatable fashion For cases where there are multiple rules acting in sequence a classifier system as given in Holland et aI (5] may be more appropriate

Volume 5 ~umb~r 4 Winter 1992

Acknowledgments

The suppon of the National Aeronautics and Space Administration GSFC code 5223 of the work presented here is gratefully acknowledged although the positions taken are those of the authors alone

References

1 Buchanan BG Sullivan 1 Cheng TP and Clearwater SH 1988 Simulated-Assisted Inductive Learning In Proceedings ofAAAI-88 Morgan Kaufmann Saint Paul MN

2 Carbonell 1 G and Gil Y 1990 Learning By Experimenlation The Operator Reftnement Method In Machine Learning An Artijiciallnzelligence Approach Volume III ed Y KodratOff and R Michalski Morgan Kaufmann San Mateo CA

3 Forbus K D 1984 Qualilative Process Theory Artificialllllelligence 24 85-168

4 Hieb M R 1990 Knowledge Discovery in Complex Engineering Systems MS Thesis George Washington University

5 Holland 1 H Holyoak Kbull Nisbett R and Thagard P 1986 Induction Processes of Inference Learning and Discovery MIT Press Cambridge MA

6 Rajamoney S A and Dejong G F 1988 Active Explanation Reduction An Approach to the ~Iultiple Explanations Problem In Proceedings of 114 51h International Conference on MachiM Letl17ling Ann Arbor MI

7 Silverman B Gbull Hieb M R Yang H Wu Lbull Truszkowski W and Dominy R 1989 nmiddotestigarion of a Simulator-Trained Machine Discovery System for Knowledge Base Management Purposes In Procudings of JCAl-89 Workshop on Knowledge Disccgtfry in Databases Detroit MI

8 Silverman B Gbull Hieb W Rbullbull and MeLlet T ~t 1991 Unsupervised Discovery in an Operatiorll (errol Setting In Knowledge Discovery in Daubullbull~lS ~d_ G Piatetsky-Shapiro and W Frawley Cambrid~ ~ ~llT Press

9 Zytkow 1 M 1989 Overcoming FA-~-==Ts Experimentation Habit Discovery Systel lUes a Darabase Assignment Proceedings ofJC4J-~9 ~i cp on Knowledge Discovery in Databases De- )c

51

parameter value settings for the current sub-goal by proposing new control rules to either reset or adjust the settings that the previous control rule implemented

Sometimes the meta-rules may not be sufficient 10 solve the current problem and new meta-rules must be created There are three planning strategies used by the Strategy Selector 10 accomplish this same-line-of-investigation similar -line-of -investigation and new-line-of-investigation Thus the system cannot only create new problem-solving rules but it can also create and modify its own control knowledge The same-line-of-investigation strategy creates new meta-rules by changing the settings of some of the parameters in the rules New meta-rules will be put back in the meta-rule base and will suggest new ways of proposing new problem-solving rules If this planning strategy fails the similar-line-of-investigation will be employed which uses generalization and specialization 10 modify existing meta-rules If these two strategies fail then the planner resorts to its last method that of newmiddot line-of-investigation whicH is simply a random selection

The last agent is the Control-Action Proposer which takes the current meta-rule and creates a new rule which effects a control parameter The Control-Action Proposer then sends this rule to the Performer System to be tested

Experiments

Three cases were investigated with the HDS

1) HDS operation with perfect sensor information

2) HDS operation with imperfect sensor information

3) lIDS operation using imperfect sensor information without storing the generated rules

In all cases the EPS was run in a random error mode where any of the six switches could fail The conmiddot sequences of the failure ranged from mild (a failure in switch 1 at night had no effect) to severe (a failure in switch 4 always cut off power to the bus completely) The repair action consisted of resetting the switch to its proper position Only one error was generated at a time which meant that the present error had to be fixed prior to another error being generated

The flJst case consisted of nmning the HDS with perfect sensor information The HDS sampled aU the parameters and was able to ascertain switch position information so that it could directly deduce which switch needed to be reset The actual goal-list used is given in Figure 5

Complete Domain Theory

Day Night

Ammeter 1 Ammetcr2 Ammeter3 Ammeter4 Bus-Power Switchl Switch2 Switch3A Switch38 Switch4 Battery

Switch

155 155 31 31 10lD 2 2 2 1 2 2

105 105 22 22

300 2 2 2 1 2 2

Ammeterl Ammeter Ammeter3 Ammeter4 Bus-Power Switchl Switch Switch3A SwitCh3B Switch4 Bauery

Switch

0 0 0 3l 1010 1 I 1 2 2 2

0 Q 0 9 300

I I 1 2 2 2

Incomplete Domain Theory

Day Sighl

Ammeter1 155 105 Ammeter I 0 Q Ammeter2 Ammeter3

15S 31

105 Ammelerl Ammeter3

0 Q

0 Q

Ammeter4 31 22 Ammeter4 I 9 Bus-Power 1010 300 Bus-Power 1010 300

Figure S HUBBLE Discovery System Goal Lists

Volume 5 Number 4 Winter 1992 77

In the second case the HDS only had imperfcct sensor information with ammeters and bus power as the sensed par~lIntcrs Since the errors were occurring in the switches the system had to infer thcse errors from the scnscJ parameters The goal-list used is also given in Figure 5

The third case is just like the second with one exception The critic lacks memory It must rediscover good rules on each run This was to verify that the rules created by the HDS produced an improvement in EPS operation over a random selection of rules

Two baseline runs were controls on the experiment First the simulator was run without any errors to determine the maximwn accumulated power Second a blackboard system (the Real-Time Monitor Blackboard System) utilizing 11 different rule-bases was run to compare the performance of hand-crafted rules against the performance of the HOSs discovered rules These rules were obtained from a systematic failure analysis and then optimized

Cumulative power production over time is the primary mlt1sure of the controlling systems performance Any delay in fixing errors due to incorrcct inconsislCnt or incomplete rules lead to a drop in powcr production Another measure is the number of power outages the system experiences A power outage is a period when the power is insufficient to supply the bus requirements As the system learns more and bener rules it experiences fewer outages It repairs errors using the rules that it learns

Two SUN workstations were utilized in aU experiments

The EPS simulator rJn on onc workstation and sent telemetry packets to the HOS operating on anottler SUN workSlation The HDS then sent command-loads (lists of commands) to operate the simulator This ensured thal the controlling system was operating independently of the simulation

Rule Discovery with the HUBBLE Discoverv System

- Initially the simulator was run with no errors introduced to determine the maximum power production (run Rl) This produced a set of performance data giving the accumulated power production over time (see the first row of Table 1 and Figure 6)

The second run (run R2) was made with the Real-Time Monitor Blackboard System to determine the performance of the hand-crafted rules Random switch errors were sel As the second row of Table 1 shows the manually constructed rules of the Real-Time Monitor Blackboard System control the power level very well allowing only 2 power outages the fewest of any of the runs attempted This run is graphically portrayed in Figure 6

The nrst run with the HOS (HOS 1) third row of Table 1 used perfect sensor information described earlier as case 1 There were 11 power outages during the eight orbits with 17 rules generated as shown in the third row of Table 1 In the same period it retracted only I rule that it had learned from the rule-base The rules generated were of higher quality and were more specific than the rule base generated with imperfect sensor information The rule-base generated is shown in Appendix 1

MEASURES

RUNS Number of Rules Number of RulesPower Produced Number of Power Created RetractedOutages

No Errors Generated (Rl) na

RealmiddotTime Monitor (R2)

8058 na na

na

HDS with perfect sensor

na6810 2

17 16281 11 information (HDS 1)

HDS with imperfect 6546 7 78 sensor information (HDS2)

HOS with no rule storage 4064 nana39 (HDS3)

Table 1 Comparison of HDS Testbed Cases

The Journal of Knowledge Engineering 78

IltXXX

CI) -r I 80000

c bull d

~ 600)-

S = v -I 40000 0 r c r QI

~ 20000 c

0

The performance of the HDS using imperfect sensor information (run HDS2) for the second case was not quite as good as the HDS operating with imperfect sensor information However the power produced was only 5 less than the case with the Real-Time Monitor There were 7 power OUlages an average number compared to the other runs The system generated 8 rules and in the same period retracted 7 rules from the rule-base The rule-base generated is shown in Appendix 1

The HDS for case 3 did not operate weU when unable to store any of the rules it created It only generated approximately half the power possible in run HDS3 It is obvious from the last row of Table 1 that there were many times when the system was not able to quickly determine the correct repair During this session 39 power outages were experienced the highest number of any of the experiments

A comparison of the 5 runs of the Testbed is presented in Table 1 and Figure 7middot The immediate observation is that the run with perfect sensor information was not as successful as the run with imperfect sensor information One of the reasons that this discrepancy occurred is due to the experimental methodology employed Because only one error was genemted at a time a system with better performance would encounter more errors as it would expect another error to be generated after it successfully fixed the current error If a controlling system could not

-pound1shy No Errors Generaled shy _shyshy Real-Time Blackboard Syslem

~ -shy

~ a rfI1

If

tshy

L77

shy1 o 20 40 iii)

Time (10 hours =I orhit)

Figure 6 Baseline Performance Graph

fix an error for a lengthy period of time then it would be exposed to fewer errors over the 8 orbit time period

This proved to be the case as the run with imperfcct sensor information only encountered 11 errors while the run with perfect sensor information repaired 21 errors Over a longer period of time (more than 8 orbits) the HDS with perfect sensor information should perform beuer due to a more complete and better quality rule base In some cases the error would change completely when a mode transition occurred In one such case a switch that w~ in the wrong position during the day mode was in the right position when the mode changed to night Yet the switch in question prevented another switch from changing position The system fued the new error by resetting both switches In this case when the mode transition occurred the error changed as a result Because the system was able to adapt it tracked the error across the mode transition and repaired it

The system was also tested with an incorrect domain theory where some parameter information in the goal database was incorrect (errors were not sensed properly) In this case some spurious errors were sensed This impaired but it did not cripple the learning capability Bad rules were created and placed into the rule-base but they were retracted when they did not work Thus the system managed to function acceptably although at a lower level of performance

Volume 5 Number 4 Winter 1992 79

tt~1 r-----------------------------------

---J--l--shy --__-_ -0- No Errors GCllcrlhJ I

MUlll - -0- IIl)S with CUlllpiclC Thcory -----1lZ[- III I I )

liDS witlllmolllplclc Them) I -shy BLJS with IlU Lcamillg 1----f-----1-----=ooo---- ~~----------J ~ l~

60Jll

-sect-~ I ~ 4(O---~r---_+---~-~~~~-_4~~-~--__4 I c loo lt- o c 2~t----r_-~~~~lt~~~~j_----_i------r---i----_l-1

01F---~----_+-----~-----+_----~---~---+_--~-J Il 20 IlII

Time (10 hours =1 urbit)

Figure 7 HOS Perfonnance Graph

In the longest experiment recorded one run continued for 34 passes or orbits (equivalent to approximately 50 hours of simulated time and 5 hours of wall clock time) The run was conducted with the HDS operating with incomplete knowledge During this time the HDS created 56 rules correctly keeping the good rules and retracting the ones which did not work

Related Work

To enable rule discovery by penurbing a realmiddottime process we have integrated our learning system with a simulator Buchanan et a1 [1] utilized a simulator to learn rules in the domain of high energy physics by induction over a large number of training examples In this case the simulator served as a generator of training examples from which rules were learned for an expen system We use our simulator for the same purpose but with a different learning mechanism Because of the real-time nature of our system we integrated our learning system into the perfonnance element This enabled us to take advantage of experimentation techniques

Experimentation in machine learning has been used by many systems FAHRENHEIT [9] uses a discovery program to design its own experiments to make the system closer to real world applications Zytkow applied these methods to the area of databases Rajamoney and DeJong [6] have described an elegant approach to experimentation

that is used to learn theories of physical devices This work focuses on using experiment design coupled with explanation-based learning to revise incomplete inuactable or incorrect domain theories Experimentation is perfonned to deny or conftrm predictions made by wellshyfonned hypotheses and if possible reduce the number of multiple explanations While FAHRENHEIT uses a quantitative mode) of the world Rajamoney and DeJong employ qualitative techniques We have begun to explore both paradigms in the HDS testbed

Another system utilizing experimentation is PRODIGY [2] PRODIGY experiments to improve the domain theory of a planning system As with the work of Rajamoney and DeJong PRODIGY experiments to discriminate between multiple explanations As with HOS experimentation is demand-driven and uses both domain constraints and any external feedback received HOS is more domain specific and is tailored to an engineering application

Conclusion

A methodology for discovering control knowledge in complex engineering systems has been presented The addition of a discovery system to an expert system in a dynamic environment such as process control should offer an improvement in performance and enable an expen system to learn new knowledge and improve its existing

The Journal of Knowledge Engineering 80

knowledge base Adding the ability to learn in CWTent knowledge-based systems reduces certain brittleness concerns The HDS is a demonstration of how such a system could be implemented

The HDS was tested in a domain characterized by realshytime operation and a dynamic environment The system was able to discover adequate EPS control rules while starting with no rules in its knowledge base The discovery process worked well when given either perfect sensor information or when given less perfect sensor information The performance of the combined learningproblem solving system was compared to a control case of a blackboard system that handled multiple rule bases that contained hand-crafted rules This blackboard system the Real-Time Monitor performed only slightly better than the HDS after evaluation of several EPS perfonnance criteria The HDS was tested in the case where it could not store any created rules The pertonnance of the discovery system was much worse if there was no memory of past experience

Currently the system is only able to deal with a certain subset of EPS problems Obviously a more practical system would have to deal with a wider range of parameters Also the system is only presented with situations where only one error exists at a time More robust algorithms must be tried to properly assign credit when multiple errors are present in the environment A system capable of dealing with many parameters would require a much more discriminating credit assignment

Another area which should be explored is to conven the rules learned into the format of the hand-crafted rules so that the system could be added onto the existing RealshyTime MonitOr system The discovery system was operated by itself in the testbed to obtain unbiased experimental dala Yet in a more practical application one would expect the discovery system to work in tandem with an existing knowledge base It could then add rules and improve the existing rules already formed

One of the current problems with unsupervised learning is the problem of credit assignment To this end many different strategies have been tried This area is critical as was found even in the simple domain explored here Once a rule is put into use it may be discovered that the rule is in error When docs one dccide that it is a bad rule thal it no longer contributes DynamiC real-time systems offer a potential of immediate feedback in a repeatable fashion For cases where there are multiple rules acting in sequence a classifier system as given in Holland et aI (5] may be more appropriate

Volume 5 ~umb~r 4 Winter 1992

Acknowledgments

The suppon of the National Aeronautics and Space Administration GSFC code 5223 of the work presented here is gratefully acknowledged although the positions taken are those of the authors alone

References

1 Buchanan BG Sullivan 1 Cheng TP and Clearwater SH 1988 Simulated-Assisted Inductive Learning In Proceedings ofAAAI-88 Morgan Kaufmann Saint Paul MN

2 Carbonell 1 G and Gil Y 1990 Learning By Experimenlation The Operator Reftnement Method In Machine Learning An Artijiciallnzelligence Approach Volume III ed Y KodratOff and R Michalski Morgan Kaufmann San Mateo CA

3 Forbus K D 1984 Qualilative Process Theory Artificialllllelligence 24 85-168

4 Hieb M R 1990 Knowledge Discovery in Complex Engineering Systems MS Thesis George Washington University

5 Holland 1 H Holyoak Kbull Nisbett R and Thagard P 1986 Induction Processes of Inference Learning and Discovery MIT Press Cambridge MA

6 Rajamoney S A and Dejong G F 1988 Active Explanation Reduction An Approach to the ~Iultiple Explanations Problem In Proceedings of 114 51h International Conference on MachiM Letl17ling Ann Arbor MI

7 Silverman B Gbull Hieb M R Yang H Wu Lbull Truszkowski W and Dominy R 1989 nmiddotestigarion of a Simulator-Trained Machine Discovery System for Knowledge Base Management Purposes In Procudings of JCAl-89 Workshop on Knowledge Disccgtfry in Databases Detroit MI

8 Silverman B Gbull Hieb W Rbullbull and MeLlet T ~t 1991 Unsupervised Discovery in an Operatiorll (errol Setting In Knowledge Discovery in Daubullbull~lS ~d_ G Piatetsky-Shapiro and W Frawley Cambrid~ ~ ~llT Press

9 Zytkow 1 M 1989 Overcoming FA-~-==Ts Experimentation Habit Discovery Systel lUes a Darabase Assignment Proceedings ofJC4J-~9 ~i cp on Knowledge Discovery in Databases De- )c

51

In the second case the HDS only had imperfcct sensor information with ammeters and bus power as the sensed par~lIntcrs Since the errors were occurring in the switches the system had to infer thcse errors from the scnscJ parameters The goal-list used is also given in Figure 5

The third case is just like the second with one exception The critic lacks memory It must rediscover good rules on each run This was to verify that the rules created by the HDS produced an improvement in EPS operation over a random selection of rules

Two baseline runs were controls on the experiment First the simulator was run without any errors to determine the maximwn accumulated power Second a blackboard system (the Real-Time Monitor Blackboard System) utilizing 11 different rule-bases was run to compare the performance of hand-crafted rules against the performance of the HOSs discovered rules These rules were obtained from a systematic failure analysis and then optimized

Cumulative power production over time is the primary mlt1sure of the controlling systems performance Any delay in fixing errors due to incorrcct inconsislCnt or incomplete rules lead to a drop in powcr production Another measure is the number of power outages the system experiences A power outage is a period when the power is insufficient to supply the bus requirements As the system learns more and bener rules it experiences fewer outages It repairs errors using the rules that it learns

Two SUN workstations were utilized in aU experiments

The EPS simulator rJn on onc workstation and sent telemetry packets to the HOS operating on anottler SUN workSlation The HDS then sent command-loads (lists of commands) to operate the simulator This ensured thal the controlling system was operating independently of the simulation

Rule Discovery with the HUBBLE Discoverv System

- Initially the simulator was run with no errors introduced to determine the maximum power production (run Rl) This produced a set of performance data giving the accumulated power production over time (see the first row of Table 1 and Figure 6)

The second run (run R2) was made with the Real-Time Monitor Blackboard System to determine the performance of the hand-crafted rules Random switch errors were sel As the second row of Table 1 shows the manually constructed rules of the Real-Time Monitor Blackboard System control the power level very well allowing only 2 power outages the fewest of any of the runs attempted This run is graphically portrayed in Figure 6

The nrst run with the HOS (HOS 1) third row of Table 1 used perfect sensor information described earlier as case 1 There were 11 power outages during the eight orbits with 17 rules generated as shown in the third row of Table 1 In the same period it retracted only I rule that it had learned from the rule-base The rules generated were of higher quality and were more specific than the rule base generated with imperfect sensor information The rule-base generated is shown in Appendix 1

MEASURES

RUNS Number of Rules Number of RulesPower Produced Number of Power Created RetractedOutages

No Errors Generated (Rl) na

RealmiddotTime Monitor (R2)

8058 na na

na

HDS with perfect sensor

na6810 2

17 16281 11 information (HDS 1)

HDS with imperfect 6546 7 78 sensor information (HDS2)

HOS with no rule storage 4064 nana39 (HDS3)

Table 1 Comparison of HDS Testbed Cases

The Journal of Knowledge Engineering 78

IltXXX

CI) -r I 80000

c bull d

~ 600)-

S = v -I 40000 0 r c r QI

~ 20000 c

0

The performance of the HDS using imperfect sensor information (run HDS2) for the second case was not quite as good as the HDS operating with imperfect sensor information However the power produced was only 5 less than the case with the Real-Time Monitor There were 7 power OUlages an average number compared to the other runs The system generated 8 rules and in the same period retracted 7 rules from the rule-base The rule-base generated is shown in Appendix 1

The HDS for case 3 did not operate weU when unable to store any of the rules it created It only generated approximately half the power possible in run HDS3 It is obvious from the last row of Table 1 that there were many times when the system was not able to quickly determine the correct repair During this session 39 power outages were experienced the highest number of any of the experiments

A comparison of the 5 runs of the Testbed is presented in Table 1 and Figure 7middot The immediate observation is that the run with perfect sensor information was not as successful as the run with imperfect sensor information One of the reasons that this discrepancy occurred is due to the experimental methodology employed Because only one error was genemted at a time a system with better performance would encounter more errors as it would expect another error to be generated after it successfully fixed the current error If a controlling system could not

-pound1shy No Errors Generaled shy _shyshy Real-Time Blackboard Syslem

~ -shy

~ a rfI1

If

tshy

L77

shy1 o 20 40 iii)

Time (10 hours =I orhit)

Figure 6 Baseline Performance Graph

fix an error for a lengthy period of time then it would be exposed to fewer errors over the 8 orbit time period

This proved to be the case as the run with imperfcct sensor information only encountered 11 errors while the run with perfect sensor information repaired 21 errors Over a longer period of time (more than 8 orbits) the HDS with perfect sensor information should perform beuer due to a more complete and better quality rule base In some cases the error would change completely when a mode transition occurred In one such case a switch that w~ in the wrong position during the day mode was in the right position when the mode changed to night Yet the switch in question prevented another switch from changing position The system fued the new error by resetting both switches In this case when the mode transition occurred the error changed as a result Because the system was able to adapt it tracked the error across the mode transition and repaired it

The system was also tested with an incorrect domain theory where some parameter information in the goal database was incorrect (errors were not sensed properly) In this case some spurious errors were sensed This impaired but it did not cripple the learning capability Bad rules were created and placed into the rule-base but they were retracted when they did not work Thus the system managed to function acceptably although at a lower level of performance

Volume 5 Number 4 Winter 1992 79

tt~1 r-----------------------------------

---J--l--shy --__-_ -0- No Errors GCllcrlhJ I

MUlll - -0- IIl)S with CUlllpiclC Thcory -----1lZ[- III I I )

liDS witlllmolllplclc Them) I -shy BLJS with IlU Lcamillg 1----f-----1-----=ooo---- ~~----------J ~ l~

60Jll

-sect-~ I ~ 4(O---~r---_+---~-~~~~-_4~~-~--__4 I c loo lt- o c 2~t----r_-~~~~lt~~~~j_----_i------r---i----_l-1

01F---~----_+-----~-----+_----~---~---+_--~-J Il 20 IlII

Time (10 hours =1 urbit)

Figure 7 HOS Perfonnance Graph

In the longest experiment recorded one run continued for 34 passes or orbits (equivalent to approximately 50 hours of simulated time and 5 hours of wall clock time) The run was conducted with the HDS operating with incomplete knowledge During this time the HDS created 56 rules correctly keeping the good rules and retracting the ones which did not work

Related Work

To enable rule discovery by penurbing a realmiddottime process we have integrated our learning system with a simulator Buchanan et a1 [1] utilized a simulator to learn rules in the domain of high energy physics by induction over a large number of training examples In this case the simulator served as a generator of training examples from which rules were learned for an expen system We use our simulator for the same purpose but with a different learning mechanism Because of the real-time nature of our system we integrated our learning system into the perfonnance element This enabled us to take advantage of experimentation techniques

Experimentation in machine learning has been used by many systems FAHRENHEIT [9] uses a discovery program to design its own experiments to make the system closer to real world applications Zytkow applied these methods to the area of databases Rajamoney and DeJong [6] have described an elegant approach to experimentation

that is used to learn theories of physical devices This work focuses on using experiment design coupled with explanation-based learning to revise incomplete inuactable or incorrect domain theories Experimentation is perfonned to deny or conftrm predictions made by wellshyfonned hypotheses and if possible reduce the number of multiple explanations While FAHRENHEIT uses a quantitative mode) of the world Rajamoney and DeJong employ qualitative techniques We have begun to explore both paradigms in the HDS testbed

Another system utilizing experimentation is PRODIGY [2] PRODIGY experiments to improve the domain theory of a planning system As with the work of Rajamoney and DeJong PRODIGY experiments to discriminate between multiple explanations As with HOS experimentation is demand-driven and uses both domain constraints and any external feedback received HOS is more domain specific and is tailored to an engineering application

Conclusion

A methodology for discovering control knowledge in complex engineering systems has been presented The addition of a discovery system to an expert system in a dynamic environment such as process control should offer an improvement in performance and enable an expen system to learn new knowledge and improve its existing

The Journal of Knowledge Engineering 80

knowledge base Adding the ability to learn in CWTent knowledge-based systems reduces certain brittleness concerns The HDS is a demonstration of how such a system could be implemented

The HDS was tested in a domain characterized by realshytime operation and a dynamic environment The system was able to discover adequate EPS control rules while starting with no rules in its knowledge base The discovery process worked well when given either perfect sensor information or when given less perfect sensor information The performance of the combined learningproblem solving system was compared to a control case of a blackboard system that handled multiple rule bases that contained hand-crafted rules This blackboard system the Real-Time Monitor performed only slightly better than the HDS after evaluation of several EPS perfonnance criteria The HDS was tested in the case where it could not store any created rules The pertonnance of the discovery system was much worse if there was no memory of past experience

Currently the system is only able to deal with a certain subset of EPS problems Obviously a more practical system would have to deal with a wider range of parameters Also the system is only presented with situations where only one error exists at a time More robust algorithms must be tried to properly assign credit when multiple errors are present in the environment A system capable of dealing with many parameters would require a much more discriminating credit assignment

Another area which should be explored is to conven the rules learned into the format of the hand-crafted rules so that the system could be added onto the existing RealshyTime MonitOr system The discovery system was operated by itself in the testbed to obtain unbiased experimental dala Yet in a more practical application one would expect the discovery system to work in tandem with an existing knowledge base It could then add rules and improve the existing rules already formed

One of the current problems with unsupervised learning is the problem of credit assignment To this end many different strategies have been tried This area is critical as was found even in the simple domain explored here Once a rule is put into use it may be discovered that the rule is in error When docs one dccide that it is a bad rule thal it no longer contributes DynamiC real-time systems offer a potential of immediate feedback in a repeatable fashion For cases where there are multiple rules acting in sequence a classifier system as given in Holland et aI (5] may be more appropriate

Volume 5 ~umb~r 4 Winter 1992

Acknowledgments

The suppon of the National Aeronautics and Space Administration GSFC code 5223 of the work presented here is gratefully acknowledged although the positions taken are those of the authors alone

References

1 Buchanan BG Sullivan 1 Cheng TP and Clearwater SH 1988 Simulated-Assisted Inductive Learning In Proceedings ofAAAI-88 Morgan Kaufmann Saint Paul MN

2 Carbonell 1 G and Gil Y 1990 Learning By Experimenlation The Operator Reftnement Method In Machine Learning An Artijiciallnzelligence Approach Volume III ed Y KodratOff and R Michalski Morgan Kaufmann San Mateo CA

3 Forbus K D 1984 Qualilative Process Theory Artificialllllelligence 24 85-168

4 Hieb M R 1990 Knowledge Discovery in Complex Engineering Systems MS Thesis George Washington University

5 Holland 1 H Holyoak Kbull Nisbett R and Thagard P 1986 Induction Processes of Inference Learning and Discovery MIT Press Cambridge MA

6 Rajamoney S A and Dejong G F 1988 Active Explanation Reduction An Approach to the ~Iultiple Explanations Problem In Proceedings of 114 51h International Conference on MachiM Letl17ling Ann Arbor MI

7 Silverman B Gbull Hieb M R Yang H Wu Lbull Truszkowski W and Dominy R 1989 nmiddotestigarion of a Simulator-Trained Machine Discovery System for Knowledge Base Management Purposes In Procudings of JCAl-89 Workshop on Knowledge Disccgtfry in Databases Detroit MI

8 Silverman B Gbull Hieb W Rbullbull and MeLlet T ~t 1991 Unsupervised Discovery in an Operatiorll (errol Setting In Knowledge Discovery in Daubullbull~lS ~d_ G Piatetsky-Shapiro and W Frawley Cambrid~ ~ ~llT Press

9 Zytkow 1 M 1989 Overcoming FA-~-==Ts Experimentation Habit Discovery Systel lUes a Darabase Assignment Proceedings ofJC4J-~9 ~i cp on Knowledge Discovery in Databases De- )c

51

IltXXX

CI) -r I 80000

c bull d

~ 600)-

S = v -I 40000 0 r c r QI

~ 20000 c

0

The performance of the HDS using imperfect sensor information (run HDS2) for the second case was not quite as good as the HDS operating with imperfect sensor information However the power produced was only 5 less than the case with the Real-Time Monitor There were 7 power OUlages an average number compared to the other runs The system generated 8 rules and in the same period retracted 7 rules from the rule-base The rule-base generated is shown in Appendix 1

The HDS for case 3 did not operate weU when unable to store any of the rules it created It only generated approximately half the power possible in run HDS3 It is obvious from the last row of Table 1 that there were many times when the system was not able to quickly determine the correct repair During this session 39 power outages were experienced the highest number of any of the experiments

A comparison of the 5 runs of the Testbed is presented in Table 1 and Figure 7middot The immediate observation is that the run with perfect sensor information was not as successful as the run with imperfect sensor information One of the reasons that this discrepancy occurred is due to the experimental methodology employed Because only one error was genemted at a time a system with better performance would encounter more errors as it would expect another error to be generated after it successfully fixed the current error If a controlling system could not

-pound1shy No Errors Generaled shy _shyshy Real-Time Blackboard Syslem

~ -shy

~ a rfI1

If

tshy

L77

shy1 o 20 40 iii)

Time (10 hours =I orhit)

Figure 6 Baseline Performance Graph

fix an error for a lengthy period of time then it would be exposed to fewer errors over the 8 orbit time period

This proved to be the case as the run with imperfcct sensor information only encountered 11 errors while the run with perfect sensor information repaired 21 errors Over a longer period of time (more than 8 orbits) the HDS with perfect sensor information should perform beuer due to a more complete and better quality rule base In some cases the error would change completely when a mode transition occurred In one such case a switch that w~ in the wrong position during the day mode was in the right position when the mode changed to night Yet the switch in question prevented another switch from changing position The system fued the new error by resetting both switches In this case when the mode transition occurred the error changed as a result Because the system was able to adapt it tracked the error across the mode transition and repaired it

The system was also tested with an incorrect domain theory where some parameter information in the goal database was incorrect (errors were not sensed properly) In this case some spurious errors were sensed This impaired but it did not cripple the learning capability Bad rules were created and placed into the rule-base but they were retracted when they did not work Thus the system managed to function acceptably although at a lower level of performance

Volume 5 Number 4 Winter 1992 79

tt~1 r-----------------------------------

---J--l--shy --__-_ -0- No Errors GCllcrlhJ I

MUlll - -0- IIl)S with CUlllpiclC Thcory -----1lZ[- III I I )

liDS witlllmolllplclc Them) I -shy BLJS with IlU Lcamillg 1----f-----1-----=ooo---- ~~----------J ~ l~

60Jll

-sect-~ I ~ 4(O---~r---_+---~-~~~~-_4~~-~--__4 I c loo lt- o c 2~t----r_-~~~~lt~~~~j_----_i------r---i----_l-1

01F---~----_+-----~-----+_----~---~---+_--~-J Il 20 IlII

Time (10 hours =1 urbit)

Figure 7 HOS Perfonnance Graph

In the longest experiment recorded one run continued for 34 passes or orbits (equivalent to approximately 50 hours of simulated time and 5 hours of wall clock time) The run was conducted with the HDS operating with incomplete knowledge During this time the HDS created 56 rules correctly keeping the good rules and retracting the ones which did not work

Related Work

To enable rule discovery by penurbing a realmiddottime process we have integrated our learning system with a simulator Buchanan et a1 [1] utilized a simulator to learn rules in the domain of high energy physics by induction over a large number of training examples In this case the simulator served as a generator of training examples from which rules were learned for an expen system We use our simulator for the same purpose but with a different learning mechanism Because of the real-time nature of our system we integrated our learning system into the perfonnance element This enabled us to take advantage of experimentation techniques

Experimentation in machine learning has been used by many systems FAHRENHEIT [9] uses a discovery program to design its own experiments to make the system closer to real world applications Zytkow applied these methods to the area of databases Rajamoney and DeJong [6] have described an elegant approach to experimentation

that is used to learn theories of physical devices This work focuses on using experiment design coupled with explanation-based learning to revise incomplete inuactable or incorrect domain theories Experimentation is perfonned to deny or conftrm predictions made by wellshyfonned hypotheses and if possible reduce the number of multiple explanations While FAHRENHEIT uses a quantitative mode) of the world Rajamoney and DeJong employ qualitative techniques We have begun to explore both paradigms in the HDS testbed

Another system utilizing experimentation is PRODIGY [2] PRODIGY experiments to improve the domain theory of a planning system As with the work of Rajamoney and DeJong PRODIGY experiments to discriminate between multiple explanations As with HOS experimentation is demand-driven and uses both domain constraints and any external feedback received HOS is more domain specific and is tailored to an engineering application

Conclusion

A methodology for discovering control knowledge in complex engineering systems has been presented The addition of a discovery system to an expert system in a dynamic environment such as process control should offer an improvement in performance and enable an expen system to learn new knowledge and improve its existing

The Journal of Knowledge Engineering 80

knowledge base Adding the ability to learn in CWTent knowledge-based systems reduces certain brittleness concerns The HDS is a demonstration of how such a system could be implemented

The HDS was tested in a domain characterized by realshytime operation and a dynamic environment The system was able to discover adequate EPS control rules while starting with no rules in its knowledge base The discovery process worked well when given either perfect sensor information or when given less perfect sensor information The performance of the combined learningproblem solving system was compared to a control case of a blackboard system that handled multiple rule bases that contained hand-crafted rules This blackboard system the Real-Time Monitor performed only slightly better than the HDS after evaluation of several EPS perfonnance criteria The HDS was tested in the case where it could not store any created rules The pertonnance of the discovery system was much worse if there was no memory of past experience

Currently the system is only able to deal with a certain subset of EPS problems Obviously a more practical system would have to deal with a wider range of parameters Also the system is only presented with situations where only one error exists at a time More robust algorithms must be tried to properly assign credit when multiple errors are present in the environment A system capable of dealing with many parameters would require a much more discriminating credit assignment

Another area which should be explored is to conven the rules learned into the format of the hand-crafted rules so that the system could be added onto the existing RealshyTime MonitOr system The discovery system was operated by itself in the testbed to obtain unbiased experimental dala Yet in a more practical application one would expect the discovery system to work in tandem with an existing knowledge base It could then add rules and improve the existing rules already formed

One of the current problems with unsupervised learning is the problem of credit assignment To this end many different strategies have been tried This area is critical as was found even in the simple domain explored here Once a rule is put into use it may be discovered that the rule is in error When docs one dccide that it is a bad rule thal it no longer contributes DynamiC real-time systems offer a potential of immediate feedback in a repeatable fashion For cases where there are multiple rules acting in sequence a classifier system as given in Holland et aI (5] may be more appropriate

Volume 5 ~umb~r 4 Winter 1992

Acknowledgments

The suppon of the National Aeronautics and Space Administration GSFC code 5223 of the work presented here is gratefully acknowledged although the positions taken are those of the authors alone

References

1 Buchanan BG Sullivan 1 Cheng TP and Clearwater SH 1988 Simulated-Assisted Inductive Learning In Proceedings ofAAAI-88 Morgan Kaufmann Saint Paul MN

2 Carbonell 1 G and Gil Y 1990 Learning By Experimenlation The Operator Reftnement Method In Machine Learning An Artijiciallnzelligence Approach Volume III ed Y KodratOff and R Michalski Morgan Kaufmann San Mateo CA

3 Forbus K D 1984 Qualilative Process Theory Artificialllllelligence 24 85-168

4 Hieb M R 1990 Knowledge Discovery in Complex Engineering Systems MS Thesis George Washington University

5 Holland 1 H Holyoak Kbull Nisbett R and Thagard P 1986 Induction Processes of Inference Learning and Discovery MIT Press Cambridge MA

6 Rajamoney S A and Dejong G F 1988 Active Explanation Reduction An Approach to the ~Iultiple Explanations Problem In Proceedings of 114 51h International Conference on MachiM Letl17ling Ann Arbor MI

7 Silverman B Gbull Hieb M R Yang H Wu Lbull Truszkowski W and Dominy R 1989 nmiddotestigarion of a Simulator-Trained Machine Discovery System for Knowledge Base Management Purposes In Procudings of JCAl-89 Workshop on Knowledge Disccgtfry in Databases Detroit MI

8 Silverman B Gbull Hieb W Rbullbull and MeLlet T ~t 1991 Unsupervised Discovery in an Operatiorll (errol Setting In Knowledge Discovery in Daubullbull~lS ~d_ G Piatetsky-Shapiro and W Frawley Cambrid~ ~ ~llT Press

9 Zytkow 1 M 1989 Overcoming FA-~-==Ts Experimentation Habit Discovery Systel lUes a Darabase Assignment Proceedings ofJC4J-~9 ~i cp on Knowledge Discovery in Databases De- )c

51

tt~1 r-----------------------------------

---J--l--shy --__-_ -0- No Errors GCllcrlhJ I

MUlll - -0- IIl)S with CUlllpiclC Thcory -----1lZ[- III I I )

liDS witlllmolllplclc Them) I -shy BLJS with IlU Lcamillg 1----f-----1-----=ooo---- ~~----------J ~ l~

60Jll

-sect-~ I ~ 4(O---~r---_+---~-~~~~-_4~~-~--__4 I c loo lt- o c 2~t----r_-~~~~lt~~~~j_----_i------r---i----_l-1

01F---~----_+-----~-----+_----~---~---+_--~-J Il 20 IlII

Time (10 hours =1 urbit)

Figure 7 HOS Perfonnance Graph

In the longest experiment recorded one run continued for 34 passes or orbits (equivalent to approximately 50 hours of simulated time and 5 hours of wall clock time) The run was conducted with the HDS operating with incomplete knowledge During this time the HDS created 56 rules correctly keeping the good rules and retracting the ones which did not work

Related Work

To enable rule discovery by penurbing a realmiddottime process we have integrated our learning system with a simulator Buchanan et a1 [1] utilized a simulator to learn rules in the domain of high energy physics by induction over a large number of training examples In this case the simulator served as a generator of training examples from which rules were learned for an expen system We use our simulator for the same purpose but with a different learning mechanism Because of the real-time nature of our system we integrated our learning system into the perfonnance element This enabled us to take advantage of experimentation techniques

Experimentation in machine learning has been used by many systems FAHRENHEIT [9] uses a discovery program to design its own experiments to make the system closer to real world applications Zytkow applied these methods to the area of databases Rajamoney and DeJong [6] have described an elegant approach to experimentation

that is used to learn theories of physical devices This work focuses on using experiment design coupled with explanation-based learning to revise incomplete inuactable or incorrect domain theories Experimentation is perfonned to deny or conftrm predictions made by wellshyfonned hypotheses and if possible reduce the number of multiple explanations While FAHRENHEIT uses a quantitative mode) of the world Rajamoney and DeJong employ qualitative techniques We have begun to explore both paradigms in the HDS testbed

Another system utilizing experimentation is PRODIGY [2] PRODIGY experiments to improve the domain theory of a planning system As with the work of Rajamoney and DeJong PRODIGY experiments to discriminate between multiple explanations As with HOS experimentation is demand-driven and uses both domain constraints and any external feedback received HOS is more domain specific and is tailored to an engineering application

Conclusion

A methodology for discovering control knowledge in complex engineering systems has been presented The addition of a discovery system to an expert system in a dynamic environment such as process control should offer an improvement in performance and enable an expen system to learn new knowledge and improve its existing

The Journal of Knowledge Engineering 80

knowledge base Adding the ability to learn in CWTent knowledge-based systems reduces certain brittleness concerns The HDS is a demonstration of how such a system could be implemented

The HDS was tested in a domain characterized by realshytime operation and a dynamic environment The system was able to discover adequate EPS control rules while starting with no rules in its knowledge base The discovery process worked well when given either perfect sensor information or when given less perfect sensor information The performance of the combined learningproblem solving system was compared to a control case of a blackboard system that handled multiple rule bases that contained hand-crafted rules This blackboard system the Real-Time Monitor performed only slightly better than the HDS after evaluation of several EPS perfonnance criteria The HDS was tested in the case where it could not store any created rules The pertonnance of the discovery system was much worse if there was no memory of past experience

Currently the system is only able to deal with a certain subset of EPS problems Obviously a more practical system would have to deal with a wider range of parameters Also the system is only presented with situations where only one error exists at a time More robust algorithms must be tried to properly assign credit when multiple errors are present in the environment A system capable of dealing with many parameters would require a much more discriminating credit assignment

Another area which should be explored is to conven the rules learned into the format of the hand-crafted rules so that the system could be added onto the existing RealshyTime MonitOr system The discovery system was operated by itself in the testbed to obtain unbiased experimental dala Yet in a more practical application one would expect the discovery system to work in tandem with an existing knowledge base It could then add rules and improve the existing rules already formed

One of the current problems with unsupervised learning is the problem of credit assignment To this end many different strategies have been tried This area is critical as was found even in the simple domain explored here Once a rule is put into use it may be discovered that the rule is in error When docs one dccide that it is a bad rule thal it no longer contributes DynamiC real-time systems offer a potential of immediate feedback in a repeatable fashion For cases where there are multiple rules acting in sequence a classifier system as given in Holland et aI (5] may be more appropriate

Volume 5 ~umb~r 4 Winter 1992

Acknowledgments

The suppon of the National Aeronautics and Space Administration GSFC code 5223 of the work presented here is gratefully acknowledged although the positions taken are those of the authors alone

References

1 Buchanan BG Sullivan 1 Cheng TP and Clearwater SH 1988 Simulated-Assisted Inductive Learning In Proceedings ofAAAI-88 Morgan Kaufmann Saint Paul MN

2 Carbonell 1 G and Gil Y 1990 Learning By Experimenlation The Operator Reftnement Method In Machine Learning An Artijiciallnzelligence Approach Volume III ed Y KodratOff and R Michalski Morgan Kaufmann San Mateo CA

3 Forbus K D 1984 Qualilative Process Theory Artificialllllelligence 24 85-168

4 Hieb M R 1990 Knowledge Discovery in Complex Engineering Systems MS Thesis George Washington University

5 Holland 1 H Holyoak Kbull Nisbett R and Thagard P 1986 Induction Processes of Inference Learning and Discovery MIT Press Cambridge MA

6 Rajamoney S A and Dejong G F 1988 Active Explanation Reduction An Approach to the ~Iultiple Explanations Problem In Proceedings of 114 51h International Conference on MachiM Letl17ling Ann Arbor MI

7 Silverman B Gbull Hieb M R Yang H Wu Lbull Truszkowski W and Dominy R 1989 nmiddotestigarion of a Simulator-Trained Machine Discovery System for Knowledge Base Management Purposes In Procudings of JCAl-89 Workshop on Knowledge Disccgtfry in Databases Detroit MI

8 Silverman B Gbull Hieb W Rbullbull and MeLlet T ~t 1991 Unsupervised Discovery in an Operatiorll (errol Setting In Knowledge Discovery in Daubullbull~lS ~d_ G Piatetsky-Shapiro and W Frawley Cambrid~ ~ ~llT Press

9 Zytkow 1 M 1989 Overcoming FA-~-==Ts Experimentation Habit Discovery Systel lUes a Darabase Assignment Proceedings ofJC4J-~9 ~i cp on Knowledge Discovery in Databases De- )c

51

knowledge base Adding the ability to learn in CWTent knowledge-based systems reduces certain brittleness concerns The HDS is a demonstration of how such a system could be implemented

The HDS was tested in a domain characterized by realshytime operation and a dynamic environment The system was able to discover adequate EPS control rules while starting with no rules in its knowledge base The discovery process worked well when given either perfect sensor information or when given less perfect sensor information The performance of the combined learningproblem solving system was compared to a control case of a blackboard system that handled multiple rule bases that contained hand-crafted rules This blackboard system the Real-Time Monitor performed only slightly better than the HDS after evaluation of several EPS perfonnance criteria The HDS was tested in the case where it could not store any created rules The pertonnance of the discovery system was much worse if there was no memory of past experience

Currently the system is only able to deal with a certain subset of EPS problems Obviously a more practical system would have to deal with a wider range of parameters Also the system is only presented with situations where only one error exists at a time More robust algorithms must be tried to properly assign credit when multiple errors are present in the environment A system capable of dealing with many parameters would require a much more discriminating credit assignment

Another area which should be explored is to conven the rules learned into the format of the hand-crafted rules so that the system could be added onto the existing RealshyTime MonitOr system The discovery system was operated by itself in the testbed to obtain unbiased experimental dala Yet in a more practical application one would expect the discovery system to work in tandem with an existing knowledge base It could then add rules and improve the existing rules already formed

One of the current problems with unsupervised learning is the problem of credit assignment To this end many different strategies have been tried This area is critical as was found even in the simple domain explored here Once a rule is put into use it may be discovered that the rule is in error When docs one dccide that it is a bad rule thal it no longer contributes DynamiC real-time systems offer a potential of immediate feedback in a repeatable fashion For cases where there are multiple rules acting in sequence a classifier system as given in Holland et aI (5] may be more appropriate

Volume 5 ~umb~r 4 Winter 1992

Acknowledgments

The suppon of the National Aeronautics and Space Administration GSFC code 5223 of the work presented here is gratefully acknowledged although the positions taken are those of the authors alone

References

1 Buchanan BG Sullivan 1 Cheng TP and Clearwater SH 1988 Simulated-Assisted Inductive Learning In Proceedings ofAAAI-88 Morgan Kaufmann Saint Paul MN

2 Carbonell 1 G and Gil Y 1990 Learning By Experimenlation The Operator Reftnement Method In Machine Learning An Artijiciallnzelligence Approach Volume III ed Y KodratOff and R Michalski Morgan Kaufmann San Mateo CA

3 Forbus K D 1984 Qualilative Process Theory Artificialllllelligence 24 85-168

4 Hieb M R 1990 Knowledge Discovery in Complex Engineering Systems MS Thesis George Washington University

5 Holland 1 H Holyoak Kbull Nisbett R and Thagard P 1986 Induction Processes of Inference Learning and Discovery MIT Press Cambridge MA

6 Rajamoney S A and Dejong G F 1988 Active Explanation Reduction An Approach to the ~Iultiple Explanations Problem In Proceedings of 114 51h International Conference on MachiM Letl17ling Ann Arbor MI

7 Silverman B Gbull Hieb M R Yang H Wu Lbull Truszkowski W and Dominy R 1989 nmiddotestigarion of a Simulator-Trained Machine Discovery System for Knowledge Base Management Purposes In Procudings of JCAl-89 Workshop on Knowledge Disccgtfry in Databases Detroit MI

8 Silverman B Gbull Hieb W Rbullbull and MeLlet T ~t 1991 Unsupervised Discovery in an Operatiorll (errol Setting In Knowledge Discovery in Daubullbull~lS ~d_ G Piatetsky-Shapiro and W Frawley Cambrid~ ~ ~llT Press

9 Zytkow 1 M 1989 Overcoming FA-~-==Ts Experimentation Habit Discovery Systel lUes a Darabase Assignment Proceedings ofJC4J-~9 ~i cp on Knowledge Discovery in Databases De- )c

51