Eliciting and modelling expert knowledge

14
13 Eliciting and Modelling Expert Knowledge George WRIGHT * and Peter AYTON ** • Bristol Polytechnic, Bristol BSI6 IQY, UK • * City of London Polytechnic, London E1 7NT, UK This paper evaluates the usefulness of various psychological techniques that can be utilized to elicit and model expert knowledge for subsequent representation in rule-based expert systems. Interviewing, protocol analysis and multidimensional scaling are described and evaluated as complementary methods of knowledge elicitation. In addition 'context-focusing' and card-sorting are introduced as short-cut methods for the knowledge engineer's 'tool box'. It is argued that expert knowledge about uncertainty can be represented as subjective probabilities and that these assess- ments can (and therefore should) be checked for consistency and coherence as a pre-condition for realism. Finally, the issue of whether it is possible to improve upon expert judgement is discussed and evidence is reviewed which shows that, in repetitive decision-making situations, statistical models of the expert can out-perform the expert on whom the models are based. Statistical modelling has a valid but limited application as a replacement for expert judgement. George Wright recieved his PhD from Brunel University in 1980. He has since published widely on the human aspects of decision-making and fore- casting. His publications include Be- havioral Decision Theory, Beverly Hills: Sage and Harmondsworth: Penguin, 1984, Behavioral Decision Making, New York: Plenum, 1985. In- vestigative Design and Statistics, Harmondsworth: Penguin, 1986 and Judgemental Forecasting, Chichester: Wiley, in press. 1. Introduction Factual knowledge can be represented in terms of classifications and relationships. This sort of knowledge is often termed declarative knowledge. Procedural knowledge is concerned with the proce- dures and rules for manipulating the declarative knowledge and also with the control structures which contain information about when and how to apply the procedures and rules. In expert sys- tems the knowledge represented is often that acquired from a human expert. This paper dis- cusses the nature of expertise and evaluate the usefulness of various psychological techniques to aid the elicitation or acquisition of human knowl- edge. 2. What is Expert Knowledge? Expert knowledge is additional to knowledge contained in textbooks: 'Learning by observing seasoned experts is a very important step in the development of medical expertise. Prior to observing experienced physi- cians, a medical student first spends two or three years studying and acquiring textbook knowledge of diseases and the physiology of the human body. At the end of this period, despite a significant repertoire of factual medical knowledge, the stu- dent is unable to demonstrate any real diagnostic expertise .... Expertise is acquired during an ap- prenticeship period in which the student watches his or her mentors diagnosing real cases and at- 1 This research was supported in part by the Knowledge Engineering Business Centre of International Computers Ltd and in part by the British Economic and Social Research Council via project grant C00232037. We especially wish to thank Leslie Rabbitts at ICL for enabling us to test out our academic ideas in the real-world of business. North-Holland Decision Support Systems 3 (1987) 13-26 0167-9236/87/$3.50 © 1987, Elsevier Science Pubfishers B.V. (North-Holland) Peter Ayton conducted research in memory and language at University College London before joining the Decision Analysis Group. His current research activities include the develop- ment of statistical methods for indi- vidual difference analysis and the study of intuitive statistical eoucepts.

Transcript of Eliciting and modelling expert knowledge

13

Eliciting and Modelling Expert Knowledge George WRIGHT * and Peter AYTON ** • Bristol Polytechnic, Bristol B S I 6 IQY, UK • * City o f London Polytechnic, London E1 7NT, UK

This paper evaluates the usefulness of various psychological techniques that can be utilized to elicit and model expert knowledge for subsequent representation in rule-based expert systems. Interviewing, protocol analysis and multidimensional scaling are described and evaluated as complementary methods of knowledge elicitation. In addition 'context-focusing' and card-sorting are introduced as short-cut methods for the knowledge engineer's ' tool box'.

It is argued that expert knowledge about uncertainty can be represented as subjective probabilities and that these assess- ments can (and therefore should) be checked for consistency and coherence as a pre-condition for realism.

Finally, the issue of whether it is possible to improve upon expert judgement is discussed and evidence is reviewed which shows that, in repetitive decision-making situations, statistical models of the expert can out-perform the expert on whom the models are based. Statistical modelling has a valid but limited application as a replacement for expert judgement.

George Wright recieved his PhD from Brunel University in 1980. He has since published widely on the human aspects of decision-making and fore- casting. His publications include Be- havioral Decision Theory, Beverly Hills: Sage and Harmondsworth: Penguin, 1984, Behavioral Decision Making, New York: Plenum, 1985. In- vestigative Design and Statistics, Harmondsworth: Penguin, 1986 and Judgemental Forecasting, Chichester: Wiley, in press.

1. Introduction

Factual knowledge can be represented in terms of classifications and relationships. This sort of knowledge is often termed declarative knowledge. Procedural knowledge is concerned with the proce- dures and rules for manipulating the declarative knowledge and also with the control structures which contain information about when and how to apply the procedures and rules. In expert sys- tems the knowledge represented is often that acquired from a human expert. This paper dis- cusses the nature of expertise and evaluate the usefulness of various psychological techniques to aid the elicitation or acquisition of human knowl- edge.

2. What is Expert Knowledge?

Expert knowledge is additional to knowledge contained in textbooks:

'Learning by observing seasoned experts is a very important step in the development of medical expertise. Prior to observing experienced physi- cians, a medical student first spends two or three years studying and acquiring textbook knowledge of diseases and the physiology of the human body. At the end of this period, despite a significant repertoire of factual medical knowledge, the stu- dent is unable to demonstrate any real diagnostic expertise . . . . Expertise is acquired during an ap- prenticeship period in which the student watches his or her mentors diagnosing real cases and at-

1 This research was supported in part by the Knowledge Engineering Business Centre of International Computers Ltd and in part by the British Economic and Social Research Council via project grant C00232037. We especially wish to thank Leslie Rabbitts at ICL for enabling us to test out our academic ideas in the real-world of business.

North-Holland Decision Support Systems 3 (1987) 13-26

0167-9236/87/$3.50 © 1987, Elsevier Science Pubfishers B.V. (North-Holland)

Peter Ayton conducted research in memory and language at University College London before joining the Decision Analysis Group. His current research activities include the develop- ment of statistical methods for indi- vidual difference analysis and the study of intuitive statistical eoucepts.

14 G. Wright, P. Ayton / Eliciting and Modelling Expert Knowledge

tempts to duplicate this skill on his or her own through practice.' [Wilkins et al. (1984)]

Feigenbaum (1979) notes that expert knowledge consists of unwritten 'rules of thumb':

'(it is).., largely heuristic knowledge, experimen- tal, uncertain - mostly 'good guesses' and 'good practice', in lieu of facts and figures. Experience has also taught us that much of this knowledge is private to the expert, not because he is unwilling to share publically how he performs, but because he is unable. He knows more than he is aware of knowing . . . . What masters really know is not writ- ten in the textbooks of the masters. But we have learned that this private knowledge can be uncovered by the careful painstaking analysis of a second party, or sometimes by the expert himself, operating in the context of a large number of highly specific performance problems.' (p. 8)

Even understanding one expert's actions can re- quire the expertise of another:

engineer has far less knowledge of the domain than the expert, however, communication prob- lems impede the process of transferring expertise into a programme. The vocabulary initially used by the expert to talk about the domain with a novice is often inadequate; thus the knowledge engineer and expert must work together to extend and refine it.' (p. 129)

Duda and Shortcliffe (1983) conclude that:

'The identification and encoding of knowledge is one of the most complex and arduous tasks encountered in the construction of an expert sys- tem . . . . Thus the process of building a knowledge base has usually required a time-consuming col- laboration between a domain expert and an AI researcher. While an experienced team can put together a small prototype in one or two man- months, the effort required to produce a system that is ready for serious evaluation (well before contemplation of actual use) is more often mea- sured in man-years.' (p. 265)

'The ability :o infer the reasons for the action of another expert when watching the expert solve a problem is as much a dimension of expertise as problem solving, explanation of expertise, and technology of expertise. A familiar example of this within the field of artificial intelligence is seen during organised human-machine chess matches. There is often a highly ranked player present who explains the probable reason for the moves of each player during the game . . . . When a physician asks a question of a patient, another physician watching the patient/physician interview can usu- ally infer the reason for each question asked of the patient.' [Wilkins et al. (1984)]

Wilkins et al. (1984) also emphasise the effort required:

'A bottleneck in the creation of an expansion of these knowledge-intensive systems is knowledge acquisition. Acquiring the necessary domain knowledge is a very tedious and time-consuming manual process requiring many person-years of effort on the part of a domain expert and a knowledge engineer. There is good motivation to automate this process but methods to date have proved unsuccessful.' (p. 1)

4. What Techniques Can Aid the Elicitation of Expert Knowledge?

3. Is it Easy to Elicit Expert Knowledge?

All of the published articles in the area of knowledge engineering point to the difficulties of eliciting expert knowledge. Hayes-Roth et al. (1983) have described the problem in this way:

'Knowledge acquisition is a bottleneck in the con- struction of expert systems. The knowledge engineer's job is to act as a go-between to help build an expert system. Since the knowledge

Currently expert knowledge is often elicited by informal interviews and the knowledge obtained is coded into empirical rules. In many descriptions of how expert systems are built, knowledge elicita- tions is glossed over. For example, Pauker et al. (1976) said that they elicited the problem-solving strategies that physicians use by 'introspection and through direct observations of the clinician's problem-solving behaviour. The insights gained in this way were represented as a computer program.' (p. 983)

G. Wright, P. Ayton / Eliciting and Modelling Expert Knowledge 15

Duda and Gaschnig (1981), commenting on the development of the PROSPECTOR system for mineral exploration, noted: 'We developed each model by interviewing a geologist who is an authority on a particular class of deposits, and then translating the geologist's knowledge of the associations between field-observable evidence and relevant geological hypotheses into a structured collection of rules.' (p. 259)

The reason why the methods of knowledge elicited are often unstated or vague is because they are mainly ad hoc and non-scientific. The knowl- edge engineers are often computer specialists without training in relevant psychological tech- niques.

In the next sections of this paper we will de- scribe, in some detail, techniques that have been successfully used to elicit and model expert knowl- edge.

4.1. Interviewing - questioning the expert

What precise questions should you ask? Per- haps the best way to start is to ask the expert to talk for a set period, say half-an-hour, on the domain of expertise that you are interested in modelling in order that you can establish an over- view of the area. It is then possible to ask direct probing questions to access declarative and proce- dural knowledge. However, this sort of knowledge elicited by interview may not be easily translated into the rules and control structures of an expert system. Perhaps the best method to obtain these rules is by protocol analysis, which we describe in the next section.

Interviewing can be used to gain an overview of the domain of expertise and an understanding of the expert's jargon.

4.2. Protocol analysis - getting the expert to "think aloud'

As the expert works through a problem in his field of expertise ask him to ' think aloud' about his every thought and action. Record this verbali- sation and have it typed out. The problem may be a real one or an imaginary one or a set of prob- lems that describe in a fairly complete fashion the types of problem that the expert system is to be able to solve in the same way as the expert solves them. This technique is a method of concurrent

protocols since the think aloud data is obtained at the time the expert solves the problem. Below is an example of a protocol of a clinical expert examin- ing data on a patient who may have leukaemia. This protocol is taken from Myers et al. (i983).

'This patient is a fifty year old adult who in common with the last patient was said to have chronic myeloid leukaemia diagnosed in 1978 and has now gone into blastic transformation and is said to be lymphoid by morphological and cyto- chemical criteria . . . . Looking at bone marrow which has 70~ BLASTS. Um I also know, though I can't give a numerical value to it, than when this value of 70~ is obtained from a smear of blood then in the process of doing the market tests there is an ENRICHMENT OF BLASTS, SO 70% MAY BE A MINIMUM value and there may be rather more. TERMINAL TRANSFERASE IS 90%. IN FACT THERE HAS BEEN SOME ENRICH- MENT, SINCE IN AN ADULT THIS IS A LEUKAEMIA MARKER. An ADULT PA- TIENT WITH NORMAL TdT POSITIVE CELLS ARE NOT MORE THAN 5~. So this is . . . are all leukaemic lymphoblasts in lymphoblastic transfor- mation of CLL rather then myeloblastic. And I'm looking to see what subset it is - the IM- MUNOGLOBULIN IS ONLY 4% SO ITS NOT B. T-CELL MARKERS ARE 14~, 5~, 6% AND LESS THAN 1~. ALL LESS THAN 20. YOU COULD SAY THAT 90~ OVERLAPS WITH 14~ BUT I REGARD THAT AS AN IN- SIGNIFICANT O V E R L A P . . . W I T H I N THE ERROR OF OUR TECHNIQUES.'

These investigators have placed in block capitals the information that they thought useful in ex- tracting from the protocol the knowledge state- ments shown below:

70% blasts, preparation gives enrichment of blasts, so 70~ may be a minimum. Terminal transferase is 90~. In fact there has beea some enrichment. Since in an adult this is a leukaemia marker. An adult patient with normal TdT positive cells are not more than 5~. immunoglobulin is only 4% so it's not B. T-ceU markers are 14%, 5~, 6% and less than 1~. All less than 20. You could say that 90~ overlaps

16 G. Wright, P. Ayton / Eliciting and Modelling Expert Knowledge

with 147o but I regard that as an insignificant overlap. . , within the error of our techniques.

and the rules:

preparation enriches for blasts, so take the blast count to be a minimum TdT > blast count ~ suggests enrichment TdT in adult ~ leukaemic marker normal adult level of TdT less than 5% immunoglobulin < 5% ---> not B-cells T-cell markers less than 20% ~ not T-cells overlap of 14 with 90 ~ insignificant

Sometimes it may be more appropriate to video the expert performing the task and, after the solu- tion has been achieved, play back the video and ask the expert to say what he was thinking and doing. This method is one of retrospective pro- tocols. This procedure is justified when it is thought that obtaining a concurrent protocol will affect the expert's performance on the task. Conversely, it may also be useful when performance of the task is suspected to interfere with the expert's ability to offer a coherent protocol. Even if no apparent hindrace is evident in either activity ad- ditional insights may be available to the expert when engaged in focussed contemplation of his or her own performance. This should be valuable when the expert's knowledge is largely tacit and not easily translated into a verbal format. This problem is considered further in the next section.

4.2.1. Some problems with protocol analysis Some investigators [e.g., Ericsson and Simon (1980)] have noted that over time recurrent cogni- tive processes tend to become automated. Con- sider trying to explain to someone how you coor- dinate your own actions to enable you to jump on a moving bus.

You may be able to verbalise about some other cognitive processes more easily but perhaps not accurately. Consider the task of explaining to someone how you known how far away objects are. For example the distance, from yourself, of two other cars on a road as you drive along the road. How do you measure such distances without a ruler or tape measure? Psychologists have shown, by extensive experimentation, that we judge dis- tance by several cues. The first set of cues are physiological and come from feedback from muscle

contractions in our eyes. The muscles press hard on our eye lens to focus on near objects but relax to focus on distant objects. Also with distance the texture of fine detail becomes less distinct and colours appear to fade to a hazy blue. Familiar size is another cue to distance - we know that all cars are roughly the same physical size even though the size of the image of the cai on the back of the eye's retina (which is like the film in a camera) reduces dramatically the further away a car is - thus a change in image size can be used as a cue to how far away a car is. Another cue to depth is the interposition of one object over another. Consider fig. 1. From this diagram it appears C is in front of B which is in front of A.

From this overview of four of the many cues to distance that v~c use you will have begun to realise that how we say we do something, like judge distance or depth, may not be a true description of how we really do it. It follows that expert proto- cols may also be invalid descriptions of 'real" cognitive processes and operations. As knowledge engineers, we may be able to model what the expert says he does but these verbalisations may not be a valid description of real processes, which may be very difficult for the expert to verbalise. Interviews, especially, may encourage the expert to speculate and theorise about his or her cogni- tive processes. In this regard it is worth noting that Nisbett and Wilson (1977) have argued that we have no conscious access to mental processes - only the mental products of such processes. From these products the existence and nature of the processes can only be inferred.

Asking the expert to perform a range of tasks is more likely to provide relevant knowledge than simply asking the expert what he does.

Obtaining concurrent protocols is also more likely to result in valid knowledge of expert processes than probing questioning_.

A B C

Fig. 1. A cue to depth .

G. Wright, P. Ayton / Eiiciting and Modelling Expert Knowledge 17

Consider the following three methods of elicit- ing knowledge: (1) Focused question: Did you use X as a sub-

goal? Answer: Yes

(2) Unfocused question: Did you use any sub- goals? If so which?

Answer: Yes, I used X (3) Concurrent protocol: . . . I was first trying to

get X and I . . . when I attained X . . .

All three elicitation methods gave the knowl- edge engineer the same information but the first method also tells the expert what is required and may encourage the expert to say what he thinks is the 'correct' answer. The second prompts the ex- pert to generate a plausible answer which, conse- quently, may not be valid. The third method con- tains the strongest evidence that the expert really used X as a subgoal.

Does the generation of verbal protocols affect the way in which the expert performs his exper- tise? Ericsson and Simon have argued that think aloud protocols will not change task performance although the speed of task performance may be slowed down. For example Roth (1966) found that verbalisation had no effect on the effectiveness of task performance. However, asking the expert (by probing questioning) to explain why he is doing what he is doing requires the expert to attempt to access additional knowledge and information in his memory and so will disturb task performance. It follows that probing questions should follow immediately after the expert has demonstrated his expertise or as the expert performs another task.

Ericsson and Simon (1979) have noted that the automation of expertise is analogous to executing a computer algorithm in compiled instead of inter- pretive mode. Automation and compilation have two important consequences. They greatly speed up the process, and they make the knowledge of the process unavailable to memory and hence unavailable for verbal reporting.

Fast automatic processes may proceed in paral- lel and unpractised processes may follow a slower serial sequence. Consider the slow deliberate movements of a novice bricklayer and the move- ments of a skilled man. With an increase in expe- rience of a task the cognitive processes that the novice is able to verbalise may be unavailable to the expert when asked to ' think

Many studies have shown that people can dis- play consistent and accurate behaviour without being able to report verbally the concepts being used [e.g., Bugelski and Scharlock (1952)1. There appears to be a negative relation between degree of practice and awareness of the intermediate stages of a cognitive process. Ericsson and Simon (1980) argue that many overlearned processes op- erate automatically without leaving any more trace than their final result in memory.

In a recent study, Berry and Broadbent (1984) explored the relationship between a person's per- formance on a cognitive task and the person's ability to make explicit the knowledge underlying that performance - assessed on a post-task questionnaire. The task was computer-imple- mented and required individuals to reach and maintain specified target values of an output vari- able by varying a single input variable. The nature of the equation was such that there was no unique output associated with any one input. The result- ing output depended on previous input. The task, like a manual skill, required sustained perfor- mance and also cognitive decision-making. They found that practice significantly improved perfor- mance but had no effect on the ability to answer task-related questions. Berry and Broadbent con- eluded that assessing knowledge by means of a questionnaire did not give a true picture of an individual's competence. Also, providing an indi- vidual with appropriate detailed verbal instruc- tions which are understood (and are later shown to have been remembered on the post-task questionnaire) is not necessarily sufficient to im- prove task performance - the individual must implement this verbal understanding.

It has also been shown that people tend to stop verbalising or to verbalise incompletely when the task is difficult and takes a lot of mental effort for a solution. Chess grandmasters seem unable to report intermediate stages in their thought processes as they contemplate difficult moves [de Groot (1965)]. Rather these experts tend to report experience of 'insight' where the solution a~ppears as a whole as if from nowhere.

In general it wouM appear that a protocol is potentially useful for what it contains rather than what it omits.

Another conclusion to be drawn is that when people are asked probing questions about their cognitive processes they frequently do not bas~:

18 G. Wright, P. Ayton / Eliciting and Modelling Expert Knowledge

their answers on the specific memory of what they did but tend to speculate and theorise about what they did. Such speculations may of course not be valid. People try to 'fill out' and generalise incom- plete or missing memories. Such speculations are shown by long pauses in replying to questioning and 'tentative' answering.

To summarise, verbal protocols are best taken at the time the expert performs a task within his domain of expertise. Several tasks which illustrate different aspects of the expertise should be used to elicit protocols; repetitive successive elicitations will check the validity of expert protocols. Probing questions should be left until the task has been completed, or perhaps asked as the expert per- forms another task, and the expert should be questioned in as indirect a way as possible so that words are not put in the expert's mouth as he helpfully tries to describe a thought process that may be automatic or compiled and so not availa- ble for verbalisation. Nevertheless, probing ques- tions may uncover underlying cognitive processes that have not been verbalized as protocols.

Another form of protocol analysis which allows direct and often more convenient access to proce- dural knowledge than concurrent and retrospec- tive protocols is that of 'context-focusing'.

4. 2.2. Context.focusing: Short-cut protocol analysis Context focusing i is applied after the knowledge engineer has used investigative interviewing to gain an initial account of a problem area and has decided that development of a knowledge based system may be an appropriate solution to a prob- lem. Context-focusing gives the knowledge en- gineer access to the expert's sequence of rule test- ing. By itself it does not directly allow access to the expert's knowledge of the classification or relationships between the objects, experience and rules of the expert's world. Multi-dimensional scaling and card-sorting tasks, to be discussed later in section 4.3, are used for this purpose.

In context-focusing the knowledge engineer imagines a particular state of the system or classi- fication and the expert has to find out what it is. For example, a car-fault diagnosis problem where the knowledge engineer (KE) is trying to discover

The original idea which lead to the development of the technique is due to Peter Whalley.

the sequence of rule testing that a mechanic (M) uses to diagnose why a car won't start.

M Does the ignition light come on when you turn the key?

KE Yes. M Does the petrol gauge show that there is

petrol in the tank? KE Yes. M Does the engine turn over when you turn the

key? KE Yes. M Does it turn over as quickly as it does nor-

mally? KE No. M o . .

If the mechanic is using his expertise efficiently the earlier questions in a sequence should serve to eliminate the most likely cases of a car failing to start. In other words, the early testing of a rule indicates that it has higher priority than rules that are tested later. Ideally the knowledge engineer should initiate the context-focusing procedure many times, each time imagining an alternative State of the system. In this way it is possible to check that the expert's priority ordering of rule testing is consistent from task to task. If it is not then the knowledge engineer should ask the expert why his or her sequence of questioning changed. If the sequence is consistent then the knowledge engineer should discuss the rationale behind the ordering with the expert.

In the above example of car-fault diagnosis, the mechanics may reply that the most frequent cause of a car not starting is a flat battery but often failure to start is the result of an empty petrol tank. It follows that the expert's second question eliminates a common cause of failure and the third and subsequent questions begin to focus down on the electrical system, in order to ensure that the cause is not there. Intuitively it would seem that the expert's fourth question would be a more appropriate initial question. But remember that the expert will often be transferring knowl- edge that is previously unfamiliar to the knowl- edge engineer. Only by questioning the expert on the rationale for the sequence will a more efficient sequence of rule testing normally become ap- parent.

If the knowledge engineer is relatively un-

G. Wright, P. Ayton / Eliciting and Modelling Expert Knowledge 19

familiar with the domain of expertise in which a knowledge based system is to be built, he or she may not be able to answer the expert's questions in the process of view-changing. In these cir- cumstances the knowledge engineer should act as an observer whilst two experts, who shore a com- mon understanding of the knowledge domain, ini- tiate the context-focusing technique. In this case the observer should make a careful record of the sequence of rule testing for later elaboration in discussion with the expert, or experts. The knowl- edge engineer should bear in mind the potential problems of combining knowledge obtained from different experts.

4.3. Elicitation of declarative knowledge using mul- tidimensional scaling

When there are a number of closely related concepts and there is no specialised vocabulary to describe subtle distinctions and relationships, a technique called multidimensional scaling can be a useful tool for the knowledge engineer. Multidi- mensional scaling allows experience and relation- ships to be communicated from one person (an expert) to another person (a knowledge engineer) whilst de-emphasizing the mediating role of lan- guage.

A problem often faced in building an expert system is how to measure and understand the way experts view relationships between objects and experiences. Multidimensional scaling analysis al- lows us to represent visually the psychological similarities between objects or experiences as points on a scattergram. Decreasing physical sep- aration between objects represents increasing psy- chological similarity. Objects judged to be psycho- logically dissimilar will be represented as being far apart.

By asking experts to rate the similarity of ob- jects and subsequently representing the similarity as physical distance it is possible to make interpre- tations of the underlying dimensions on which the objects have been judged relative to one another. Multidimensional scaling (MDS) procedures do not require any a priori knowledge of these di- mensions and, since only similarity judgements are required, the knowledge engineer does not 'put words into the expert's mouth'. Conversely the expert does not put jargon into the knowledge engineer's ear.

MDS is an attempt to measure and understand the relations between objects within an assumed spatial model of psychological similarity. MDS is simply a mathematical tool for representing the adjudged similarities between objects as a spatial map.

In multidimensional scaling studies, each possi- ble pairing of objects from an object set is pre- sented to the expert who then rates the similarity of the pair on a seven-point scale ranging from, say, no si.milarity at all to completely similar. In most situations these ratings will be ordinal-level. The intention of non-metric multidimensional scaling is to represent, as closely as possible, the rank order of the similarity judgements as rankordered distances in some psychological ' space'.

in some cases, it can be shown that MDS techniques can be used to elicit subjective percep- tions that are not accessible with protocol-based methodologies. Protocols and interviews are de- pendent upon the mediating role of language and may therefore be inappropriate for some situa- tions. Below we detail one such case, described by Whalley (1984), involving the perception of pain. Another example might be the diagnosis of the cause of an unaccustomed noise in a particular type of machine.

The dominant methodology at present used to explore pain perception involves checking descrip- tive words on a questionnaire. From the words an individual has checked to describe his pain it is possible to make a preliminary diagnosis. This diagnosis is based on previously observed reh- tions between words checked and subsequent final diagnoses, based in some cases on surgical investi- gation. Words included in the questionnaire in- cluding 'pulsing', 'drilling', 'cutting', 'tender', 'dull', 'tight' etc. Whalley showed that these lingu- istic descriptions have different meanings for dif- ferent people and advocates a multidimensional scaling approach to pain. In his study, patients were required to make comparative similarity judgements between the pain that they were expe- riencing and a set of commonly occurring painful events. The relational information shown between the new pain and the set of common pairs can then be used as a method for diagnosis without requiring patients to use imprecise ambiguous ad- jectives. One of Whalley's two-dimensional refer- ence plots is given in fig. 2. He has labelled the

20 G. Wright, P. Ayton / Eliciting and Modelling Expert Knowledge

two pain dicaensions 'intensity' and 'duration'. Other variations of multidimensional scaling

and cluster analysis allow access to hierarchies in declarative knowledge. However, since most im- plementations of these analyses are on main-frame computers the knowledge engineer may find card- sorting techniques, to be discussed in the next section, more convenient to use.

4.3.1. Card-sorting: A short cut to declarative knowledge

Card-sorting techniques 2 are very easy to use and involve the knowledge engineer writing the names of the objects, experiences or rules in the expert's world onto individual cards. Only those concepts which the knowledge engineer feels need to be explored should be used in the card-sorting tasks. These tasks give the knowledge engineer access to the expert's knowledge of the classifica- tions and relationships between objects, experi- ences or rules. Essentially the tasks allow access to an understanding of the structure of the knowl- edge underlying the expert's jargon.

4.3.1.1. Group separation task. Take the whole set of cards and ask the expert to sort them into two groups which the expert should then name. For example, the names of fifteen particular models of cars may be sorted into 'foreign' and 'British' models. Shuffle the cards and then ask the expert to sort the cars into three named groups. For ex,~mple, the fifteen cars may be sorted into 'estates', 'sports cars' and 'family cars'. Next re- shuffle the cards and ask the expert to sort the cars into four named groups. For example, the

t Z _o I - <¢ n-

¢3

Burn • B a c k a c h e •

= H e a d a c h e T o o t h a c h e •

Too th f i l led •

• Sore th roa t

• In ject ion

INTENSITY --~

Fig, 2. Mu l t i d imcns iona ] scal ing o f subject ive perceptions o f pain.

j /OARS

FOREIGN BRITISH

EXPENSIVE LESS ZXFENSIVE EXPENSIVE LESS EXPENSIVE

Fig. 3. A possible hierarchy.

fifteen cars may be sorted into 'expensive British cars', 'less expensive British cars', 'expensive for- eign cars' and 'less expensive foreign cars'. The latter sort may lend itself to representation as the hierarchy shown in fig. 3. But a hierarchical repre- sentation of classifications and relationships may not always be appropriate. The first two card-sorts cannot be combined .directly as a hierarchy.

Even the third card-sort may be represented as the alternative hierarchy illustrated in fig. 4. Per- haps if the fifteen cards were sorted into twelve categories the hierarchy shown in fig. 5 would result. Check that any constructed hierarchy matches the expert's view of the world by present- ing him or her with the objects in the lower classifications of a hypothesised hierarchy and asking him or her to create fewer classifications by combining categories. The combined categories should match your hierarchy. For example, the four categories of 'expensive foreign', 'expensive British', 'less expensive foreign' and 'less expen- sive British' may be combined into the two group- ings of 'expensive' and 'less expensive'.

4.3.1,2. Group creation task. Instead of sorting the group of cards into successively smaller groupings another way to approach the prob!~: !s to ask the expert to find a pair of cards in the set that are more similar to each other than any other possible pairing. For example, two car models made by the same manufacturer may be seen as the most simi- lar of all possible pairings of the cards. Next ask the expert to find the next most similar pairing or allow him or her to add aaother car to the first pairing. In this way the relationship between what

CARS

EXPENSIVE LESS EXPENSIVE

FOREIGN BRITISH FOREIGN BRITISH

2 See footnote 1. Fig. 4. An alternative hierarchy.

G. Wright, P. Ayton / Eliciting and Modelling Expert Knowledge 21

CARS~ EXPENSIVE LESS EXPERSIVE

FOREIGN BRITISH POREIGg BRITISH /I',

ESTATES SPORTS FAMILY " " • • • " ' " ' •

Fig. 5. A hierarchy with four levels.

of the structure of an expert's knowledge. How- ever, they do provide a means of achieving a more focussed and systematic understanding of the clas- sifications and relationships present in the expert's view of the world.

4.4. Elicitation of expert knowledge about uncer- tainty

the knowledge engineer has written on the cards can be explored with the expert as he or she groups and sorts the cards. To this end it is particularly important that the knowledge en- gineer asks the expert to try and name the groups or links that are formed. The knowledge engineer will find himself or herself asking the expert ques- tions about groups that have been formed such as: 'How are these cards similar but yet different from these cards?'

By this method the knowledge engineer be- comes aware of the structure of the expert's knowledge and, for example, may hear the expert answer by saying: 'On these cards are items that we normally stock.' 'On these cards are the names of the chemical processes by which our products are made'. 'These are parts of the machine's electrical system.' 'These are types of weed that respond to weed- killer X whilst these are types of weed that require weedkiller Y.' 'These are forms that need to be completed to obtain a stock item, these are forms for non-stock items whilst these are forms to call in an outside contractor for repair work. All these forms need authorisation by manager X before they can pro- gress through the purchasing system.'

4.3.1.3. Triadic comparisons This card-sorting method requires the knowledge engineer to take three cards at random from the set of cards. The three cards are the presented to the expert who is asked to put the cards into two groups such that the two cards in one group are more similar to each other than to the third card. The expert is then asked to try and name the way in which the groups differ. For example the expert may say: 'On these two cards are the names of dials I need to keep a careful eye on. The third dial's reading is less critical.'

In general, the different techniques of card- sortir~g will not produce identical representations

Many of the rules elicited from experts will follow the form 'A and B together often imply C' and many expert systems have in-built facilities to deal with uncertain information. For example, Joseph (1982) notes that not all expert knowledge is a set of 'black-and-white' facts. Most expert systems that can tolerate uncertainty employ some kind of probability-like measure to weigh and balance conflicting evidence. PROSPECTOR as- signs probabilities to conclusions using an ap- proximate form of Bayes' theorem to update prob- abilities as more information is acquired. MYCIN uses 'certainty values', but the meanings of the numbers are debatable since they do not follow the probabifity laws.

Indeed as Feigenbaum (1979) notes: 'Mycin was the first of our programs that forced us to deal with what we had always understood: that experts' knowledge is uncertain and that our in- ference engines had to be made to reason with th;s uncertainty.' (p. 18)

4.4.1. Methods o] eliciting subjective probabilities The direct method for probability assessment is

very simple; the probability assessor is simply required to state a number between 0 and 1, with 0 meaning that it is thought impossible that an event will occur and 1 meaning it is thought absolutely certain that the event will occur.

It is also possible to vary the response mode and ask the expert to assess his or her uncertainty in odds. Assessed odds of 1 to 1 would mean that it is thought that the event's occurrence is equally likely as its non-occurrence. Odds of 1,000 to 1 against would mean that it is thought that the event's occurrence was extremely unlikely. Con- versely, odds of 1,000 to 1 on would mean that the event's occurrence is thought to be extremely likely.

It is easy to convert an odds response to probability. For example, odds of 10 to 1 against the event happening is equivalent to a probability of 0.91 that the event will not happen or 0.09 that

22 G. Wright, P. AyWn / Eliciting and Modelling Expert Knowledge

Wager A

Doesn't rain 1 £0

Wager B

Pointer stops ] £100 in black |

/

Pointer stops [ £0 in white , I

Spinner bet for wager B

@ Fig. 6. Indirect Measurement of Probability.

the event will happen. Odds of 10 to 1 on are equivalent to a 0.91 probability that the event will happen. It is often found that a converted odds response is not identical to a direct probability estimate for the occurrence of the same event. We will return to this issue later.

Next we will consider an indirect way of mea- suring your degree of belief. Consider wager A presented in fig. 6. If it does rain at 11 a.m. tomorrow over your home you win £100; if it doesn' t you win nothing.

Now consider wager B, which refers to the 'spinner bet' consisting of a pointer which is free to rotate over a circle comprising two colours, black and white. 1 can adjust the relative amount of these two colours. If the pointer lands in the black you win £100; if it lands in the white you win nothing. Given that the proportion of black and white sectors is that in fig. 6, which wager would you prefer, wager A or wager B, or are you indifferent between the wagers?

If you preferred wager A to wager B then I would increase the proport ion of black to white until you prefer wager B to wager A. I would then reduce the proportion of black until you are indif- ferent between the two wagers. The relative pro- port ion of black to white would then be equivalent to your subjective probability that it will rain tomorrow. This indifference bet method allows the knowledge engineer to measure subjective prob- ability without requiring the expert to state any numbers to describe his or her degree of belief. The only restriction in tiffs method is that the

utilities of the outcomes in the two wagers must be strictly identical. In this instance wager B must be played at, or shortly after, 11 a.m. tomorrt~'w, when we known whether it has actually rained or not. If the result of wager B was to be paid out now, the expert 's utility for an ' instant ' £100 may be higher than that for a 'delayed' £100 and so the two wagers would not be strictly identical.

Notice that although people may differ in their utility for £100, this amount is similar in both wagers and therefore has no bearing on the mea- surement of subjective probability.

Which of these three methods is the best for elicitation of subjective probability? The empirical evidence is, unfortunately, contradictory. Some- times the indirect methods are inconsistent with direct methods and sometimes they are not. Some studies have shown consistency between probabil- ity estimates inferred from wagers and direct estimates. However, other studies have shown that statistically naive subjects were inconsistent be- tween direct and indirect assessment methods, whereas statisticians were not. Generally, direct odds estimates, perhaps because they have no upper or lower limit, tend to be more extreme than direct probability estimates.

If probability estimates derived by different methods for the same event are inconsistent, which method should be taken as the true index of degree of belief?

One way to answer this question is to use the method of assessing subjective probability that is more reliable. In other words there should be high agreement between the subjective probabilities, assessed at different times by a single assessor for the same event, given that the assessor's knowl- edge of the event is unchanged. Unfortunately, there has been relatively little research on this important problem. Goodman (1973) reviewed the results of several studies using direct estimation methods. Test-retest correlations were all above 0.88 with the exception of one study using stu- dents assessing odds - here the reliability was 0.66. Goodman concluded that most of the sub- jects in all experiments were very reliable.

Whatever direct or indirect method of obtain- ing a numerical estimate is used it is clear that the elicited probabilities can be utilised easily in an expert system. By contrast, consider verbal reports of uncertainty, for example, 'very probably' or 'extremely likely'. These estimates are less precise

G. Wright, P. Avton / Eiiciting and Modelling Expert Knowledge 23

Table 1 Variations in the numerical meaning of probability expres- sions.

Mean associated Range of associated probability probabilities

Highly probable 0.89 0.60-0.99 Quite likely 0.79 0.30-0.99 Probable 0.71 0.01-0.99 Possible 0.37 0.01-0.99 Improbable 0.12 0.01-0.04 Quite unlikely 0.11 0.01-0.50 Highly improbable 0.06 0.01-0.30

than numerical estimates and interpretation of their meaning varies from person to person. Table 1 sets out the variation in numerical meaning attributed to some probability expressions found by Lichtenstein and Newman (1967). Also notice the asymmetry found between mirror-image pairs. Clearly verbal expressions of probability are open to misinterpretation.

Wright, Ayton and Whalley (1985) have devel- oped a computer program called FORECAST which helps experts make subjective assessments of probability. The program checks for consistency between direct and indirect assessments of prob- ability and reports inconsistencies back to the expert for interactive resolution. The program then goes on to check for coherence in probability assessment by using the probability laws. For example, one simple probability law states that the probability of event A and event B both occurring is equal to the probability of event A occurring multiplied by the probability of event B occurring given that event A has occurred. More formally:

P(A and B) = P(A)* P ( B \ A )

Consider an imaginary diagnostic expert system for assertaining why a car won't start. It is possi- ble to work out the probability that the cause of the starting failure is due to both badly set contact points and sparking plugs [P(A and B)] if we can assess the probability that the contact breaker gap is badly set [P(A)] and the probability that the plug gaps are wrongly set given that we know the contact-breaker gap is wrongly set. [P(B\A)]

Why bother to work through this probability law? The reason is that the two sides of the equation seldom balance - experts show marked

incoherence in judgemental assessment of probability (Ayton and Wright, in press). The FORECAST program monitors subjective probability assessments for coherence and interac- tively resolves incoherence with the probability assessor.

Only when probability assessments are con- sistent and coherent is it sensible to check that they are also realistic. One measure of the realism of probability assessments is calibration. A person is said to be perfectly calibrated if, for all events or propositions to which he or she assigns a given subjective probability (P), the proportion that oc- curs, or is correct, is P. For example if you assign a probability of 0.7 to each of ten questions con- ceming the possible occurrence of future events, you should get seven of those questions correct. Similarly, all events that you assess as being cer- tain to oocur (1.0 probability assessment) should, in fact, occur. Wright and Ayton (1986) discuss the relationship between consistency, coherence and validity in more detail.

5. l-low Good is the Expert's Expertise: Is It Possi- ble to Improve on Expert Judgement.'?

Up to now we have discussed some of the approaches to eliciting expert knowledge for mod- elling in rule-based inference systems. This discus- sion has been founded on the assumption that it is worth-while modelling the expert in this way. In the next section of this paper we discuss some research which has shown that expert judgement can be improved upon by statistical modelling.

Expert judges have been studied making illness diagnoses on the basis of patient symptoms or characteristics. One of the most extensive evalua- tions of medical expertise was that conducted by Meehl (1959). The judgemental problem used was that of differentiating psychotic from neurotic pa- tients on the basis of their personality question- naire profile.

Each patient upon being admitted to hospital had taken eleven personality tests. Expert clinical psychologists believe (or at least used to believe) that they can differentiate between psychotics and neurotics on the basis of profile of eleven questionnaire scores.

Initially researchers tried to 'capture' or 'model' expert judges by a simple linear regression equa-

24 G. Wright, P, Ayton / Efic;'ing and Modelling Expert Knowledge

PATIENT INFORMATION Score (S) on each of the eleven

personality scales related to neuroticism and psychoticism I

RITERION (C} TO BE PREDICTED hether the 10atient was diagnosed

neurotic or psychotic after extensive psychological and

psychiatric evaluation

T JUDGE

Weight (W) derived for each of the eleven scales

=] Linear additive model of judge I. l r

I C= ,u + W1 $1 + W2 $2--. + Wll $11

Fig. 7. Basic paradigm for the construction of a linear additive model of an expert judge.

tion. This judgemental representation is con- structed in the following fashion. The clinician is asked to make his diagnostic or prognostic judge- ment from a previously quantified set of cues for each of a large number of patients. These judge- ments are then used as the dependent variable in a standard linear regression analysis. The indepen- dent variables in this analysis are the value of the cues. The results of such an analysis are a set of regression weights, one for each cue, and these sets of regression weights are referred to as the expert's 'model' or his 'policy'. Fig. 7 sets out the basic paradigm.

How do these models make out as predictors themselves? That is, if the regression weights (gen- erated from an analysis of one clinical judge) were used to obtain a 'predicted diagnosis' for each patient, would these diagnoses be more valid, or less valid, than the original clinical diagnoses from which the regression weights were derived? To the extent that the model fails to capture valid non- linear variance to the expert's decision processes, it should perform worse than the expert; to the extent that it eliminates the random error compo- nent in human judgements, it should perform bet- ter than the expert.

What were the results of this research? The overwhelming conclusion was that the linear model of the expert's behaviour out-performed the ex- pert. Dawes (1975) noted: 'I know of no studies in which human judges have been able to improve upon optimal statistical prediction . . . . A mathe-

matical model by its very nature is an abstraction of the process it models; hence if the decision- maker's behaviour involves following valid princi- ples but following them poorly these valid princi- ples will be abstracted by the model.'

Goldberg (1965) reported an intensive study of clinical judgement, pitting experienced an inexpe- rienced clinicians against linear models in the psychotic/neurotic prediction task. He was led to conclude that Meehl chose the wrong task for testing the clinician's purported expertise. The clinicians achieved a 62 per cent rate, while the simple linear composite achieved 70 per cent. A 50 per cent hit rate could have been achieved by chance as the criterion base rate was approxi- mately 50 per cent neurotic, 50 per cent psychotic.

Dawes and Corrigan (1974) have called the replacement of the expert by his model bootstrap- ping. Belief in the efficacy of bootstrapping is based on a comparison of the validity of the linear model of the expert with the validity of his or her holistic judgements.

Dawes and Corrigan concluded that the hup.~an decision-maker need specify with very little preci- sion the weightings to be used in the decision - at least in the context studied; what must be speci- fied is the variables to be utilised in the linear additive model. It is precisely this knowledge of 'what to look for' in reaching a decision that is the province of the expert clinician. It is not in the ability to integrate information that the expert excels.

G. Wright, P. Ayton / Eliciting and Modelling Expert Knowledge 25

The distinction between knowing what to look for and the ability to integrate information is illustrated in a study by Einhorn (1972). Expert doctors coded biopsies of patients with Hodgkin's disease and then made an overall rating of sever- ity. These overall ratings were very poor predictors of survival time, but the variables the doctors coded made excellent predictions when utilised in a linear additive model.

In conclusion, we can say that in a repetitive prediction task only the knowledge of which varia- bles to include in the prediction equation is im- portant. Clinical expertise is, of course, the source of this knowledge - without it the linear models could not work. However, the clinician's impor- tance weightings are not at all crucial. This result remains true in all the contexts so far investigated.

The research discussed in this section can be viewed as being highly critical of the conceptual basis of expert systems. However, it must be re- membered that linear models only work in repe- titive situations where the set of cue variables is constant from diagnosis to diagnosis - only the values or scores of the cue variables change with each decision. By contrast, the structures for knowledge representation in expert system shells allow a more flexible representation of human expertise. Perhaps the best way for the knowledge engineer to view the research on linear models is to recognise that statistical modelling has valid but limited application as a replacement for expert judgement.

6. The Future of Knowledge Elicitation

The next major breakthrough may come in the automation of knowledge acquisition. Robert En- glemere, director of knowledge systems develop- ment for Teknowlege notes 'we're handcrafting now' [quoted by Swaine (1983)]. Other researchers have developed various tools that can facilitate the process of acquiring knowledge.

For example, RULEMASTER [Michie, Mug- gleton, Riese and Zubrick (1984)] has been pro- posed and implemented as a 'general purpose expert-system building tool' which builds rules by 'rule induction' or generalisation over examples of expert decision-making. For instance, in a hypo- thetical example of building a system to classify animals on the basis of colour and shape the

expert may type in 'grey, big and elephant', yel- low, big and giraffe' and 'grey, small and tortoise'. Rulemaster would then generate the following rule:

If the animals colour is (a) yellow, then it is a giraffe (b) grey, then if the animals' size is

(i) big, then it is an elephant (ii) small, then it is a tortoise.

However, computer-based automation of the full scope of knowledge may be an unrealisable dream, given our discussion of the difficulties of knowledge elicitation in this paper. Our view is supported by Meyers et al. (1983): 'In our view the automatic construction of rules by algorithmic or heuristic methods, is some way from routine practical applicability...we also believe that the attempt to automate knowledge base construction may be based on a somewhat premature rejection of knowledge elicitation techniques.'

Indeed, problems inherent in the goal of defi- ning and representing knowledge have been the subject of debate and dispute since at least the time of Aristotle. With the development of com- puters these matters have now become a hot tech- nological issue and it is perhaps tempting to sup- pose that some major progress is about to be made. The emergence of 'cognitive science' as a new discipline is, arguably, symptomatic of that optimism. Nevertheless one only has to scan re- ports of research in computer simulation, semantic memory modelling, natural language comprehen- sion and other pertinent areas to appreciate the enormity of the task. The formulation of ideal, definitive and a priori principles for capturing knowledge in any domain will no doubt prove to be elusive yet awhile. Knowledge is ethereal stuff. Its successful elicitation and modelling will de- pend on active efforts to realise the subtleties and complexities of the problems. We hope that this discussion of the advantages and limitations of the various knowledge elicitation techniques has dem- onstrated the potential applicability of psychologi- cal methodology to aid modelling of expertise for the construction of expert systems.

References

Ayton, P. and G. Wright, n.d., Assessing and improving judge- mental probability forecasts, OMEGA International Jour- nal of Management Science, forthcoming,

26 G. Wright~ P. Ayton / Eliciting and Modelling Expert Knowledge

Berry, D.C. and D.E. Broadbent, On the relationship between task performance and associated verbalizable knowledge, Quarterly Journal of Experimental Psychology (1984) 36A, 209-231.

Bugelski, B.R. and D.P. Sharlock, An experimental demonstra- tion of unconscious nediated association, Journal of Ex- perimental Psychology (1952) 44, 334-338.

Dawes, R.M., Graduate admission variables and future success, Science (1975), 187, 721-743.

Dawes, R.M. and B. Corrigan, Linear models in decision-mak- ing, Psychological Bulletin (1974) 81, 95-106.

de Groot, A.D., Thought and choice in chase (Mouton, The Hague, 1965).

Duda, T.O. and J.G. Gaschnig, Knowledge-based expert sys- tems come of age, Byte (1981) 9, 238-281.

Duda, R.O. and E.H. Shortcliffe, Expert systems research, Science (1983) 220, 261-268.

Einhorn, H.J., Expert measurement and mechanical combina- tion, Organizational Behavior and Human Performance (1972) 7, 86-106.

Ericsson~ K.A. and H.A. Siinon, Verbal reports as data, Psy- chological Review (1980) 87, 215-251.

Feigenbaum, E.A., Themes and case studies in knowledge engineering, in: D. Michie, ed., Expert systems in the micro.electronic age (Edinburgh University, Press, 1979).

Goldberg, L.R., Diagnosticians versus diagnostic signs: The diagnosis of psychosis versus neuroses from the MMPI, Psychological Monographs (1965) 79, 602-643.

Goodman, B.C., Direct estimation procedures for eliciting judgement about uncertain events, Engineering Psychology Technical Report (University of Michigan, 1973).

Hays-Roth, F., D.A. Waterman and D.B. Lenat, eds., Building Expert Systems (Addison Wesley, Reading, MA 1983).

Joseph, E.C., Defense computer and software - what's ahead for AI? Concepts (1982) 5, 141-147.

Lichtenstein, S. and J.R. Newman, Empirical scaling of corn-

mon verbal phrases associated with numerical probabilities, Psychonomic Science (1967) 9, 563-564.

Meehl, P.E., A comparison of clinicians with five statistical methods of identying psychotic MMPI profiles, Journal of Counselling Psychology (1959) 6, 102-122.

Michie, D., S. Muggleton, C. Reise and S. Z.uhrick, RULEMASTER: A second generation knowledge en- gineering facility, Radian Technical Report MI-R-623 (Radian Corporation, Austin, TX, Dec. 1984).

Myers, C.D., J. Fox, S.M. Pegram and M.F. Greaves, Knowl- edge acquisition for expert systems: experience using Emycin for leukaemia diagnosis, Proceedings of Expert Systems 83 (1983).

Nisbett, R.E. and T.D. Wilson, Telling more than we can known: Verbal reports on mental processes, Psychological Review (1977) 84, 231-259.

Pauker, S.G., G.A. Gorry, J.P. Kassirer and M.D. Schwartz, Towards the simulation of clinical cognition, The American Journal of Medicine (1976) 60, 981-998.

Reboh, R., The knowledge acquisition system, in: R.O. Duda, ed., A computer based consultant for mineral exploration, Final Report, SRI International, Project 6415 (Menlo Park, CA, 1979).

Roth, B., The effect of overt verbalisation on problem solving, Dissertation Abstracts (1966) 27, 957B.

Swaine, M., Knowledge engineers' handcraft diagnostic soft- ware, lnfoworld (1983) 5, 11-12.

Whalley, P.C., The psychology of similarity, Unpublished PhD. dissertation (Open University, 1984).

Wilkins, D.C., B.G. Buchanan and W.J. Clancey, Inferring an expert's reasoning by watching, Proceedings of the 1984 conference on Intelligent Systems and Machines (1984).

Wright, G. and P. Ayton, The psychology of forecasting, Futures (1986) 18, 420-439.

Wright, G., P. Ayton and P. Whalley, A general-purpose computer aid to judgemental forecasting: Rationale and procedures, Decision Support Systems (1985) 1, 333-340.