Forecasts. decisions and uncertain probabilities

23
PETER GARDENFORS FORECASTS, DECISIONS AND UNCERTAIN PROBABILITIES* 1. INTRODUCTION that an error is made One of the aims of precise with the aid probabilities. The main reason for making forecasts is that the information provided by forecasts is considered to be an important factor in rational decision making. All decisions are based on some form of prediction since the choice among the available alternatives is dependent on the possible consequences of the different alternatives. The consequences of an action which is not yet performed cannot be known, but they must be predicted. These more or less trivial observations show that there is an intimate connection between forecasts and decisions. However, this connection is to a large extent ignored in the traditional theories of decision. 1 In the most well-known models for decision making no regard is paid to how the predictions of the consequences of the alternatives are made. In order to apply one of the models it is only demanded that there is some prediction of the consequences of the alternatives, not necessarily the best available forecast. Nevertheless, we judge, intuitively, that a decision which is made on the basis of a worked through forecast is better than a decision in the same situation which is based on a mere guess. The explanation of this judgement which comes first to hand is that when a decision is made on the basis of the information provided by serious forecasts, the risk is smaller than when the decision is based on guesses. this paper is to show how this idea can be made of an analysis of different types of knowledge of I will next present an example which will serve as an introduction to the problem area of this paper. Suppose I have an important meeting in Stockholm next Thursday. I can choose between going by train or by air from Malm~5 to Stockholm. My trip will be paid, so the price is of no importance for my decision. On the other hand, I find it much more convenient to go by air than to go by Erkenntnis 14 (1979) 159-181. 0165-0106/79/0142-0159 $02.30 Copyright 9 1979 by D. Reidel Publishing Co., Dordrecht, Holland, and Boston, U.S.A.

Transcript of Forecasts. decisions and uncertain probabilities

PETER GARDENFORS

F O R E C A S T S , D E C I S I O N S A N D U N C E R T A I N

P R O B A B I L I T I E S *

1. INTRODUCTION

that an error is made

One of the aims of

precise with the aid

probabilities.

The main reason for making forecasts is that the information provided by

forecasts is considered to be an important factor in rational decision

making. All decisions are based on some form of prediction since the

choice among the available alternatives is dependent on the possible

consequences of the different alternatives. The consequences of an

action which is not yet performed cannot be known, but they must be

predicted. These more or less trivial observations show that there is an

intimate connection between forecasts and decisions. However, this

connection is to a large extent ignored in the traditional theories of

decision. 1

In the most well-known models for decision making no regard is paid

to how the predictions of the consequences of the alternatives are made. In

order to apply one of the models it is only demanded that there is some

prediction of the consequences of the alternatives, not necessarily the best

available forecast. Nevertheless, we judge, intuitively, that a decision which

is made on the basis of a worked through forecast is better than a decision

in the same situation which is based on a mere guess. The explanation of

this judgement which comes first to hand is that when a decision is made

on the basis of the information provided by serious forecasts, the risk

is smaller than when the decision is based on guesses.

this paper is to show how this idea can be made

of an analysis of different types of knowledge of

I will next present an example which will serve as an introduction to the

problem area of this paper. Suppose I have an important meeting in Stockholm next Thursday. I

can choose between going by train or by air from Malm~5 to Stockholm.

My trip will be paid, so the price is of no importance for my decision. On the other hand, I find it much more convenient to go by air than to go by

Erkenntnis 14 (1979) 159-181. 0165-0106/79/0142-0159 $02.30 Copyright �9 1979 by D. Reidel Publishing Co., Dordrecht, Holland, and Boston, U.S.A.

160 PETER G.~RDENFORS

train, mainly because if I go by train I have to start from Malm6 Wednes- day evening, while if I go by air I can take the first plane on Thursday morning. What makes my decision non-trivial is that the risk of not coming to Stockholm in time for the meeting is greater if I go by air than if I go by train. The proposition that the train will be in time will be represented by p, and the proposition that the plane will be in time will be represented by q. I assume that my utilities of the different outcomes are as follows:

I go by train I go by air

S1 $2 S 3 $4

P & q P & - q - p & q - p & - q

6 6 0 0 10 0 10 0

The four relevant hypotheses about the future are represented by sl, . . . , s4. The determining factor of my decision is how probable each of these states are. Suppose that I judge that the probability that the train will be so much delayed that I will not be in time for the meeting, i.e. not-p, is about 0.01. Suppose further that I do no know very much about how often the planes for Stockholm are delayed, but, if someone asked me what my estimate of the probability of not-q is, I would say 0.10.

Using this information it is now possible to apply the most well-known of the traditional decision principles - maximizing expected utility. This principle is also known as Bayes' rule. 2 The expected utility of taking the train is close to 6, while the expected utility of going by air is 9 and hence Bayes' rule recommends me to go by air.

However, in this decision situation I may very well choose to go by train although the expected utility is smaller. The primary reason is that I do not dare taking the risk that the plane will be delayed, Even though I judge the probability that the plane will be in Stockholm in time to be 0.90, my knowledge about the reliability of flights is very scanty and therefore my judgement of the probability is very uncertain. Relative to what I know it is likely that further knowledge will radically change the probability that the plane will be in time.

Let us next assume that I meet a pilot who frequently flies from Malmi5

F O R E C A S T S 161

to Stockholm, who has heard the weather forecast for the next Thursday and who in general is well informed about the air service. At my request

he answers that he judges the probability that my plane will be in time for my meeting to be 0.85. This information will diminish my expected utility of going by air to Stockholm, when compared with my earlier judgement, but, nevertheless, it does not seem irrational if I now choose to

go by air instead of by train, since I now know with greater certainty that the probability that the plane will be in time is high. If I choose to go by

air, I think I can easier defend my choice after obtaining the pilot's

information than before. The purpose of this example is to show that when applying the traditional

version of Bayes' rule, the degree of certainty of the forecasts is not taken into consideration and that such considerations may be of practical im- portance. In the sequel I will formulate decision principles which pay regard to the uncertainty in the information provided by forecasts. As a

preparation for this I will first outline the foundations of the traditional theories of decision. I will also discuss the value of forecasts for decisions

which are made by maximizing expected utility.

2. AN O U T L I N E OF D E C I S I O N T H E O R Y

A decision, in its most general form, is a choice of one of the alternatives

available in a given situation. For simplicity, I will assume that in any decision situation there is a finite set of alternatives al, a2 . . . . . a, to choose between. In many decision situations the alternatives can be conceived of as acts, from among which the decision maker has to perform one. In such cases a rational decision maker has to deliberate upon the possible outcomes or consequences of the available acts to assure himself that he

chooses the "best" act. Though the decision maker presumably has some control over the

variables which determine the outcome, he does not in general have complete control. I will assume that the uncertainty as to the outcomes of an act can be described by referring to different states of nature (or just states, for brevity). These states are assumed to represent different hypo- theses about what is in fact true about the world, now and in the future. We can, of course, imagine an infinite number of future developments, but, for a given decision problem, only certain aspects of the states are relevant.

162 PETER G ARDENFORS

I will assume that, in any given decision situation, only a finite number of

types of states are relevant to the decision, a These will be denoted

S 1 , $ 2 , . . . , S i n .

The outcome of an alternative a~, if the true state of nature is sj, will

be denoted o~j. An important factor when making a decision is how the decision maker values the outcomes. Normally it is assumed that this

valuation can be represented by a utility measure U which measures the decision maker's utility of the different outcomes. 4 We will denote the utility of the outcome o,j by u,j.

Since it is assumed that all information on how the decision maker values the outcomes is summarized by the utility measure, we can describe a decision situation by a so-called decision matrix in the following way:

a l

a 2

an

S 1 $2 �9 . . S m

Ull U12 �9 . . U l m

U21 U22 �9 . . U2m

: :

U n l Un2 �9 . . Unto

A final factor for the decision situation is what knowledge the decision maker has about which of the states of nature is the true state. To start with we can distinguish between two extreme cases:

(i) The decision maker has no information (relevant to the decision problem) about the true state of nature.

(ii) The decision maker has full information, in the sense that there is a subjective probability measure P such that P(s/at) represents the probability

for the decision maker that sj is the true state, if the alternative at is chosen.

It may happen that the probability distribution P gives the decision maker enough information to determine that each alternative, inde- pendently of the state of nature, leads to a specific outcome. If this much information is assumed, it is commonly said that the decision is made under certainty. In all other cases of (ii), i.e. when there are some alterna- tives for which several outcomes are possible, we say that the decision is made under risk. Finally, if (i) obtains, the decision is traditionally said to be made under uncertainty.

F O R E C A S T S 163

It is often assumed that the alternative chosen does not have any

influence on how the decision maker judges which state is the true

state of nature. In the case of decision making under risk, we Will then

have P(sk/a~) = P(s~/aj) for all alternatives ai and aj, and we need not refer

to which alternative is chosen when determining the probability of a particular state.

Since different forecasts correspond to different states of nature, we

can use the probability measure P which is assumed to exist in a decision

situation under risk to introduce the concepts of self-verifying and self-

falsifying forecasts. Assume that one of the alternatives, say a0, represents

status quo, which means that if that alternative is performed, then the

present state of affairs will not be changed in any relevant aspect. Assume

further that it is predicted that st is the true state of nature. Let aj be the

alternative which is best ifsi is the true state, i.e., the alternative that maxi-

mizes the utility of the outcomes in s~'s column in the decision matrix. I f now

P(s~/aj) > P(s/ao), then the forecast si is self-verifying, and if P(s/aj) < P(si/ao), then s~ is self-falsifying.

From the classification presented earlier we see that the traditional

decision theory considers only two extreme cases of information about

the states of nature. Between the two extremes (i) and (ii) there is a vaste

area of different degrees of partial information. It would be a fiction not to

admit that most practical decision problems fall within this area. In most

cases the decision maker has some information, which is relevant for the

decision problem, about the probabilities of the different states, but it is

almost only in games with coins and dice that he can reasonably be said

to have full information.

However, very little seems to be known about decision principles

which can be applied when one has some but not full information about

the probabilities of the states of nature. In the sequel I will discuss how

this kind of intermediate knowledge can be represented and how it can be

used when formulating decision principles. But first I will discuss the

value of forecasts for decisions made under risk and under uncertainty.

3. D E C I S I O N M A K I N G U N D E R R I S K

The most well-known decision rule for decision making under risk is Bayes' rule, i.e. the rule which says that, in a given decision situation, the

164 PETER GA.RDENFORS

alternative ought to be chosen which has the greatest expected utility. The expected utility of an alternative aj is defined as ~ ui~. P(sdaj), where P is the probability measure which is assumed to represent the knowledge in the decision situation about the states of nature.

Those who adhere to the Bayesian tradition claim that an individual's knowledge in a given situation can be represented by a probability measure

which, for all propositions, determines the individual's degree of belief. In the literature one can find a number of axiomatizations of the notion of degree of belief which result in a unique probability measure in each knowledge situation. If any of these axiomatizations is accepted, then

decisions under risk will always be possible and, furthermore, we need never consider any other kind of individual decision making.

If new knowledge is obtained, say if the individual comes to know

that p is true, then it assumed that the probability measure P' in the new knowledge situation is determined from the old measure P by conditionalization, so that, for any proposition q, P'(q) = p(q/p) 5 Thus the probability measure in a given knowledge situation does not provide imperturbable probabilities, but the degree of belief in a proposition

normally changes as further knowledge is gained. When a decision maker applies Bayes' rule he is expected to use the probability measure in his actual knowledge situation, not a probability measure connected with an

earlier knowledge situation and not a measure which would represent a future knowledge situation where his wishes have come out true.

Using Bayes' rule as a starting point, I now turn to the question why the use of forecasts and other kinds of new information result in better decisions. An interesting answer has been given by Good [6]. I will here outline his result and comment upon the presuppositions for this result. 6

Suppose that, in a given decision situation, the available alternatives are

al, a2, �9 �9 a, and that the relevant states of nature are sl, s2, � 9 sin. Let the utility that results frorn aj under circumstances s~ be uj~. Good assumes that the probabilities P(s~) of the states of nature are independent of which alternative is chosen. The decision maker considers obtaining new information, e.g. by making a particular experiment, in order to learn more about the states of nature. Let the possible outcomes of the planned experiment be el, e2 . . . . . er, and let the prior probabilities of the outcome ek be P(e~).

I f now the decision maker performs the experiment and the outcome

F O R E C A S T S 165

is ek, then, according to Bayes' rule, the alternative which, given the new knowledge that the outcome was e~, has the maximal expected utility ought to be chosen. The maximal expected utility can be written

(1) max ~ uj~. P(s/ek). J

Since the prior probability of the outcome ek is P(e~), deciding to perform the experiment and to take the result into account yields the expectation

(2) ~ P(ek). max~u,,.P(s,/ek).j

This expression can be rewritten, by elementary probability calculus, as

(3) ~max~uj,.P(s,).P(ek/si).

Good shows that (3), which represents the expected maximal expected utility after the experiment is performed, cannot be smaller than

(4) max y uj~. P(s~) j "~.

which represents the maximal expected utility of the alternatives before the experiment is performed. Furthermore (3) and (4) are equal only when the alternative recommended by Bayes' rule is the same irrespective of which of the outcomes ek actually occurs.

Note that this result does not imply that the maximal expected utility after the experiment is performed will always be at least as large as the maximal expected utility before the experiment, but only that the expected maximal expected utility will be at least as large. Thus it may happen that when the experiment is performed the maximal expected utility is smaller than it was before.

Good's result can also be applied to forecasts. A forecast can be seen as a particular kind of experiment, the possible outcomes of which are determined by the range of the possible result of the forecasts. In this sense, Good's result shows that one can expect a decision made on the basis of a forecast to be better than a decision made without it.

The assumption that the probabilities of the states of nature are inde- pendent of which alternative was chosen is not necessary for the proof. Good does not mention this, but the proof can easily be generalized. 7

Another assumption, which is implicit in the argument above but which

166 PETER GARDENFORS

must be taken into consideration when valuing Good's result, is that the

cost of experimentation is zero, or at least negligeable. For practical decision problems this assumption is almost always false. There are situations where it is no longer worth while to try to acquire more informa- tion before making the decision, since the cost of a relevant experiment or

an improved forecast by far exceeds the profit to be expected in form of a better decision. A difficult problem is to determine how much information the decision maker ought to obtain before acting. Decisions sometimes turn out not to be the best possible, but we can hardly accuse a decision maker of having made a mistake, if he has informed himself to the utmost that he is capable. 8

In practice a weighing must be made to determine what part of the

possibly available information to obtain. The information which is

judged to be most relevant to the decision under consideration is, of course, most important to acquire. Only seldomly can the decision maker obtain complete knowledge of the consequences of an action, but he can have well-informed ignorance. 9

Although Bayes' rule is called a decision principle for decision making under risk, its main shortcoming is, in my opinion, that it does not pay enough attention to the risks of the decisions. As I tried to show by the

introductory example, there are situations where the (subjective) probability of the states of nature can be estimated, but where the alternative which has the largest expected utility nevertheless is considered non-optimal.

If one was certain of the probabilities which are used when computing the expected utility, in the sense that no further available information

would alter one's judgement of the probabilities, then Bayes' rule would be much more reasonable. But it is almost only in parlour games that we can be said to be certain of the probabilities in this sense, and in most

situations we can easily imagine further information that would radically change our judgement of the probabilities of the states. In this sense the expected utility of an alternative is risky. And people seem to have an inclination to avoid taking risks.

In the variant of Bayesianism which is advocated by Ramsey and de Finetti, it is assumed that an individual's subjective probability of a state of nature can be determined with the aid of his inclination to accept bets concerning the state. 1~ If he accepts a bet that the true state of nature is s~ at odds of 1 : 4, then this is taken to imply that he estimates that probability ofs~

F O R E C A S T S 167

to be at least 0.2. It is then shown that if it is not possible to construct a

bet where the individual will lose money no matter which state turns out to

be true, then there is a unique probability measure that describes the

individual's degree of belief in the different states. 11

However, in order to derive this conclusion it must be assumed that if

an individual is not willing to bet on s~ at odds of a: b, then he should be

willing to bet on not-s~ at odds of b:a . But this assumption makes too

heavy demands on people's willingness to make bets. One is often neither

willing to accept a bet on s~ at odds of a: b, nor willing to bet on not-s~ at

opposite odds, simply because one is not willing to risk anything. In my

opinion, this can to a large extent be explained by the fact that one is

uncertain of the probabilities of the different states of nature.

The criticism presented here is directed at Ramsey's and de Finetti's

axiomatizations of subjective probability, but similar criticism can be

constructed against other axiomatizations, lz

4. D E C I S I O N M A K I N G U N D E R U N C E R T A I N T Y

The characteristic assumption for decision making under uncertainty is

that the decision maker has no knowledge which is relevant for determining

which state of nature is the true state. This assumption is, of course,

almost always false. Nevertheless, the information one has about the

states of nature is sometimes so meagre and vague that it is relieving to

apply a decision principle where this knowledge need not be considered.

The most common principles for decision making under uncertainty are

presented in Luce and Raiffa [16], ch. 13. Here I will only mention the most

well-known - the maximin criterion. When applying this principle to a

choice between the alternatives, al, a2 . . . . . a , in a decision situation, one

first looks for, for every alternative at, the state for which the outcome of

a, will be the worst. The utility of this outcome is called the security level of at. Then that alternative is chosen which has the largest security level.

Thus the optimal alternative, according to the maximin criterion, is the

one with the best worst outcome. A defining property of all principles for decision making under uncer-

tainty is that no regard is paid to any kind of knowledge one may

have about the possible states of nature. Consequently, the information

provided by forecasts or experiments cannot be used for this kind of

168 PETER GARDENFORS

decisions. We conclude that within the theory of decision making under un- certainty it is not possible to explain why a decision which is based on

carefully prepared forecasts is better than a decision based on mere guesses. This argument does not prevent that it may be rational, in some decision

situations, to apply a principle for decision making under uncertainty, but it only shows that such principles are irrelevant, if forecasts are to be

exploited for the decision. I have now discussed the parts of decision theory which are based on

extreme assumptions of the knowledge about the states, viz. full knowledge

of the probabilities of the states and no relevant knowledge. Both these extreme cases are unrealistic in most decision situations - normally the

decision maker has some knowledge about the states of nature but is not willing to ascribe definite probabilities to them. Such situations could be

classified as decision making under uncertain risk. A central question is now how such partial knowledge can be described. In the sequel I will suggest

two kinds of descriptions and discuss how they can be used when

formulating decision principles. I will also try to show how the information provided by forecasts can be used to reach better decisions.

5. PROBABILITY INTERVALS

The first way of describing partial knowledge about the states of nature I

will examine is to associate with each state a subjective probability interval. The intended meaning of the interval is that the knowledge available to the decision maker entails that the "true" probability of the state is contained in the interval and that no narrower interval is justified. The interval

associated with a state si will be denoted (P+(s0, P+(s0), where 0 ~< P+(s,) ~< P+(st) ~< 1.13

The width of the probability interval associated with a state st can be taken as a measure of the decision maker's degree of ignorance of s~. For example, let s~ be the state that the next ball drawn from a given urn will be red. If one knows that there are 50 red and 50 green balls in the urn and that the drawing is random, then it is natural to associate with si a small interval around 0.5, perhaps the point 0.5 can be regarded as a minimal interval in itself. If one, on the other hand, only knows that the next ball is drawn at random but nothing about the proportions of red and green

FORECASTS 169

balls in the urn, then the representation of this knowledge concerning s~

will be a very wide interval, maybe even the entire interval (0, 1).

Certain restrictions must be observed in order to make the assignment of intervals consistent. If there are only two possible states of nature sl and s2, then it must be case that P+(s~) = 1 - P+(s2) and P+(s~) = 1 - P+(s2) in order to maintain the interpretation that every value within the interval

is a possible probability of the state. Generally, it must be required that, for every number x within the interval (P + (s~), P + (s,)), there be a combination

of numbers, which lie within the intervals associated with the remaining

states, such that the sum of x and these numbers equal 1. The probability intervals associated with the states can be used to

define a generalization of Bayes' rule where the risk of uncertain probability judgements is taken into consideration. The minimal expected utility of an

alternative a~ in a decision situation with states s~, s2 . . . . ,Sm is defined in the following way: Compute the expected utility, in the ordinary sense,

for every m-tuple xl, x2, . . . , xm such that x~ + x2 + " " Xm = 1 and where each x~ lies within the interval (P +(s~), P+(s~)). The minimal expected utility is then the smallest of these "possible" expected utilities. The

principle for decision making under uncertain risk, i.e. for decision situations where the knowledge about the states of nature can be repre-

sented by probability intervals, can be formulated as follows: The maximin criterion for expected utilities: The alternative with the

largest minimal expected utility ought to be chosen.

In order to illustrate how the minimal expected utilities are computed,

I will use the example presented in the introductory section. Let the

probability intervals associated with the states sl . . . . . s4 be as shown in the following matrix:

State: sl s2 sa s4 Description of state: p & q p & - q - p & q - p & - q

Probability interval: (0.45, 1.0) (0, 0.45) (0, 0.05) (0, 0.05)

al : I go by train 6 6 0 0 a2: I go by air 10 0 10 0

The minimal expected utility of al is obtained when the probabilities of the worst outcomes, viz. sa and s~, are chosen as large as possible, in this

170 PETER G,~RDENFORS

case 0.05 each, and the probabilities of sl and s2 are chosen so that their sum becomes 1 - (0.05 + 0.05) = 0.9. The minimal expected utility of

al is thus 6.0.9 + 0.0.1 = 5.4. In a similar way the minimal expected

utility of a2 is obtained if the probability of s2 is chosen as 0.45, the probability of s4 as 0.05, and the probabilities of sl and s3 so that their sum is 0.5. The minimal expected utility then becomes 10.0.5 + 0.0.5 -- 5.0.

According to the maximin criterion for expected utilities, applied to this decision situation, I ought to go by train.

Let us now assume, for the continuation of the example, that when I have met the pilot, who gives me the information mentioned earlier, the

probability intervals associated with the states are changed so I now believe that sl should be associated with (0.8, 0.9), s2 with (0, 0.2), sa with (0, 0.08), and s4 with (0, 0.02). When the minimal expected utilities are computed

one obtains 6.0.9 + 0.0.1 = 5.4 for al, which is the same value as earlier, and 10.0.78 + 0.0.22 = 7.8 for a2. In this situation, I should choose a2, to go by air, according to the suggested decision principle.

(It should be noted that the probability intervals used in this example do not exclude that my subjective probabilities, if I am forced to state

definite numbers instead of intervals, are 0.99 for p both before and after

the pilot's information and 0.9 for q before and 0.85 after the information.) It can be shown that both Bayes' rule for decision making under risk

and the maximin criterion for decision making under uncertainty are special cases of the maximin criterion for expected utility. Having full

knowledge of the probability of a state implies that the associated prob- ability interval is minimal, i.e. a single point. Hence, having full knowledge of the probabilities of all states implies that there is only one "possible" expected utility which then, automatically, is the minimal expected utility.

Thus, in this special case the maximin criterion for expected utility reduces to Bayes' rule.

On the other hand, having no relevant knowledge about the probabilities of the states of nature implies that the associated probability intervals are of maximal width, i.e. the entire interval (0, 1) is associated with all states. Hence, the minimal expected utility of an alternative is obtained when the security level of the alternative is taken to have the probability 1. This is just another way of formulating the maximin criterion for decision making under uncertainty.

I have now outlined a principle for decision making under uncertain

FORECASTS 171

risk, which, in a sense, covers the area between decision making under

risk and decision making under uncertainty. Starting out from this decision principle, I will next return to the question why decisions based on forecasts and other kinds of information are better than decisions

made without this information. First, the interpretation of the probability intervals must be somewhat

more elaborated. There are mainly two ways of interpreting the interval

associated with a state: (i) as a kind of confidence interval - one is ahnost

certain that, given the available information about the state, the probability

of the state lies within the interval, but it is not excluded that further information, which is possible to obtain, may show that the probability

nevertheless is outside the interval; (ii) as certain knowledge one knows

that the probability of the state lies within the interval, no further informa-

tion can change this, but further knowledge only has the effect that the possible probability interval is diminished.

The most natural of these interpretations is (i) and this interpretation

must be used in practical applications. The degree of accuracy to be

demanded is dependent on how serious the decision to be made is - the

more dangerous consequences of an erroneous decision, the wider prob-

ability intervals should be assigned to the states. But let us, for the moment, accept (ii) as the correct interpretation.

There is then a very simple explanation of why more information yield

better decisions. Obtaining more relevant knowledge about the states

normally has the effect that the probability intervals associated with the states will shrink, in any case they will not become wider. And as soon as a probability interval is diminished, a number of possible expected

utilities will be excluded. Normally this implies that the minimal expected utility of each one of the alternatives becomes larger, in any case it cannot become smaller. The result is that the alternative which is chosen according

to the maximin criterion for expected utility when the extra information is taken into account will have at least as large minimal expected utility as

the alternative which is chosen when the information is not taken into account, and, normally, the maximal minimal expected utility will be larger. This can be compared with Good's result for decision making under risk, where only the expected value of the maximal expected utility was increased when new information was obtained, not the expected utility of each alternative.

172 PETER G,~RDENFORS

As I said before, the interpretation (ii) is an idealization. If the interpre- tation (i) is accepted instead, then the preceding argument cannot be carried through with full rigor, since it may happen, although it is im-

probable, that further information changes the probability intervals so that

the new intervals are not covered by the old ones. Nevertheless, I believe that the argument above gives a good picture of why we consider further information to yield better decisions.

One may ask whether it is possible to carry through an argument for

the maximin criterion for expected utilities corresponding to Good's result for Bayes' rule. In order to do this it is necessary to have rules for

computing conditional probability intervals. There are some suggestions for such rules in the literature, but all of them seem to be connected with serious problems? 4 Since an examination of these rules would lead us far

away, I will not pursue this theme here.

6. PROBABILITY D I S T R I B U T I O N S

In the preceding section probability intervals were introduced as a way

of representing the uncertainty of the probabilities of the states. It may

happen that the information the decision maker has about the states makes it possible to say something more about the probability of the states than that it lies within a certain interval, for example that it is more likely

that the probability of the state lies between 0.5 and 0.6 than that it lies between 0.4 and 0,5. The probability intervals are thus only a rough way of representing the decision maker's uncertain knowledge about the states

of nature. A more general way is to associate with each state a probability distribu-

tion which, for every probability value between 0 and 1, describes how likely it is that the probability of the state has this value. 15 In this way we will talk about the probability of the probability - a second order prob- ability.

Probability distributions can be represented by diagrams as shown on the the following page.

Returning to the introductory example, Figure (i) can be taken to repre- sent my personal probability distribution, before I have met the pilot, of the state that the plane will be in time, and (ii) can be taken to represent the distribution after I have obtained the information from the pilot.

FORECASTS 173

second order probability

second order probability

0 0.9 1 0 probability of the state

(i)

0.85 probability of the state

(ii)

For any interval (a, b) which is a part of (0, 1), the part of the area

under the curve which lies between a and b is a measure of the (second

order) probability that the true probability of the state lies within the

interval (a, b). The area under the entire curves is always the same - what

differs is how this area is distributed over the interval (0, 1). The more

uncertain the decision maker is of the probability of the state, the flatter

is the curve; and the more certain he is of the correct probability of the

state, the more of the area is concentrated around this value.

In (i), half of the area under the curve is on each side of 0.9, and in (ii)

half of the area is on each side of 0.85. In this sense these values represent

the "expected" mean values of the distributions and these are the numbers

I would report if asked of my estimation of the probability that the plane

will be in time. It is possible to introduce a kind of "confidence intervals" for the

probability distributions. I f the decision maker wants to be 90~ certain

that his judgement is correct, then he should choose an interval such

that 9070 of the area lies within this interval and such that the expected mean value of the probability distribution divides this area in equal parts.

In this way one obtains probability intervals of the kind that was studied

in the preceding section. The more certain the decision maker wishes to

be, the wider will the corresponding confidence interval be. With the aid of these confidence intervals, we can now define a class of

decision principles. These principles can be seen as generalizations of the maximin criterion for expected utilities which was formulated for the case when the uncertain probabilities were represented by probability

intervals. For each c~ such that 0 ~< c~ ~< 1, we can formulate the c~-maxhnin

174 PETER G.~RDENFORS

criterion for expected utilities in the following way: For each state s~,

associated with a probability distribution, pick the confidence interval

(P+(sJ, P+(sJ) such that the second order probability that the true

probability lies within this interval is ~. From these intervals the winning

alternative is then determined by the maximin criterion for expected utilities as presented earlier. 16

The number a can be seen as an index of the degree of "risk aversion"

of the decision maker. If a = 0, then the ~-maximin criterion is equivalent

to Bayes' rule since each confidence interval is, by definition, centered

around the mean of the probability distribution, If ~ = 1 and the prob-

ability distributions for the states are non-zero everywhere in the interval

(0, 1), then the ~-maximin criterion reduces to the ordinary maximin

criterion. However, if the entire area of a probability distribution lies

within a smaller interval than (0, 1), then the ~-maximin criterion does

not in general yield the same decision as the ordinary maximin criterion,

even if c~ = 1. The probability distribution associated with a state will change when

new relevant information abou t the state is obtained. To a certain extent,

it is easier to determine conditional distributions than it is to determine

conditional probability intervals, but I have no general solution to the

conditionalization problem for probability distributions. 17 The normal

effect of additional information about a state is that more of the area

under the curve is concentrated to a smaller interval so that the variance

of the distribution becomes smaller. I f it is assumed that some confidence

interval is used to determine the minimal expected utility of the state, then,

in these normal cases, we arrive at the situation discussed in the preceding

section where the minimal expected utility of a state is increased when new

information is considered. There are, however, cases where further information increases the

variance of the distribution. A simple example can illustrate this situation. 18 A stranger holds a coin between his fingers in such a way that I cannot see

its faces and he asks me how I judge the probability that the coin will show heads after the next throw. Not having examined the coin I cannot exclude the possibilities that the coin is biased or a trick coin with either heads on both faces or tails on both faces. Prima facie, these possibilities are quite unlikely, and my probability distribution has the following

shape: 19

FORECASTS 175

second order probability

I 0 0.5

Now the stranger tells me that he is a salesman for trick coins. After

receiving this information (I have still not seen the faces of the coin) my

probability distribution will change to the following:

second order probability

I 0.5

It is now much more probable that the coin is a trick coin, but I have no

information that tells whether the coin has heads on both sides or tails on both sides.

Even though the second probability distribution is based on more

information relevant for the state that the next throw will show heads, the

variance of the distribution is larger. The information is to a great extent relevant for decisions about bets on the coin. For example, if I come to see that one of the faces of the coin is heads, then I will be much more willing to bet on heads after having learned that the man is a salesman for trick coins than before obtaining this information. The conclusion to be drawn from this example is that the variance of the probability distribution does

176 PETER G~,RDENFORS

not determine, in all decision situations, the uncertainty of the probability

of a state.

7. THE WEIGHT OF EVIDENCE

In the preceding sections I have outlined two answers to the question

why it is better to use as much information as possible when choosing

between the alternatives in a decision situation. It would now be interesting

to construct a measure of how much of the theoretically possible informa-

tion is used when a particular decision is made. Such a measure could be

used to estimate the reliability of a decision - the more relevant informa-

tion is used for a decision, the more reliable is the decision.

A closely related concept is introduced by Keynes in [11]: 2o

As the relevant evidence at our disposal increases, the magnitude of the probability of the argument may either decrease or increase, according as the new knowledge strengthens the unfavourable or favourable evidence; but something seems to have increased in either case - we have a more substantial basis upon which to rest our conclusion. I express this by saying that an accession of new evidence increases the weight of an argument. New evidence will sometimes decrease the probability of an argument, but it will always increase its "weight".

Here Keynes writes about the probability of an argument, while I have

been concerned with the probability of a state of nature. However, I believe that what Keynes intends to measure by the "weight" is the same

as what I am looking for, in spite of the difference in terminology.

Keynes does not provide any explicit measure of the weight of evidence,

but he confines himself to a discussion of what properties such a measure

ought to have. It is clear, however, that he conceives the weight of evidence

as being proportional to the amount of relevant information: ~1

If the new evidence is irrelevant, . . . , the weight is left unchanged. If any part of the new evidence is relevant, then the value is increased.

The traditional definition of relevance is the following: e is relevant to s, given the earlier evidence k iff P(s/e & k) r P(s/k). However, Keynes is not content with this definition since it may happen that both el and e2 are relevant to s, taken as separate piece of evidence, but the compiled evidence el & e2 is irrelevant to s. 22 He remarks that 2a

if we are to be able to treat "weight" and "relevance" as correlative terms, we must regard evidence as relevant part of which is favourable and part unfavourable, even if, taken as a whole, it leaves the probability unchanged.

FORECASTS 177

For this reason, Keynes suggests an alternative definition of the relevance

relation which satisfies this requirement. It can be shown, however, that

this definition leads to the absurd consequence that all information,

except what follows logically from what is already known, will be relevant

to any proposition whatsoever. 24 This result is not dependent on Keynes'

definition, but is a consequence of the demand that the conjunction of

two relevant propositions also be relevant. The conclusion that must be

drawn is that it is impossible to maintain Keynes' idea that additional

relevant information always increases the weight of evidence.

In [1], Carnap discusses "the problem of the reliability of a value of

degree of confirmation" which obviously is a parallel to Keynes' problem. 25

Carnap remarks that Keynes' concept of the "weight of evidence" was

forestalled by Peirce who mentioned it in [17] in the following way: 26

to express the proper state of our belief, not one number but t w o are requisite, the first depending on the inferred probability, the second on the amount of knowledge on which that probability is based.

Carnap suggests that the estimated standard error of a probability judge-

ment be used as a measure of the weight of evidence. But, as the example

in the preceding section shows, there are cases where additional informa-

tion leads to an increased estimated error, but where the reliability of the

probability judgement nevertheless is greater. Keynes also points out,

using an example similar to the one presented here, that the weight of

evidence cannot be determined by the magnitude of the probable error,

even though this magnitude is often connected with the weight. 27 It is

interesting to note that Carnap, who refers to Keynes' discussion, does

not mention Keynes' objection against using the probable error as a

measure of the weight of evidence.

Levi [14] criticises Keynes' suggestion that the weight of evidence

should be correlated to the amount of relevant information: 2a

Sometimes . . . . , an increase in the amount of relevant evidence will decrease its sufficiency. A physician might want to find out whether McX has disease D or E, which call for different therapies. Relative to the evidence available to him, he feels justified in diagnosing D. Subsequently, new evidence is obtained that casts doubt on that diagnosis, but without being decisive in favour of E. The amount of relevant evidence has increased; but would it not be plausible to say that the need for new evidence increased after the increase of relevant evidence ?

178 PETER GARDENFORS

Consequently, Levi suggests that 2a

it is preferrable to view weight of evidence not as a measure of the absolute amount of relevant evidence but as an index of the sufficiency of available evidence. Weight of evidence would then be viewed as of high value when no further evidence is needed and would fall away from that high value as the demand for new evidence increases.

Unfortunately, Levi does not discuss how to determine whether further

evidence is needed. Rather, this is the problem I wanted to solve by finding an appropriate measure of the weight of evidence.

As this presentation shows, it is generally agreed that representing the probability of a state of nature by a unique number is not sufficient when determining the reliability of a decision, but a measure o f " the weight

of evidence" is also needed, a~ How this measure should be defined is, as I have tried to show in this section, still an unsolved problem.

I believe that probability distributions provide a good way of represent- ing the knowledge one has about the states. I suggest, at least as a tempor- ary solution, that such distributions be used when estimating the reliability of a decision, instead of the single number sought for as the weight of evidence. Intuitively, flat distributions yield less reliable decisions than

peaked distributions do, even if they are multi-peaked.

8. CONCLUSION

In the traditional decision theories the role of forecasts is to a large

extent swept under the carpet. I believe that a recognition of the connec- tions between forecasts and decisions will be of benefit both for decision theory and for the art of forecasting.

In this paper I have tried to analyse which factors, apart from the utilities of the outcomes of the decision alternatives, determine the value

of a decision. I have outlined two answers to the question why a decision which is made on the basis of a forecast is better than a decision which is based on a guess. Neither of these answers is universally valid. An assump- tion which is necessary for the first answer, i.e. Good's result, is that Bayes' rule is accepted as a correct and generally applicable decision principle. The second answer, which was given with the aid of probability intervals, departed from a more general decision principle, the maximin criterion for expected utilities, which was formulated in order to evade some of the criticism against Bayes' rule. However, the argument leading

FORECASTS 179

to the answer is based on the assumption that the probabil i ty intervals

associated with the states o f nature represent certain knowledge. Fo r this

reason this answer is only approximat ively valid.

As a number of quota t ions in the section on " the weight o f evidence"

show, it is not sufficient to describe the knowledge about the states o f

nature by a single number, representing the (subjective) probabil i ty o f the

state, but something else has to be invoked which measures the amount of

informat ion on which a decision is based. Several authors have tried to

characterize this mysterious quanti ty, which here was called the weight o f

evidence. However , there seems to be little agreement as to how this

quant i ty should be measured.

University of Lund, Sweden.

NOTES

* The research for this paper has been supported by a grant from a research project on future studies sponsored by the Planning Division of the Research Institute of Swedish National Defence (FOA P). The author wishes to thank professor S/Sren Halld6n and an anonymous referee for helpful comments on the manuscript. 1 A brief discussion of the connections between decisions and forecasts can be found in Hansson [8]. 2 This decision principle will be defined and discussed in section 3. a In the sequel I will not maintain the distinction between states and types of states, but, for simplicity, refer to both concepts as "states".

Many problems are connected with the measurement of utilities. Utilities are un- certain in much the same way as probabilities are uncertain. In this paper I will, however, confine myself to a discussion of the uncertainty of probabilities and disregard the problems of determining utilities. 5 If accepting the new knowledge about p does not imply thatp obtains the probability l, then some more general procedure than conditionalization must be employed to determine the new probability measure. Cf. Jeffrey's discussion of "observation by candlelight" in [10]. 6 Good's result is also discussed in Hilpinen [9]. 7 This was pointed out to me by a referee. a Cf. Locke [15]: "He that judges without informing himself to the utmost that he is capable, cannot acquit himself of judging amiss". 9 See Halld6n [7] for a presentation of the use of "well-informed ignorance" in forecasting. lo See e.g. the papers by Ramsey [20] and de Finetti [3] in Kyburg & Smokler [13]. 11 This is often called the "Dutch Book Theorem". Cf. Kyburg [12] for a detailed discussion. 12 The most famous of these is found in Savage [22]. For further criticism of the traditional subjectivistic approach, cf. Kyburg [12].

180 PETER G.~RDENFORS

la Kyburg [12] suggests that probability intervals be used to represent degrees of belief. The mathematical theory of probability intervals has been studied by Dempster [2], Good [6], Shafer [24] and [25], and Smith [26]. 14 Cf. Shafer [24] and the following discussion. 15 This is suggssted by e.g. Rosenkrantz [21]. He rejects Kyburg's proposal that degrees of belief be measured by probabili ty intervals because "current Bayesian formulations are already more general. They not only allow confinement of one's probabili ty mass to a subspace of parameter space, but allow one to structure opinions within that subspace". Savage [22] formulates some criticism against second order probabilities. To some extent this is countered by my criticism against the traditional theories of subjective probability. 16 For the intervals determined in this way it may not be certain that for every number x within the interval (P + (sO, P + (sl)) there is a combination of numbers which lie within the intervals associated with the remaining states such that the sum o f x and these numbers equals 1. However, in order to be able to apply the maximin criterion it is sufficient that there is some m-tuple xl, x2, �9 �9 xm such that x~ 4- x2 4- �9 �9 �9 4- Xm ~ 1.

17 Rosenkrantz [21] makes use of beta densities for a particular kind of observations which enables him to compare the conditional distributions for these observations. A similar computat ion is presented in Raiffa [19], ch. 7. ~8 A similar example has been proposed by Amos Tversky. 19 I assume that heads and tails are the only relevant possibilities. 20 Keynes [11], p. 71. 21 Ibid., pp. 71-72. m Here Keynes is referring to the case when P(s/el & k) ~ P(s/k) and P(s/e2 & k) P(s/k), but P(s/el & e2 & k) = P(s/k). 2s Keynes [11], p. 72. 2~ Cf. Carnap [1], p. 420 and G~irdenfors [4]. 25 Carnap [1], pp. 554-555. 26 Peirce [17], p. 421. 27 Keynes [I1], pp. 74-76. 2a Levi [14], p. 142. 29 Ibid. ao A related topic is discussed by Popper [18], pp. 406--419. He formulates a "paradox of ideal evidence" which he takes to be a conclusive argument against the subjectivistic view of probability. For a criticism of this argument, cf. Rosenkrantz [21].

R E F E R E N C E S

[1] Carnap, R., Logical Foundations o f Probability, University of Chicago Press, Chicago, 1950.

[2] Dempster, A. P., 'Upper and Lower Probabilities Induced by a Multivalued Mapping' , Ann. Math. Stat. 38 (1967), 325-339.

[3] de Finetti, F., 'La pr~vision: ses lois logique, ses sources subjectives', Ann. Inst. Poincar~ 7 (1937), 1-68. An English version can be found in Kyburg and Smokler [13].

[4] G~denfors , P., 'On the Logic of Relevance', Synthese 37 (1978), 351-367.

FORECASTS 181

[5] Good, I. J., 'Subjective Probability as a Measure of a Non-Measurable Set', in Nagel, Suppes and Tarski (eds.), Logic Methodology and Philosophy of Science; Proc. of the 1960 Int. Congress, Stanford Univ. Press, Stanford, 1962.

[6] Good, I. J., 'On the Principle of Total Evidence', The British Journal for the Philosophy of Science 17 (1967), 319-321.

[7] Halld6n, S., 'Well-Informed Ignorance in Forecasting', in Schwarz [23]. [8] Hansson, B., 'Some Methodological Problems in Future Studies', in Schwarz

[23]. [9] Hilpinen, R., 'On the Information Provided by Observations', in Hintikka and

Suppes (eds.), Information and Inference, Reidel, Dordrecht, 1970. [10] Jeffrey, R. C., The Logic of Decision, Mcgraw-Hill, New York, 1965. [ l l] Keynes, J. M., A Treatise on Probability, Macmillan, London, 1921. [12] Kyburg, H., 'Bets and Beliefs', Amer. Phil. Quart. 5 (1967), 54-63. [13] Kyburg, H. and Smokler, H. E. (eds.), Studies in Subjective Probability, Wiley,

New York, 1964. [14] Levi, I., Gambling with Truth, Alfred A. Knopf, New York, 1967. [15] Locke, J., Essay Concerning Human Understanding, 1689. [16] Luce, R. D. and Raiffa, H., Games and Decisions, Wiley, New York, 1957. [17] Peirce, C. S., Collected Papers, ed., by Hartshorne and Weiss, Belknap Press,

Cambridge, Mass., 1932. [18] Popper, K. R., The Logic of Scientific Discovery, Basic Books, London, 1959. [19] Raiffa, H., Decision Analysis, Addison-Wesley, Reading, 1968. [20] Ramsey, F. P., 'Truth and Probability', in Kyburg and Smokler [13]. [21] Rosenkrantz, R. D., 'Probability Magic Unmasked', Philosophy of Science 40

(1973), 227-233. [22] Savage, L. J., The Foundations of Statistics, Wiley, New York, 1954. [23] Schwarz, S. (ed.), Knowledge and Concepts in Future Studies, Westview Press,

Boulder, Colorado, 1976. [24] Shafer, G., 'A Theory of Statistical Evidence', in Harper and Hooker (eds.),

Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science Vol. 11, Reidel, Dordrecht, 1976.

[25] Shafer, G., A Mathematical Theory of Evidence, Princeton Univ. Press, Princeton, 1976.

[26] Smith, C. A. B., 'Consistency in Statistical Inference and Decision', Journal of the Royal Statistical Society Ser. B 23 (1961), 1-25.

Manuscript submitted 13 August 1977 Final version received 6 March 1978