Quantum Probability and the Interpretation of Quantum Theory

45
Quantum Probability and the interpretation of Quantum Theory L. Accardi Centro Matematico V.Volterra Universit` a di Roma Torvergata, Roma, Italy Talk given at the: Centenary Session, International Statistical Institute, Amsterdam August 12–22 (1985) 1

Transcript of Quantum Probability and the Interpretation of Quantum Theory

Quantum Probability and the interpretation of Quantum TheoryL. Accardi

Centro Matematico V.VolterraUniversita di Roma Torvergata, Roma, Italy

Talk given at the: Centenary Session,International Statistical Institute, Amsterdam August 12–22 (1985)

1

Contents

1 Statement of the problems 3

2 The axioms of probability theory 5

3 Conditioning axioms 9

4 Experiments to test the validity of Bayes’s formula 14

5 The two–slit experiment 17

6 A more recent example: Bell’s inequality 20

7 Another Non- Kolmogorov model for correlations correla-tions similar to those of Corollary (5.3) 23

8 Non-Kolmogorov model for the correlation of Corollary (5.3) 23

9 The statistical invariants 24

10 The new structure axioms 27

11 The theorem of composite probabilities and the quantummeasurement process 34

12 A historical digression on the so-called paradoxes of quantummechanics 38

13 Post–scriptum (added 21–6-2013) 43

2

1 Statement of the problems

The sixth, among the twenty-three problems stated by Hilbert at the Parisinternational congress of Mathematicians in 1900 was [20] . . . to treat ax-iomatically those physical disciplines in which already todays mathematicsplays a predominant role. . . these ones are in the first place the calculus ofprobability and mechanics . . . .

Twentyfive years after Hilbert’s Paris address, the new quantum mechan-ics was discovered and it was soon clear that this theory was as much a newmechanics as a new probability calculus. Hilbert himself was immediatelyaware of these features of the new theory as well as of the changes of per-spective that this discovery caused on his sixth Paris problem. In fact hedevoted a seminar (in 1926) to the description of the mathematical structureof quantum theory, in which the non-classical features of the new probabilis-tic calculus were clearly described. The notes of this seminar, collected byvon Neumann and Nordheim, were later published in a joint paper [19]. Thusthe Kolmogorov axiomatization of the classical probabilistic model came tolight in a time (1933) when already the most advanced physical theorieswere making extensive use of a completely different mathematical formalism.The challenge posed to all probabilists by this new probability calculus waspointed out in Feynman’s communcation in the 2nd Berkley Symposium onProbability and Statistics (1954) [13].

In fact, from a theoretical point of view, it would be very unsatisfactoryfor contemporary probability theory to be unable to answer questions as thefollowing: imagine you want to handle a set of statistical data (say- transi-tion probabilities, or correlation functions, . . . ).Which type of probabilistic models will you try to fit your data?The one based on the usual Kolmogorov calculus or the one based on thenew calculus used by the quantum physicists?

Of course from an empirical point of view there is no problem: if yourdata come from some experiments on - say - populations biology, then youuse the classical model; if they come from CERN, then you use the quantumone. (Some people are perfectly happy with this level of understanding).

Another natural question is: is the new probabilistic model really neces-sary? or maybe is it the result of an historical misunderstanding, and thenew physics can be completely described within the context of the classicalprobabilistic model?

3

There is a large subtle literature on the problem.The theories which try to fit the statistical data of quantum mechanics withina classical probabilistic model, are called hidden variables theories. We willnot discuss them here.But assume that the new probabilistic model is not superfluous and has anintrinsic necessity.Then, since the qualitative mathematical features of the classical probabilis-tic calculus are entirely determined by its axioms, it follows that the newcalculus must be the expression of different axioms, and in particular thatsome axiom of the classical probability calculus must be false in the new one.In other words the Kolmogorov probabilistic model should have its parallelaxiom like the euclidean geometric model.Once the analogy between euclidean and non euclidean geometries and Kol-mogorov and Non-Kolmogorov models is accepted one can go further, andlook for statistical invariants.By analogy with geometrical invariants these would provide a rigorous math-ematical distinction between the different models, rather than an empiricalone, based on the fact that it works.Moreover, one has to single out the axioms which lead to the new proba-bilistic model or, say it again with Hilbert’s words to formulate the physicalrequirements so completely that the mathematical formalism becomes uniquelydetermined by them. . . [19].

The present paper will report on some results obtained in the last yearswhich have lead to clarify the relations among the axioms which lie at thefoundation of the classical and of the quantum probabilistic models andwhich allow to construct new classes of probabilistic models, the mathemat-ical structure of some of which has revealed some unsuspected connectionsbetween probability and geometry (c.f. [2]). Finally, the inner developmentof the quantum probabilistic models has given rise to a whole crop of newresults concerning quantum Markov chains, quantum central limit theorems,the subtle technical and conceptual problems connected with the notion ofconditioning in quantum theory, the quantum Feynman-Kac formula, quan-tum infinitely divisible processes, and the newly developed quantum stochas-tic calculus. The whole body of these results constitutes a new and rapidlydeveloping branch of probability theory: QUANTUM PROBABILITY.The above listed results shall not be discussed in the present paper. The in-terested reader is referred to the ten volumes of the QP-series which presenta fairly complete description of the state of the art of quantum probability

4

[1], . . ., [8].The main goal of the present the present paper will be a detailed analysis ofthe reasons which prove the necessity of enlarging the horizons of classi-cal probability theory by keeping into account the stimuli which arose fromquantum physics, but whose perspectives go far beyond it.

The author is grateful to G. Watson for several discussions which lead toformal and substantial improvements the present paper.

2 The axioms of probability theory

For the discussion which follows it will be of utmost importance to distin-guish:– the probabilistic (or physical) notions, such as events, proabilities, condi-tional probabilities, observables, states,. . .– from their representative in the mathematical model.In fact, following the example of geometry, our strategy to distinguish math-ematically between different proabilistic models will be:– first to show that the possibility of describing a certain family of empir-ically given and model indendant data (e.g. angles in geometry; transitionprobabilities or correlations in probability) within a given probabilistic modelimposes some constraints on these data;– then to produce examples of such experimentally given statistical datawhich do not fullfill the constraints of the classical probabilistic model.We assume as primitive the notion of event and denote them by capital let-ters A,B,C, . . ..The probability of an event A given that an event C is known to happen isdenoted P (A | C).Even if not necessary for the analysis which follows, it will be convenientto keep in mind the physical interpretation and to think of the probabilityP (A | C) as the approximate relative frequency of the event A in an esembleof systems prepared so that the event C is certainly verified for each of them.The physical counterpart of the probabilistic notion of conditioning is thatof preparation.

The axioms underlying any known probabilistic model can be subdividedinto five groups:

5

(I) Normalization axioms.

(II) Structure axioms.

(III) The finite additivity axiom.

(IV) Continuity axioms.

(V) Conditioning axioms.

In the debate on the foundations of classical probability, the continuityaxioms have been questioned, while in the debate on the foundations of quan-tum theory each of the first three groups of axioms have been questioned bydifferent authors.The fifth group of axioms doesn’t seem to have been explicitly mentioned inthe literature.It will constitute the main topic of our analysis.

The normalization axioms are common to all known probabilistic models:(I.1) For any pair of events A,C, one has:

0 ≤ P (A | C) ≤ 1

(I.2) For any event C, one has:

P (C | C) = 1

The structure axioms describe the structure of the family of events in whoseprobabilities we are interested.Here the difference between the classical and the non-classical (quantum)context begins to appear.In fact the structure axiom common to all classical probabilistic models is:

(I.C) The family of events, whose probabilities are considered, is closedunder the action of the logical connectives (conjunction, disjunction, nega-tion).

6

Here, when referring the logical connectives to events, we are implicitlyidentifying an event A with the proposition the event A happens.By Stone’s representation theorem, Postulate (II.C) uniquely determines themathematical model of the family of events as a Boolean algebra of sub-sets ofa given set and, in the correspondence between propositions and events, con-junction, disjunction and negation become respectively intersection, unionand complement. In the following we will us indifferently the logic or the settheoretical notations.

Kolmogorov used a stronger version of (II.C), namely:

(II.C1) (II.C) holds and the family of events is closed under countableunions.

The structure axioms of classical probability cannot be applied to quan-tum probability. In fact the Heisenberg principle forced us to admit thatin many physically meaningful situations, the family of events satisfies thefollowing negative postulate:

(II.Q) There exist pairs of events whose conjunction (or disjunction)does not represent an event whose physical realizability can be checked ex-perimentally.

In other terms: there are in nature pairs of events A,B such that theprobabilities P (A∩B), P (A∪B) cannot be given an experimental meaning.Such pairs of events will be called incompatible, otherwise we say they arecompatible.The existence of incompatible events does not imply, per se, the impossibilityof using the classical probabilistic model. In fact, the following a commomlyused prodcedure, one might introduce, in the mathematical formalism, someexpressions (such as P (A ∩ B) for incompatible A,B) which have no exper-imental meaning but which are used only in intermediate steps – the finalresult being expressed only in terms of experimentally measurable quantities.However we will see that the mere existence of these joint probabilities im-poses some constraints on the physically measurable statistical data: the the-ory of statistical invariants is based on the calculation of these constraints(see section (6) below).

7

There have been a large number of attempts to turn the negative postu-late (II.Q) into a positive statement which either fixes uniquely the model ofthe family of events (as axiom (II.C) does for the classical model), or at leastisolates a class of reasonable models for this family.From this line of research the theory of Quantum Logics arose, which isa structure theory for a wide class of non-distributive lattices and which,through Gleason’s theorem [21], is connected to quantum probability, roughlyas classical measure theory is connected to integration theory.The postulate of finite additivity is also common to all known probabilisticmodels. Its formulation is as follows:

(III). If two events A,B are mutually exclusive then

P (A ∪B) = P (A) + P (B)

Two events A,B are called mutually exclusive if they are compatible and ifthe fact that A is verified implies that B is not and conversely (A ∩B = ∅).

The prototype of continuity axiom is the axiom of countable additivity.We will not insist on this axiom here, since our goal is to show that theoccurence of non-Komogorovian models can be proved even in the most ele-mentary probabilsitic setting: when only a finite number of events is involved.

By a conditioning axiom we mean a rule which provides an answer toquestions of the following type:assume that, knowing that a certain event C is realized, we evaluate theprobability of the event A by the number P (A | C). How shall the probabil-ity of A be evaluated if our information is changed by the knowledge of thefact that also the event B has happened?The answer, given by classical probability to this question, is provided bythe Bayes formula which we formulate as an axiom:

(V.1) If A,B,C are three events such that A is compatible with B andB with C, and if P (B | C) 6= 0, then

P (A | B;C) =P (A ∩B | C)

P (B | C)(1)

In the probabilistic model this formula is usually taken as the definition ofconditional probability, meaning by this that the right hand side defines the

8

left hand side. There are however cases in which the right and the left handside of Bayes formula can be measured independently, i.e. by different phys-ical experiments.For example we can imagine an apparatus which produces a statistical en-semble of particles for which the event C is surely verified and then measuresthe relative frequencies of the events B an A ∩B in this ensemble.But we can also imagine another apparatus which prepares a statistical en-semble of particles for which both the events C and B are surely verified,and which then measures the relative frequencies of the event A.The former apparatus corresponds to the right hand side of Bayes formula,the latter to the left hand side.Since the two experimental situations are different, the equality of the twosides in Bayes formula in this case is certainly not a question of definitions,but of experiments.

3 Conditioning axioms

The crucial, experimentally measurable, difference between the classical andthe quantum probabilistic model appears in the conditioning axioms.In view of this we shall examine in detail the conditioning axioms of theclassical probabilistic models. In these models the conditional probability isdefined through Bayes formula (2.1), which expresses the probability of theevent A under the condition C and a subsequent condition B as a functiononly of the probability P (· | C).

We now list a set of axioms which turns out to be equivalent to Bayes’formula and which shows that this formula hides some far from obvious pos-tulates.In fact we shall see, in the following Section, that the application of this for-mula (or of some simple consequence of it ) to some statistical data arising inquantum mechanical experiments leads to results which are in contradictionwith the experiments.The first of these postulates simply asserts that, once we know that the eventB has happened, its probability becomes equal to one.This is a tautology only if the time interval between the two instants in whichthe event B is considered can be considered irrelevant to the occurence of

9

B (in physical language one could say that no interaction, relevant for theoccurence of B, has occurred).In quantum physics this is the case in the so called first kind measurements.

CA1) For every preparation C there exists a family FC of events, noncontaining ∅, called the family of conditions admissible after C, such that,for every B in FC there exists a probability measure P (· | B ; C) on thefamily F of all the events, satisfying

P (B | B ; C) = 1 (2)

Remark (1) In classical probability theory the conditions admissible aftera preparation C are those which satisfy condition (4) below.

CA2) In the notations of Condition (CA1), for every preparation C, forevery B in FC , and for every A in F such that A ⊆ B

P (A | C) = 0 =⇒ P (A | B ; C) = 0 (3)

Remark (1). Notice that Postulates (CA2) and (CA1) imply, that ifB ∈ FC then necessarily

P (B | C) > 0 (4)

This implies that our axiomatization can only consider conditioning eventsof nonzero probability. This assumption is usually done in all the standardtextbooks of probability theory. However it should be noted that, in theBayesian framework it has been developed a theory of (classical) conditioningeven under the assumption that the conditioning events have zero probability.

Remark (2). The axiom (CA2) states that if an event A has probabilityzero under a set of conditions C and if we know that the event C has hap-pened, then even if we know that some consequence B of A has happenedafter C we should still attribute zero probability to A.

This prescription is reasonable only under the assumption that the acqui-sition of information on the event B has not destroyed any of the premisesof the event C.In several situations this assumption is not warranted and in these cases thereis no reason to consider the axiom (CA2) as a truth of nature.For example von Neumann’s postulate, used in the quantum theory of mea-surement (cf. Section (10) below ), is a conditioning postulate which does

10

not satisfy the axiom (CA2) above.A simple example showing this is the following: let C2 be considered as aHilbert space with the usual scalar product denoted 〈 · , · 〉 , let χ be aunit vector in C2, let F be the partial Boolean algebra of all the orthogonalprojectors on C2 and finally let P (A | C) be the probability measure on Fdefined by

P (A | C) = 〈χ,Aχ〉 ; A ∈ F (5)

Now let X be an Hermitean matrix in C2 with eigenvalues x1 6= x2, letψ1, ψ2 be the orthonormal basis of C2 consisting of the eigenvectors of Xand let P1, P2 denote the rank one projectors corresponding respectively toψ1 and ψ2. Denoting still X the observable corresponding to the matrixX, the projection Pj corresponds to the event the value assumed by X isxj (j = 1, 2). Hence P1 ∨ P2 = 1 corresponds to the trivial event: theobservable X assumes some of its values and it is clear that for any otherevent (i.e. orthogonal projection) A one has

A ≤ P1 ∨ P2 = 1 ; ∀A ∈ Proj(C2) (6)

In particular, if we choose as A the rank one projector corresponding to avector ϕ orthogonal to χ then, according to (6), we have

P (A | C) = 〈χ,Aχ〉 = 0 (7)

and therefore, according to (CA2) and (1) one should have

P (A | P1 ∨ P2;C) = 0

(note that this is a model independent statement). But, according to vonNeumann prescription:

P (A | P1 ∨ P2 | C) = 〈ψ1, Aψ1〉· | 〈ψ1, χ〉 |2 +〈ψ2, Aψ2〉· | 〈ψ2, χ〉 |2=

=| 〈ψ1, ϕ〉 |2 · | 〈ψ1, χ〉 |2 + | 〈ψ2, ϕ〉 |2 · | 〈ψ2, χ〉 |2 6= 0 (8)

CA3) For A,B,C as in postulate (CA2)

P (A | B;C) ≥ P (A | C) (9)

Remark (3). In the logical interpretation of Boolean algebras, in whichevents are identified with properties, the inclusion relation A ⊆ B is equiv-alent to the statement that property A implies property B, i.e. that A is a

11

premise for B. With this interpretation, postulate (CA3) means that:

if an event is known to have happened, then the probability of any of itspremises increases.Also this axiom is not satisfied by von Neumann conditioning formula. Infact if in the construction of Remark (2) we choose χ, ϕ, ψ1, ψ2 so that

| 〈χ, ψj〉 |2= 1/2 ; | 〈ϕ, χ〉 |2> 1/2

then (9) is violated, since in this case the right hand side of (8) is equal to1/2 independently on ϕ, while P (A|C) =| 〈ϕ, χ〉 |2> 1/2.

CA4) For A,B,C as in postulate (CA2) the map

A ∈ F −→ gA =P (A | B;C)

P (A | C)(10)

achieves its maximum value for A = B.Remark (4) The quantity (10), always ≥ 1 because of (9), represents the

gain in probability of the event A due to the additional information that theevent B happened. Thus postulate (CA4) means that:

the maximum gain in probability of the event A due to the additionalinformation that the event B happened is achieved by those events which,under the new information, become certain.

It is however quite conceivable, even independently on quantum theory,the existence of a situation in which one starts from two events, one of highand one of low probability and, after acquisition of information, the highlyprobable event becomes certain and the event of low probability, even notbecoming certain, gains more in probability than the other one.

Remark (5). Using the construction in Remark (2) above, it is easy toconstruct examples in which gB is finite but, for appropriate A, gA can bemade arbitrarily large.

Theorem 1 In the above notations, assume that postulate (CA1) holds.Then the following statements are equivalent :i) For any preparation C the map

A,B ∈ F × FC −→ P (A | B ; C) (11)

12

satisfies the axioms (CA2), (CA3), (CA4).ii) The conditional probability P (A | B;C) is expressed through the usualBayes’ formula i.e., for every preparation C, for every B in FC, and forevery A in F such that A and B are compatible, one has

P (A | B;C) =P (A ∧B | C)

P (B | C)(12)

iii) The finite form of the theorem of composite (or total) probabilities holdsi.e., for each family

A ,B1 , . . . , Bn ; (n ∈ N)

of compatible events in F such that

Bi ∧Bj = ∅ ; i 6= j ;n∨j=1

Bj ⊇ A (13)

one has

P (A | C) =n∑j=1

P (Bj | C)P (A | Bj;C) (14)

Proof. Let us first show that (i) =⇒ (ii). Let A ∈ F and assume first thatA ⊆ B i.e., A is compatible with B and A ∧ B = A. If P (A | C) = 0,then (12) holds because of (4). If P (B \ A | C) = 0 then (12) holds withB \ A replacing A. Therefore it also holds for A since both sides of (12) area probability measure. Therefore we can assume that

P (A | C) > 0 ; P (B \ A | C) > 0 (15)

Now, denoting for A ⊆ B

gA :=P (A | B;C)

P (A | C)(16)

one has

1 = P (B | B;C) = P (A | B;C) + P (B \ A | B;C) =

gAP (A | C) + gB\AP (B \ A | C) = gBP (B | C) =

13

= gBP (A | C) + gBP (B \ A | C)

therefore(gA − gB)P (A | C) = (gB − gB\A)P (B \ A | C) (17)

but, since A,B \ A ⊆ B, (CA4) implies that the left hand side of (7) isnegative and the right hand side is positive. Hence they must be both zeroand, because of (15), this implies that

gA = gB =1

P (B | C)

Thus for any A ⊆ B,

P (A | B;C) = gAP (A | C) =P (A | C)

P (B | C)

If A is any event compatible with B, we can use the identityA = (A ∧B) ∨ (A \B) and (CA1) to deduce that

P (A | B;C) = P (A ∧B | B;C)

whence (12) follows from the first part of the proof.The implication (ii) =⇒ (iii) is trivial. The implication (iii) =⇒ (i) follows

by choosing in (14), n = 2 , B1 = B , B2 = Bc - the set theoreticalcomplement of B. In fact with this choice (14) becomes

P (A | C) = P (B | C) · P (A | B;C)

for every A ⊆ B, and from this (CA2), (CA3), (CA4) are immediately veri-fied.

4 Experiments to test the validity of Bayes’s

formula

In order to relate the two probabilities P (A | C) and P (A | B;C), it is im-portant to take into account how the information on B has been acquired;in particular to distinguish the case when the acquisition of new informationdoes not alter the previous information, from the case when it does.When speaking of the probability P (A | B;C) in a classical context, is is

14

always implicitly assumed that the new information is acquired without de-stroying the old.But quantum physicists have taught us that there are cases in which thiscannot in principle be achieved.An idealized example of this situation might be the following.

A biologist wants to study the correlation between the field of view of agiven population of animals with the chemical and the electrical activity ofthe cells of their eyes.Assume he has succeeded in preparing a sample population in such a waythat the chemical activity of the eyes is known (event C). It might well bethat, in order to aquire some information on the electrical activities of thecells (event B), he will alter in an apriori uncontrollable way their chemicalactivity.Then the information obtained by performing in series the two experiments,is not B ∩ C but only B.We express this statement with the equality:

P (A | B;C) = P (A | B)

The fact that in some cases the process of acquisition of information is notcumulative has an interesting consequence.Consider two mutually exclusive events B and Bc which exhaust all possibil-ities, i.e. such that in any measurement done to ascertain which of the twoevents B and Bc happens, one finds that at least one (and only one) of themis realized. In such a situation, according to classical probability, the eventB ∪ Bc is a trivial one, i.e. it happens with probability 1 and therefore theknowledge that it happened does not alter our information.

However, keeping in mind that the aquisition if information on B and Bc

might have altered some previous information we had on the system, we areinduced to distinguish the case (denoted B ∪ Bc) in which no experimenthas been done to discriminate between the two alternatives, from this case(denoted BV Bc) in which an experiment to discriminate between two alter-natives has been done, but we do not know the result of it.Clearly, in the case when no experiment has been done one has:

(P (A | (B ∪Bc) ∩ C) = P (A | C) (18)

15

because in this case we are merely stating a tautological fact, and neitherthe physical situation nor our information has changed.In the latter case however, even if we do not know the result of the experi-ment, at least we know that it was done, namely that at a certain momentthe particles of our experimentally prepared ensemble interacted with an ap-paratus (possibly being disturbed by this interaction).We can thus say that the act of measuring B and Bc has split the originalensemble into two: one for which Bc ∩ C is verified, the other one for whichBc ∩ C is verified. Since there are no other possibilities, it follows that, inany experiment done to verify the occurences of A, after the experiment onthe alternatives B,Bc, one must have:

N(A | (B ∪Bc) ∩ C) = N(A | B;C) +N(A | Bc ∩ C) = (19)

N(A | B;C)

N(B | C)·N(B | C) +

N(A | Bc ∩ C)

N(Bc | C)·N(B | C)

where N(A | (BV Bc) ∩ C) is the number of occurences of A in the originalensemble, N(A | B;C) is the number of occurences of A among the particlesfor which B ∩ C was known to happen, and N(B | C), N(Bc | C), N(A |Bc ∩ C) are defined in a similar way. Thus, identifying probabilities andapproximate relative frequencies (which in this case is allowed) one finds:

P (A | (B∪Bc)∩C) = P (A | B;C)·P (B | C)+P (A | Bc∩C)·P (Bc | C) (20)

Equation (20) is not postulate. It follows only from the identification of prob-abilities which relative frequencies and from the assumption that all particlesof the initial ensemble can be split into two mutually exclusive classes: onefor which the event B is verified, another in which the event Bc is verified.

The crucial fact about the identity (20) is that it involves only experi-mentally measureable quantities.This gives us the possibility to device a simple experiment to test the rangeof applicability of the classical probabilistic model.

In fact, assume that the conditional probabilities P (A | B;C), P (A |Bc ∩C) can be described according to the prescription of Bayes formula i.e.:

P (A | B;C) =P (A ∩B | C)

P (B | C);P (A | Bc ∩ C) =

P (A ∩Bc | C)

P (Bc | C)(21)

16

In this case, inserting (21) into (20), we should find:

P (A | (BV Bc)∩C) =P (A ∩B | C)

P (B | C)·P (B | C)+

P (A ∩Bc | C)

P (Bc | C)·P (Bc | C) =

= P (A | C) (22)

(Notice that, for validity of (22), it is not required that the probabilitiesP (A ∩ B | C) and P (A ∩ Bc | C) be experimentally measurable quantities:it is sufficient that two such numbers exist, that they belong to the interval[0, 1], and that the axiom of finite additivity is satisfied).

If follows that, if we can produce experimentally realizable examples ofevents A,B, Bc, C for which:

P (A | (B ∪Bc) ∩ C) 6= P (A | C) (23)

then the conditional probabilities P (A | B;C) and P (B | C) cannot berelated by the Bayes formula (21) for whatever choice of the probabilitiesP (A ∩B | C), P (A ∩Bc | C).This would therefore imply that the set of statistical data

P (A | C) , P (A | B;C) , P (A | Bc ∩ C) , P (B | C) , P (Bc | C)

cannot be described by any classical probabilistic ( or Kolmogorov) model.In the next section we describe an experiment which has been effectivelydone (many times) and which exhibits the violation of (23).

5 The two–slit experiment

This famous experiment is discussed by Feynman in his talk at the II-dBerkeley Symposium on Probability and Statistics [13] and which, accordingto Feyman [15] is formulated so to include all the mysteries of quantum me-chanics . . . .Several analyses of this experiment exist in the literature, but the only onewhich is based on an approach similar to the present one is due to Fine [16].The reason why Fine arrives to a conclusion different (in some sense oppo-site) from ours is that he considers as unknowns, in the system (4.4) below,the conditional rather than the joint probabilities.The present approach seems to be more suited to comparison with experi-ments, since the conditional probabilities can be compared with experimental

17

data.

A source S emits identically prepared particles (nowadays this experimentis done using neutrons) towards a screen Σ1 with two slits on it, denoted 1and 2.The particles which pass the screen Σ1 are collected on another screen Σ2,parallel to Σ1 and one measures the relative frequency of the particles whichhit a small region X of the screen Σ2.This measurement is done in three different physical situations:(i) with both slits 1 and 2 open;(ii) only slit 1 open;(iii) only slit 2 open.Denoting C- the event corresponding to the common preparation of the elec-trons emitted by the source S (say - a fixed value of the energy); A- the eventthat an electron falls in the region X of the screen Σ2; B- the event that theelectron passes through slit 1; and Bc - the event that the electron passesthrough slit 2, the following probabilities can be experimentally measured(by approximating them with relative frequencies):

P (A | C) , P (A | B;C) , P (A | Bc∩C) , P (A | (B∪Bc)∩C) , P (B | C) , P (Bc | C)

where B ∪Bc is defined in section (3)).Applying implicitly Bayes formula, Feynman concludes that one should have:

P (A | C) = P (B | C) · P (A | B;C) + P (Bc | C) · P (A | Bc ∩ C) (24)

But the experiments give that, for any choice of P (B | C) and P (Bc | C),one has:

P (A | C) 6= P (B | C) · P (A | B;C) + P (Bc | C) · P (A | Bc ∩ C) (25)

Moreover, in agreement with the theoretical analysis of section (3), the ex-periments give:

P (A | B∪Bc)∩C) = P (B | C) ·P (A | B;C)+P (Bc | C) · (A | Bc∩C) (26)

That is: if an experiment is done to discriminate through which of the twoslits the particles pass, then the theorem of composite probability holds.According to the analysis of section (3), the inequality (25) only means thatthe statistical data

P (A | C) , P (A | B;C) , P (A | Bc ∩ C)

18

cannot be described by a Kolmogorov model.This however is not Feynman’s conclusion. Feynman claims that the identity(24) must be true as soon as we accept that, even when nobody is lookingat them, the particles which reach the screen Σ2 either have passed throughslit 1 of through slit 2, but not both. Explicitly he says ([15], pg. 538):. . . we concluded on logical bases that, since (25) is not true [Feynman’s num-bering of the formulas, as well as his notations, are different from ours], it isnot true that the electron passes either through hole 1 or through hole 2 . . .

Our analysis in section (3) shows that this conclusion is not justified onprobabilistic grounds.Let us now show that this conclusion is not justified on mathematical grounds.Feynman’s implicit assumption in claiming that if B and Bc are disjointevents then (24) must hold, is the validity of Bayes’ formula; i.e. the exis-tence of four positive numbers (not necessarily experimentally measurable):

P (B | C) , P (Bc | C) , P (A ∩B | C) , P (A ∩BcC)

such that:P (B | C) + P (Bc | C) = 1 (27)

P (A ∩B | C) + P (A ∩Bc | C) = P (A | C)

P (A ∩B | C)

P (B | C)= P (A | B;C)

P (A ∩Bc | C)

P (Bc | C)= P (A | Bc ∩ C)

Thus Feynman’s implicit assumption is equivalent to postulating that theabove system of four equations in four unknowns always admits a solutionor equivalently, with a simple computation, that:

0 < P (A | C)− P (A | Bc ∩ C)

P (A | B;C)− P (A | Bc ∩ C)< 1 (28)

But why should it be evident . . . on a logical grounds . . . that if the eventsB and Bc are mutually exclusive then the probabilities

P (A | C) , P (A | B;C) , P (A | Bc ∩ C)

19

which have been measured in three mutually incompatible physical situa-tions, should satisfy the inequalities (28)?

Therefore the first answer provided by quantum probability to Feynman’sanalysis is:the inequality (25) means that the three probabilities P (A | C), P (A |B;C), P (A | Bc ∩ C) do not admit a Kolmogorov model and no mysteriousproperties of the particle involved are needed to justify it (c.f. the discussionin section (10) below).

The next mystery of quantum mechanics is the correct substitute for thetheorem of composite probabilities (4.1).The rules, found by the physicists, say that you have to deal with complexnumbers ψ(A | C), ψ(B | C), . . . called conditional probability amplitudes)and related to the probabilities by:

P (X | Y ) =| ψ(X | Y ) |2 ; X, Y events (29)

and these amplitudes satisfy the analogue of the relation (24), i.e.

ψ(A | C) = ψ(B | C) · ψ(A | B;C) + ψ(Bc | C) · ψ(A | Bc ∩ C) (30)

This relation is called the theorem of composite amplitudes and it is empiricalfact that, in the two slit experiment, it leads to results in agreement with theexepriment.

Can the strange probabilistic calculus based on the above prescriptions bein some sense explained starting from some more fundamental prescriptionwith an immediate probabilistic and physical meaning?Or shall we accept it as a truth of nature, something not reducible to morefundamental requirements, and which can only be used, but not explained?Sections (8) and (9) below are devoted to answering this question.

6 A more recent example: Bell’s inequality

The two slit experiment, discussed in the previous section shows how, sincethe early days of quantum theory, one came across some simple sets of sta-tistical data which could not be described by any Kolmogorov model.

20

According to the quantum probabilistic interpretation, the experimentdiscussed by Bell is simply another example of the same situation.The difference between Bell’s example and the two slit experiment, lies onlyin the fact that, in the latter, the statistical data are transition probabilities,while in the former, the statistical data which do not admit a descriptionwithin the Kolmogorov model are a set of correlation functions.

The following is a statement of the celebrated Bell’s inequality and someequivalent formulations of it, which have been discussed in the literature.The proof is trivial in the ±1–valued case, elementary in the [−1,+1]-valuedcase. The same is true for the following statement.

Theorem 2 Let A,B,C,D denote four random variables on a probabilityspace (Ω, , P ) with values in +1,−1. Then the following, equivalent, in-equalities are satisfied:

| E(A ·B)− E(B · C) |< 1− E(A · C) (31)

| E(A ·B)− E(B · C) |< 1 + E(A · C) (32)

| E(A ·B)− E(B · C) | + | E(A ·D)− E(D · C) |< 2 (33)

Notation - For any function X : Ω −→ E, we use the notation∫Ω

XdP = E(X) (34)

The relevance, of the above elementary inequalities, for the debate on thefoundations of quantum mechanics is contained in the following corollary.

Corollary (5.2) There exist triples a, b, c of unit vectors in E3 for whichit is not possible to find six random variables

Sjx(X = a, b, c ; j = 1, 2)

on some probability space (Ω,F , P ) with values in −1,+1 such that theircorrelations are given by:

E(S1x · S2

y) = −x · y ; x, y = a, b, c (35)

21

(where, for x, yE3, x · y denotes the euclidean scalar product).Proof. A consequence of (35) is that:

E(S1x · S2

x) = −x · x = − ‖ x ‖2= −1 ; x = a, b, c

and, since | S1xS

2x |= 1 this is possible if and only if S1

x = −S2x (x = a, b, c).

Thus, using the inequality (5.1), we obtain:

| E(S1aS

2b )− E(S2

bS1c ) |< 1− E(S1

aS1c ) = 1 + E(S1

aS2c )·

Or equivalenty, using (35):

| a · b− b · c |< 1− a · c (36)

Thus if we choose the three vectors a, b, c to be coplanar and such that a isperpendicular to b and c lies between a and b, forming an angle 0 with a,then the inequality (36) becomes:

sin ab+ cos bc < 1 ; 0 ≤ ab, bc ≤ π/2 (37)

But the maximum of the function

x 7→ sinx+ cosx

in the interval [0, π/2] is√

2 (obtained for x = π/4) and is equal to√

2. So,for x near to π/4, the left hand side of (37) will be near to

√2 > 1. There-

fore, for such a choice of x, the triple of unit vectors a, b, c will not satisfy theinequality (36) hence, by Theorem 31 it cannot admit any Kolmogorov model.

Corollary (5.3) There exist triples a, b, c of unit vectors in E3 for whichit is not possible to find three random variables Sa, Sb, Sc on a probabilityspace (Ω,F , p) with values in +1,−1 such that their correlations are givenby:

E(Sx · Sy) = ±x · y ; x, y = a, b, c (38)

Proof. As in Corollary (5.2).

The conclusion is exactly the same as in the two–slit experiment:you start with a set of statistical data and you show that they do not admita Kolmogorov model.The interest of the corollaries above lies in the fact that statistical correla-tions of this kind can be obtained from a small set of related experiments.

Let us now describe two simple non-Kolmogorov models which give riseto correlations similar to those described in Corollaries (5.2) and (5.3).

22

7 Another Non- Kolmogorov model for cor-

relations correlations similar to those of Corol-

lary (5.3)

Let M be the algebra of all 2× 2 matrices; denote

τ : (aij)i,j∈1,2 −→ 1/2(a11 + a22)

- the normalized trace; consider the vector space basis of M given by theidentity matrix and:

σ1 =

(0 11 0

); σ2 =

(0 −ii 0

); σ3 =

(1 00 −1

)(39)

For each unit vector a = (a1, a2, a3) ∈ R3, define

Sa = a · σ = a1σ1 + a2σ2 + a3σ3 (40)

then Sa is hermitian with eigenvalues ±1 and if a, b are two unit vectors then

τ(SaSb) = a · b = a1b1 + a2b2 + a3b3 (41)

Thus, if we represent the random variables as the 2× 2 matrices (40), theirvalues by the eigenvalues of the corresponding matrices, and the correlationfunction of two observables by the trace of the product of the correspondingmatrices, we obtain a (non Kolmogorov) model by which we can describe thecorrelations (38) not only for three given vectors, but for any triple of unitvectors.

8 Non-Kolmogorov model for the correlation

of Corollary (5.3)

Denote H the Hilbert space C2 with the usual scalar product defined withrespect to the basis:

ψ+ = (0, 1) ; ψ− = (1, 0) (42)

For a unit vector a ∈ R3, let Sa be defined by (39), (40) and define

S1a = Sa ⊗ 1 ; S2

a = 1⊗ Sa (43)

23

i.e. in (43) the 4× 4 matrices S1a, S

2a act on H⊗H and 1 is the identity 2× 2

matrix. Let now σa denote the vector in H ⊗H defined by:

σ = 1/√

2(ψ+ ⊗ ψ− − ψ− ⊗ ψ+) (44)

One then easily verifies that

〈σ, (S1a ⊗ S2

b )σ〉 = −a · b (45)

Therefore, if we represent the observables S1a, S

2b (a, b unit vectors) as in

(43), and their correlations by the left hand side of (45), we obtain a (non-Kolmogorov) model of the correlations (35) for any triple of unit vectorsa, b, c ∈ R3.

Remark 1) The term correlations, refered to expressions such as (41) or(45), is justified by the fact that the observables commute so that they admita Kolmogorov model. Moreover they have zero mean in the state σ and thisimplies that the values assumed by each observable Sx must be equiprobablein the state σ. From this it can be shown (see [9]) that expressing the lefthand sides of (41), (45) in terms of quantum mechanical transition proba-bilities (using the formalism described in section (10) below) one finds anexpression which is formally identical to the expression for the correlationsin the classical Kolmogorov models.

9 The statistical invariants

Let T be a set and, for each x ∈ T , let A(x) be an observable quantity whosevalues will be denoted a1(x), . . . , an(x)(n < +∞, for all x).We assume that the transition probabilities

P (A(y) = aj(y) | A(x) = ai(x)) = Pij(x, y) (46)

are given for each x, y ∈ T and i, j = 1, . . . , n (they are considered to be theexperimentally given statistical data).The matrix (Pij(x, y)) will be denoted P (x, y).

24

Definition 1 The family of transition probability matrices

P (x, y) : x, y ∈ T

is said to admit a Kolmogorov model if there exist:– A probability space (Ω,F , P )– For each x ∈ T a measurable partition A1(x), . . . , An(x) of Ω such that foreach x, y ∈ T and each i, j = 1, . . . , n :

Pij(x, y) =P (Ai) ∩ Aj(y))

P (Ai(x))(47)

Definition 2 The family of transition probability matrices P (x, y) : x, y ∈T is said to admit a complex n-dimensional Hilbert Space model if thereexist:(-) a complex n-dimensional Hilbert space H.(-) for each x ∈ T , an orthonormal basis φ1(x), . . . , φn(x) of H such that,for each x, y ∈ T and each i, j =], . . . , n :

Pij(x, y) =| 〈φi(x), φj(y)〉 |2 (48)

where 〈 · , · 〉 denotes the scalar product in H. In similar way one definesthe notion of real or quaternion Hilbert space model.The notion of statistical invariant is illustrated by the following theorem (con-cerning the particular case in which n = 2 and T is a set containing threeelements).

Theorem 3 Let p, q, r(0, 1) be three given numbers, and let

P =

(p 1− p

1− p p

); Q =

(q 1− q

1− q q

); R =

(r 1− r

1− r r

)(49)

be three bi-stochastic matrices. Then:i A Kolmogorov model for P,Q,R exists if and only if

| p+ q − 1 |< r < 1− | p− q | (50)

ii A complex Hilbert space model for P,Q,R exists if and only if:

−i < p+ q + r − 1

2√pqr

< 1 (51)

25

iii A real Hilbert space for P,Q,R, exists if and only if:

p+ q + r − 1

2√pqr

= +1 (52)

iv A quaternion Hilber space model for P,Q,R exists if and only if a complexHilbert space model exists.

Theorem 3 above (see [12] for a proof) suggests the following general def-inition:

a set of statistical invariants for the family of transition probabilitiesP (x, y) : x, y ∈ T with respect to a given probabilistic model is defined by:– a family (Fi)i∈I of real valued functions depending on the elements Pij(x, y)of the matrices P (x, y);– a family (Bi)i∈T of subsets of R such that the conditions

Fk(pij(x, y)i,j,x,y) ∈ Bk ; ∀k ∈ I (53)

are necessary and sufficient for the existence of the given model.Similarly to what happens for the geometrical, or topological invariants, thereis no general algorithm to calculate the statistical invariants of a set of tran-sition probability matrices with respect to a given probabilistic model.Several particular cases have been worked out (see [12]).Even with these limitations, Theorem (6.3) provides a general answer to an-other of the mysteries of quantum theory, to which a large leterature hasbeen devoted since the early sixties (c.f. [12] for a proof and references ) i.e.:

why complex Hilbert spaces (i.e. complex amplitudes)?

It is clear from (50), (51), (52) that not only the non-Kolmogorovity of themodel, but also the discrimination between real and complex num-bers is built into the experrimentally measurable statistical data.

Let us conclude with an historical remark: an old theorem, due to vonNeumann, states that within the Hilbert space model n self-adjoint operatorsadmit a joint distribution in any quantum state if and only if they commute.

26

There have been many mathematically interesting generalizations of this the-orem to lattices (quantum logics) more general than those given by projec-tions in Hilbert space.

The point of view of the statistical invariants is completely different sinceit provides, seemingly for the first time a model independent solution ofthe problem:the existence or not of a Kolmogorov model can be now decideduniquely in terms of experimentally verifiable statistical data.

10 The new structure axioms

In this section we look for a constructive formulation of the structure axiomswhich keeps into account the negative postulate (II.2) of section (2).As we have shown, the notion of conditioning is crucial in order to understandthe rise of non-Kolmogorov models. In particular, the distinction betweenconditioning with or without alteration of previous information.Since possible alterations can occur because we act upon a system by a mea-surement, it is natural to set up our mathematical model as an idealizationof the various operations which are present in the measurement procedures.The usual Boolean-Kolmogorov structure will be recovered as the limitingcase in which the information acquired in a measurement process does notaffect the information acquired by previous measurements.The language we adopt is an extension of one introduced by J. Schwinger[22] and called the algebra of measurements.The mathematical theory of these algebras was developed in [2].

Let A be an observable with values aj (j = 1, . . . , n) and let t be aninstant of time.To the triple (A, ajt) we associate an idealized measurement apparatus, de-noted Aj(t), which from an ensemble of independent similar systems selectsthose for which the value of the observable A at time t is aj.Such an apparatus will be called an elementary filter.

The symbol Aj(t) · Bk(t)(s < t) will be associated with the apparatuscorresponding to the consecutive application, to the same ensemble, first ofthe filter Aj(t) and then of Bk(t).

27

This operation is associative, with an identity consisting of the apparatuswhich does not filter away anything.Elementary filters are idealizations of the so-called first-kind measurements,meaning by this that, if ε is very small then each particle which passed thefilter Aj(t) will also pass the filter Aj(t + ε) (i.e. - if the observable A hasthe value aj at time t− 0, it will also have this value at t+ 0).Thus elementary filters act both as measurement apparatus and as preparingapparatus.Not all measurements in nature are of the first kind, but our mathematicalanalysis will be limited to this class.Since the measurements which do not disturb the system at all are of thefirst kind, it follows that all the events considered in the classical theory areincluded in the present discussion.

We start from a set T (index set) and we assume that, for each x ∈ T itis given an observable quantity A(x) with values a1(x), . . . , an(x) (n < +∞,the same for all x).For the moment we limit ourselves to the structure axioms, so we do notintroduce probabilities.The family of events in whose probabilities we will be eventually interestedis

| A(x) = aj(x)] : x ∈ T, j = 1, . . . , n

where [A(x) = aj(x)] denotes the event that A(x) takes the value aj(x).One can prove that the mathematical model for this family is defined by:– an associative real ∗–algebra A with identity.– the assignment of a correspondence which, to the event [A(x) = aj(x)]associates an element Aj(x) of A satisfying

Aj(x)∗Aj(x) ; Aj(x) · Ak(x) = δjkAj(x) (54)

n∑j=1

Aj(x) = 1 (55)

– it is also required that the algebra A is algebraically spanned by the set ofprojections Aj(x) : x ∈ T, j = 1, . . . , n.An algebra A as above will be called a Schwinger algebra of measurementsassociated to the set of observables A(x) : x ∈ T.Two elements of A are called compatible if they commute.

28

Two typical examples are:

Example 1) The requirement that all the observables A(x)(x ∈ T ) aremutually compatible (i.e. any pair of projectors Aj(x) (j = 1, . . . , n ;x ∈ T ) commute) allows to associate in a canonical way a Boolean algebrato the Schwinger algebra A.Conversley every Boolean algebra defines a commutative Schwinger algebra.

This shows that the classical theory is included in the possible models ofthe new structure axioms for probability.

Example 2) Let H = Cn with the usual hermitean product; let, for eachx ∈ T, φ1(x), . . . , φn(x) be an orthonormal basis of H. To the event [A1(x) =aj] we associate the rank one projector

Aj(x) : ψ ∈ H −→ 〈φj(x), ψ〉 · φj(x) = Aj(x) · ψ

(〈 · , · 〉 -the scalar product).Clearly (3) and (4) are verified.Here the Schwinger algebra is the algebra of all n× n matrices.

The two examples considered above are extreme, in the sense that while inthe former for each x ∈ T , the observable A(x) is compatible with any otherobservable A(y), in the latter we have that if an element of A(= M(n,C))commutes with each one of the A1, . . . , An(x), then it must be a linear com-bination of them (with complex coefficients), so that the observable A(x)presents the maximal degree of compatibility.Motivated by this remark we introduce the following:

Definition 3 In the above notations the observable A(x)(x ∈ T ) is calledmaximal if an element of A commutes with each of the A1(x), . . . , An(x)when and only when it is a linear combination of them with coefficients inthe centre of A (i.e. the family of elements of A commuting with each elementof A).

In the following the centre of the Schwinger algebra A will be denotedκ and we will assume the validity of the following two technical conditions

29

(they are much stronger than we need, but assuming their validity greatlysimplifies the exposition):

x ∈ A and γ ∈ κ =⇒ (γ · x = 0 if and only if γ = 0 or x = 0) (56)

γ · Aj(x) > 0 ; with γ ∈ κ ; if and only if γ > 0 (57)

(in a ∗–algebra an element is called positive if it can be written as a sum ofelements of the form b∗b).

In the classical case the same Boolean structure is a model for the familyof events both for a deterministic and a statistical theory.Therefore the Boolean algebra model cannot be intrinsically related to a sin-gle set of probabilities.Conversley, it is intuitively clear that if we are dealing with two differentmaximal observables, then the acquirement of an exact information on thevalues of one of them implies that the information on the values of the otherone cannot be but statistical.Therefore, in this case we expect a set of privaleged probabilities intrinsicallyassociated to the observables. It is a remarkable feature of the Schwingeralgebra of measurements that they provide a precise mathematical supportfor this intuition:

Theorem 4 In the above notations, let A(x) and A(y)(x, y ∈ T ) be twomaximal observables. Then there exists a bi-stochastic matrix

P (X, y) = (Pij(x, y) ; (Pij(x, y) > 0 ,n∑i=1

Pij(x, y) =n∑j=1

(x, y) = 1)

such that:Ai(x) · Aj(y) · Ai(x) = Pij(x, y) · Ai(x) (58)

Pαβ(x, y) = pβα(y, x) (59)

Thus: not only do two maximal observables in a Schwinger algebra canon-ically define a transition probability matrix, but this matrix has necessarilythe symmetry property (59) which is found in the usual C-Hilbert space model(c.f. the identity (3)).

30

Now, just as Stone’s theorem provides a standard mathematical modelfor the Boolean algebras of classical probability, we would like to have astandard representation theorem for Schwinger algebras in order to obtain aclassification of all the possible models which can be obtained in that way.Recall that, with the exception of the commutative case, all these models areintrinsically probabilistic.In the classical case (all observables are compatible) we recover all the clas-sical models.So let us consider the opposite extreme, i.e. the case in which each observableA(x)(x ∈ T ) is maximal.An important corollary of Theorem 4 is that the Schwinger algebra generatedby a finite number of maximal observables is necessarily finite-dimensionalover its centre κ.Moreover, if we add to our assumptions (9.1), (9.2) the assumption that:

Pij(x, y) > 0 ; x, y ∈ T ; i, j = 1, . . . , n (60)

((60) will be implicitly assumed, in the following), then for any x, y,∈ T then2 elements of A, ai(x) ·Aj(y) : j = 1, . . . , n are linearly independent overthe centre of A.Thus a Schwinger algebra with each A(x) maximal has at least dimension n2

over its centre if card (T ) > 2.

Definition 4 A Heisenberg algebra associated with the family of observablesA(x) : x ∈ T is a Schwinger algebra A associated with A(x) : x ∈ Tsuch that:– each observable A(x) is maximal.– A has minimum dimensions over its centre.

The following problem arises now quite naturally:given a set P (x, y) : x, y ∈ T of n x n of bi-stochastic matrices, when doesthere exist a Schwinger algebra A associated with a family A(x) : x ∈ T ofn-valued observables, such that:(i) each A(x) is maximal in A(ii) for each x, y ∈ T the bi-stochastic matrix canonically associated to thepair A(x), A(y), according to Theorem 4 is P (x, y)?

This theorem is the quantum probabilistic analogue of the well knownclassical problem:

31

given a family of transition probability matrices P (s, t) : s < t, s, t ∈?+,when does there exist a Markov process (A(t)) such that for each s < t, thetransition matrix canonically associated with the pair of random variablesA(s), A(t) is P (s, t)?.It is well known that the classical probabilistic problem has a positive solutionif and only if the family of transition probability matrices (P (s, t)) satisfiesthe Chapman–Kolmogorov equation:

P (r, s) · P (s, t) = P (r, t) ; r < s < t (61)

A rather surprising fact is that the quantum-probabilistic problem has a pos-itive solution if and only if to each transition matrix P (x, y) one can associatean amplitude matrix U(x, y) so that the family U(x, y) : x, y ∈ T satisfies ageneralization of the Schrodinger equation (in integral form). More precisley:

Theorem 5 (c.f. [2]) The following assertions are equivalent:(i) There exists a Heisenberg algebra A with centre κ satisfying the conditionsof the problem stated above.(ii) For each x, y ∈ T there exists a κ-valued matrix U(x, y) such that foreach x, y, z ∈ T, i, j, k = 1, . . . , n.

n∑i=1

(pij(x, y)

Uij(x, y))·Uik(x, y) = σjk (62)

n∑j=1

(Pij(x, y)

Uij(x, y))·Ukj(x, y) = σik (63)

U(x, x) = 1 (64)

U(x, y) · U(y, z) = U(x, z) (65)

Moreover, in this case, the Heisenberg algebra A can be identified with thealgebra of all n x n matrices with coefficients in κ.

Remark (1) The Hilbert space model of quantum theory is recovered whenκ = C and

Pij(x, y) =| Uij(x, y) |2 (66)

one immediately recognizes in (62), (63) the conditions of unitarity of thematrix U(x, y).

32

Moreover, considering the case T = R and assuming a translation invariantsituation, (64) and (65) become equivalent to:

U(s) · U(t) = U(s+ t) ; U(0) = 1 (67)

where (U(s)) is a 1–parameter unitary group of n× n matrices.Denoting by H its generator, (67) becomes equivalent to:

d

dtψ(t) = iHψ(t) ; ψ(0) = ψ0 ∈ Cn (68)

which is the usual finite–dimensional form of the Schrodinger equation.

Remark 2) It can be shown that equation (65) includes also the notionof group representation and that a natural modification of it (making it pathdependent) leads to the notion of connection in a Hilbert bundle and of theassociated gauge theories.

Many problems remain open in the clarification of the structure of Schrodingeralgebras of measurements and their connections to probabilistic models.Theorem 21 above shows however that essentially all the features of thequantum mechanical model follow from a natural and physically meaningfulidealization of the measurement procedures.It remains to determine the nature of the centre of A. But this is done withthe techinque of statistical invariants described in section (6).

The transition to observables with infinitely many (possibly continuous)values can be achieved starting from Theorem 5, through the application ofknown mathematical techniques.The idea of the procedure is the following:The problem of classifying (in the N–dimensional Hilbert space model withN < +∞) the pairs of observables which realize the maximal mutual inde-terminacy (i.e. the transition probabilities between their values are always1/N), has led naturally to the introduction of the notion of complementarypairs (c.f. the paper in [2], section (2)), which was motivated by the author’sattempt to give a mathematical formulation of the distinction between in-compatible pairs in the sense of Heisenberg and complementary pairs in thesense of Bohr and which has been subsequently studied by a multiplicity ofauthors.

33

A final remark: the definition 2 of Hilbert space model is limited to maxi-mal observables (i.e. each of the observables considered represents a completeset of compatible observables in the physicist’s terminology).In the mathematical model, this assumption is reflected through the factthat to each value of the given observable corresponds a vector in the Hilbertspace or, more precisely, a rank one projection.If the observable is not maximal, then to each of its values will correspond aprojection which is not or rank 1.

11 The theorem of composite probabilities

and the quantum measurement process

Let A,B,C be three different n–valued observables. We denote by X(t) theobservable x at time t(x = A,B,C) and Aa(t) - the event corresponding toA(t) = a(a a value of A). Similar notations will be used for B and C. Theset of values of the observables will be assumed time independent. Following(1) we denote:

Pab(t, s) = P (Aa(t) | Bb(s)) (69)

(and similarly for BC,CA). We assume A,B,C to be maximal, in the sensethat for r < s < t and values a, b, c:

P (Aa(t) | Bb(s) ∩ Cc(r)) = P (Aa(t) | Bb(s)) (70)

The identity (70) corresponds to the identity (??) in section (3) and theprobability on the left hand side of (70) is measured as approximate relativefrequency of the event Aa(t) in an ensemble of particles obtained by selectionof the particles for which B(s) = b from an ensemble prepared so that C(r) =c. While the probability on the right hand side of (70) is evaluated as theapproximate relative frequency of the event Aa(t) in an ensemble of particlesprepared so that B(s) = b. Thus the identity (70) should be considered asan experimental fact, and there are many physical systems in which it issatisfied.Formally (70) is a weak form of Markov property, but is must be kept inmind that in a classical context (70) means that the information on Cc(r) isnot necessary while in the present context - that this information haslost any value.

34

The mathematical difference is big, in fact in the classical markovian casethe equality

Pca(r, t) =∑b

Pcb(r, s) · Pba(s, t)

holds, while in the present case it doesn’t. A selective measurement such asthe one described to define the left hand side of (70) is often called, in thequantum mechanical literature, a complete measurement.A non selective measurement is called incomplete.According to the notation introduced in section (3. ), we denote∨

b

Bb(s) (b runs over all values of B) (71)

the event corresponding to having done an incomplete measurement of B(s).According to the general formula (20), deduced in section (3. ), (extendedin an obvious way to n disjoint events) and keeping into account (70), theprobability of the event Aa(t) given the preparation Cc(r) and the incompletemeasurement

∨bBb(s), is given by:

P (Aa(t) | [∨b

Bb(s)] ∩ Cc(r)] = (72)

∑b

P (Aa(t) | Bb(s)) · P (Bb(s) | Cc(r)) =∑b

Pcb(r, s) · Pba(s, t)

(which is usually different from Pcb(r, t)). Assume now that the transitionprobabilities (69) are described by the usual quantum mechanical model.This means (according to Definition 2, that there exist: an Hilbert spaceH ≡ Cn and for each observable A(t), B(t), C(t), an orthonormal basis(φAa (t)), (φBb (t)), (φCc (t)) of H, such that:

Pba(s, t) =| 〈φAa (t), φBb (s)〉 |2= Tr(PAa (t) · PB

b (s)) (73)

(similary for B,C, r, s) where Tr(·) denotes the (non normalized) trace on thespace of all the bounded operators on H and PA

a (t) is the rank one projector:

PAa (t)ψ = 〈φAa (t), φ〉φAa (t) (74)

Inserting (74) into (70) and (72) respectively, one finds:

P (Aa(t) | Bb(s) ∩ Cc(r)) = P (Aa(t) | Bb(s)) = Tr(PAa (t) · PB

b (s)) (75)

35

P (Aa(t) | [∨b

Bb(s)] ∩ Cc(r)) = (76)

=∑b

Tr(PAa (t) · PB

b (s)) · Pcb(r, s) = Tr(PAa (t) ·W (r, s))

where the operator W = W (r, s) = WCB(r, s) is defined by:

W (r, s) =∑b

Pcb(r, s) · PBb (s) (77)

and has the following properties:

W is a positive hermitean operator (78)

Tr(W ) = 1 (79)

In the quantum mechanical literature an operator W with these properties iscalled a density matrix (or a mixture, or a state). The rank one projectionsare particular density matrices called pure states.Summing up: when no measurement is done at time s (or, more generally,when the particles af the ensemble can be considered as isolated in the timeinterval (0, t)) the probability of the event Aa(t) shall be evaluated by:

P (Aa(t) | Cc(r) = Tr(PAa (t) · PC

c (r)) (80)

When a selective measurement of B(s) is performed, corresponding to thevalue b, we shall use formula (75). (It is not necessary that the particles forwhich B(s) 6= b are materially discarded. The only relevant thing is that therelative frequency of the event Aa(t) is evaluated only over the ensemble ofthose particles for which B(s) = b). Finally, in a non selective measurementof B(s), one shall use formula (76).The fact that in general the right hand sides of (76) and of (80) are differ-ent corresponds mathematically to the non validity of Bayes’ axiom inthe quantum probabilistic model, and physically to the fact that thephysical situations in which the two probabilitities of Aa(t) are evaluated aredifferent.So, for example, if a friend of mine makes a selective measurement of B(s),but doesn’t tell me which value of B(s) he selected, then I will use formula(10. 8) to evaluate the probability of Aa(t), while he will use (75).His results will be experimentally more precise than mine, which is not sur-prising since he has got more information.

36

In general I must use formula (76) when there is some information on thesystem which in principle I could obtain but which I have not.Formula (75) instead, is used when I dispose of the maximal amount of exactinformation obtainable (in agreement with the indeterminacy principle) at agiven moment.There is nothing surprising in the fact that the mere possibility of obtain-ing in principle some information on the system alters my way of evaluatingprobabilities of events concerning it. This mere possibility is in fact not of apsycological nature, but corresponds to different physical and experimentalconditions to which the system has been subjected.These difference in the experimental conditions of the system can arise asdisturbances occurring between the original preparation and the measure-ment of the event in question - as it is the case when we use the formula(72) (incomplete measurements) - as when we constrain the system to berelated to another system by a conservation law (in the mathematical modelof quantum theory this situation is described by a wave vector of the form(44)).The physical state of the system changes in time, in the sense that the valuesof all the observables of the system are functions of t.However, by the Heisenberg principle, we cannot obtain exact informationsimultaneously on all the observables of a system, but only on a maximal ob-servable (a complete set of compatible observables - in the physicists’ termi-nology). Thus the knowledge of the physical state of the system, in the senseof the mechanicistic theories, is ruled out a priori by Heisenberg principle.In the mathematical model the maximal amount of information, attainableon a given system at a given moment s, is representated by a vector φBb (s) ina Hilbert space (more precisely - by the corresponding rank one projectionPBb (s).

This gives two types of information on the system:(i) that the exact value of B(s) is b(ii) that the probabilities of the value of any observable A(t)(t > s) shouldbe evaluated according to formula (75).For reasons made explicit in section (11. ) some physicists have claimedthat, if the mathematical model af the state of quantum system at time sis φBb (s), then all the observables incompatible with B do not assume anyvalue but are in a physically undefined (and in principle unobservable) stateof physical superposition.These states have the peculiar property of existing only if nobody tries to

37

ascertain whether they exist or not:as soon as one tries they disappear (this is the collapse of the wave packet -cf. Section (11)) one faces the familar situation in which an observable takesone and only one of its values.But why, one might ask, physicists have introduced such a peculiar notion?It is not easy to find an answer to this question in the physical literature,because few contemporary physicists had the scientific courage of writingdown explicitly the precise definition of a state of physical superposition and,most of all, the rational reasons which determined the introduction of such abizarre notion. The papers of R.P. Feynman constitute a notable exception,and that is the reason why the answer to the questions posed in those pa-pers have been the starting point of many investigations on the relationshipbetween probability and quantum theory.In Section (11) we will give a brief outline of the reasons which lead to thenotion of physical superposition of states (which is to be distinguished by thecorresponding mathematical notion which is perfectly well defined and cor-responds to the familiar operation of taking linear combinations of vectorsin a complex Hilber space).

12 A historical digression on the so-called para-

doxes of quantum mechanics

As stated section (1), since the late twenties both physicists and mathemati-cians were aware that quantum theory was a new probabilistic calculus.The point of view advocated in the present paper is that this new calculusreflects the non universal applicability of some of the elementary rules of clas-sical probability (in particular, Bayes’ definition of conditional probability).

This possibility has not been taken into account in the previous litera-ture where, on the contrary, starting from the (implicit) assumption of theuniversal applicability of the rules of the classical probabilistic model, onewas led to the necessity of introducing some strange physical properties tojustify the discrepancies between the experimental date and the theoreticalpredictions (deduced using these rules).

In the following we briefly outline the considerations which led to the in-troduction of such strange physical notions as: non-locality, non-separability,

38

physical superpositions, collapse of the wave packet,. . . It should be under-lined that the correct application of the rules of this new probability calculusdo not lead to contradictions (at least not more than any other mathemat-ical theory). It is only when the rules of the new probabilistic calculus areincritically mixed up with those of the old one that contradictions arise.In section (4) we have illustrated this situation with a particular example.Now we will illustrate, in its generality, the mechanism which lead to the socalled paradoxes of quantum theory.The identity (22), i.e.:

P (A | C) = P (AB ∩ C) · P (B | C) + P (A | Bc ∩ C) · P (Bc | C) (81)

was considered by the physicists a tautology on relative frequencies, as longas the events B,Bc are disjoint and exhaust all the possibilities.Therefore, in order to explain the fact that some experiments (as the onedescribed in section (4)) contradict (11) they were led to the conclusion thatin such cases they had to do with events B,Bc which were not mutuallyexclusive.However this conclusion also contradicted the experiments, which unequivo-cally showed that the events B and Bc were mutually exclusive.However strange may be the way out chosen by the physicist form this em-passe was the following: since the identity (81) is falsified by the experimentsin a situation in which no attempt is made to verify which of the alternativesB,Bc was fulfilled; and since, whenever we do an experiment to verify such athing, we find that one and only one of the events B,Bc can happen at eachtime, then we conclude that the events B and Bc are mutually exclusive whenwe look at them but they are not mutually exclusive when we do not look atthem.

So, for example, if we consider an observable which, when measured, givesas result either value +1 or the value −1, in appropriate units, but not oth-ers, then we are not looking at this observable, we cannot say that it eitherassumes the value +1 or the value −1 but we do not know which one (oth-erwise we will find a contradiction with the experiments).

This statement constitutes the essence of what, after N. Bohr, W. Heisen-berg and many others has called the Copenhagen interpretation of quantummechanics. Let us quote, for example, a statment taken from a famous paper

39

of R.P. Feynman ([14], pg 369):

. . . Looking at probability from a frequency point of view [the identity] (11)simply results from the statement that in each experiment giving A and C,Bhad some value.The only way (81) could be wrong is the statement B had some value, mustsometimes be meaningless.Noting that (30) replaces (81) only under the circumstances that we makeno attempt to measure B, we are led to say that the statement B had somevalue, may be meaningless whenever we make no attempt to measure B . . .

(here we have slightly changed Feynman’s notations: B denotes a2–valued observable and B,Bc in (81), are the events corresponding to thefact that B assumes one of its 2 values. Also the numeration of the formulaeis different from Feynman’s original one ).Feynman’s dicussions of this statement [13], [14], makes clear that the termmeaningless means that, if one assumes that the observable B had some valuewhen one does not look at it, then necessarily one arrives to a contradictionwith the experiments.One might quote many statements equivalent to Feynman’s, taken from thearticles or books by Heisenberg, Wigner, Regge, Dyson,. . . (to quote onlysome very well known physicists, not suspectable of having been deviated byan excessive interest for the foundational problems of quantum mechanics).

It should also be said that not all physicists accepted the Copenhagen in-terpretation of quantum theory. On the contrary, the controversy about thisinterpretation gave rise to one of the most fascinating and subtle conceptualdebates of our century which, from the stature and the number of personsinvolved as well as for the depth and the wealth of the scientific production itstimulated in different fields such as physics, mathematics, philosophy, logic,epistemology,. . . can only be compared to the great conceptual controversieswhich have always accompanied major development of scientific thought (e.g.the debates concerning the notions of action at distance or at contact, thewave of particle nature of light, determinism or indeterminism, evolutionism-creationism, . . .).

Those who opposed the view that observables cannot assume any oftheir values when nobody looks at them included among others Einstein,

40

Schrodinger, de Broglie, . . . while the orthodox front, aligned with the Copen-hagen interpretation, included the overwhelming majority of the leadingphysicists.

The opponents of the orthodox interpretation did not suceed in findinga convincing alternative, but at least they succeeded in giving some conse-quences of it which contradict either some basic principles of physics (such asthe principle of special relativity) or some well established experiences aboutthe behaviour of macroscopic objects.

In fact, according to the Copenhagen interpretation, an observable notlooked at is in a strange new type physical state, called a superposition state,in which no one of its values is assumed, but each one is virtually present(Heisenberg, to describe such a state, used the word potentia as opposed tothe actum of assuming a simple value).This superposition state is in principle not observable, because as soon as onedoes an experiment on the system to check which value the observable has,the potentia becomes instantaneously actum and the result is always a welldefined value. This abrupt change from potentia to actum is called, in thephysical literature, the collapse of the wave packet.For example an electron in a definite energy state, when nobody looks atit, is virtually everywhere in the space (i.e. its position observable is in asuperposition state). However as soon as somebody looks at it suddenly ma-terializes in a region of a few microns.To describe this point of view Schrodinger, who opposed it, used the termwitchery.The most famous objection to this statement is known under the name ofEinstein-Podolsky-Rosen paradox and (one of the many equivalent versionsof it, which is due to Heisenberg [18]) can be described as follows: an electronis in a region of space delimited by two communicating boxes.Then the communication between the two boxes is closed and they are sep-arated by an arbitrarily large distance.According to the orthodox interpretation, it is wrong to say that the electronis in one and only one of the boxes with a certain probability.The correct statement is that the electron is virtually spread in both boxes,but actually in neither of them. Now, if an experimenter opens one of the twoboxes, the electron collapses - according to the orthodox view - into one andonly one of them (not necessarily the open one), with a certain probability

41

which is correctly predicted by quantum theory.But, Einstein, Podolsky and Rosen asked, how does the part of the electronvirtually contained in box 2 know that box 1 has been opened?It must know it in a time shorter than that needed to the experimenter ofbox 1 to take conscience of the presence of absence of the electron in box1; however, the two boxes can be so far apart that the time needed to anyphysical signal (according to the theory of special relativity ) propagate itselffrom box 1 to box 2 would be much greater than the relaxation time neededby the experimenter to ascertain whether the electron is in box 1 or not.Thus, if the orthodox interpretation of quantum theory is correct, the princi-ple of special relativity, according to which no physical signal can propagateitself faster than the light in the vacuum, must be wrong.One might take comfort from the fact that this superluminal propagation hasno observable consequences.But the theoretical contradiction remains.This is the root of the notorious debate on non-locality in quantum theory.Non-separability is roughly the same thing:if instead of our electron I consider two electrons (or balls) then until I donot open the box I can’t say that they have a separate physical identity:they are in a state of physical superposition and I create their individualityby the act of opening the box: in this sense some people have spoken ofnon-separability.

Many other paradoxes have been deviced by exploiting the notion of col-lapse of the wave packet.Some of them have been taken so seriously by some physicists to lead themto abandon the dream of Boltzmann, Einstein,. . . of the unification of physicsunder a few general principles and to try to construct two separate physicaltheories: one for the description of the microscopic world and one for thedescription of the macroscopic world (c.f. [33]).A description of these paradoxes can be found in [29], the proof that they areall based on the inappropriate use of a single formula of classical probabilitytheory (Bayes’ formula) to deal with sets of statistical data not admitting aKolmogorvian model is in [9].The conclusion is that, given for granted that quatum theory represents thedeepest level of our present understanding of nature, nevertheless some partsof its currently accepted interpretation should be changed.In particular the statement that the introduction of notions such as collapse

42

of wave packet, physical superposition of states,. . . is necessary to avoid con-tradictions between theory and experiments, is mathematically wrong:

the only contradiction is between experimental (statistical) data and thepretense to fit them within a single Kolmogorov model.

One might still ask:

which physical reasons prevent the use of the Kolmogorov model? (82)

and at the moment to find an answer to this quation is probably the mostimportant open problem for the foundations of contemporary probability.

However I hope that the arguments of the present paper have convincedthe reader that the symmetric question:which are the physical reasons that suggest the exclusive use of the Kol-mogorov model?is at least as legitimate.

13 Post–scriptum (added 21–6-2013)

The answer to the above question (K) was not known when the present paperwas written. In fact in 1985 the idea that:response (i.e. chameleon) type measurement (as opposed to passive, or classi-cal ones) may naturally lead to non classical (non Kolmogorov type) statisticaldata was not yet developed.The EPR–chameleon experiment, described in the paper:

Accardi L., Imafuku K., Regoli M.:On the EPR–Chameleon ExperimentIDA–QP (Infinite Dimensional Analysis, Quantum Probability and RelatedTopics) 5 (1) (2002) 1–20quant-ph/0112067

was the first quantitative answer to this question.

43

References

[1] [QP–0] Mathematical problems in the quantum theory of irreversible pro-cesses, ed.L.Accardi, V.Gorini, G.Parravicini, Arco Felice 1978.

[2] [QP–I] Accardi L.: Some trends and problems in quantum probability,In: Quantum probability and applications to the quantum theory of ir-reversible processes, L. Accardi, A. Frigerio and V. Gorini (eds.), Proc.2–d Conference: Quantum Probability and applications to the quan-tum theory of irreversible processes, 6–11, 9 (1982) Villa Mondragone(Rome), Springer LNM N. 1055 (1984) 1-19

[3] [QP–II] Quantum Probability and Applications II, L.Accardi, W.vonWaldenfels (eds) ; Springer LNM N. 1136

[4] [QP–III] Quantum Probability and Applications III, Springer LNM N.1033(1987)

[5] [QP–IV] Quantum Probability and Applications IV, Springer LNM N.1396(1989)

[6] [QP–V] Quantum Probability and Applications V, Springer LNM1442(1990)

[7] [QP–VI] Quantum Probability and Related Topics VI, World Scientific(1991)

[8] [QP–VII] Quantum Probability and Related Topics VII, World Scientific(1992)

[9] Accardi L.: Topics in Quantum Probability. Physics Reports. 77 (1981)169-192.

[10] Accardi L.: The probabilistic roots of the quantum mechanical paradoxes.in: The wave particle dualism, eds. S. Diner, G. Lochak, F. Selleri:Reidel (1983).

[11] [QP–PQ XIV] Quantum Probability and White Noise Anal-ysis XIV Quantum interacting particle systems, LuigiAccardi, Franco Fagnola (eds) World Scientific (2002),http://www.wspc.com.sg/books/mathematics/5055.html

44

[12] [Ac86a] Accardi L.: Foundations of Quantum Mechanics: a quantumprobabilistic approach, in: The Nature of Quantum Paradoxes, eds.G.Tarozzi, A. van der Merwe, Reidel (1988) 257–323 Preprint Dipar-timento Di Matematica, Universita’ di Roma Torvergata (1986)

[13] Feynman R.: The concept of probability in Quantum Mechanics, Proc.II-d Berkeley Symp. on Math. Stat. and Prob., Univ. of California Press.Berkeley (1951) 533-541.

[14] Feynman R.: Space-time approach to non-relativistic quantum mechan-ics, Rev. of Mod. Phys. 20 (1948) 367-385.

[15] Feynman R.: The character of physical law (1965) (it. ed. Boringhieri1971).

[16] Fine A.: Probability and the Interpretation of Quantum Mechanics, Brit.L. Phil. Sci. 24 (1973) 1-37

[17] Garuccio A. Selleri F.: Systematic derivation of all the inequalities ofEinstein locality. Found. of Phys. 10 (1980) 209-216.

[18] Heisenberg W.: Physics and Philosophy: the revolution in mod-ern science. New York, Harper 1958.

[19] Hilbert D. Nordheim L. von Neumann J.: Uber die Grundlagen derQuantenmechanik. Math. Ann. 98 (1927) 1-30.

[20] Hilbert D.: Mathematical developments arising from Hilbert problems.AMS, Proceedings Symp. in pure math. 28 (1976).

[21] Kruszynski P.: Extensions of Gleason Theorem. In Ref. (2).

[22] Schwinger J.: Quantum kinematics and dynamics, Academic Press, NewYork, (1970)

45