Download - Some mathematical structures for computational information

Transcript

Some mathematical structures forcomputational information

Hung T. Nguyen

Department of Mathematical Sciences, New Mexico State University,

Las Cruces, NM 88003-8001, USA

Received 1 June 1999; accepted 28 April 2000

Abstract

This paper is about the basic underlying mathematical structures of various types of

imprecise information. Set theory and probability measures are basic ingredients in

extracting some speci®c type of information and in reasoning with uncertain knowledge.

But in this era of information technology, it is desirable to consider also more complex

types of information, for example, information coming from human perception (see e.g.,

L.A. Zadeh, IEEE Trans. Circuits and Systems 45 (1) (1999) 105±119). As a ®rst

approximation, the classical ingredients could be generalized to adequately model new

types of information. As such, we will describe in this paper four types of non-classical

sets, namely random sets, rough sets, conditional sets and fuzzy sets, as well as their

associated non-additive measures. Ó 2000 Elsevier Science Inc. All rights reserved.

Keywords: Capacities; Conditional events; Fuzzy logic; Random sets; Rough sets

1. Introduction

Like the concept of shape, information is di�cult to de®ne mathematically.In statistics, one of the main problems is this. How to determine an unknownparameter h, in a probability model f �x; h� of a random variable X, to repre-sent, in the most plausible way possible, a random sample value x1; x2; . . . ; xn

drawn from X? The idea of Fisher is to look at which surfaces T �x1; x2; . . . ; xn�

Information Sciences 128 (2000) 67±89www.elsevier.com/locate/ins

E-mail address: [email protected] (H.T. Nguyen).

0020-0255/00/$ - see front matter Ó 2000 Elsevier Science Inc. All rights reserved.

PII: S 0 0 2 0 - 0 2 5 5 ( 0 0 ) 0 0 0 3 9 - 6

on which the ``information'' about h is concentrated. In this view, Fisher tookas information the quantity

In�h� � nZ �1

ÿ1�1=f �x; h��jof �x; h�=ohj2 dx:

It is clear that the above concept of information is speci®c to statisticalestimation problems.

Another concept of information in communication is due to Shannon (1949).Let �X;A; P� be a probability space describing a random experiment. Then theentropy or information of a partition p � fA1; . . . ;Ang of X is taken to be

H�p� � ÿcXn

j�1

P �Aj� log P �Aj�:

The above de®nition is motivated as follows. A measure of indertermination ofÿc log P �Aj� is assigned to the realization of the event Aj. Thus, H�p� is nothingelse than the expected value of the random variable which take the valuesÿc log P �Aj�with probabilities P�Aj�. Note that the valueÿc log P �Aj�was givenby Wiener in 1948 as a measure of information when the event Aj occurs. Thisconcept of information is essentially based upon probabilities. It is interesting tonote that the notation H and the term ``entropy'' have their roots in Boltzmann's formula in thermodynamics. Shannon's concept of information is suitable fortransmission problems. It is clear that H�p� depends only on the probabilities ofthe messages, but not on their meanings. In fact, as Shannon stated ``these se-mantic aspects of communication are irrelevant to the engineering problem''.Thus, if we seek semantic information in knowledge, then we need to rely onother considerations. In this paper, we look at basic mathematical structures formodeling and reasoning with various types of information.

2. Random sets

Random sets are sets obtained at random. Mathematically speaking, arandom set is a random variable taking sets (instead of points) as values. Amathematical theory of random sets is given by Math�eron [17]. While theconcept of random sets is useful in its own right, say, for statistics, it turns outthat it plays an interesting role in relation with various types of uncertaintymeasures.

2.1. Non-additive set-functions and their associate integrals

Consider the setting of imprecise probability model, see e.g., [28]. Let P be aclass of probability measures on a measurable space �X;A�, containing the

68 H.T. Nguyen / Information Sciences 128 (2000) 67±89

true, unknown probability measure Po of some statistical model. Withoutknowing Po, we are forced to consider bounds on Po, that is lower and upperprobability envelops de®ned, respectively by: G; F : A! �0; 1�,

G�A� � inffP �A� : P 2 Pg; F �A� � supfP�A� : P 2 Pg:Note that G�A� � 1ÿ F �Ac�, where Ac denotes the set complement of A in X.Unless P is a singleton, the above set-functions are not additive, but they arealways monotone increasing (i.e., A � B implies, e.g., F �A�6 F �B�), and mayhave additional properties, depending upon the structure of P. For example,on a ®nite space X, let

P � f�P � �1ÿ ��Qo : P 2 Pg;where P is some class of probability measures, Qo is some known probabilitymeasure and 0 < � < 1. Then F is alternating of in®nite order, that is F ismonotone and for n P 2,

F\ni�1

Ai

!6

X;6�I�f1;2;...;ng

�ÿ1�jI j�1F[

I

Ai

!;

where jI j denotes the cardinality of the set I.The set-function G is referred to as a belief function [27], whose M�obius in-

verse m is a bona®de probability mass function on the power set 2X of X, where

m�A� �XB�A

�ÿ1�jAnBjG�B�;

where A n B � A \ Bc.Thus, if we view m as a probability mass function of a random set X on X,

then the belief function G or its dual F, called the plausibility function, isnothing else than the distribution function of the random set X

F �A� � Pfx : X �x� \ A 6� ;g:This situation is somewhat general in the sense that probability laws of randomclosed sets on Euclidean spaces (or more generally, on locally compact,Hausdor� and separable spaces) are characterized by non-additive set-func-tions such as F. For background on random closed sets, see e.g., [17], or [12].

As an example, let f : Rd ! �0; 1� be a upper semi-continuous function, andconsider the set-function F de®ned on Borel sets of Rd by

F �A� � supff �x� : x 2 Ag:Then F is a Choquet capacity, i.e.,

(i) 06 F ���6 1, F �;� � 0;(ii) F is alternating of in®nite order,(iii) if Kn is a decreasing sequence of compact sets in Rd converging to K,then F �Kn� converges to F �K�.

H.T. Nguyen / Information Sciences 128 (2000) 67±89 69

In fact F is the distribution of a random set S, de®ned on some probabilityspace �X;A; P�, taking closed sets of Rd as values. Indeed, let X : X! �0; 1� bea random variable, uniformly distributed on �0; 1�. Then the random set

S�x� � fx 2 Rd : f �x�P X �x�gis such that, for any compact set K of Rd ,

F �K� � supff �x� : x 2 Kg � Pfx : S�x� \ K 6� ;g:

Remarks.

(a) The fact that the set-function F, counter-part of distribution functions ofrandom vectors, which is required only to be speci®ed on compact sets, de-termines completely the probability law of the random set S is known asChoquet Theorem in the literature of random set theory. See e.g., [17].

(b) The proof of (iii) above is straigthforward. (ii) follows from a moreelegant and general fact, namely: any maxitive set-function F (i.e., F satisfyingF �A [ B� � max�F �A�; F �B��) is alternating of in®nite order. See [12]. It isinteresting to note that the maxivity property of capacities (which arises nat-urally in the theory of extremal stochastic processes) corresponds to the sta-bility of various types of dimensions in Fractal Geometry, see e.g., [6]. Theinterest in maxitive set-functions lies in the fact that they are possible modelsfor random sets. In Topology, the Kuratowski's measure of non-compactnessis maxitive.

(c) If we denote by m the set-function m�A� � 0 or 1 according to A � ; ornot, then, in the above example, we have

F �A� � supff �x� : x 2 Ag �Z 1

0

m�fx : f �x�P tg \ A�dt

� �c�Z

Af �x�dF �x� �in symbol�;

which is known as the Choquet integral of f on A with respect to the (monotoneincreasing) non-additive set-function m. Thus, we can write f � dF =dm as a kindof Radon±Nikodym derivative. Choquet integrals can be used in decision-making in an incomplete probabilistic information environment as we nowexamplify.

2.2. Decision-making based upon belief functions

Consider the following incomplete information scenario. Suppose that theprobability density function f of a random variable X with values inH � fh1; . . . ; hng is only partially speci®ed, say, as f �hi�P ai, i � 1; 2; . . . ; n,where

Pni�1 ai < 1. Such a situation can be modeled as a problem involving a

70 H.T. Nguyen / Information Sciences 128 (2000) 67±89

belief function which is the distribution G of a random set on 2H whose densityfunction m on 2H given by

m�fhig� � ai; i � 1; 2; . . . ; n and m�H� � 1ÿXn

i�1

ai

G : 2H ! �0; 1� is the associated belief function G�A� �PB�A m�B�.Indeed, it can be checked that f belongs to the class F of densities on H,

where F � fg : G6 Pgg, Pg denoting the probability measure on H associatedwith g, i.e., Pg�A� �

Ph2A g�h�. Note that G � inffPg : g 2Fg.

Suppose decisions are to be based on expected value of some utility functionu, then, from a minimax viewpoint, we are led to minimize Eg�u�X �� overg 2F. It turns out that, in this ®nite case, the minimum is attained and isequal to the Choquet integral of u with respect to the belief function G, i.e.,�c� RH u�h�dG�h� which is considered as a generalized expected value of u. Thiscan be seen as follows.

Rename the h's so that u�h1�6 u�h2�6 � � � 6 u�hn�. Then

�c�Z

Hu�h�dG�h� �

Xn

i�1

u�hi��G�fhi; hi�1; . . . ; hng� ÿ G�fhi�1; hi�2; . . . ; hng��:

Let A�i� � fhi; . . . ; hng and h�hi� � G�A�i�� ÿ G�A�i� 1��. Then h is a proba-bility density on H, so that the above Choquet integral is an ordinary expec-tation, but the density used for it depends not only on G but also on theordering via u. We have

h�hi� � G�A�i�� ÿG�A�i� n fhig� �XB�Ai

m�B� ÿX

B�A�i�nhi

m�B� �X

hi2B�A�i�m�B�:

Thus h 2F. Next, for each t 2 R, and g 2F, we have Ph�u > t�6 Pg�u > t�since �u > t� is of the form fhi; . . . ; hng. Therefore, for all g 2F,Eh�u�X ��6Eg�u�X ��.

The above fact is somewhat general: the in®mum of EP �u�X ��, not neces-sarily attained, over a class of probability measures P is equal to the Choquetintegral of u with respect to the belief function which is the lower envelopof P.

An alternative approach to expectation with respect to a belief function indecision-making could be based upon the maximum entropy principle if some``canonical'' choice of a density in F needs to be made. Recall that the entropyof a density f on a ®nite H, say, is H�f � � ÿPh2H g�h� log f �h�. Decisionscould be based on Eu�u�X �� where u is the density in F with maximum en-tropy. As in any problem involving maximum entropy, computational proce-dures need to be developed for applications. In the above scenario ofincomplete probabilistic information, the maximum entropy density u is ob-tained by the following algorithm, for details, see [22].

H.T. Nguyen / Information Sciences 128 (2000) 67±89 71

Let a16 a26 � � � 6 an, then

u�hi� � ai � �i; where �i P 0;Xk

i�1

�i � m�H�; and

a1 � �1 � a2 � �2 � � � � � ak � �k 6 ak�16 � � � 6 an:

This density is constructed by putting the ai in increasing order, settingdi � ak ÿ ai, i � 1; 2; . . . ; k with k maximum such that

Pki�1 di6m�H�, letting

�i � di � �m�H� ÿP

di�=k, i � 1; 2; . . . ; k, and letting �i � 0 for i > k.The general situation is this. Let H be a ®nite set and G be a belief function

on it. Corresponding to G is the set P of probability measures P on H such thatG6 P . Each P corresponds to a density f on H via P �A� �Ph2A f �h�, yielding aset F of densities. The Mobius inverse of G is a probability density m on 2H.Here is another algorithm which calculates the density g in F with maximumentropy directly from G, see [18] for details.

De®ne a density g on H as follows. Inductively de®ne a decreasing sequenceof subsets Hi of H, and numbers bi, as follows, quitting when Hi is empty:

(i) Ho � H,(ii) bi � max G�K [Hc

i � ÿ G�Hci �=jKj, over ; 6� K � Hi,

(iii) Ki is the largest subset of Hi such that G�Ki [Hci � ÿ G�Hc

i � � bijKij(there is a unique such Ki),(iv) Hi�1 � Hi n Ki.If h 2 Ki, then set g�h� � bi.

2.3. On the use of non-additive set-functions in uncertainty modeling

An objection to the use of non-additive set-functions in uncertaintymodeling was raised by Lindley [15] in which he used a generalized scoringframework of De Finetti to de®ne admissibility and arrived at the conclusionthat all non-additive set-functions are inadmissible. In the literature, thediscussions on Lindley's message ``you cannot avoid probability'' were ratherphilosophical than mathematical. And since it is an important issue whichneeds to be clari®ed, we reproduce here the essence of our mathematicalresponse to Lindley's paper. See [9] for details. Basically Lindley's mainresult is this. In a scoring framework, only functions of probabilities areadmissible.

Uncertainty due to randomness occupies a large place in natural phenomenaas well as in scienti®c problems. The additivity property is essential in modelinguncertainty. Indeed, even from a subjective viewpoint, where probabilities areassigned to events subjectively, these numbers should satisfy the additivityproperty. This is referred to as the coherence principle. We would like to knowto what extent non-additive set-functions are compatible with the coherenceprinciple.

72 H.T. Nguyen / Information Sciences 128 (2000) 67±89

Following Lindley, a fairly general formulation of the coherence principle isas follows. A score function is a real valued function f de®ned on �0; 1� � f0; 1gsatisfying the following:

(i) f �x; 0� and f �x; 1� are di�erentiable functions of x on �0; 1� with contin-uous derivatives. (We denote these derivatives by f 0�x; 0� and f 0�x; 1�, respec-tively.)(ii) There is an interval �x0;f ; x1;f � � �0; 1� such that f 0�x0;f ; 0� �f 0�x1;f ; 1� � 0:(iii) On �x0;f ; x1;f �, f 0�x; 0� > 0 and f 0�x; 1� < 0.Usually, one can take x0;f to be 0 and x1;f to be 1. The two functions f �x; 0�

and f �x; 1� can be interpreted as follows. If x is the uncertainty measure ofsome event E, then f �x; 1� represents the ``penalty'' if E occurs, and f �x; 0�measures the penalty if E does not occur.

Let A be a Boolean algebra of subsets of a set X. By an uncertainty measurel we mean a set function l : A! �0; 1� such that l�£� � 0 and l�X� � 1. ForE 2A, we write E � 1 if E occurs, and E � 0 if it does not. Thus, the ``score'' isf �l�E�;E�.

Given a score function f and n events Ei, i � 1; 2; . . . ; n, we de®ne a game as atriple �C; v; L�, where C � f0; 1g is the space of realizations of the Ei, v � �0; 1�Cis the space of uncertainty measures, and L is the loss function v� C! R

given by L�l;E� �Pni�1 f �l�Ei�;Ei�, where E � �E1;E2; . . . ;En�, and with the

convention that Ei� 1 or 0 according to whether or not it occurs. The functionl is inadmissible with respect to L (or to f and the Ei's) if there is an uncertaintymeasure t such that L�t;E�6L�l;E� for all E, with strict inequality for some E.Otherwise, l is admissible with respect to L.

For each score function f de®ne the transform

Pf : �x0;f ; x1;f � ! �0; 1� : x! f 0�x; 0�=�f 0�x; 0� ÿ f 0�x; 1��:

By the regularity conditions on f, the function Pf is continuous. If in ad-dition, the f 0�x; i� are strictly increasing, then Pf is strictly increasing, and thusPÿ1

f exists.A necessary condition for l to be admissible with respect to f and E;Ec, is

that Pf �l�E�� � Pf �l�Ec�� � 1. It is easy to construct a score function f suchthat even if l is a probability measure, l is not f-admissible. Reasonableuncertainty measures should be ones which are admissible with respect tosome score function. For example, any probability measure is admissiblesince it su�ces to consider proper score functions f, that is, those f such thatPf �x� � x. We refer to this type of admissibility as ``general admissibility''.Thus an uncertainty measure is not general admissible if there is no scorefunction f such that it is f-admissible. Of course, if one can ®nd a scorefunction f for which an uncertainty measure l is f-admissible, then l isgeneral admissible.

H.T. Nguyen / Information Sciences 128 (2000) 67±89 73

Theorem. Let A and B be two disjoint events, and let f be a score function. If theuncertainty measure l is admissible with respect to f and A;B;A [ B; then�Pf � l��A [ B� � �Pf � l��A� � �Pf � l��B�, that is Pf � l is additive.

Proof. Since A and B are disjoint, the only con®gurations of the events A, B,and A [ B are �1; 0; 1�, �0; 1; 1�, and �0; 0; 0�. Setting x � l�A�, y � l�B�, andz � l�A [ B�, the possible total scores are

f �x; 1� � f �y; 0� � f �z; 1�;f �x; 0� � f �y; 1� � f �z; 1�;f �x; 0� � f �y; 0� � f �z; 0�:

The admissibility of �x; y; z� implies that

detf 0�x; 1� f 0�y; 0� f 0�z; 1�f 0�x; 0� f 0�y; 1� f 0�z; 1�f 0�x; 0� f 0�y; 0� f 0�z; 1�

0@ 1A � 0:

Expanding this determinant across row three gives the announced result.Another necessary condition for admissibility is this: given three events A, B,

and C and an uncertainty measure l, let l�BjC� � x, l�AjB \ C� � y, andl�A \ BjC� � z. If �x; y; z� is admissible with respect to a score function f, thenPf �z� � Pf �x�Pf �y�. This follows from the fact that the total score function isf �x;B�C � f �y;A�BC � f �z;AB�C and that there are three possible realizationsof the sequence, namely �1; 1; 1�; �0; 1; 0�; and �1; 0; 0�.

An uncertainty measure l is general admissible if there exists a score functionf such that Pf � l is additive. If there is a functionr : �0; 1� � �0; 1� ! �0; 1� suchthat l�A [ B� � l�A� 5 l�B� whenever A \ B �£, then l is 5-decomposable.By admissibility of l we simply mean that Pf � l is additive for some f such thatPÿ1

f : exists.

Theorem. An uncertainty measure l is admissible if and only if(i) There exists a continuous, increasing function h : �0; 1� ! �0; 1� withh�0� � 0, h�1� � 1 and

h � l�A� � h � l�B�6 1 if A \ B �£ ���:(ii) l is a 5h-decomposable, where 5h is a continuous, Archimedean t-conormwith additive generator h (see e.g., [23] for background).

Proof. Suppose that l is admissible. Take h�x� � Pf �x�. Since Pf � l�A [ B�6 1,inequality ��� is satis®ed. Now ��� implies that

l�A [ B� � Pÿ1f �Pf � l�A� � Pf � l�B��:

Set x5h y � Pÿ1f �Pf � l�A� � Pf � l�B��, where x � l�A� and y � l�B�.

74 H.T. Nguyen / Information Sciences 128 (2000) 67±89

For the converse, we have that for A \ B �£,l�A [ B� � l�A� 5h l�B� �hÿ1�h � l�A��h � l�B�� using ���. Thus, h � l�A [ B� � h � l�A� � h � l�B�.Now set Pf �x� �h�x� and solve for f.

Corollary. Let X be finite and u : X! �0; 1� withP

x2X u�x�6 1. Definelp : 2X ! �0; 1� by lp�A� � 5p�u�A��, where x5p y � min��xp � yp�; 1�1=p

. Then,for p P 1, lp is admissible.

Proof. If fx1; . . . ; xng \ fy1; . . . ; yng �£, then x1 5 � � � 5 xn 5 y1 5 � � � 5 yn ��x1 5 . . .5 xn� 5 �y1 5 . . .5 yn�, so that lp is 5p-decomposable. Note thatx5p y � hÿ1

p �hp�x� � hp�y��, where hp�x� � xp: For A \ B �£,

hp � lp�A� � hp � lp�B� � hÿ1p

Xx2A

up�x� !p"

� hÿ1p

Xx2B

up�x� !p#1=p

:

But sinceP

x2X u�x�6 1, we have hp � lp�A� � hp � lp�B�6 1. The result fol-lows from the above theorem.

Remarks. Maxitive set-functions are not admissible since, Pf being non-decreasing,

Pf � l�A [ B� � Pf �max�l�A�; l�B�� � max�Pf � l�A��; Pf � l�B��< Pf � l�A� � Pf � l�B�:

However, such l are uniform limits of admissible measures. Indeed, weconsider the proof of the previous corollary. Since p P 1,

maxfx1; . . . ; xng6 5p �x1; . . . ; xn�6 maxfx1; . . . ; xng � n1=p ÿ 1:

It follows that max fx1; . . . ; xng � limp!15p�x1; . . . ; xn�, and hence for A � X,lp�A� � 5p�u�A�� converges to max fu�x�;x 2 Ag � l�A�, uniformly in A.

Some belief functions are admissible in Lindley's sense. Indeed, let P be aprobability measure on a ®nite set X. For any positive integer n, P n is a functionof P, and is clearly a belief function! For example, for n � 2, andX � fx1; . . . ;xkg. De®ne m : 2X ! �0; 1� by m��xg� � P 2�fxg�, m�fx;x0g� �2P �fxg�P �fx0g� for all x 6� x0, m � 0 otherwise. Then, for any A � X, we have

XB�A

m�B� �X

x;x02A

P �fxg�P �fx0g� �Xx2A

P �fxg�" #2

� P 2�A�:

In summary, even in Lindley's framework for admissibility, there are non-additive set-functions which are admissible. This is in fact compatible withLindley's result, since Lindley proved that functions of probabilities are ad-missible, but clearly, set-functions which are functions of probabilities need notbe additive!

H.T. Nguyen / Information Sciences 128 (2000) 67±89 75

3. Rough sets

Rough sets, [26], are sets designed to handle partial knowledge in databases.

Consider again a familiar situation in Bayesian statistics. Let H be the pa-rameter space of some statistical model. Let A be a r-®eld of subsets of H.Instead of specifying the distribution Po of the true, unknown parameter ho, it ismore realistic, especially for robustness, to assume that Po belongs to someclass P of probability measures on �H;A�. Consider the case where P is theclass of measures with given values on a partition of H. Speci®cally, letp � fH1;H2; . . . ;Hkg be a (measurable) partition of H. Although Po is un-known, its values on the elements of the partition are known, say, Po�Hi� � ai,i � 1; 2; . . . ; k. Thus, P � fP 2 P : P �Hi� � ai, i � 1; 2; . . . ; kg, where P de-notes the class of all probability measures on �H;A�. One basic question is:how to assign (or approximate) probabilities to other events? This type ofquestion is known in the literature of arti®cial intelligence as probability logic.Several approaches are possible. One can choose an element in P to representPo, say by calling upon the maximum entropy principle; or dealing directly withlower and upper envelops of P as in Section 2. A di�erent view for approxi-mation is this. Instead of approximating the probability of an event A, oneapproximates the event A itself in terms of the elements of the given partition.

Speci®cally, the partition p induces an equivalence relation R on H : hRh0 i�h; h0 2 Hi for some i. The complete boolean algebra D of ``de®nable'' setsconsists of sets which are unions of the Hi's. For each A 2A, the lower andupper bounds of A are de®ned, respectively as

A� � fh 2 H : �h� � Ag;A� � fh 2 H : �h� \ A 6� ;g;

where �h� denotes the equivalence class containing h.Note that A� and A� are both in D. Obviously, A� � A � A�. Moreover, these

are the smallest (respectively, largest) lower (respectively, upper) approxima-tion of A, in the sense of set inclusion. In turn, an equivalence relation on A isinduced: two events A and B are indistinguishable or equivalent (given theavailable information on p) i� A� � B� and A� � B�. Rough sets are de®ned tobe equivalence classes with respect to this equivalence relation on A. Note thatin applications, partitions, possibly of a general nature, on H are induced bymappings from H! Rd , called relational or information tables.

In the above imprecise probability model, there is a close relationship be-tween rough sets and belief function. This can be seen as follows. Let G denotethe lower envelop of P, i.e., G�A� � inf fP �A� : P 2 Pg. Assuming that H is®nite, then G is a belief function. We choose to prove this fact by using roughsets to show that ``semantic'' and ``syntactic'' approximations are closelyrelated.

76 H.T. Nguyen / Information Sciences 128 (2000) 67±89

For each A 2 2H, the map P 2 P ! P �A�� is constant. Thus, we can de®neH : 2H ! �0; 1� by H�A� � P �A��, for any P 2 P (if A� � [iHi thenH�A� �Pi ai). Since \�Ai�� � �\Ai�� and [i�Ai�� � �[iAi��, we have

H�[ni�1Ai� � P ��[iAi���P P �[i�Ai���:

In fact, H 6 P i� P 2 P. Indeed, since A� � A, we have

H�A� � P �A��6 P �A� for any P 2 P:

Conversely, if H 6Q, then, in particular, H�Hi� � P ��Hi��� � P�Hi�6Q�Hi�for i � 1; 2; . . . k. But then, necessarily, Q�Hi� � P �Hi� for all i, so that Q 2 P.

Moreover, H � G, since for each A, there is a PA 2 P such that

PA�A� � PA�A��:In order to investigate reasoning processes, which is essential in decision-making, we need to look at the algebraic structures of rough sets. Now a roughset is characterized as an ``interval'' �A�;A�� in A or in 2H which is a Booleanring. We are led to consider the algebraic logic of ``closed intervals'' in aBoolean ring, (where �A�;A�� � fB � H : A� � B � A�g).

To be general, let us consider a Boolean ring R��; :�, i.e., a ring with unit inwhich every element is idempotent, i.e., 8a 2 R, a � a � a2 � a. For example,R � 2X or R � B�R�, the borel r-®eld of the reals R, where ab � a \ b,a� b � aMb (symmetric di�erence of sets). The partial order on R is de®ned as: a; b 2 R, a6 b i� ab � a. The zero and unit in R are denoted as 0 and 1, re-spectively. Complementation and ``union'' in R are: a0 � 1� a anda _ b � a� b� ab; respectively (and ``intersection'' ^ is taken to be the mul-tiplication). For a6 b, we write �a; b� to mean the ``closed interval'' (by abuse oflanguage) fx 2 R : a6 x6 bg. Let R�2� denote the set of all such intervals.Viewing each a in R as �a; a�, i.e., R � R�2�, we can extend the algebraic structureof R to R�2� in a natural way as follows:

�a; b� _ �c; d� � �a _ c; b _ d�; �a; b� ^ �c; d� � �ac; bd�:R�2� is a lattice where the partial order on it is de®ned by (using the same no-tation): �a; b�6 �c; d� i� a6 c and b6 d. The sup and inf are

sup��a; b�; �c; d�� � �a _ c; b _ d�; inf��a; b�; �c; d�� � �ac; bd�:Clearly this lattice is bounded with zero �0; 0� and unit �1; 1�. Moreover, it isdistributive (_ and ^ are distributed over each other). But R�2� is not comple-mented, since if a6 b then b06 a0. However, if we let �a; b�0 � �b0; b0�, then R�2� ispseudo-complemented. Indeed,

(i) �a; b� ^ �b0; b0� � �0; 0�,(ii) If �a; b� ^ �c; d� � �0; 0�, then db � 0 implying that d 6 b0 sinced � d � db� d�1� b� � db0. Thus, �c; d�6 �b0; b0�, noting that c6 d.

On the other hand, �a; b�00 � �b; b� so that �a; b�0 _ �a; b�00 � �1; 1�, i.e., the Stoneidentity holds in the lattice R�2�, and hence R�2� is a Stone algebra.

H.T. Nguyen / Information Sciences 128 (2000) 67±89 77

Obviously, a rough set �a�; a�� is an element of R�2� for R being a Boolean ringof subsets of some set. But not all closed intervals are rough sets. It can beshown that rough sets form a sub-Stone algebra of R�2�: In a sense, going fromordinary sets to rough sets is, from a logical viewpoint, a step from Boole toStone.

Now, closed intervals in a Boolean ring are the same as cosets of principalideals of that Boolean ring. Recall that an ideal I of R is principal when it is ofthe form Ra � fra : r 2 Rg, and the elements of the quotient ring R=I are co-sets, that is subsets of R of the form

a� I � fa� i : i 2 Ig:Let ) denote the material implication in R, i.e., �b) a� � b0 _ a. Then

a� Rb0 � fa� rb0 : r 2 Rg � �ab; b0 _ a� � �ab; b) a�:Conversely, �a; b� � a� R�b0 _ a�0. In fact, the map �a; b� ! a� R�b) a�0 is abijection. Thus operations among cosets can be derived from those of closedintervals in Boolean rings. This fact will be used next when we discuss condi-tional events or conditional sets, and as a consequence, the semantics of roughsets is closely related to the logic of conditional sets.

4. Conditional sets

Conditional sets or events are novel mathematical objects which can be usedto represent knowledge expressed in terms of conditionals in a consistent waywith use of conditional probabilities as weights of rules.

Information about the behaviour of a system can be given in the form of acollection of if ::: then ::: rules, called a rule base, such as in medical diagnosis.This is referred to as ruled-based or knowledge-based systems in engineeringliterature. For model building in such a case, we need to represent mathe-matically the rules and to ®nd appropriate fusion operators to combine piecesof information. This situation is more general than probabilistic networkswhose knowledge domains consist of collections of random variables V, andsome structure (e.g., V forms a Markov random ®eld) as well as some quan-titative data (e.g., conditional probabilities) are known.

To be general, in the sequent, we will use the general framework and no-tation of Boolean rings.

Consider a rule of the form ``if b then a'', in symbol, b) a The con®denceor strength of each rule should be quanti®ed somehow. In a probabilisticsetting, each b) a is usually quanti®ed by the conditional probability of agiven b, P �ajb�. But then b) a cannot be modeled as a material implication,since

78 H.T. Nguyen / Information Sciences 128 (2000) 67±89

P �b0 _ a� � P �ajb� � P �a0jb�P �b0�P P �ajb�with equality holding only in trivial cases. Thus, the material implication (bi-nary) operator �a; b� ! b0 _ a is not compatible with conditional probabilityevaluations. In view of this observation, one might attempt to look for otherbinary operations } on R, i.e., } : R� R! R, such that P �a}b� � P �ajb� forall probability P on R, and for all a; b 2 R , with b 6� 0, to represent rulescompatible with conditional probabilities. Such attempts have been laid to restby the so-called Lewis' triviality result.

Lewis' triviality result. Let R be a Boolean ring with more than four ele-ments. Then there is no binary operation } on R such that for all probabilitymeasures P on R, and all a; b in R with P�b� > 0,

P �a}b� � P �ajb�:Lewis' triviality result can be seen intuitively in the case, where R is ®nite asfollows. If such } existed, then since a}b is an element of R, P �ajb� can haveno more than jRj (cardinality of R) values. But it can be shown that there is a Pon R such that P�ajb� takes more than jRj values. In fact, we can do a bit more:let X be a ®nite set with jXj � n > 0, and let R be the Boolean algebra of allsubsets of X. Let P be any probability measure on R, then there are no morethan 3n ÿ 2n�1 � 3 possible values for P �ajb�. Further, then there is a P suchthat P �ajb� takes on 3n ÿ 2n�1 � 3 distinct values. Thus, for n > 3, we have3n ÿ 2n�1 � 3 > 2n. For Lewis' original proof and other alternate proofs of theabove result as well as all references surrounding the study of conditional eventsand their logics, see [10].

To the best of our knowledge, DeFinetti seems to be the ®rst to speak aboutthe term conditional events, i.e., a mathematical object, denoted as hajbi, outsideof the operator P. Note that, from a probabilistic standpoint, P�ajb� simplystands for the probability of a, conditioned on b, i.e., Pb�a�. What we are dis-cussing here is this. Can an object hajbi be found so that P �< ajb >� � P �ajb�?In view of Lewis' triviality result, such an object cannot be an element of R. Butclearly, Lewis' triviality result does not say, however, that we cannot de®ne hajbioutside of R! But then, P, with domain R, needs to be extended accordingly forP �hajbi� to make sense. Below, we will present two solutions to this problem.

De®nition. Let R be a Boolean ring. For a; b 2 R, the conditional event hajbi isde®ned to be the coset a� Rb0.

Remarks. The above de®nition of conditional events is derived axiomaticallyfrom desired properties of such objects.

In view of the discussions in Section 3, hajbi � �ab; b0 _ a�, thus, a condi-tional event, except when b � 1, is not an ordinary event, i.e., not an element of

H.T. Nguyen / Information Sciences 128 (2000) 67±89 79

R, but is a collection of ordinary events. Since any closed interval is a condi-tional event: �a; b� � hajb0 _ ai, we see that the space of all conditional events isR�2�. Viewing a in R as haj1i, we consider R as sitting inside R�2�, i.e., R�2� is theextension of R to house conditional events. As we have seen in Section 3, R�2� isnot Boolean, and hence it is not clear how P on R should be extended to R�2�. Ifwe consider uncertainty measures in a general sense, then we can extend P asfollows. De®ne P ^on R�2� as P �̂hajbi� � P �ajb�. This assignment is well-de®ned.Indeed, it is easy to check that hajbi � hcjdi if and only if ab � cd and b � d;and hence P �̂hajbi� � P �̂hc; di�.

Conditional events or sets are mathematical objects to represent if... thenrules compatible with conditional probability evaluations. To combine them,we can simply extend operations from R to R�2�. For example,

hajbi0 � ha0jbi:

hajbi ^ hcjdi � �a; b0 _ a� ^ �c; d 0 _ c� � �ac; �b0 _ a��d 0 _ c��� hacja0b _ c0d _ bdi:

hajbi _ hc; di � �a; b0 _ a� _ �c; d 0 _ c� � �a _ c; �a _ c� _ �a _ c _ bd�0�� ha _ cja _ c _ bdi:

These logical operations on R�2� form a syntax for conditional events. It turnsout that the semantics of conditional events is a three-valued logic. This isbasically due to the analogous relationship between Boolean algebras andclassical two-valued logic: there exists an analogous relationship between R�2�

and three-valued logic. Roughly speaking, while any Boolean algebra R con-tains the two-element Boolean algebra f0; 1g � V , the Stone algebra R�2� con-tains the three-element Stone algebra V �2� � fh0j1i; h1j1i; h0j0ig which plays therole of the space of truth values, namely h0j1i; h1j1i and h0j0i are identi®edwith false, true and undecided �u�, respectively. In view of the logical opera-tions on R�2�, the logic of conditional events is the Lukasiewicz's three-valuedlogic. See [10] for details. For further discussions on conditionals as cosets andtheir associated logics, see [13,19].

We present now a ``Boolean'' solution to the problem of de®ning condi-tional events compatible with conditional probabilty evaluations, that is we aregoing to imbed a Boolean r-algebra R into a bigger Boolean r-algebra in whichconditional events live.

Let �X;A; P � be a probability space. Then, for a; b 2A with P �b� 6� 0,

P �ajb� � P �ab�=P �b� � P�ab�=�1ÿ P �b0�� � P �ab�X1n�0

�P �b0��n

�X1n�0

P �ab��P �b0��n:

80 H.T. Nguyen / Information Sciences 128 (2000) 67±89

This suggests that, as division operation on real numbers can be expressed interms of multiplication and in®nite sums, ``division of sets'' (an informal, butalgebraic, way of looking at conditional events) could be de®ned by an ap-propriate counter-part. This is exactly what we are going to do.

Let �X;A� be a measurable space. The in®nite product space �X1;A1� isconstructed as usual by taking X1 to be the in®nite countable Cartesianproduct space X� X� � � � � X� � � �, and A1 to be the associated in®niteproduct r-®eld of the A's. For simplicity, we write, e.g., a for the elementa� X� X� � � � of A1, a� b� c for a� b� c� X� X� � � � The set union inX1 is written as _.

For a; b 2A, we associate the element of A1

a�a; b� � ab _ �b0 � ab� _ �b0 � b0 � ab� _ � � � �_

n P 0

�b0�n � ab;

where �b0�n � b0 � b0 � � � � � b0 (n times), with �b0�n � ab � ab for n � 0. Notethat, as subsets of X1, the �b0�n � ab are pairwise disjoint.

Now, if P is any probability measure on �X;A�, then P1 will denote thein®nite product measure of the measures Pn � P , for all n. By de®nition ofproduct measure, and disjointness of the �b0�n � ab, n P 0, we have

P1_

n P 0

�b0�n

� ab

!�Xn P 0

P1��b0�n � ab� �Xn P 0

P�ab��P �b0��n � P �ajb�:

Thus, a : A�A!A1 is a map such that for any P on A, for any a; b 2Awith P �b� 6� 0, we have P1�a�a; b�� � P�ajb�.

In more formal terms, to each measurable space �X;A�, there is a mea-surable space �X1;A1� and a map a : A�A!A1 such that for anyprobability measure P on A, there is a probability measure P1 on A1 suchthat for any a; b 2A with P�b� 6� 0, we have P1�a�a; b�� � P �ajb�. Obviously,the mathematical object a�a; b�, which is an element of the boolean r-®eld A1,is completely quali®ed as a conditional set or event. For more developments ofthe Boolean notion of conditional events as well as its applications to datafusion, see [11].

5. Fuzzy sets

Random elements mentioned by Fr�echet [7] in his pioneering work onrandom elements on metric spaces di�er from standard random objects sincethe concepts or characteristics involed are described in a natural languagerather in a mathematical language. This is basically because natural languagescan describe fuzzy concepts. Fuzzy concepts are concepts which are hard todescribe precisely! They are in our daily conversations! For example, ``tall

H.T. Nguyen / Information Sciences 128 (2000) 67±89 81

buildings'', ``unlikely'', ``young people'', ``beautiful'',... If we need to take intoaccount of the information containing fuzzy concepts, for example for machineintelligence, then we wish to be able to represent these fuzzy concepts mathe-matically. In the following, we clarify some connections between random setsand fuzzy sets and address Elkan's objection to fuzzy technology.

5.1. Mathematical modeling of fuzzy concepts

Statisticians often use fuzzy concepts to explain their analysis or ®ndings,such as ``under the null hypothesis, it is unlikely that this situation can occur''.Of course, they use 1% or less, say, to mean unlikely! In another direction, it issometimes desirable to understand fuzzy concepts from linguistic reports. Thisis exempli®ed by the studies by Mosteller and Youtz [20] in which they tried tocome up with numerical proportions to quantify probabilistic expressions suchas high probability, very improbable, low chance, unusual, once in a while ::: .Now, clearly quantifying linguistic expressions is a way to understand themeaning of these expressions in some approximate fashion. This is somewhatsimilar to putting threshold values on imprecise quantities to make them precise!

To get to the heart of fuzzy concepts for extracting more meaningful in-formation they contain, a better meaning representation of fuzzy concepts isneeded. Based upon the view that meaning is a matter of degree, Zadeh in 1965proposed to model fuzzy concepts as fuzzy sets which are generalizations ofordinary (crisp) sets via membership (indicator) functions. See [23] for amathematical introduction to the theory of fuzzy sets and logics.

Implicit in any fuzzy concept A, such as high probability, is a universe ofdiscourse U, here U � �0; 1�: The meaning of such a concept can be described asthe gradual grades of membership of elements in U. This is achieved by gen-eralizing the concept of indicator functions of ordinary (crisp) sets to maps:U ! �0; 1�. In other words, a fuzzy subset of a set U is a map U ! �0; 1�: Foru 2 U , the value A�u� is the membership of u in A, representing the degree towhich u is compatible with the meaning of A. The unit interval �0; 1� is chosento express numerical degrees, as a ®rst approximation; other lattices, such assub-intervals of �0; 1�, can be used as well. It is interesting to note that in thestudy of values for non-atomic games, Aumann and Shapley [1, pp. 141±144],are led to model the notion of ``evenly spread'' measurable sets (of players) by akind of ideal sets which are formally fuzzy sets, although in a footnote, theywrote ``formally, our ideal sets are similar to``fuzzy sets'' of Zadeh (1965), butintuitively the ideas are somewhat di�erent''.

One of the important questions for applications is how to obtain member-ship functions for fuzzy concepts? This issue will be discussed next. As foranother important question, namely, how to reason with fuzzy concepts, werefer the reader to, e.g., [23]. For reasoning with probability measures, see [3].

82 H.T. Nguyen / Information Sciences 128 (2000) 67±89

In statistics, each random variable is postulated to have a unique, say,probability density function which then can be estimated from samples. Thestatistical reasoning, say in hypothesis testing, is based simply on the classicaltwo-valued logic because events are identi®ed as ordinary sets of the samplespace. Now, it is somewhat clear that not only that each fuzzy concept doesnot have a sharply de®ned boundary (which is in fact the main characteristicof any fuzzy concept) but also, its meaning is subjective: for example, peopleunderstand or use the term high probability di�erently. In statistics, thissubjectivity is re¯ected in using the concept of P-value in testing of hy-potheses: we leave the interpretation of unlikely to the decision-maker. Notethat while people interprete, say, unlikely, di�erently, there is of coursesomething in common in their understanding of the meaning of unlikely! Inother words, there is some invariant in each fuzzy concept, perhaps the shapeof its membership function.

In applications, we need to supply a membership function to each fuzzyconcept involved. This can be done in a variety of ways. Each should be viewedas an estimation procedure. For example, membership functions of fuzzyconcepts in medical sciences could be supplied directly by experts. In engi-neering, membership functions of fuzzy concepts are in general supplied bytuning procedures. Of course statistical sampling methods can also provide oneway to obtain membership functions. This should not be viewed that ran-domness subsumes fuzziness! The two types of uncertainty are distinct. Belowwe elaborate a little bit on the connection between fuzzy sets and random setsfor statistical estimation purposes.

5.2. Connections with random sets

Let A be a fuzzy subset of a set U. The a-level set of A isAa � fu 2 U : A�u�P ag. Note that A�u� � R 1

01Aa�u�da. These Aa can be ob-

tained by randomizing the level a, i.e., by choosing a randomly in �0; 1�.Speci®cally, let a denote a random variable, de®ned on some probabilityspace �X;A; P � with values in �0; 1�, uniformly distributed. Then, for eachu 2 U ,

Pfx 2 X : u 2 SA�x�g � Pfx : A�u�P a�x�g � A�u�;where SA is the random set on U, de®ned by SA�x� � fu 2 U : A�u�P a�x�g.

This relationship suggests that, as an estimation procedure, membershipfunctions of fuzzy concepts can be obtained via probability measures P andrandom sets SA. More generally, P could be replaced by a non-additive set-function, and SA by a set-valued function, see [25].

In the above relationship, the range R�SA� of the canonical random set SA

associated with A is nested in the sense that R�SA� � fAt : 06 t6 1g is totallyordered (by set-inclusion). It is clear that Pfx : a�x� 2 R�SA�g � 1.

H.T. Nguyen / Information Sciences 128 (2000) 67±89 83

Moreover, for any At 2 R�SA�fx : At � SA�x�g � fx : A�At� � �a�x�; 1�g � fx : a�x�6 inf A�At�g 2A;

where, as usual, A�At� is the image of the set At by the function A���. This leadsto the following de®nition.

De®nition. Let S : �X;A; P � ! �E;E� be a random set in U, where E � 2U . Wesay that S is nested if

(i) R�S� is totally ordered (by set-inclusion) family of subsets of U,(ii) For any B 2 R�S�, we have fx : B � S�x�g 2A .It turns out that SA is the only nested random set representing A, i.e., the

membership function of A is the covering function of the random setSA : A�u� � Pfu 2 SAg, for all u 2 U .

Theorem. Let S be a random set on U with covering function u�u� � P �u 2 S�: IfS is nested, then S � Su in distribution, where Su is the canonical random setrepresenting u : U ! �0; 1�, i.e., Su is the random set obtained by randomizing thea-level sets of u.

Proof. It su�ces to show that, for any B 2 R�S�

Pfx : B � S�x�g � Pfx : B � Su�x�g:

This is obviously the case when B � ;. For B 6� ;, let W�B� � fD 2 R�S� :; 6� D � Bg. If W�B� � ;, then for any x 2 B, we have, since R�S� is totallyordered

fx : x 2 S�x�g � fx : B � S�x�g � ���:

If W�B� 6�£, then there exists some x 2 B \ Dc for any D 2W�B�. For that x,��� is clearly satis®ed. Thus, whenever B 6�£, there is an x 2 B such that ���holds. Now

P �x : B � S�x�� � P �x : y 2 S�x� 8y 2 B�6 P �x : z 2 S�x� for some z 2 B�:

Next

P �x : B � S�x�� � P �x : y 2 S�x� 8y 2 B�6 P �x : z 2 S�x� for some z 2 B�:

Thus

P �x : B � S�x��6 inffP �x : y 2 S�x��; y 2 Bg � inf u�B�;P �x : B � S�x��6 inffP �x : y 2 S�x��; y 2 Bg � inf u�B�:

84 H.T. Nguyen / Information Sciences 128 (2000) 67±89

But, in view of ���, we have

inf u�B� � inffP �x : y 2 S�x�� : y 2 Bg6 P �x : x 2 S�x�� � P �x : B � S�x��:

For the case where U is ®nite, it is possible to specify the class S�A� all randomsets representing a fuzzy subset A of U, i.e., random sets having A as theircommon covering function.

Without loss of generality, assume U � f1; 2; . . . ; ng. A random set S on U ischaracterized by its probability function f : 2U ! �0; 1�, where

f �B� � P �x : S�x� � B�:If we let Vi�x� � 1S�x��i�, i 2 U , then Vi is a random variable taking values inf0; 1g with

P �x : Vi�x� � 1� � P �x : i 2 S�x�� � 1ÿ P �x : Vi�x� � 0�:The distribution of the random vector VS � �V1; V2; . . . ; Vn� is completely de-termined by that of S and vice versa. Indeed, for any x � �x1; . . . ; xn� in �0; 1�n,we have

P �x : VS�x� � x� � Pfx : �V1�x�; . . . ; Vn�x�� � xg� P �x : S�x� � B� where B � fi 2 U : xi � 1g:

For any S 2 S�A�, the cumulative distribution function (cdf) of each Vi is

GA�i��x� �0 if x < 0;

1ÿ A�i� if 06 x < 1;1 if 16 x:

8<:If the fuzzy set A is given, then the marginal cdfs GA�i�; i � 1; . . . ; n, are known.But then, according to Sklar theorem, the joint cdf of V � �V1; . . . ; Vn�, each Vi

having GA�i� as cdf, is of the form

F �x1; . . . ; xn� � C�GA�1��x1�; . . . ;GA�n��xn��;

where C is an n-copula, see e.g., [21] for background on Sklar's theorem andcopulas in statistics. Thus, we have the following.

Theorem. Let A be a fuzzy subset of a finite set U. Then all possible distributionsfor random sets S in S�A� are of the form C�GA�u�; u 2 U� with C being any jU j-copula. In particular, the canonical nested random set Aa��� corresponds to thechoice of the copula C�x1; . . . ; xjU j� � minfxi : i � 1; . . . ; jU jg, xi 2 �0; 1�.

H.T. Nguyen / Information Sciences 128 (2000) 67±89 85

Remarks. The above result can be readily extended to the case of a ®nitenumber of fuzzy sets on ®nite domains. Speci®cally, let A�j� be fuzzy sets ofUj, j 2 J (a ®nite index set). Then the joint distribution of �Sj, j 2 J�, whereeach Sj 2S�A�j��, is of the form C�GA�j��x�; x 2 Uj; j 2 J�, where C is aj [j2J �Uj � fjg�j-copula.

Random sets can be extended to random fuzzy sets as random elementstaking fuzzy sets as values. This can be achieved within Fr�echet's framework,where an extended Hausdor� metric on the space of fuzzy subsets of Rd can bede®ned, see [4]. Obviously, fuzziness, when appropriate, can be used to im-prove statistical analysis. This is exampli®ed by Manton et al. [16]. See also [14]for statistics of fuzzy data.

5.3. Propositional calculus of fuzzy sets

Among the objections to fuzzy theory, perhaps the paper by CharlesElkan [5] has caused some confusion to outsiders as well as to practic-ioners of fuzzy technology. His result is ``we show that as a formal system,a standard version of fuzzy logic collapses mathematically to two-valuedlogic :::''

Here is Elkan's set-up and result. Consider the ``standard'' form of fuzzylogic: given simple assertions and logical connectives ``and'' �^�, ``or'' �_�, and``not'' �0�, suppose truth evaluation t maps composed assertions to �0; 1� ac-cording to:

(i) t�A ^ B� � minft�A�; t�B�g; t�A _ B� � maxft�A�; t�B�g; t�A0� � 1ÿ t�A�;assume also(ii) t�A� � t�B� if A and B are logical equivalent in the sense of classical two-valued propositional calculus.Then for any two assertions A and B, either t�B� � t�A� or t�B� � 1ÿ t�A�.

As a consequence, ``only two di�erent values are in fact possible in the (above)formal system''!

Remarks.

(a) Obviously, it is the assumption (ii) which produces the paradoxicalresult. One might wonder why Elkan considered two-valued logical equiva-lence for a formal system whose truth values are uncountably in®nite? Theanswer seems to lie in the desirable fact that a practical logical equivalencenotion should be algorithmic, i.e., there exists a ®nite algorithm for deter-mining when two assertions are logically equivalent. With this in mind,Nguyen et al., [24] addressed Elkan's paper when the truth space �0; 1� isreplaced by the lattice of sub-intervals of �0; 1�, i.e., for interval-valued fuzzylogic. More speci®cally, in the wake of Elkan's paper, they presented a nor-mal form for interval-valued fuzzy logic and thus a ®nite algorithm for

86 H.T. Nguyen / Information Sciences 128 (2000) 67±89

checking logical equivalence in that logic. An interesting connection withlogic programming was also established. However, a rigorous discussion ofmathematical logic involved in Elkan's paper and the proof that logicalequivalence in �0; 1�-valued fuzzy logic of Zadeh is in fact algorithmic (andhence Elkan's assumption (ii) above should be replaced by a correct notion oflogical equivalence so that his ``paradoxical phenomenon'' will disappear)were due to Gehrke et al. [8] one year later.

(b) It is interesting to observe that there are some magic numbers in theabove story: the �0; 1�-valued fuzzy logic is algorithmic in the sense that itspropositional calculus (without implication operators involved) is the same as thatof a three-valued logic, whereas the interval-valued fuzzy logic has a four-valuedpropositional calculus. This reminds us of the famous Kharitonov's theorem inrobust control theory, see e.g., [2]: in order to check stability of an intervalpolynomial family, it su�ces to check only four canonical polynomials in thatfamily.

The construction of a propositional calculus of any formal system is asfollows. We start out with the building blocks which consist of a set V of en-tities called variables or primitive propositions, and three connective symbols_;^; �0�. Then the set of well-formed formulas F is constructed inductively asfollows: if u is a variable, then u is a formula; if u, v are formulas, thenu _ v; u ^ v; u0 are formulas. Note that no meaning has been attached to any-thing yet. The concept of logical equivalence on F is de®ned once we specify atruth set T.

F � �V ;^;_; �0�; 0; 1� is in fact an algebra of type �2; 2; 1; 0; 0�, i.e., F is a pairhV ;Fi, where V is a set and F � f^;_; �0�; 0; 1g is a collection of operationson V, the type of F being a list of the arities of the operations in F. The truthset T is an algebra of the same type as F. For example, in fuzzy logic,T � ��0; 1�;^ � min;_ � max; x0 � 1ÿ x, 0; 1�, which is a Kleene algebra. LetHom�F;T� denotes the set of all homomorphisms from F to T. Then twoformulas a; b 2 F are said to be logically equivalent under T, in symbol a �T b,i� u�a� � u�b� for any u 2 Hom�F;T�. Note that the equivalence relation �T

is in fact a congruence relation. The quotient algebra F= �T is de®ned to be thepropositional calculus (or logic) over V with algebra of truth values T. Fromthis standard setting of mathematical logic, the formal system of fuzzy sets isnot at all the one Elkan had in mind! Since T is not the Boolean algebra2 � ff0; 1g;^;_; �0�; 0; 1g! Now, consider a three-valued logic 3 � ff0; u; 1g;^;_; �0�;0; 1g, where f0; u; 1g is a chain, with u standing for ``undecided'',which could be taken as 1=2, and the involution �0� sends 0 to 1, u to u, and 1to 0. Gehrke et al. [8] showed that using truth values f0; u; 1g and using�0; 1� give the same equivalence relation on F, and thus the same resultingpropositional calculi : F= �T� F= �3, where T is the Kleene algebraf�0; 1�;min;max; 1ÿ ���; 0; 1g.

H.T. Nguyen / Information Sciences 128 (2000) 67±89 87

Acknowledgements

In appreciation of signi®cant collaborations throughout many years, Iwould like to thank I.R. Goodman, Vladik Kreinovivh, Carol and ElbertWalker.

References

[1] R.J.L.S. Aumann, Values of Non-Atomic Games, Princeton University Press, Princeton, NJ,

1974.

[2] B.R. Barmish, New Tools for Robustness of Linear Systems, Macmillan, New York,

1994.

[3] B.M. Bennett, D.D. Ho�man, P. Murthy, Lebesgue logic for probabilistic reasoning and some

applications to perception, J. Math. Psychol. (37) (1993) 63±103.

[4] P. Diamond, P. Kloeden, Metric Spaces of Fuzzy Sets, World Scienti®c, Singapore, 1994.

[5] C. Elkan, The paradoxical success of fuzzy logic, in: Proceedings of the 11th Conference on

Arti®cial Intelligence, Washington, DC, July 1993, pp. 698±703.

[6] K. Falconer, Fractal Geometry: Mathematical Foundations and Applications, Wiley, New

York, 1990.

[7] M. Fr�echet, Les �el�ements al�eatoires de nature quelconque dans un espace distanci�e, Ann. H.

Poincar�e X (IV) (1948) 215±310.

[8] M. Gehrke, C. Walker, E. Walker, A Mathematical setting for fuzzy logic, Int. J. Uncertainty,

Fuzziness Knowledge-Based Systems 5 (3) (1997) 223±238.

[9] I.R. Goodman, H.T. Nguyen, G.S. Rogers, On the scoring approach to admissibility

of uncertainty measures in expert systems, J. Math. Anal. Appl. 159 (2) (1991) 550±

594.

[10] I.R. Goodman, H.T. Nguyen, E. Walker, Conditional Inference and Logic for

Intelligent Systems: A Theory of Measure-Free Conditioning, North-Holland, Amster-

dam, 1991.

[11] I.R. Goodman, R. Mahler, H.T. Nguyen, Mathematics of Data Fusion, Kluwer Academic

Publishers, Dordrecht, 1997.

[12] J. Goutsias, R. Mahler, H.T. Nguyen, Random Sets: Theory and Applications, IMA Volumes

in Mathematics and its Applications No. 97, Springer, New York, 1997.

[13] T. Hailperin, Sentential Probability Logic, Lehigh University Press, Bethlehem, 1996.

[14] R. Kruse, D. Meyer, Statistics with Vague Data, Kluwer Academic Publishers, Dordrecht,

1987.

[15] D. Lindley, Scoring rules and the inevitability of probability, Int. Statist. Rev. 50 (1982) 1±26.

[16] K.G. Manton, M.A. Woodbury, H.D. Tolley, Statistical Applications Using Fuzzy Sets,

Wiley, New York, 1994.

[17] G. Math�eron, Random Sets and Integral Geometry, Wiley, New York, 1975.

[18] A. Meyrowitz, F. Richman, E. Walker, Calculating maximum entropy densities for

belief functions, Int. J. Uncertainty, Fuzziness Knowledge-Based Systems 2 (4) (1994)

377±389.

[19] P. Milne, Bruno de Finetti and the logic of conditional events, Brit. J. Philos. Sci. 48 (2) (1997)

195±232.

[20] F. Mosteller, C. Youtz, Quantifying probability expressions, Statist. Sci. 1 (15) (1990) 2±34.

[21] R.B. Nelsen, An introduction to copulas, Lecture Notes in Statistics No. 39, Springer, New

York, 1999.

88 H.T. Nguyen / Information Sciences 128 (2000) 67±89

[22] H.T. Nguyen, E.A. Walker, On decision-making using belief functions. in: R. Yager, J.K.

Kacprzyk, M. Fedrizzi (Eds.), Advances in the Dempster±Shafer Theory of Evidence, Wiley,

New York, 1994, pp. 311±330.

[23] H.T. Nguyen, E. Walker, A First Course in Fuzzy Logic, second ed., Chapman & Hall/CRC,

Boca Raton, 1999.

[24] H.T. Nguyen, V. Kreinovich, O. Kosheleva, Is the success of fuzzy logic really paradoxical?,

Int. J. Intell. Sys. 5 (1996) 295±326.

[25] S.A. Orlowski, Calculus of Decomposable Properties, Fuzzy Sets and Decisions, Allerton

Press, 1994.

[26] Z. Pawlak, Rough Sets, Kluwer Academic Publishers, Dordrecht, 1992.

[27] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, 1976.

[28] P. Walley, Statistical Reasoning with Imprecise Probabilities, Chapman & Hall, London, 1991.

H.T. Nguyen / Information Sciences 128 (2000) 67±89 89