Order Structure of Symbolic Assertion Objects

6
830 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 5, OCTOBER 1994 Concise Papers Order Structure of Symbolic Assertion Objects P. Brito Abstracf-We study assertion objects that constitute a particular class of symbolic objects. Symbolic objects constitute a data analysis driven formalism, which can be compared to propositional calculus, but which is oriented toward the duality intension (characteristic properties) versus extension (set of all individuals verifying a given set of properties). The set of assertion objects is endowed with a partial order and a quasi-order. We focus on the property of completeness, which precisely expresses the duality intension-extension. The order structure of complete assertion objects is studied, using notions of lattice theory and Galois connection, and extending Wille’s work to multiple-valued data. Two results are then obtained for particular cases. Index Terms- Concept, intension, knowledge representation, lattice, symbolic object I. INTRODUCTION The need to process data that go beyond the classical tabular model of data analysis has led to the introduction of a new formalism of knowledge representation, based on the notion of symbolic object [7], [I 1). Generally speaking, a symbolic object is a description by intension of a set of individuals which constitute its extension. This description is expressed by means of a conjunction of events in terms of the values taken by the variables. Several works have been reported in the area of symbolic data analysis (see, e.g., Diday [S], [9], [IO]; Brito, Diday [5]; De Carvalho [6]; Jacq [ 151; Lebbe [ 161; Sebag [21]; and Vignes [22]). Among the symbolic objects introduced by Diday [7], we shall consider only assertion objects, that is, conjunctions of properties on the variables whose associated functions are applied to a single individual. The set of assertion objects is partially ordered by a “specialization-generalization” relation. This relation has also been considered by Mitchell [20] and Gascuel [14], among others. We have further introduced and studied union and intersection operators on assertion objects. In this context, Michalski [18], [19] proposed a formalism of knowledge representation that is mainly based on propositional and predicate calculus. Unlike logic-based systems, the formalism pre- sented here allows for an explicit interpretation within its framework by considering the duality intension-extension. This results from the wish to keep a statistics point of view. Nevertheless, some notions are common to both formalisms. The main aim of this paper is the study of the property of “completeness,” which expresses the duality intension-extension. An assertion object is said to be complete if it describes exhaustively its extension, i.e., by giving all its properties, and, if it is minimal, for the introduced order relation, to fulfill this condition 171, [I I]. We formalize this notion in the framework of the theory of Galois connections [2], [3] and study the order structure of complete assertion objects. The notion of c-connection is introduced as being a pair of mappings (f, t/) between two partially ordered sets that Manuscript receivedJuly 25, 1991; revisedOctober 14, 1992. The author is with the Universidadede Aveiro, Departamento de Mate- mitica, Campus Universita rio de Santiago,3800 Aveiro, Portugal, and the INRIA-Projet CLOREC, Rocquencourt, 78153 Le Chesnay, France. IEEE Log Number 9214377. should fulfill given conditions. A complete assertion object is then defined as a fixed point of the composed f o 9, this mapping is called a “completeness operator,” because it “completes” a given assertion object. We show that the set of complete assertion objects forms a lattice, and we state how suprema and infima are obtained. A point of departure of our approach was the work of the school of Darmstadt [13], [23]. We generalize this approach [23], based on a binary table, already proposed by Barbut and Monjardet [2]. Several different cases are then considered. II. THE SYMBOLIC OBJECTS In general terms, a symbolic object is defined as a description that is expressed by means of a conjunction of statements on the values taken by the variables. Let 12 be the set of observed individuals, II = {(I,, . ... . II’ ,, 1 c n, where II is the entire population under study. Each individual is characterized by variables y,: II + 0,. i = 1.. . . . p. Let 1? = (~1.. . {I,,). We then have the following mapping: 1’ : n + 0, x “’ x o,, (I’ -+ (~1(l/‘ )..“.!/,,(Il’ )). The point (yl( ub) .... . y,,( l(s)) E 0 = 01 x .. . x O,, is called “description of (13,” and 0 = 01 X. x O,, the description space (also called “event space” in the literature [ 181, [ 191; the term “description space” has been chosen to stress the fact that we are dealing with descriptions of units of a population II). Given a symbolic object, we may consider the set of elements of 12 verifying it. This set is called the extension (on $1) of the symbolic object. A symbolic object can then be apprehended from two points of view: by intension, as the conjunction that defines it, and by extension, when we consider the set of observations that fulfill this conjunction. Hence, a symbolic object can be considered as a representation by intension of its extension (given (1). A. The Elementary Events An elementary event, denoted e = [y, = I;], where 1 5 i 5 p. T 1 C 0,) expresses the condition, “Variable .r/, takes its values in ii.” We then define the mapping associated to f as follows: c: 0 + {true, false}. such that ?(.I,, . . .. r‘ ,,) = true iff .I’ , E I;. Let 1;~ denote the restriction of I* to II, and let us consider the composed mapping 1’* = C 0 Ii<!, c*: 12 + {true, false} II’+ fa(ljc,( II’ )). c* ( l(s) = true iff !I,( tr’ ) is an element of 1;. We call extension of r on !I the set elements of !I whose description verifies r: ext<,c = {(I‘ E I]/?*( (I’ ) = true} = c *-‘ (true). The virtual extension of c. on 0, is defined as follows: extor = {X E O/~(X) = true} = pm‘ (true). The virtual extension is in fact the image by 1. of all imaginable individuals whose description would verify (‘ . I04 I-4347/94$04.00 0 1994 IEEE

Transcript of Order Structure of Symbolic Assertion Objects

830 IEEE TRANSACTIONS O N KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 5, OCTOBER 1994

Concise Papers

Order Structure of Symbolic Assertion Objects

P. Brito

Abstracf-We study assertion objects that constitute a particular class of symbolic objects. Symbolic objects constitute a data analysis driven formalism, which can be compared to propositional calculus, but which is oriented toward the duality intension (characteristic properties) versus extension (set of all individuals verifying a given set of properties). The set of assertion objects is endowed with a partial order and a quasi-order. We focus on the property of completeness, which precisely expresses the duality intension-extension. The order structure of complete assertion objects is studied, using notions of lattice theory and Galois connection, and extending Wille’s work to multiple-valued data. Two results are then obtained for particular cases.

Index Terms- Concept, intension, knowledge representation, lattice, symbolic object

I. INTRODUCTION

The need to process data that go beyond the classical tabular model of data analysis has led to the introduction of a new formalism of knowledge representation, based on the notion of symbolic object [7], [I 1). Generally speaking, a symbolic object is a description by intension of a set of individuals which constitute its extension. This description is expressed by means of a conjunction of events in terms of the values taken by the variables. Several works have been reported in the area of symbolic data analysis (see, e.g., Diday [S], [9], [IO]; Brito, Diday [5]; De Carvalho [6]; Jacq [ 151; Lebbe [ 161; Sebag [21]; and Vignes [22]). Among the symbolic objects introduced by Diday [7], we shall consider only assertion objects, that is, conjunctions of properties on the variables whose associated functions are applied to a single individual. The set of assertion objects is partially ordered by a “specialization-generalization” relation. This relation has also been considered by Mitchell [20] and Gascuel [14], among others. We have further introduced and studied union and intersection operators on assertion objects.

In this context, Michalski [18], [19] proposed a formalism of knowledge representation that is mainly based on propositional and predicate calculus. Unlike logic-based systems, the formalism pre- sented here allows for an explicit interpretation within its framework by considering the duality intension-extension. This results from the wish to keep a statistics point of view. Nevertheless, some notions are common to both formalisms.

The main aim of this paper is the study of the property of “completeness,” which expresses the duality intension-extension. An assertion object is said to be complete if it describes exhaustively its extension, i.e., by giving all its properties, and, if it is minimal, for the introduced order relation, to fulfill this condition 171, [I I]. We formalize this notion in the framework of the theory of Galois connections [2], [3] and study the order structure of complete assertion objects. The notion of c-connection is introduced as being a pair of mappings (f, t/) between two partially ordered sets that

Manuscript received July 25, 1991; revised October 14, 1992. The author is with the Universidade de Aveiro, Departamento de Mate-

mitica, Campus Universita rio de Santiago, 3800 Aveiro, Portugal, and the INRIA-Projet CLOREC, Rocquencourt, 78153 Le Chesnay, France.

IEEE Log Number 9214377.

should fulfill given conditions. A complete assertion object is then defined as a fixed point of the composed f o 9, this mapping is called a “completeness operator,” because it “completes” a given assertion object. We show that the set of complete assertion objects forms a lattice, and we state how suprema and infima are obtained. A point of departure of our approach was the work of the school of Darmstadt [13], [23]. We generalize this approach [23], based on a binary table, already proposed by Barbut and Monjardet [2]. Several different cases are then considered.

II. THE SYMBOLIC OBJECTS

In general terms, a symbolic object is defined as a description that is expressed by means of a conjunction of statements on the values taken by the variables.

Let 12 be the set of observed individuals, II = {(I,, . ... . II’,, 1 c n, where II is the entire population under study. Each individual is characterized by variables y,: II + 0,. i = 1.. . . .p. Let 1? = (~1.. . {I,,). We then have the following mapping:

1’: n + 0, x “’ x o,,

(I’-+ (~1(l/‘)..“.!/,,(Il’)).

The point (yl( ub).... . y,,( l(s)) E 0 = 01 x .. . x O,, is called “description of (13,” and 0 = 01 X. x O,, the description space (also called “event space” in the literature [ 181, [ 191; the term “description space” has been chosen to stress the fact that we are dealing with descriptions of units of a population II).

Given a symbolic object, we may consider the set of elements of 12 verifying it. This set is called the extension (on $1) of the symbolic object. A symbolic object can then be apprehended from two points of view: by intension, as the conjunction that defines it, and by extension, when we consider the set of observations that fulfill this conjunction. Hence, a symbolic object can be considered as a representation by intension of its extension (given (1).

A. The Elementary Events An elementary event, denoted e = [y, = I;], where 1 5 i 5

p. T 1 C 0,) expresses the condition, “Variable .r/, takes its values in ii.” We then define the mapping associated to f as follows:

c: 0 + {true, false}.

such that ?(.I,,. . ..r‘,,) = true iff .I’, E I;. Let 1;~ denote the restriction of I* to II, and let us consider the composed mapping 1’* = C 0 Ii<!,

c*: 12 + {true, false}

II’ + fa(ljc,( II’)).

c* ( l(s) = true iff !I,( tr’) is an element of 1;. We call extension of r on !I the set elements of !I whose

description verifies r:

ext<,c = {(I‘ E I]/?*( (I’) = true} = c *-‘(true).

The virtual extension of c. on 0, is defined as follows:

extor = {X E O/~(X) = true} = pm ‘(true).

The virtual extension is in fact the image by 1. of all imaginable individuals whose description would verify (‘.

I04 I -4347/94$04.00 0 1994 IEEE

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 5. OCTOBER 1994 831

Example 1: Let U = {WI, wz, w3,wd) be a set of observed objects, described by two variables, yl = color, 01 = {blue, red, green, white, . . .}; y2 = size, O2 ={big, medium, small}:

Yl Y2

blue medium white big blue big

green small

Consider the elementary event e = [size = {big, medium}]. Then extne = {u! E 0/e*(w) = true} = {w E R/yl(w) = big or medium} = {~J~,wz,w~}; extoe = {.r E o/e(x) = true} = {blue, red; . .} x {big, medium}.

B. The Assertion Objects An assertion object is made up of a conjunction of elementary

events and represents an element of P( 01) x . . . x P( 0, ). Let Yl be a subset of { ~1, , yP}. With no loss of generality, we

may take Yl = {yl;.., yq}, q 5 p. An assertion object a, denoted as follows:

n = [y1 = VI] A.. A [y* = \iq],

where V, C O,,i = l,... , q. expresses the condition, “Variable y1 takes its values in VI and . . . and variable yq takes its values in V,.”

The mapping associated to a is as follows:

a: 0 + {true, false},

such that a(.rl,... ,irp) = true iff Vi = l;..,q,.r, E Vz. a is hence the characteristic function of the set IT1 x . x r/, x 04+1 x ... x 0, c P(O1) x . .. x P(0,).

W e then consider the composed mapping n’ = a o Yin, as follows:

a*: S2 + {true, false}

(1: -+ fI(ki*(W))

n*( ro) = true iff its description is an element of the set 1’; x . . . x v, x 0,+1 x ... x 0,.

As in the case of elementary events, we define the extension of n and its virtual extension. The extension of 0 on 12 is defined as follows:

extnn = {UJ E n/n*(u,) = true} = a*-‘(true).

The virtual extension of o, on (3, is defined as follows:

ext0n = {X E O/a(s) = true} = fl-‘(true).

Note that the absence of an elementary event concerning a variable y, expresses the idea that there is no condition imposed on it, and hence corresponds to the trivial elementary event [y& = O,]. So, to simplify, an elementary event of this form can be deleted from the conjunction, the obtained expression being functionally equivalent.

Example 2: Consider again the data of Example 1. W e now define the following assertion object:

n = [size = {big, medium}] A [colour = {white}].

Then extnn = (771 E 52/n*( ICI) = true} ={UI E Il/yl( II') =

big or medium and y~(rc’) = white} = (~1~); T-(ext~o) = {(white, big)} exton = {white} x {big, medium} = {(white, big), (white, medium)}.

Diday [7]-[ lo] only considers the mapping a* (which is hence, for this author, the assertion object) and the extension on the observed set 0. In general, the mapping that to each a associates a,* is not injective, since two different mappings a. may lead to the same mapping a*, depending on the observed values of the variables. Notice, however, that if 1’(n) = 0, then it is equivalent to consider a or a*.

Other kinds of objects, of higher complexity, have been defined (Diday [7]-[9]), allowing to represent more complex data.

C. Symbolic Order W e now define a partial order on the set of assertion objects based

on the virtual extensions. Let 5’ be the set of all assertion objects. Definition: For any al, a2 E S, we say that nl 5 a2 if and only

if extoal & extouz. That is, the symbolic order is the partial order induced by set inclusion relation on P(O1 ) x . . . x P(0,).

Now let “I” be the symbolic order just defined, and let fl the set of observed objects. Consider the quasi-order <e defined by the following condition:

al 5~ u2 e extnul c extnuz.

Notice that “<n” is generally not an order; we may have al <a a2 and a2 In al with al # a2. W e say that “5~” is the quasi-order generated by R.

From the definitions of extnu and extoa, a E S, it follows that

Val, a2 E S,UI < a2 =k UI <n a2.

Hence, “5~” is a quasi-order compatible with the order “5.” For any a,l, cl2 E S, we say that al inherits from a2 and that u2 is

more general than ul if and only if al 5 a2. The assertion objects a such that a 1 al (a 5 al ) are the ascendants (the descendants) of al.

Example 3: In the case of example 1, we have [size = {big}] In [size={big, medium}] and [size ={big}] < [size= {big, medium}] but [color = {red, blue}] 5~ [color = {blue}] whereas [color = {blue}] 5 [color = {red, blue}].

D. Description of Classes Suppose now that instead of a set of objects 12 C II, we

consider classes of objects, Cl,. . . , C,, E P(0), of which we have global descriptions. That is, we do not know the individual description of each u’ E C,, j = 1,. . . , n, but for each variable y2,i = l;.. ,p, we know yZ(C3),j = l,....n. For each class C,,Yl(C,, x ... x yP (C, ) can be represented by an assertion object UJ = [Yl = Yl(C,)l A ... A [Yp = Y,(C,)l.

Now let a be an assertion object, a = [yl = VI] A.. . A [yp = V,], and let .-I = (01, . . . , a,,} be the set of assertion objects describing the n classes, a, = [yl = I+‘{] A ... A [yy = W I;‘]. The extension of a on d can be defined as being the set of descendants of a in A, ext.4a = {(I] E .4/t/i. Pi:< C I ;}, or as being the set of ascendants of n in ,-I, ext.40 = (n3 E A/Vi,\, C Ir;J}.

E. Union and Intersection of Assertion Objects Given two assertion objects 0 1 and 02, we define their union,

denoted by 0 1 U ~2, as the minimum n (for the symbolic order) such that rrl 5 II and a2 5 a. Dually, the intersection of al and a2, denoted o I n a2, is the max imum n such that n < rr 1 and a 5 a~.

Proposition 1: If 01 = [yl = I;] A ... A [yl, = ii,], and n2 = [y, =rr;]A... A [y,, = II,] (where we may have I; = 0, and/or 11; = 0, for some i), then al U a2 = [yl = 1; U H71] A.. . A [yp =

T;, U IT;,] and o1 n (I~ = [yl = I; fl IT;] A.. . A [yr, = I;, n H,].

832 IEEE TRANSAaIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 5, OCTOBER 1994

Proposition 2: The union and the intersection of assertion objects are commutative and associative. If a = al U az,, then extou 2 (extouiU extoug). But if a = al II UZ, then exte,a = (extoaln extouz).

Now let g be the mapping that to a class of assertion objects associates their supremum:

g: P(S) +s

C ‘g(C) = U a,

a, E C.

Proposition 3: Let A c S, B c S. Then, g({g(A), g(B)}) = S(A u B).

Proofs of propositions 1, 2, and 3 can be found in the Appendix.

F. Relation to Other Formalizations As already mentioned, the formalism of symbolic objects differs

from VLl and APC [ 181, [ 191 in its data analysis-driven conception, whereas VLl and APC present a rather logic-based approach. This is concretized by considering, together with the definition, the extension on the observed set. However, a parallel can be drawn between some notions of both formalisms; hence, an elementary event corresponds to a selector in VLl language (also called “elementary expression” by Zagoruiko and Lbov [24]), and an assertion object, being defined as a conjunction of elementary events, can be associated to a Z-complex, its virtual extension corresponding to the s-complex.

III. THE PROPERTY OF COMPLETENESS

A. C-Connections, Completeness, and Complete Assertion Objects W e formalize the property of completeness and study the order

structure of complete assertion objects. In general terms, an assertion object is said to be complete if it describes exhaustively its extension, i.e., by giving all of its properties, and if it is minimal, for the symbolic order, to fulfill this condition [7], [l 11. This notion is close to the definition of “intension of a concept” as it has been proposed and discussed by several authors (Arnault and Nicole [ 11, Wille [23], Duquenne [ 121). Wille [23] develops Formal Concept Analysis in the framework of lattice theory. The notion of concept is then formalized by using the notions of closure operator and Galois connection [2], [3]. W e generalize the notion proposed by Wille [23] for binary data. In this case, the “concepts” that we obtain are those that would be obtained by applying Wille’s approach to the “doubled” data set, resulting from considering the nonoccurrence of each atribute as a new attribute.

Let 0, as above, be the set of observed objects, and let 1’ be the vector of variables, fi = {t~i,...,w,,},Y = (y’;..,y,). Let S be the set of all possible assertion objects that can be defined, given the p variables.

Consider, on the one hand, the following mapping:

f: s 3 P(12)

a -3 extna,

and, on the other hand, the following family of mappings:

g: P(12) 4 s

1 U”,“‘, w,} + g{u’l, . . . 71’1 }

such that f o g is extensive, that is, VP E P( 12). P c f o g (P). Notice that f is always isotone.

Consider now the following composed mappings:

h=gof:S-+S

h’ = f 0 g: P(12) + P(Q).

Defmition: f and g are said to form a c-connection if the following conditions are met.

1) h’ is a closure operator on P(cl), that is, h’ is extensive, isotone, and idempotent,

2) h fulfills the following conditions:

a) Va. E S,h(a) $2 a, b) Val,nz E S,UI 50 02 + h(al) $2 h(rra), and c) Vfl E S,h(h(n)) = h(a),

where “lo” denotes, as usual, the quasi-order generated by 0. That is, h is anti-extensive, isotone, and idempotent with respect to the quasi-order generated by Q.

Definition: An assertion object a E S is said to be complete if and only if h(n) = a.

Definition: If h fulfills conditions b2) and b3), we say that h is a completeness operator. Since h is idempotent by b3), h (a ) is always complete. Hence, h can be seen as a mapping that “completes” a given assertion object.

Proposition 4: Let a be a complete assertion object (i.e., h(a ) = a.), and let E be its extension, E = f(a). Then g(E) = a and E = h’(E). Conversely, if E = h’(E) and a, = g(E), then a is complete and E = f (a ).

Proof: ucomplete~a=kh(u)=gof(a)=g(f(a))=g(E), and h’(E) = f o g(E) = f (g( E)) = f(a) = E. Conversely, E = k’(E) = fog(E) = f(g(E)) = f(a), and k(u) = go f(a) =

g( f (a)) = g(E) = u e a is complete. q Based on the notion of complete assertion object, we now introduce

the definition of a concept. W e generalize the definition presented by Wille [23] for the binary case, where a concept is defined as a pair (A, B), where A is a set of objects and B is a set of attributes, such that A is the set of all objects that have all of the attributes in E and B is the set of all attributes valid for all the objects in A. In the same line, we define a concept as a couple extension-intension, consisting of a class, together with its description, in the form of a complete assertion object.

Defnition: Given a set of observed objects (1, a concept of 12 is

defined as a couple (E, a) such that E C 12, a E S, a is complete and E = f(a.).

So far, we have not given a precise expression for g. W e now consider a particular mapping, which allows us to state some results.

In what follows, we consider the mapping g that to a set of elements of 0 associates the supremum of their descriptions:

g:P(S2) --t s {7lr~,...,7lJ~} + n = A,[y, = U,y,(u’,)].

Then we can state Theorem 1. Theorem I: The set of all complete assertion objects is closed

under union; that is, if nr and 02 are complete, then ni U 02 is complete.

Proof In this case, f and g are both isotone, h’ = f o 9 is extensive, and h = g o f is anti-extensive. It follows that the set of all complete assertion objects constitutes a lattice of open sets and is hence closed under union (cf. [ 171). q

From Theorem 1, we know that the union of two complete assertion objects is still complete; the same, however, does not hold for intersection. The following theorem states how the supremum and the infimum of each pair of elements are obtained in the lattice of coplete assertion objects.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO. 5. OCI-OBER 1994 833

Theorem 2: The set of all complete assertion objects constitutes a lattice for the symbolic order, where infima and suprema are given by the following:

inf(ai,az) = h(ai n ~2) = g 0 f(ul n a2)

sup(u1, u2) = a1 u uz.

Proof: From Theorem 1, we know that ai U u2 is complete, g o f(ui n ~2) is complete by construction. The following remains to be proved:

,Elb complete /go f(ui n ~2) 5 b 5 ul;g o f(ul n ~2) 5 b 5 ~2.

b 5 ai and b 5 u2 + b 5 ulnaa. But b 5 ulnuz +- h(b) 5 h(uln aa), because h is isotone and the quasi-order Lo is compatible with the symbolic order. It follows that h(ui tl u2) < h(b) 5 h(ul n u2), and so b = h(b) = h.(ul n ~2). cl

Diday [9] had already established that the set of complete assertion objects constitutes a lattice, but without making the link with the theory of Galois connections.

Theorem 3: The set of concepts of 52, with the order defined by (&,a~) 5 (&,u2) e Ei c E2 is a lattice where infima and suprema are given by the following:

inf((Ei,ai),(Ez,az)) = (El n.&,(gof)(ai na2))

sup((&,a1),(Ez,az)) = ((fog)(El UE2),a1 Ua2).

Proof Weneedtoproveonlythata)g(EitlEa) = (gof)(uln ~2); b) f(al U ~2) = (f 0 s)(-G U E2).

a) From Proposition 2, we know that f(ui n ~2) = El n E2, so (9 0 f)(Ul n u2) = dE1 n E2).

b) From Proposition 3, we know that g(E1 U E2) = g(g(El), d&J) = al U ~2. So, (f 0 g)(El U E2) = f(al U ~2). 0

C. When the Data Units Are Classes Let us now consider the more general case where the data units

are classes of objects Cl,. . . , C,,, represented by assertion objects al;..,& (as exposed in Section I-D).

Let A = {al,..., a,} & S. W e consider now the following mappings:

f: S -+ P(A)

u ---) extAu

g: P(A) + S

such that f o g is extensive, that is, VP E P(A), P C f o g(P). Let the following be true, as above:

h=gof:S+S

h’ = f o g: P(A) ---) P(A)

In fact, h, is just the identity in S when A is the set of all possible assertion objects (A = S).

A c-connection is now defined by replacing, in the previous definition, the quasi-order generated by R, so, by the quasi-order generated by A, <A.

By choosing f and g in a suitable way, we obtain two different c-connections and the corresponding completeness operators.

Theorem 4 below states that if f is defined as being the mapping that to an assertion object a. associates its descendants in A, and if g is the supremum, then g o f is a completeness operator.

Theorem 4: Let:

f: S +P(A)

a = A[y, = W ,] +{a, = yy3 = vjq/v; c W ,], 3

j = l,*.-,P},

and g: P(A) +S

{al,..., a,} +a = A,[y, = u;vy].

Then f and g constitute a c-connection and h = g o f:S --) S is a completeness operator. Notice that g o f # Id, because, in general, A # S.

Example 4: Let p = 2, y1 = “color,” y2 = “size,” A = {ui,u2,aa,u~}, with ai = [yi = {yellow}] A [yz = {big}] u2 = [yi = {blue, red}] A [yz = {big, medium}] u3 = [yi = {yellow, blue}] A [ye = {medium, small}] u4 = [yi = {red}] A [ye = {big, medium}], and let a = [yi = {yellow, red, green}] A [ye = {big, medium}]. Then f(u) = {ul,u4},h(u) = g(f(u)) = [yl = {yellow, red}] A [ye = {big, medium}] # a.

Proof: It is obvious that f and g are isotone mappings; that is, for all a, b E S, a 5 b + f(u) c f(b), and for all subsets 4 and P2 of A, PI C P2 a g(9) I g(P2).

Let:

h=gof:S+S

h’ = fog: P(A) + P(A)

Let us prove that h is anti-extensive; that is, for all a E S, h(u) < a and that h’ is extensive: that is, for all P’ E P(A), P’ c h’( P’).

i) h is anti-extensive: Let a = A, [yJ = W3]; f(u) = {u; = A,[yj = v;]/v; C w,],.i = l,...,p} g(f(u)) = (A,[Y~ = ‘J;V,], Vj,Vi, Vi C Wj, and hence Vj, UiV; c W ,, which implies that A, [yj = &Vi] < Aj[y, = Wj], which proves that h(a) 5 U.

ii) h’ is extensive. Let P’ = {al,... , a,}, a; = A,[y, = V;],g(f”) = Aj[y, = UtV;] f 0 g(P’) = f(AJ[yj = U;y]). Let a, = Aj[yz = V;‘] E P’. For all j = l;.. ,p,V;’ c U,Vt + a, E f o g(P’). Hence, P’ c h’(P’).

It follows that f and g constitute a Galois connection between (S, 2) and (P(A), c). g is called the residuated mapping, and f is the corresponding residual (cf. [17]); h’ is a closure operator on P(A), and h an anticlosure operator (anti-extensive, isotone, and idempotent) on S. In particular, we know that h is idempotent, i.e., h satisfies b3).

h also satisfies bl), because the quasi-order <A is, in this case, compatible with the symbolic order; hence, h(u) 5 a + h(u) <A a.

The proof that h satisfies b2) also follows from this compatibility; because h is a closure operator, a < b G- h(u) 5 h(b). Let a and b E S such that a <A b. Then a 5 b G- h(u) 5 h(b) + h(u) <A h.(b), which proves b2).

W e have hence proved that f and g form a c-connection and that h = g 0 f: S --) S is a completeness operator. 0

Theorem 5 states that if f is defined as being the mapping that to an assertion object a associates its ascendants in A, and if g is the infimum, then g o f is a completeness operator.

Theorem 5: Now let: f: S +P(A)

U = A,[y, = WJ] +{a, = Aj[yj = V,]/Wj E V:],

j = l:..,p}

and g: P(A) +S

{a.l;..,a,,> +n = A,[y, = n,v;]

IEEETRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,VOL.6,NO.5,OcTOBER 1994

Then f and g constitute a c-connection, and h = g o f: S + S is a completeness operator.

Notice that in this case, we obtain elementary events of the form [yI = 01, which should be interpreted as “any value or set of values,” because in this case, f (a.) is the set of the ascendants of n in A, and VV,0 C: V, hence Vu, E A, u, E f([y3 = 01).

Example 5: Consider again the data of the previous example, and let

a = [ye = {red}] A [yz = {medium}] Then f(u) = {az,u4}.k(a) = g(f(a)) = [YI = {red>]A

[ye = {big, medium}] # a. Proof: W e shall prove that in this case, f and g form a Galois

connection between (S, 5) and (P(A), (I), where “I” denotes the symbolic order.

From their definitions, it follows immediately that f and g are antitones; i.e., for all a, b E S, n 5 b + f(b) c f(a), and for all subsets PI and P2 of A, PI C P2 + g( P2) < g(Pl ).

Let, then,

k=gof:S+S

k’ = fog: P(A) + P(A),

and let us prove that h and h’ are extensive; i.e., for all a E S. a 5

h(u), and for all P’ E P(A),P’ C k’(P’). i) Let u E S,a = A,[y, = WJ], and let us prove that a 5 h(a).

k(a) = go f(u) = g({nl;..,a,}) such that if a, = A,(y, = Vy],W, 5 Iy,j = l;..,p. Then g({ul;~~,u,}) = A,[y, = T,] suchthatforallj = l;..,p,T, 2 WJ.Hence,a 5 gof(a) = k(a).

ii) Let P’ E P(A),P’ = {al,...,u,,},u, = A,[y, = I;‘], and let us prove that P’ c h’( P’).

g(P’) = A,[y, = n,r;;z]. Let a, = A, [yJ = I,?] E P’. For all j = l,...,p,n,Vj G 17 + a, E f(g(P’)). Hence,

P’ C h’(P’). W e have proved that f and g form a Galois connection; h and h’

are hence closure operators [2], [3]. In particular, h is idempotent; that is, h satisfies b3).

h also satisfies bl), because the quasi-order 5,~ is, in this case, the reverse of the symbolic order a 5 b + b 1.4 a.

The proof that h satisfies b2) also follows from the fact that the order has been reversed:

Since k is a closure operator, a 5 b 3 h(a) < h(b).

Let, then, a and b E S such that a, 5.4 b. Then b 5 u + k(b) 5 k(u) + k(a) 5.4 h(b), which proves b2).

Hence, f and g constitute a c-connection and h = g o f: S -+ S is a completeness operator.

IV. CONCLUSION AND PERSPECTIVES

A formalism of knowledge representation is presented, which is based on the duality intension-extension of a symbolic object. The order structure of complete assertion objects is studied in the framework of lattice theory and Galois connection, by introducing the notions of c-connection and completeness operator. Two major results are then obtained.

Perspectives of future development concern mainly the use of the obtained results in clustering methods and the study of equivalent structures for other kinds of symbolic objects.

APPENDIX

Proof of Proposition 1: An assertion object being in fact a char- acteristic function on 01 x ‘. . x O,, the set of assertion objects can be identified with the Cartesian product P( 01) x . x P( 0,). Now infimum and supremum on Cartesian products are defined as Cartesian products of infima and suprema [2]; on the other hand,

infimum and supremum with respect to set inclusion are, respectively, intersection and union. That is, al fl a2 = inf(n 1, a2 ) = inf( [ye = I;] A A [yp = S;], [y, = W1] A ... A [yp = Wp], = [y1 = illf(ri,~~,)]A...A[y, =inf(\;,m;,)] = [YI =\‘iflr~~;]A..~A [yp = I,jnM;],andalU a2 = sup(al,az) = snp([y1 = Ii]A...A [y,=i,],[y, =W,]A.. . A [yp = Wp]) = [y/1 = slip( Ii, J$; )] A . ..A[y. = SUP (I;,Iv,)] = [?/I = \‘iUWl]A . ..A[y. = I;UIl;].

Proofof Proposition 2: a) Commutativity and associativity derive trivially from commutativity and associativity of set union and intersection.

b) If a = al n 02, then extoa _> extc>alU extoaa, by definition. W e may, however, have extoa > extc701U extoaz(strictly). Consider the following example. Let p = 2.01 = 02 = (0, l}, and consider the assertion objects aI = [ye = {O}] A [y2 = {l}] and 02 = [ye = {l}] A [yy = {O}]. W e then have a = al U 02 = [yl = {O,l}]A [y2 = {O,l}];exton = {(O,O), (O,l),(l,O),(l,l)} # extonlU extc>crz = {(O,l),(l,O)}.

If a = al U n2, it follows from the definition that ext,c,a > extsal n extouz. Conversely, extoa. & ext,0aln exton2 : a is the conjunction of elementary events c, such that P, 5 a,, j = 1,2 (at most, a = nl or a = 02). So, a = Ae, 5 a,, j = 1.2, and hence extou C_ extoal U exton2. 0.

Proof of Proposition 3: The proof is based on the associativity of set union. Let A = {SI, ~2,. , sk} and B = {fl,fQ,. ,f,,,},

where:

s, = A,[y, = T-J];fts = A,[yl, = W;]

g(A U B) = A,[y, = (U,.hI/;‘, W; )];

g(A) = A,[yz = U;=,r’;J]:

g(B) = A,[y, = u;:l,w(“]

g({g(A),g(B)}) = A,[yt = W;z&) U (U;l’=,~;!“)]

= g(A u B). cl

REFERENCES

[II

VI

131

[41

[51

[61

A. Amault and P. Nicole, La Logique ou I’Art de Penser. Rev. Ed. Paris, France: Flammarion, 1970. M. Barbut and B. Monjardet, Ordre et Clussifrcation, Al&hre et Com- binaroire, Vols. I, II. Paris, France: Hachette, 1970. G. Birkhoff, Lattice Theory, 3rd Ed. Providence. RI: American Math- ematical Society Col loquium Publications, 1967. P. Brito, “Analyse de donnees symboliques: Pyramides d’hkritage.” Th&se, Universitt Paris IX-Dauphine, Paris, France, 1991. P. Brito and E. Diday, “Pyramidal representation of symbolic objects,” in M. Schader and W . Gaul, Eds., Knowledge, Dafa and Computer-Assisted Decisions. Berlin: Springer-Verlag, 1990, pp. 3-16. F. A. T. De Carvalho; J. -hbbe, K. Vignes; -and E. Diday, “Dissimi- larity in symbolic data analysis, ” in Proc. COMPSTAT, 9th Symp. on ,

[71

PI

[91

t101

[Ill

[I21

Computat ional Statistics, Dubrovnik, Yugoslavia, 1990. E. Diday, “The symbolic approach in clustering and related methods of data analysis,” in H. H. Bock, Ed., Classification and Related Methods of Data Analysis. Amsterdam: North-Holland, 1987, pp. 673-684. E. Diday, “Knowledge representation and symbolic data analysis,” in M. Schader and W. Gaul, Eds., Knowledge, Data and Computer-Assisted Decisions. Berlin: Springer-Verlag, 1990, pp. 17-34. E. Diday, “Des objets de I’analyse de don&es g ceux de l’analyse des connaissances,” in Y. Kodratoff and E. Diday, Eds., Induction Symbolique-Numkrique 6 parrir de DonnCes. Paris, France: Cepadues, 1991. E. Diday, “Towards a statistics of intension for knowledge analysis,” in Proc. WOCFAI, Ist World Co& Fundamentals of Artificial Intell., Paris, France, 199 1. E. Diday and P. Brito, “Introduction to symbolic data analysis,” in 0. Opitz, Ed., Conceprual and Numerical Analysis of Data. New York: Springer-Verlag, 1989, pp. 45-84. V. Duquenne, “Contextual implications between attributes and some representation properties for finite lattices,” Rapport C.A.M.S. P.023, Maison des Sciences de I’Homme, Paris, 1986.

[I31

II41

[I61

1171

[I81

1191

1201

1211

1221

[231

1241

B. Gamer, “Two basic algorithms in concept analysis,” preprint E 831, I. INTRODUCTION Technische Hochschule Darmstadt, Darmstadt, Germany, 1984. 0. Gascuel, “Inductive learning, numerical criteria and combinatorial The objective of database design is to model some aspect of optimization: Some results, ” in E. Diday, Ed., Da& Analysis, Learning reality and represent it using the constructs of a database management Symbolic and Numerical Knowledge. New York: Nova Science, 1989, system. Included in this process is the need to capture and represent pp. 417-424. C. Jacq, “Combining a decision tree with kernel estimates,” in E.

the semantics of the real-world application for which the database

Diday and Y. Lechevallier, Eds., Symbolic-Numeric Data Analysis and is being designed. In an attempt to improve the expressiveness of

Learning. New York: Nova Science, 1991. database designs, semantic data models have been developed that J. Lebbe, “Representation des concepts en biologic et en medecine,” use abstractions to represent significant semantic relationships. The These, Universite Paris-VI, Paris, 1991. most common data abstractions are inclusion (subtype-supertype), B. Leclerc, “The residuation model for ordinal construction of dissimi- larities and other valued objects,” Rapport C.A.M.S. P.063, Maison des

aggregation (part-whole), and association (member-set).

Sciences de l’Homme, Paris, 1990. A class of relationships that is not accommodated by existing R.S. Michalski, “A variable-valued logic system as applied to picture abstractions is the relationship between a conceptual entity and description and recognition,” in Proc. IFZF Working Conf. Graphic its concrete manifestations. In this research, a new abstraction is Languages, Vancouver, BC, Canada, 1972. R. S. Michalski, “A theory and a methodology of inductive learning,” in

introduced to model this type of relationship. Suppose, for example,

R. S. Michalski, J. G. Carbonell, T. M. Mitchell, Eds., Machine Learning one were developing an inventory database for a video rental store.

I. Berlin: Springer Verlag, 1984, pp. 83-134. The objective of the database is to keep track of information on T. Mitchell, “Generalization as search,” Artijicial Infell., vol. 18, pp. each video; for example, title, director, year of release, and whether 203-226, 1982. it is out on rental. Although videos are obviously the things that M. Sebag and M. Schoenauer, “Incremental learning of rules and meta rules,” in B. Porter and R. Mooney, Eds., Proc. 7th Int. Conf. Machine

are kept track of in the database, there is also a movie object that

Learning. San Mateo, CA: Morgan Kaufmann, 1990. is relevant. A customer is interested in renting a particular movie R. Vignes, “Caracterisation automatique de groupes biologiques,” These, and is not concerned with which copy of the movie he obtains. Universite Paris-VI, Paris, 199 1. Furthermore, many of the attributes that would be stored in this R. Wille, “Restructuring lattice theory: An approach based on hierarchies database would have the same values for all of the videos of a of concepts,” in Ordered Sets, I. Rival Ed. Dordrecht-Boston: Reidel, 1982, pp. 445470. particular movie. Finally, for some purposes, such as when the N. G. Zagoruiko and G. S. Lbov, “Algorithms of pattern recognition in customer makes a selection, the various videos corresponding to a a package of applied programs Oteks,” in IJCPR, no 4, Kyoto, Japan, movie are interchangeable; for other purposes, such as inventory 1978. management, each one must be kept track of separately. Other

examples of relationships with these characteristics include those between TV models and TV sets, courses and sections, and books and copies.

Abstract-A new data abstraction, called Mate&&&on, is introduced to model a situation that occurs frequently in the real world and has important implications for database design. Materialization is the relationship between two entity types, one that represents a conceptual object, for example, a TV Model, and one that represents its correspond- ing concrete objects, in this case, actual TV Sets. The materialization construct is formally defined and contrasted with other well-known data abstractions. Its design implications are presented in terms of the entity- relationship model and its translation into a relational model. Guidelines are offered for the proper employment of this relationship in database design methodologies, and a discussion is provided of why this constitutes an important data modeling construct.

Materialization

Robert C. Goldstein and Veda C. Storey

This abstraction, which occurs with surprising frequency in the real world, might naturally be called “instance-of.” However, because this term is commonly used for other special relationships, the abstraction discussed here is called materialization.

The objectives of this paper are to introduce and formally define materialization as a new data abstraction, to contrast it with other abstractions in order to illustrate how it effectively represents certain commonly occurring relationships with well-defined semantics that are not satisfactorily captured by these other abstractions, and to dis- cuss issues involved in implementing the materialization abstraction. The paper is divided into five sections. Section II discusses related work. Materialization is presented in detail in Section III, and its design implications are analyzed in Section IV. Section V summarizes and concludes the paper.

Zndex Terms-Data abstraction, database design, alization, normalization, semantic data models

Manuscript received July 5, 1991; revised August 14, 1992, and April 8, 1993. This work was supported by the William E. Simon Graduate School of Business Administration, University of Rochester, NY, USA, and by the Information Systems Research Bureau, Faculty of Commerce and Business Administration, University of British Columbia, Vancouver, BC, Canada.

R.C. Goldstein is with the Faculty of Commerce and Business Adminis- tration (MIS Division), University of British Columbia, Vancouver, BC V6T 122 Canada; e-mail: [email protected].

V.C. Storey is with the William E. Simon Graduate School of Business Administration, University of Rochester, Rochester, NY 14627 USA.

IEEE Log Number 92 12682.

II. RELATED WORK inheritance, materi-

A. Semantic Data Models

One of the goals of database research is to find ways to capture real world information in an appropriate model [21]. A number of semantic data models have been developed that aim at providing increased expressiveness. These models are referred to as semantic,

because they allow the representation of more semantic content than the relational model [ 171. One tool that is typically employed in these data models is a set of high-level abstractions with very well-defined semantics [ 121. That is, these models provide richer, more expressive concepts with which to capture meaning than was possible using classical data models [3].

IEEETRANSACTIONS O N KNOWLEDGE ANDDATA ENGINEERING,VOL.6,NO.5,OCTOBER 1994 835

1041.4347/94$04.00 @ 1994 IEEE