Mapping Considerations in the Design of Schemas - CiteSeerX

13
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 1, JANUARY 1981 [5] C. Beeri and P. A. Bernstein, "Computational problems related to the design of normal form relational schemas," ACM Trans. Data- base Syst., vol. 4, no. 1, pp. 30-59, 1979. [61 C. J. Date, An Introduction to Database Systems. Reading, MA: Addison-Wesley, 1975. [7] R. Fagin, "The decomposition versus synthetic approach to rela- tional database design," in Proc. 3rd Int. Conf: VLDB, Tokyo, Japan, October 1977, pp. 441-446. [8] C. Delobel and R. G. Casey, "Decomposition of a database and theory of boolean switching functions," IBM J. Res. Develop., vol. 17, pp. 374-386, Sept. 1972. [9] E. F. Codd, "Further normalization of the database relational model," in Data Base Systems, Courant Computer Science Symposia Series, vol. 6. Englewood Cliffs, NJ: Prentice-Hall, 1972, pp. 33-64. [10] M. Vetter, "Database design by applied data synthesis," inProc. 3rd Int. Conf. VLDB, Tokyo, Japan, Oct. 1977, pp. 428-440. [11] Y. E. Lien, "On the semantics of the entity-relationship model," in Entity-Relationship Approach to Systems Analysis and Design, P. P. S. Chen, Ed. Amsterdam, The Netherlands: North-Holland, 1980, pp. 155-168. [12] M. A. Melkanoff and C. Zaniolo, "Decomposition of relations and synthesis of entity-relationship diagrams," inEntity-Relation- ship Approach to Systems Analysis and Design, P. P. S. Chen, Ed. Amsterdam, The Netherlands: North-Holland, 1980, pp. 277- 294. [13] H. Sakai, "A unified approach to the logical design of a hierarchi- cal data model," Entity-Relationship Approach to Systems Analysis and Design, P. P. S. Chen, Ed. Amsterdam, The Nether- lands: North-Holland, 1980, pp. 6 1-74. [14] P. A. Ng and J. F. Paul, "A formal definition of entity-relation- ship model." Entity-Relationship Approach to Systems Analysis and Design, P. P. S. Chen, Ed. Amsterdam, The Netherlands: North-Holland, 1980, pp. 211-230. [15] Entity-Relationship Approach to Systems Analysis and Design, P. P. S. Chen, Ed. Amsterdam: The Netherlands: North-Holland, 1980. Peter A. Ng received the B.Sc. degree in mathematics from St. Edward University, Austin, TX, in 1969, and the Ph.D. degree in computer sciences from the University of Texas, Austin, in 1974. He served two years as Assistant Professor of Computer Sciences at Hunter College, City University of New York, New York, NY, from 1975 to 1976. In August 1976 he joined the Department of Computer Science, University of Missouri, Columbia, where he is presently an Associate Professor. His current major research interests are informa- tion systems, database management system, data communications, and several aspects of software engineering. He has published over 30 tech- nical papers on these fields in intemational journals and conference proceedings. He is an industrial Consultant on these areas. Dr. Ng is a member of the Association for Computing Machinery and the IEEE Computer Society. Mapping Considerations in the Design of Schemas for the Relational Model SABAH AL-FEDAGHI AND PETER SCHEUERMANN Abstract-The typical design process for the relational database model develops the conceptual schema and each of the external schemas sep- arately and independently from each other. This paper proposes a new design methodology that constructs the conceptual schema in such a way that overlappings among external schemas are reflected. If the overlappings of external schemas do not produce transitivity at the conceptual level, then with our design method, the relations in the ex- ternal schemas can be realized as a join over independent components. Thus, a one-to-one function can be defined for the mapping between tuples in the external schemas to tuples in the conceptual schema. If transitivity is produced, then we show that no such function is possible and a new technique is introduced to handle this special case. Index Terms-Conceptual schema, external schema, functional de- pendencies, independent components, interferences, logical database design, mapping functions, relational database mode. Manuscript received February 1, 1980; revised June 11, 1980. This work was supported in part by the National Science Foundation under Grant MCS77-03904. The authors are with the Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60201. I. INTRODUCTION IN THE RELATIONAL database model, a given external view Vk and the conceptual view V are described by sets of relations. The set of relations in the conceptual view con- stitutes the conceptual schema and the set of relations in a given external view constitutes an external schema. Given a relational model let E = {El, E2, * ,E.} be the set of the external schemas and C be the conceptual schema of the model. The mapping problem covers several transformations between a given Ek E E and C. Let Ek = {el, e2, ev} and C= {C, C2 , cu} where eJ, 1 i < v, is a relation in Ek and cj, 1 < j < u, is a relation in C. Several mappings are of interest. 1) Relations Mapping: a, (er) = C' C C. That is, a, con- structs the relation eJ E Ek from the relations C' in C. 2) TUples Mapping: a2 (t(el)), where t(e1) is a tuple in eK. The function a12 maps a tuple of eJ to a tuple or a set of tuples in C' C C. 0098-5589/81/0100-0099$00.75 © 1981 IEEE 99

Transcript of Mapping Considerations in the Design of Schemas - CiteSeerX

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 1, JANUARY 1981

[5] C. Beeri and P. A. Bernstein, "Computational problems related tothe design of normal form relational schemas," ACM Trans. Data-base Syst., vol. 4, no. 1, pp. 30-59, 1979.

[61 C. J. Date, An Introduction to Database Systems. Reading, MA:Addison-Wesley, 1975.

[7] R. Fagin, "The decomposition versus synthetic approach to rela-tional database design," in Proc. 3rd Int. Conf: VLDB, Tokyo,Japan, October 1977, pp. 441-446.

[8] C. Delobel and R. G. Casey, "Decomposition of a database andtheory of boolean switching functions," IBM J. Res. Develop.,vol. 17, pp. 374-386, Sept. 1972.

[9] E. F. Codd, "Further normalization of the database relationalmodel," in Data Base Systems, Courant Computer ScienceSymposia Series, vol. 6. Englewood Cliffs, NJ: Prentice-Hall,1972, pp. 33-64.

[10] M. Vetter, "Database design by applied data synthesis," inProc.3rd Int. Conf. VLDB, Tokyo, Japan, Oct. 1977, pp. 428-440.

[11] Y. E. Lien, "On the semantics of the entity-relationship model,"in Entity-Relationship Approach to Systems Analysis and Design,P. P. S. Chen, Ed. Amsterdam, The Netherlands: North-Holland,1980, pp. 155-168.

[12] M. A. Melkanoff and C. Zaniolo, "Decomposition of relationsand synthesis of entity-relationship diagrams," inEntity-Relation-ship Approach to Systems Analysis and Design, P. P. S. Chen, Ed.Amsterdam, The Netherlands: North-Holland, 1980, pp. 277-294.

[13] H. Sakai, "A unified approach to the logical design of a hierarchi-cal data model," Entity-Relationship Approach to SystemsAnalysis and Design, P. P. S. Chen, Ed. Amsterdam, The Nether-lands: North-Holland, 1980, pp. 6 1-74.

[14] P. A. Ng and J. F. Paul, "A formal definition of entity-relation-ship model." Entity-Relationship Approach to Systems Analysisand Design, P. P. S. Chen, Ed. Amsterdam, The Netherlands:North-Holland, 1980, pp. 211-230.

[15] Entity-Relationship Approach to Systems Analysis and Design,P. P. S. Chen, Ed. Amsterdam: The Netherlands: North-Holland,1980.

Peter A. Ng received the B.Sc. degree in mathematics from St. EdwardUniversity, Austin, TX, in 1969, and the Ph.D. degree in computersciences from the University of Texas, Austin, in 1974.He served two years as Assistant Professor of Computer Sciences at

Hunter College, City University of New York, New York, NY, from1975 to 1976. In August 1976 he joined the Department of ComputerScience, University of Missouri, Columbia, where he is presently anAssociate Professor. His current major research interests are informa-tion systems, database management system, data communications, andseveral aspects of software engineering. He has published over 30 tech-nical papers on these fields in intemational journals and conferenceproceedings. He is an industrial Consultant on these areas.

Dr. Ng is a member of the Association for Computing Machinery andthe IEEE Computer Society.

Mapping Considerations in the Design of Schemasfor the Relational ModelSABAH AL-FEDAGHI AND PETER SCHEUERMANN

Abstract-The typical design process for the relational database modeldevelops the conceptual schema and each of the external schemas sep-arately and independently from each other. This paper proposes a newdesign methodology that constructs the conceptual schema in such away that overlappings among external schemas are reflected. If theoverlappings of external schemas do not produce transitivity at theconceptual level, then with our design method, the relations in the ex-ternal schemas can be realized as a join over independent components.Thus, a one-to-one function can be defined for the mapping betweentuples in the external schemas to tuples in the conceptual schema. Iftransitivity is produced, then we show that no such function is possibleand a new technique is introduced to handle this special case.

Index Terms-Conceptual schema, external schema, functional de-pendencies, independent components, interferences, logical databasedesign, mapping functions, relational database mode.

Manuscript received February 1, 1980; revised June 11, 1980. Thiswork was supported in part by the National Science Foundation underGrant MCS77-03904.The authors are with the Department of Electrical Engineering and

Computer Science, Northwestern University, Evanston, IL 60201.

I. INTRODUCTIONIN THE RELATIONAL database model, a given external

view Vk and the conceptual view V are described by setsof relations. The set of relations in the conceptual view con-stitutes the conceptual schema and the set of relations in agiven external view constitutes an external schema. Given arelational model let E = {El, E2, * ,E.} be the set of theexternal schemas and C be the conceptual schema of themodel. The mapping problem covers several transformationsbetween a given Ek E E and C. Let Ek = {el, e2, ev} andC= {C, C2 , cu} where eJ, 1 i < v, is a relation in Ekand cj, 1 < j < u, is a relation in C. Several mappings are ofinterest.

1) Relations Mapping: a, (er) = C' C C. That is, a, con-structs the relation eJ E Ek from the relations C' in C.2) TUples Mapping: a2(t(el)), where t(e1) is a tuple in

eK. The function a12 maps a tuple of eJ to a tuple or a set oftuples in C' C C.

0098-5589/81/0100-0099$00.75 © 1981 IEEE

99

10IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 1, JANUARY 1981

3) Operations Mapping: y(t(eJ)) where y maps the set ofoperators on t(eJ) in Vk (e.g., delete) to a set of operators onthe tuples of C' defined by a2. Alternatively, y can be definedin terms of operations on relations instead of tuples (e.g.,join),.The normalization process in the relational model [12]-

[14], [17] has been developed to avoid certain types of up-date anomalies in the relations. This normalization is usuallyperformed without regard to its effect on the mapping pro-cedures between external schemas and the conceptual schema.The schemas are designed independently of each other, whichresults in a great deal of difficulty in identifying the requiredmappings between the external schemas and the conceptualschema.The following example illustrates this difficulty. Suppose

that there are two external views represented by their ex-ternal schemas:

V1: El = {fe(employee, health information}V2: E = {e' (employee, personnel information}

and the conceptual schema:

C = {c1 (employee, personnel information,health information)}.

Suppose that V1 is the view corresponding to the health de-partment and V2 is the view corresponding to the personneldepartment in a company. Suppose that a certain employeehas retired. Thus, at V2 it is not necessary to keep a record ofthe employee any longer. The user at V2 would input DELETEt(e), where t(e ) denotes a tuple in e . If the views V1 andV2 are supported by performing the appropriate projectionson c1, then the result of the above operation is to delete thecorresponding tuple in cl. Thus, an operation in one externalview effects the information that belongs solely to another ex-ternal view; in this case the health information that is relevantto VI is also deleted. This was called an "interference" byPaolini and Pelagatti [23].Paolini and Pelagatti [23] argued that the "interference"

among the users' views are the reason for the difficulty indefining the mapping. They proposed a formal model of themapping between an external schema and the conceptualschema using the concept of "many sorted algebras." Theyspecified some mapping functions; however, these mappingfunctions do not completely characterize the problem. Theyreached the conclusion that additional information specifyingthe interference among users is required.Generally, the conceptual schema in the relational model

(in fact, in all database models) is designed independently ofand even before the design of the external schemas. Thismethod causes the mapping problems such as the ones dis-cussed by Paolini and Pelagatti [23]. The resultant designmakes it very difficult to identify interferences among ex-ternal schemas. In this paper a new design methodology isintroduced for constructing the various schemas.This paper is organized as follows. Sections II and III re-

view some relational model terminology and the currentschema design practice. Section IV presents our alternative

schema design method in which the conceptual schema andthe external schemas are developed concurrently, taking intoaccount the overlappings among the various views. Section Vshows that as a result of this design method most relations inthe external schemas can be realized as a join over independentcomponents; thus a one-to-one function can be defined forthe mapping between tuples in the external schema to tuplesin the conceptual schema. Section VI restates these results interms of many-sorted algebras, and in Section VII we intro-duce a new technique to handle those relations in the ex-ternal schemas for which no such one-to-one function can bedefined.

II. RELATIONAL MODEL TERMINOLOGYA relational database model consists of a set of relations de-

fined on certain attributes. The term "relation" is often usedto describe both the structure of the relation, called its inten-tion, and the set of tuples in the relation, called its extension.In this paper, the term relation denotes the static structure(i.e., intention) of the relation except when otherwise stated.Furthermore, it will be assumed that all relations in a data-base are projections of one relation called the universal rela-tion [5]. Attributes and dependencies among attributes de-scribe the information content of the database [9], [12], [13].A functional dependency X -+ Y holds in a relation r iff atevery point in time each value of X in r is associated withexactly one value of Y, where X and Y are sets of attributes.There are other types of dependencies among attributes; how-ever, in this paper only functional dependencies are consid-ered. The functional dependencies are the basic buildingblocks by which relations are often described. Furthermore,there are techniques to convert other dependencies into func-tional dependencies [9]. Nonfunctional dependencies suchas multivalued dependencies [17] have been discovered re-cently; other kinds of dependencies may still be unknown [5].Armstrong [4] presented several equivalent axiomatizations

of functional dependencies. In this paper the following in-ference rules (axioms) will be used as in the works of Bernstein[9] and Delobel and Casey [10]:

1) reflexivity: X - X2) augmentation: ifX - T then X, Z - Y3) pseudotransitivity: if X -+ Y and Y, Z - W then X, Z -W where W, X, Y, and Z are sets of attribute and "," betweentwo sets of attributes in a functional dependency stands for"set union." For simplicity, the third property will be called"transitivity."The above set of inference rules is complete, in the sense

that if a given functional dependency holds true in an ex-tension of a relation, then this functional dependency can bederived from the inference rules. Given a set of functionaldependencies F, if there is a functional dependency fE F suchthat f can be produced from the set F - {f} by applying thetransitivity rule then f is called a redundant dependency dueto transitivity.Given a set of functional dependencies F, the closure of F,

denoted by F*, is the set of functional dependencies derivablefrom F using the above inference rules. A nonredundant coverof F, denoted by H, is a set of functional dependencies such

100

AL-FEDAGHI AND SCHEUERMANN: SCHEMAS FOR THE RELATIONAL MODEL

that H* = F* and no propbr subset of H has this property,where H* is the closure of H.Let X be a subset of attributes in the relation r. X is called

a key of r if every attribute of r that is not in X is functionallydependent upon X and no subset of X has this property. Ifan attribute Ai appears in any key of r then it is said to be aprime attribute in r. Otherwise, it is a nonprime attribute in r.An attribute Ai is transitively dependent upon a,set of at-

tributes X if there exists a set of attributes Y such that X e Y,Y -* X, and Y -+ Ai with Ai not belonging to X or Y. A rela-tion r is in the third nonmal form (3NF) if none of its non-prime attributes is transitively dependent on any of its keys.Let r1 (X, Y) and r2 (Y, Z) be two relations where X, Y, Z

are disjoint sets of attributes. The natural join is defined asri * r2 = {<x, y, z>l<x, y> E r1, and <y, z> E r2 where<x, y> and <y, z> denote tuples in r1 and r2, respectivelyl.The projection of the relation r1 on attributes X, is, r1 [X] ={<X>1Kx, y> E r}An instance 1(r) of r is a finite set of tuples that satisfies all

functional dependencies in r.

III. CURRENT SCHEMA DESIGN PRACTICE AND THEMAPPING PROBLEM

The ANSI/X3/SPARC [-] framework for the gross architec-ture of database management systems (DBMS) consists of

1) any number of external views each of which is describedby an extemal schema;2) one conceptual view corresponding to the whole enter-

prise described by a conceptual schema;3) an internal view described by the internal schema.The term schema refers to a graphical or formal description

of the vanous views. In the relational model each schema con-sists of a set of relations and for each relation the specificationof one or more keys.Generally, in the design of DBMS it is assumed that the con-

ceptual schema is developed first, then any external schemasare derived from the conceptual schema [1], [9], [151. How-ever, except for new applications, this order of precedence indesign is not necessary. There are often external schemas thatexist before the design of the conceptual schema [8].

In the relational model, the conceptual schema consists ofall the relations of the database and the external schema con-sists of the relations corresponding to a particular application.The schema design is basically equated to a "database normal-ization." This can be accomplished through synthesis [9] ordecomposition [13]. This paper concentrates on the designof a schema through the synthesis approach because of the al-gorithmic difficulties posed by the decomposition approach[5].The synthesis approach uses a given set of fun&ional depen

dencies to construct a set of relations, each in the third normalform [9]. Since the main concern of this paper is the designof schemas and the mapping process among them, treatmentof semantic difficulties in the third normal form [5], [7] isbeyond the scope of this work.

Ideally, the conceptual schema and the external schemashould be obtainable from each other "directly."

Fig. 1 illustrates this concept through two design methods:1) the conceptual schema is produced from the set of ex-

ternal schemas without going back to the functional depen-dencies;2) the external schemas are produced from the conceptual

schema without going back to the functional dependencies.Generally, in current design methodologies, each schema is

developed independently, as shown in Fig. 2. This method ofdesign will be called the I-design method. It should be notedthat several attempts for developing external schemas fromthe conceptual schema [method 2) above] have been pre-sented in the literature [11], 116]. However, in the approachof Dayal and Bemnstein [16], it is necessary to enforce a num-ber of "correctness criteria" in order to perform the mappingbetween tuples of a relation in an external schema to tuplesin a relation(s) in the conceptual schema and it is not alwaysa straightforward process to identify this mapping if it exists.A framework for logical database design, independent of the

data model used, is presented in [221. Briefly, a requirementanalysis step is performed first in order to gather the data re-quirements and processing requirements of various users. Inthe next step, view modeling, each user's view is extracted andrepresented explicitly, using one of the number of availableabstraction techniques [27]. Following this phase, a view in-tegration is performed to represent the global view of the en-terprise, and finally the model produced in the last step isanalyzed and converted into an optimal conceptual schema.The major difficulty in this process lies in the view integrationapproach. For example, the design method described byHubbard and Raver [20] (IBM's DBDA) employs "data ele-ments" and "associations" among them as the view representa-tion technique. These elements and associations are collectedfrom different external views in a "pool" (called a structuralmodel [24]) to construct the conceptual schema. In this pool,however, no record is kept of the original external views fromwhich these data elements and associations originate. Hence,the conceptual schema is developed independently of the ex-ternal views, corresponding to the I-design method. There-fore, in developing the external schema, portions of the con-ceptual schema are chosen for each external view in a way thatresults in the presence of irrelevant information in the externalschemas.Thus, the I-design method described previously does not

tend to produce compatible schemas. In relational modelterms, compatibility is a measure of the "structural similarity"of the relations in two schemas. This structural similarityaffects the mapping between the two schemas. For example,let C be the conceptual schema and E1, E' be two externalschemas corresponding to the same extemal view (i.e., theclosure of dependencies embedded in E1 is the same as thatof El) where

C {c= (A, A),El = {e (A1, A2),El ={e'Al(_ A3),

c2 (A2, A3, A4 )}

e2 (A2, A3)}-

The schema E1 is more compatible to C than El because eachrelation in El can be mapped to one relation in C. In E', the

101

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 1, JANUARY 1981

Functional dependenciesof the enterprise

Synthesization

Fig. 1. Two schema design methods.

unctional dependenciesof the enterprise

,1.

Conceptualschema

Functional dependencies

of external view 1.§1. _schema 1

Fig. 2. Current schema design methods.

Functional dependenciesof external view n

Externalschema n

relation et can be obtained by executing the following se-quence of operations:

1) r, = c2 [A2, A3l, that is, the projection of c2 over attri-butes A2 and A3;2) r2 =cl * rl, that is, the natural join of cl and r1 over at-

tribue A2;3) e' = r3 =r2 [A1, A3].The reason for the production of incompatible schemas in

the I-design method is that from a given set of functionaldependencies a number of nonredundant covers can be de-vised [9], [211, even though the closure of the initial set offunctional dependencies is unique [4]. Thus, several equiva-lent synthesized sets of relations can be produced from a givenset of functional dependencies. In the previous example E1and El are produced from two different nonredundant sets offunctional dependencies. The I-design method develops eachschema (conceptual or.external) separately. Hence, no con-sideration is given to the mapping process between the ex-ternal schemas and the conceptual schema.Another aspect which effects the mapping is the problem of

interferences among the external schemas. These interferencescause a great many problems, as discussed by Paolini and Pela-gatti [23 ] and illustrated by our example in the introduction.In the I-design method, no attempt is made to identify theseinterferences.In this paper an alternative design methodology will be de-

veloped. The schemas will be constructed such that

1) any given external schema and the conceptual schemaare compatible;2) the interferences among external schemas can be iden-

tified easily at the conceptual level.This methodology is presented here for the relational model,

and an extension to the hierarchical data model is presentedin [3].

IV. ALTERNATIVE SCHEMA DESIGN METHOD

Let the set of the external views in a database system be V{v1, V2,*. , Vn}J It is required to design the following.

1) n external schemas E = {E1, E2, * * , En}, where Ek cor-responds to Vk, 1 S k < n. Each Ek = {ei ek,.,ek elk} isthe set of relations corresponding to Vk. Each eJ E Ek is tobe in the third nornal form.2) One conceptual schema C = {cl, c2, - , cl}, where C

contains the set of relations corresponding to V (i.e., theenterprise view). Each relation cj E C, 1 < j <l, is to be inthe third normal form.3) A mapping mechanism between Ek and C such that this

mapping reflects the integrated database view (i.e., externalviews may share the same data). The types of mappings re-quired have been discussed in the introduction. A precisedefmintion of this mapping mechanism will be given later.The 3NF has been shown to exhibit certain undesirable

properties [13], [17]. However, it has been chosen heresince it is possible to construct algorithmically and efficiently

102

AL-FEDAGHI AND SCHEUERMANN: SCHEMAS FOR THE RELATIONAL MODEL

vl/ 2

0*I

* ( - . - -- - - *

'~ ~~~ _.

4 ~~ ~ ~~11

Fig. 3. Example of overlapping of three views. (Arrows denote func-tional dependencies.)

3NF schemas from functional dependencies; however, asshown by Beeri and Bernstein [6], not every set of functionaldependencies can be represented by stronger normal forms,such as the Boyce-Codd normal form.Algorithm E-C which will be developed in this section de-

signs the conceptual schema from disjoint sets of functionaldependencies. These sets are called intersections to emphasizethe fact that they represent different interferences among ex-ternal schemas.

Utilizing functional dependencies and the relational modeldoes not imply a limitation on the generality of the method-ology that will be introduced next. The extension of this meth-odology to accommodate different types of dependencies is,however, data model dependent [3]. The following exampleis a modification of a DBDA example [241 where only func-tional dependencies (type 1 associations in DBDA terminol-ogy) are considered.Example 1: Suppose there are three external views V1, V2,

and V3 with the corresponding functional dependencies shownin Fig. 3. Notice that there is no redundant dependency dueto transitivity in the figure. Our methodology will identifythe different intersections of views in terms of dependencies.Since there are three views, seven possible intersections couldbe formed as shown in Fig. 4. Fig. 5 shows the five nonemptyintersections corresponding to our example. Hence, our algo-rithm structures each intersection independently. In the re-lational model, each intersection of Fig. 5 contains one rela-tion in 3NF except for intersection 'V2 only" which containstwo relations (in IMS each relation corresponds to a segmenttype).The conceptual schema is the set of all the, relations con-

structed from the different intersections (in IMS, where othertypes of associations must be taken into consideration, theconceptual schema is the collection of all logical and physicaldatabases). A relation in any external schema can now bemapped uniquely to a set of relations in the conceptual

V1 only:

R1

V2 only:

R2:

R3:

V1 V2

1 6 2

7

54

3V3

Fig. 4. Intersections among three external views.

p- - ~~~DAT

N- VL VEH

N JN

v nv3:

R5 ) RO

V fonly :

R ACKAC MP~~~GPCK- PAK6NWT VOL

Fig. 5. Nonempty intersection of the three external views of Fig. 3.

schema. For example, the relation R'(PACKAGE-NO, PROD,MFG, PACK-WT, PACK-VOL) in V3 is mapped to R5 and R6of Fig. 5. In addition, an operation performed on R', i.e.,DELETE tuple t, in R', that is translated unambiguously toDELETE tuple t2 in R5 and DELETE tuple t3 in R6.This process, however, is not always as simple as in the pre-

vious example. Suppose there is a fourth view V4 = {SHIP-NO-*DATE}. This dependency is deleted at the conceptuallevel to eliminate redundancy (i.e., SHIP-NO -+TRIP-NO andTRIP-NO -+DATE imply SHIP-NO - DATE). Certainly thereis an interference between V4 on one hand and V1 and V2 onthe other, which complicates the mapping process. This typeof difficulty in mapping is caused only by the integration ofdifferent views and is not inherent to the relational design ap-proach (e.g., abnormalities of 3NF). Our methodology isolatesthis type of interference and minimizes its effect.Suppose the three external views are each represented by a

set of dependencies F(V1), F(V2), and F(V3). Three stepsare performed to find all nonredundant dependencies in eachintersection.Let F(Vk) denote the given functional dependencies of Vk

and F +(k) denote the transitive closure of F(Vk), whereF+(Vk) is obtained by applying only the transitivity rule.

1) The union of the transitive closures of the given sets of

103

VlnV2:R4:

p-NO

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 1, JANUARY 1981

functional dependencies of the external views, UI=1 F'(Vi),is found to identify all dependencies implied by the differentexternal views.2) Each set of dependencies that belongs to the same set of

external views forms an intersection.3) A nonredundant cover of each intersection is found to

eliminate redundant dependencies due to transitivity.Let this set of nonredundant covers be denoted by H = {H1,

H2,.*. , HuJ. Clearly, in a simple procedure, the conceptualschema C can be built from U!'=1 Hi after eliminating the re-dundant dependencies due to transitivity which resulted fromtaking the union of the set of dependencies Hl-Hu. Each ex-ternal schema Ek can be constructed fromUi 1 14, whereH1 E H and all dependencies in HJ belong to Vk. Further-more Uj'k Hj may contain redundant dependencies due totransitivity, and Ek is synthesized only after eliminating thesedependencies. The algorithm presented here, however, buildsC' and Ek from these unions in such a way that the intersec-tions are reflected in the resulting design. First, the redundantdependencies due to transitivity in UJ'1 Hi are removed fromH1, , Hu, and then the relations of C are constructed in-dependently from each Hi, 1 < i < u.For each Hi, there is a set of relations that is produced only

from Hi in the final conceptual schema. Thus, if Vi and Vjintersect in HI, then the' interference between V1 and Vj isrepresented in the set of relations in C that are developed fromH1. This works well except when the intersections among ex-temal schemas cause the creation of transitivity. For example,assume that H = {Hl, H2} and there are just two externalviews V1 and V2, such that V1 contains H1 and V2 containsH1 and H2. Hence, H1 represents the interference betweenthe schemas of V1 and V2. Let H1 ={f: A1 - A4, f2: A1'-*A3} and H2 = {f3: A A2, f4: A2 -+A3}. HIUH2 = {fl, f2,f3,f4} contains the redundant dependency f2. Eliminatingthis dependency from H1, the relations of the conceptualschema are developed independently from H1 = {f, } andH2 = {f3, f4}. From H1, the relation cl(Al, A4) is producedand from H2, the relations c2(A1,A2) and c3(A2, A3) areproduced. Thus, C = {cl, c2, C3}. The complete interferencebetween V1 and V2 is not explicitly present in C. Notice thatf2 is not a redundant dependency due to transitivity in thenonredundant cover of V1; hence if we synthesize E1: {e' (A1,A3, A4)} and E2: {el (A1, A2, A4), e2 (A2, A3)} we have thesame update problems we discussed in the introduction.The solution to this problem is as follows: if a functional

dependency is present in Uuk H1 of Vk and f becomes a re-dundant dependency due to transitivity in Uf,1 Hi, then fwill be built as an independent binary relation in Ek, the ex-ternal schema of Vk. This type of dependency will 'be calledan implicit dependency because it is eliminated when C isconstructed. The corresponding binary relation in Ek is calledan implicit relation. When a dependency is not an implicitdependency then it is called an explicit dependency. The re-lations in Ek formed from explicit dependencies (not neces-sarily binary relations) are called explicit relations. The pur-pose of building an independent binary relation in Ek foreach implicit dependency is to isolate and minimize the effectof this type of dependency on the mapping process. If an

implicit dependency is present in a nonbinary relation in Ek,then the mapping of that relation to a set of relations in Cwill not be a one-to-one function as will be shown later.

In Algorithm E-C, the transitive dependencies in U jklHkJ=1 Jfor every Vk and in U,!= 1 Hi will be eliminated simultaneously[step 6)]; however, Ui,= 1 Hi may still have transitive dependen-cies even though they have been eliminated in all U,1 Hk,1 .k Au. These remaining transitive dependencies are the'implicit dependencies discussed above and they are eliminatedbefore building C [step 7)]. This process makes it possibleto identify the implicit dependencies.

A. Algorithm E-CInput:

F(V1 ), F(VA,*** F(Vn)

Output:1) E= {El, E2, * En}2) C.Steps:1) (Find the transitive closure F.(Vk) of each F(Vk) and

then take the union of these closures.) Find F = U =1 F'(Vi) ={f1, f2, * * *, w2) (Eliminate augmentation and reflexive dependencies.)

For all fi C F, if fi: A1 -+ A1, then F = F - {fi}. For all fi,fj eF, if fi: A1 -A2 and fj: A1A3 A2, then F=f- {fj}.A1, A2, and A3 denote nonempty sets of attributes.3) (Identify intersections among sets of dependencies of

different external views.) Let F = {ff, f2, ,fJ} be the setof dependencies produced after performing step 2). ConstructT1 as an n * w, 0-1 matrix as follows: row i corresponds to Viand column j corresponds to fj E F. In T1, entry (i, j)= 1 iffj E F (Vi), otherwise entry (i, j) = 0.Let Fl, F2,- Fu be the subsets of F such that each Fi,

1 < i < u, contains all functional dependencies in T1 that haveidentical columns of O's and 1's.4) (Find a nonredundant cover of Fi.) For all Fi, 1 S i < u,

perform the following:i) search for redundant dependencies due to transitivity;ii) if such redundant dependencies exist, remove them

from T1.Let T2 denote the resultant table and assume that T2 has v

columns, each representing a functional dependency. LetHl,H2, ***Hu denote the resultant groups of columns inT2 derived from F1, F2, , Fu, respectively.Let H = U1'=1H .5) (Identify redundant dependencies due to transitivity in

each external view). Construct table T3 which contains all ex-ternal views and dependencies of T2; however, the entries ofT3 are 1 or 0 dependent on whether the corresponding depen-dency in the appropriate external view is or is not redundantdue to transitivity. The set of dependencies Gi C H is the setof dependencies that belong to the same external view Vi.

a) For i= 1, n extract the set of dependencies Gi from HsuchthatfjeT2 is in Gi ifT2(i,j) 1.b) Construct T3 an n * v, 0-1 matrix where n and v are as in

table T2: entry (i, j) in T3 is equal to 1 if fj E Gi is a redun-dant dependency due to transitivity in GC; otherwise (i,j) = 0.

104

AL-FEDAGHI AND SCHEUERMANN: SCHEMAS FOR THE RELATIONAL MODEL

6) (Remove redundant dependencies due to transitivityfrom each external view, i.e., from Gi, 1 < i < n, and simulta-neously from H. Keep track of dependencies removed in setS). Notice that the conceptual schema will be built from H inthe next step. The set S will be used to identify implicit de-pendencies in the different external views.Let fj I be the cardinality of column j of T3.Let S = (initialization).a) Sort the columns of T3 in descending order with respect

to I fj I and in lexicographic order.b) For j = 1 to v (i.e., for each column of T3)perform the

following: for i = 1 to n (i.e., for each row of T3) perform thefollowing: if (i, j) = 1 and the corresponding dependency fj isa redundant dependency due to transivity in Gi, then delete fjfrom Gi and also from H if it has not already been deleted:

S = S U {fj}; H = H -{fj}; Gi =Gi - {fj}.

c) For i 1 to n, relabel GC the set of dependencies remain-ing in Gi after step 6a). G! is a nonredundant cover of F(Vi).7) (Remove remaining redundant dependencies due to

transitivity from H, thus producing a nonredundant cover ofthe conceptual view V. Add the dependencies removed alsoto the set S. Construct the relations which constitute the con-ceptual schema). We produce a nonredundant cover ofH anddenote it by H'. For 'each Hi C H [see step 4)] let H; C Hi bethe set of dependencies of H,i that remains after the deletionoperations of steps 6) and 7). The conceptual schema C isconstructed from the disjoint sets of dependencies H', H2,...,H' where H'=U 1H.a) If fE H is a redundant dependency due to transitivity in

H, then delete f from H and add it to S:

H=H- {f}; S=SU {f}.

Repeat the process until no redundant dependency due totransitivity is present in H.

b) Let H' be the set of remaining dependence in H afterstep 7a). H' is a nonredundant cover of V, the conceptualview. Thus, H' = UiU=H, where H! is the set of dependen-cies corresponding to Hi as mentioned above.c) For each H!C H', 1 6 i 6 u and Hi' derive C' = {cl,

cz,.*. cl} where C' is a set of relations in the third normalform. This can be accomplished by utilizing any synthesizingalgorithm'(i.e., [9]) and using H! as a nonredundant cover(e.g., if Bemstein's algorithm [9]- is used, then perform thefollowing steps: i) partition, ii) merge equivalent keys, iii)eliminate transitive dependencies, and iv) construct relations).d) The conceptual schema C is the set of relations produced,

tion from each dependency in Sk (i.e., construct implicitrelations).b) Derive Ek = {ek,e¶,k , ekj } the set of relations in the

third normal form using G'k Sk as a nonredundant cover andany synthesizing algorithm as discussed in step 7c).Example 2: Suppose that we are given V = {V1, V2, V3}

where F(Vi) and the closure F'(Vi), 1 i S 3, are giyen inFig. 6(a) [steps 1) and 2)]. Table T1 is also shown in Fig.6(a) tstep 3)]. Fig. 6(b) shows table T2 where the nonredun-dant covers of the intersections F1, F2, F3, F4, Fs, and F6 arefound and denoted by H1, H2, H3, H4, H5, and H6, respec-tively [step 4)]. The set H contains all dependencies whichappear in table T2. To implement step 5), G1, G2, and G3 arefound as follows:

G, {fj, f2, f4, fs, f6, f7, f13, f14}G2 ={fl, f2, f3, f4, f5, f6, f7, flo, fl,Ifl2, f13,f15}G3 ={fl, f3, f4, fs, f6, fl 1, f16}

Table T3 in Fig. 6(b) gives the redundant dependencies thatare candidates for removal. Notice that f2 is redundant inG1 and G2 because of the presence of f7 in G1 and G2, andvice versa. This illustrates the purpose of the simultaneousdeletion at step 6). Without this it is possible to remove f2from G1 and f7 from G2 in the process of constructing thenonredundant covers Gl and G". This will cause the removalof f2 and f7 from H [step 6a)]. However, the presence of f2or f7 causes the other dependency to be a redundant depen-dency. Hence, only one has to be removed from H and notboth.To illustrate the need for sorting, suppose that f7 is also a

candidate for removal from G3. In this case, if T3 is notsorted in terms of number of 1's in a column, we similarlywould incorrectly remove both f2 and f7 from H.At the end of step 6), the algorithm produces the following:

I

G =2G3

S=

{fl, f4, fs, f6, f7, fl3, fl4}{fl, f3, f4, fs, f6, f7, flo, fli, fl2, fl5}{fl, f3, f4, f5, f6, flI, fl6}{fl, f3, f4, fs, f6, f7, fl , fli, fl2, fl3, fl4, fls, fl6}{f2, fl3}.

The dependency f2 is removed from G1, G2, and H and thedependency f13 is removed from G2 and H.

In step 7), the dependency f16 is a redundant dependencyin H; however, it has not been removed in step 6) because f16is not a redundant dependency in any external view. There-fore, f,6 is removed from H producing

S = {f2, f13, f16}H'= {f,1f3,f4,fS,f6,f7,f10,fl,,f12,f14,f,5}.

The relations of the conceptual schema are constructed fromthe different Hi C H, 1 < i < 6, as illustrated in Fig. 7.

C=u Ci. In step 8), notice that S1 S n Cl = {f2, f13, f16} nG0=i=1 {f13}. Hence, the dependency f13 is an implicit dependency

inCI. S n G2 . S3 nSnG {f,6};therefore, fl(8) (Identify implicit dependencies using the set S and con- is an implicit dependency in G3. The relations corresponding

struct the external schemas.) For each Vk, I S k S n, perform to the views V1, V2, V3 aso appear in Fig. 7. Notice that thethe following. binary relations el and e3 are the implicit relations that cor-

a) Find Sk = Ck f S. If Sk / P construct one binary rela- respond to the implicit dependencies f13 and f16, respectively.

105

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. S.E-7, NO. 1, JANUARY 1981

AI'A2 Al-A4, A -2 A-Al, A-4, A A7'A14A7, A1-A86 A1i 8 A2-sA4, A2-A7., A2-oA8'

1 A'A6, A2-'Al A5-A6 A5-A7AA7,

Af4A2 A2-Al, AI-A2 . A2-A,. Ai*A4, A 1 7A'v2 A2 *4,A2'A7, Al-A8 A2D4,A2'A7, A2 '8'

52 86' A326' A13A5' A5 37. A3-A6'A5 6A5--*A3' A3-4A5 A7-A9A.A,^A-'.-AAs

A2IA'2 A2 7A1' A1-A , A2A1. A1-2 7' A2 8'

A-2-A7 A2-,A8' A2A7 A2-A A3-A A5-A .

Tf 8 2 f3 f4 f5 f6 37f8 f 5flO 3'1f12 f13f64 f5 f16

vi s< ~t t~' "C- t¢s "Ct4 en $tLn $e$sn L

A-IAA-a4 A-<A44 4 4

v1 I 1 0 1 1 1 1 1 1 0 0 0 1 1 0 0

v 2 2 1' 1 2 2 1 1 0 1 0

v3 I 0 1I 1 1 1 1 1 1 0 0 1 0 1

F2 = 2 8' f7 '7f33F3= rf3 f1l)

3 6' 9 ___________________

24= 2fio' f72' f135F5= (fl4} and F6= If 1

(a)

T H HH2 1 H2 H3 4 5 H6

f.

fl f, f5 f6 f f f 53 fl f0 fZ f f4 f6i 1 4 56 2 7 1i3 ~3 1111tl 4 1

1100 0 0 0 1 0

V2 1 1 1 0 0

V3 1111 000 1 O O O 0 13~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

T3.

(b)

Fig. 6. (a) Tables constructed for example 2. (b) Tables constructedfor example 2. All O's columns are omitted in T3.

Fig. 8 shows the relationships among relations in the ex-

ternal schemas and relations in the conceptual schema of ex-

ample 1. Notice that except fo- the implicit relations el ande3 each relation in any external schema can be constructedfrom its projections which form relations in the conceptualschema. For example, the arrow "-" going from el indicatesthat el = c1 * c2 * C4 as will be explained later. Furthermore,the set of these projections (relations) in C is unique.Let D = UPI F(Vi) be the set of given dependencies in all

the external views. The set D is represented as a string of in-tegers and IDI is the length of this representation. Synthesiz-

fil f4' f5' f6

f7

f3 ' f 1

f1o0 f12 fl5

Cl (A1, 2,A 7, A8)

c2 (A2' A4)

c3 (A1, A6 )

c4 (A3, A6)

c5 (A3, A5 )

C6 (A72 A9)

7(A

5 A7)

V Nonredundant Resultant RelationsCover

e(A5' 6

1e( All A A A)2 2' 4' 7' 8

e3(A5, A7)

V2 e2(Al, A2, A4, A6 A A8)

e2(A3, A5, A6)

e3(A7, A9)

V3 Ce. A2, A A A33 2' 6' 7'

3e2(A3' 6)

e3(A5, A9)

Fig. 7. Relations of Example 2.

ing D to produce one set of relations in 3NF (i.e., the con-

ceptual schema) is of time complexity O(ID 2) [6] . Steps 1),2), 4), 7), and 8) involve processes which are required for thesynthesis algorithm; hence, they are of complexity <0 (ID12).The other steps involve a number of bookkeeping procedures.Step 3) involves the construction of table T1, whose dimen-sions are n, the number of external views, and w, the numberof dependencies in F, after the elimination of extraneous at-tributes. If the synthesis algorithm is applied to D to produceF = {ff, f2, w* f,fw then certainly I F I is bounded by'lD 12 .

Thus, the construction of T1 is of time complexity 0 (n ID 12).Identifying Fl-Fu is equivalent to sorting the dependenciesin F, hence is of complexity O(IFIlogIFI). Step 5) involvesthe construction of T3, which is smaller than T1; hence itstime is also bounded by 0(n ID 2). Finally, step 6) again in-volves sorting and a loop, and runs in time 0(ID12). Thereforethe complexity of the E-C algorithm is 0 (n I DJ2). This is theworst case time of the algorithm, but we observe that the aver-

age running time will be much better.

V. THE MAPPING BETWEEN THE EXTERNAL SCHEMAAND THE CONCEPTUAL SCHEMA

The operations performed at each external view can be de-noted by 'Yk(xl, x2, , xI), where 'Yk could be DELETE,JOIN, , etc. over parameters xl-xv. This general form can

be categorized according to the type of parameters as follows.1) 'Yk(rl, r2,'* *, r,,Y,r), where r1, r2, * r* , and r are

relations and Y is a set of attributes in these relations. Forexample, r = r, * r2 over attribute A1.2) yk(t (r)), which operates on a tuple in the relation r E Ek.

For example, DELETE<a,, a2>E Ek

106

F(Vi) F+(Vi)

f14

AL-FEDAGHI AND SCHEUERMANN: SCHEMAS FOR THE RELATIONAL MODEL

Ie {c4,c53

4e-4 c

2 2

el > C7

e3 7iC2

e2 ~*~ c4C

3 6

3e2 w C4

e3 6'r s7}

Fig. 8. Relationships among relations in external schemas and con-ceptual schema of Example 2.

In this paper the mapping of the second category will beformalized and analyzed. This mapping is the most difficultbecause performing a certain operation on a tuple in r E Ekfirst requires a mapping to the corresponding relations inr C,which is also required by the first category. Additionally, itis necessary to identify the required tuple (s) in the conceptualschema that correspond(s) to the given tuple in the externalschema.Accordingly, the following two functions are our concern:1) a1c e e*C'(e) C C, that is, given a relation El Ek,

then a1 maps it to a set of relations in C';2) X2 :t(e ) -t(C'(e1)), that is, given a tuple in e E Ek,

then ac2 maps it to tuples of C'(er).Next, the characteristics of the schema design process illus-

trated in Algorithm E-C will be discussed in terms of its capa-bility to define the above functions ct1 and a2.Lemma 1: If e EEk is an explicit relation then there exists

a unique set of relations C' C C such that F*(ej) = F*(C').Proof: Let G' be the nonredundant cover of Ek and let

H' be the nonredundant cover of C produced from algorithmE-C. Let Z C G' be the set of dependencies that formed ejk.Since eJ is explicit then fEZ implies fEH'. Let Z= {Z1,Z2- , ZW} be the subsets of Z corresponding to the differ-ent intersections found in algorithm E-C. Each Z, C Z pro-duces a relation in C. Let C' be the set of relations in C thatare produced from Z1, Z2,* ZW. Hence, F*(eJ) = F*(C').To show that this set is unique, assume that there exists a re-lation c" such that c" E C, c" 0 C', and c" contains a depen-dency fE F*(e). Because c" E C it follows that f e H'.This implies that fE Z which, in turn, implies that c" E C'.This contradicts the assumption. Hence the set C' is unique.To show that there is no C" C C' such that F*(C") =F*(- ),assume that there exists such C". This implies that Z has aredundant dependency due to transitivity which contradictsour assumption that G' is a nonredundant set.In Fig. 8, e2 is an explicit relation that corresponds to C' -

{C4, Cs}. Thus, F*(e2) F*(C') = {A3 -+A5, As - A3, A3 -+A6,A5 -eA6}.We observe here again that the problems associated with the

3NF because of consistency difficulties among prime attributes[6] are ignored in this discussion. These difficulties are consid-ered as semantic difficulties rather than mapping difficulties.

For example, if the relations cl (Al A2, A3) and c2(A3, Al)are present in C, then the relation ek(A3, A1) E Ek is mappedonly to c2. Such situations can be handled by special tech-niques similar to the one proposed for the mapping of im-plicit relations which we discuss later.From Lemma 1, a1, for explicit relations in Ek can be de-

fined as

ctl: ei -+ C (ek).

Unfortunately, it is not possible to define such a functionfor implicit relations. For example, suppose that C' {cl (A1,A2), c (A2, A3), c33(A1, A4), c (A4, A3)} and Ek {el (Al,A3)}. Clearly el is an implicit relation. It can be mapped to{cl, c2} or to {c3, c4}. Therefore, a special technique has tobe developed to handle this type of mapping for implicitrelations.Let A denote the construction of a relation from its projec-

tions by the natural join operation.Lemma 2: If ei E Ek is an explicit relation and C' C C is the

corresponding unique set of relations in C then

ek A cl * c2 * * * * * CW where C' ={c*

Proof: Since F*(ek) = F*(C') by Lemma 1 then it is pos-sible to construct ek from C' by the appropriate natural joinoperation. Every c! E C' is a projection of eIK. OIn Fig. 8, every relation on the left-hand side except for el

and e3 can be constructed by the natural join of the relationson the right-hand side.Notice that Lemma 1 guarantees the satisfaction of "defmi-

tion representation 2" given by Beeri et aL [6] which statesthat two schemas are representing the same information ifthey have the same attributes and the same data dependencies.Beeri et al. [6] have given a stronger correspondence defini-tion which states that two schemas represent the same infor-mation if there exists a one-to-one mapping between the data-bases of the two schemas ("definition representation 4").This mapping involves mapping between tuples in Ek and C.The one-to-one mapping cal between e, E Ek and C' CC isnot enough to guarantee the one-to-one mapping between thedatabase of Ek and the database of C. An additional require-ment to the one given in Lemma 1 is the presence of the loss-less join property [2].Let I(J) be an instance of 4 (i.e., set of tuples) and I (c;),

I(c), I,I(c,) be the set of corresponding instances inC' C C as given in Lemma 2. C' is said to have the losslessjoin property if for all instances of 4, I(ek) = I(ci)*I(c2-

*I(c'). If I(J)CI(cl)*I)c')* *I(cw) then it issaid that e4 has the lossy join property.Let S(r) be the set of attributes in the relation scheme r

and K(r) be the set of keys in r. A join of two relations riand r,, such that S(ri) U S(ri) = S(r) satisfies the losslessproperty iff S(ri)flS(rj)-*S(ri) or S(ri)fnS(rj)- .S(rj)where X - Y denotes the fact that the functional dependencyX -+ Y belongs to or can be derived from the given set offunctional dependencies in the relation scheme r. Notice thatif 5(r1) n S(r1) is a key in r1 or r. then the join is lossless.We will show that all join operations necessary to construct anexplicit relation 4 E Ek from its corresponding set of rela-

107

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 1, JANUARY 1981

tions C' £ C are performed over keys. Hence, C' has a losslessJoin.

Let C' = c ,c, j be the set of relations in C thatcorresponds to the explicit relation ei E Ek.Lemma 3: K(c) £ K(ek)j, 1 j S w.Proof: Since the set of dependencies from which cjt is

constructed is a subset of the dependencies from which eJis constructed, then the key(s) of ci' is a subset of the key(s)of eIk.In Fig. 8, the unique set of relations C' = {c1, C2, C3} corre-

sponds to e2. The sets of keys of cl, c2 and C3 are subsets of{A1, A2}, the set of keys of elLemma 4: C' has a lossless join.

Proof.: We provide an informal proof, since a formal prooffollowing similar arguments as in [2] is quite elaborate. LetK1, K Km be the elements of K(ef). Without loss ofgenerality we can assume that the following bijective depen-dencies are in H', the nonredundant cover from which C isconstructed:

K1 K2K2 K3

Km- <- Km.

There is a set of relations C" C C' such that these bijectivedependencies are used in the construction of C". Thus, eachrelation in C" has at least one distinct pair of keys. It is clearthat it is possible to construct a relation e' by performing a se-quence of join operations on relations in C" such that e' has mkeys. Furthermore, e' is produced losslessly since each joinoperation is performed over a key. Finally, eI can be pro-duced losslessly by another sequence of join operations be-tween e' and the set of relations C'- C". Each relation inC' - C" has just one key which is also in e'.Theorem 1: If eJE Ek is an explicit relation then a2:

t(eJ) < t(C') is a one-to-one function.Proof: The function a2 first involves the mapping be-

tween the relation eJ and the corresponding unique set ofrelations C' C C. This has been shown to be a one-to-onemapping in Lemma 1. a2 then requires the mapping betweentuples in I(eI) (i.e., the extension of ek) and tuples in I(c *

c2, , cw) = I(C'). The sequence of lossless join operationsdescribed in Lemma 4 can be used to define this mapping.The lossless join property guarantees the unique correspon-dence between t(ek) E I(ek) and t(C') E I(C').Rissanen [26] has shown that if we want a one-to-one cor-

respondence to hold between tuples of a relation and its pro-jections, then the natural join is the only operation that canbe used to reconstruct a tuple from its projections. Noticethat Theorem 1 is equivalent to the independence concept ofRissanen [25]. The set C' C C corresponding to eJ is said tobe independent if

1) F*(e) = F*(C')and

2-) I(ei) I(Cl )* I(C2 ) * I* (C)which is equivalent to the "definition representation 4" ofBeeri et al. [6] if only functional dependencies are consid-

ered. The above conditions are guaranteed by the use of ex-plicit relations only.

Fig. 9 illustrates the mapping of Theorem 1 utilizing e2, cl,and c2 of Figs. 7 and 8. Notice that e2 is a virtual relationconstructed by joining cl and c2 over NAME. Due to thelossless join property I(e') = I(C'), C' = {c1, C2}.Because a, cannot be defined for implicit relations, then

a2 cannot be defined for them either. Notice that this can begeneralized for a conceptual schema constructed by anymethod. In general, if eJ is implicit then there may be severalsets of dependencies in C that can produce ei; hence, nofunction can be defined to map t(er) to the correspondingtuples in C.

It can be concluded from this that a mapping process suchas the one developed by Paolini and Pelagatti [241 is feasibleonly for the explicit relations. (Update anomalities of 3NFare ignored.) They defined the mapping- function as a func-tion which derives Ek from C. Denote this function by a':C -+ Ek. The conceptual schema C used in their model is ob-tained via the I-design method. Paolini and Pelagatti recog-.nized the nonexistence of the inverse function at'`. Fromour previous analysis. The inverse can exist only for the ex-plicit relations in Ek. It cannot be defined for implicit rela-tions in Ek. The function a' is inadequate to characterize themapping between C and Ek. Suppose that certain operationsare performed on relations in Ek. To maintain consistency ofmapping in the database it is necessary to map these opera-tions to a'` (Ek). It is essential to consider the inverse map-ping which may give several subsets each of which correspondsto the given relation in Ek (i.e., for implicit relations). On theother hand, the two functions al and a2 define a completemapping that results in a consistent mapping for explicit re-lations. We will present later an ad hoc method to deal withimplicit relations. First, however, the mapping formalism usedby Paolini and Pelagatti [231 will be adopted and applied tothe function a2.

VI. MAPPING USING MANY-SORTED ALGEBRAA many-sorted algebra [18], I-algebra, is a family of sets

Ts called cariers with a collection of operations E called thesignature, among the family of sets. The index s belongs to aset S called the sort set.Given a s-algebra T and s-algebra T', then a 1-homomor-

phism h: T -* T' is a family of functions {h.: T, -* T'} suchthat

hS(aT(t1,*** ,tm))=T'(hs,(tl),** ,hsm(tm))where ti E Tsi, rT and ATET

Using the sort sets as names for carriers let T= {relationr1(Al, A2, A3)}, T' = {relation r2(Al, A2), relation r3(A2,A3)} and z = {INSERTtupie}. The natural join is homomor-phic if

INSERT <a1, a2> * INSERT <a2, a3>=INSERT(<a1, a2>*<a2, a3>).

Both the external view Vk and the conceptual view V canbe described by many-sorted algebras. Vk is described by

108

AL-FEDAGHI AND SCHEUERMANN: SCHEMAS FOR THE RELATIONAL MODEL

Ie2:

ID NAME Salary AGE |GRADEj

lltl A. S. Smith 1250 32 26

2222 L. M. Lee 1010 24 28

3333 A. B. Black 1250 50 26

cI;

ID I MME IAGE IRD111 A. S. Smith 32 26

2222 L. M. Lee 24 28

3333 A. B. Black 50 26

A MIDE

A2- NAME NAME SALARY

A SALARY AA. S. Smith 1200

A AGEL. N. Lee 1000

A8- GRAE A., B. Black 1200Fig. 9. The dotted lines show an example of correspondence between

tuples. An operation such as DELETE tuple in el with ID = 1111 is mappedto DELETE tuple in ci with ID 1 11 and DELETE tuple in C2 with NAME =A. S. Smith (assuming that ID and NAME are keys).

{Ek, XEk}, where XEk is the set of operation in Vk. Simi-larly, {C, Xc} describes the conceptual view. As mentionedpreviously, this paper concentrates on operations on tuplesand it will be assumed that XEk = Xc = {DELETE, INSERT,UPDATE}.Let us define1) a2:t(ef) -*t(C') where ef is an explicit relation and

a2 is as defined previously;2) 'Yk:XEk C+XwhereX XScUXcU

this function translates each operation at the external levelin a sequence of operations at the conceptual level;3) the XEk-algebra: Ck = {C, XEk} where XSk = Yk(XEk)3)te1k Ek k k

Let us denote k1 = yk(u), then we have a'1 E XEk if andonly if OE XEk.Paolini and Pelagatti [23] defined a correct mapping a' iff

a' is XEk-homomorphism between the two algebras Ck andEk. That is cEk(a (eiJ)) = a'(cjk(e)) where 0Ek E 'Yk(XEk)and ef E Ek. In this paper we say that aC2 is correct iff a2 isXEk-homomorphism and a2 is a one-to-one function. Thatis, a2 is required to be an isomorphism.By Theorem 1, a2 is defined for explicit relations and if

Ek does not include implicit relations then a2 is correct forall relations in Ek.Example 3: In Example 2, E' does not include an implicit

relation. Pal is defined as follows:2 A2 Ale =C1 *C2 *C3

where Ai * denotes a natural join over attribute Ai. Because A1and A2 are keys, the join has the lossless property. Furthermore,

2 A3e2=c4 *C5

which is also lossless for the same reason and e3 = c6.Extending the join notation to tuples we get

t(el) = t(c1)*t(c2)*t(c3) and t(e2) =t(C4)*t(c5).

72: E2. INSERT<a3, a5, a6> e2 -+ C.INSERT<a3, a6> EC4AND C.INSERT<a3, as> Ec5.

Similarly for DELETE and UPDATE. Notice that if we dealonly with explicit relations then the type of interferences thatcause the mapping problems described in the introductiondisappear.

VII. HANDLING IMPLICIT RELATIONS

Let ef E Ek be an implicit relation. As a byproduct of Algo-rithm E-C it is easy to identify C', C',* , C' C C that cor-respond to eJ,that is,all distinct sets in C such that f E F*(C!)where f is the dependency of the binary relation eJ. Transi-tivity in el causes trouble in translating operations at the ex-ternal level to operations at the conceptual level. The problemis that operations on the attributes of eJ may effect attributesthat do not belong to the external view Vk. This interferenceamong external views is difficult to deal with. The transitiv-ity problems are well-known problems, and special techniqueshave been suggested such as adding an "interrelational con-straint" [151. The main objection to these solutions is thatthe relations can no longer be updated independently [7].The following solution introduces redundancy in the con-ceptual schema, but allows a certain degree of operationindependence.Suppose that the following schemas are given:

VI : E1 = {el(Al, A3)}V2 : E2 = {e (A1,A2 ), e2 (A2, A3)}V3 : E3 = {e 3 (A1, A4), e3(A4, A3)}V :C ={c1(A1,A2), C2(A2,A3), C3(A1,A4),

C4 (A4, A3 )}

where E1, E2, and E3 are the external schemas and C is theconceptual schema. The relation el E E is an implicit rela-tion. el can be constructed in the following alternativeways:

where r1 A21) el =r [A1, A3] where ri c1 * c2,

A42) el =r2[Al,A3] where r2 C3 4* C4.

The views V1, V2, and V3 overlap in the implicit dependencyA1 A3 which does not form an independent relation in C.As previously, assume that XE, = XE2 = XE3 = SC = {DELETE,INSERT, UPDATE}. Suppose that a redundant relation isadded to C:

C= {c1(A1,A2), C2(Ai, A3), c3(Al, A4),C4(A4, A3), c5(Al A3)}

such that the mapping a1: el -+ c5 can be accomplished. Thissolves the deletion problem as follows. Suppose that the userin V1 deletes a tuple t (el). This deletion would not effect (i.e.,delete) any information belonging to V2 or V3. However, asa result of this operation the set of tuples of e' (i.e., the ex-tension of el) is different from the set of tuples of r1 [Al,

The operational translator 'Yk can be defined as

'yj: E2. INSERT<aj,a2,a4,a6,a7,a8>EeC21C. INSERT<a1, a2, a7, a8> E C1 AND C.INSERT<a2, a4> E C2

AND C.INSERT <a1, a6> E C3

I

109

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 1, JANUARY 1981

A3 ] or r2[A1, A3 ]. Suppose that <a,, a2> and <a2, a3>are tuples in cl and c2. <a1, a3> is a tuple in r1[A1, A3 ];however, <a1, a3> may not exist because the user in V1 de-leted it. The tuples <a1, a2> and <a2, a3> are not deletedfor the purpose of preserving the information about A2 whichdoes not belong to V1.

Similarly, INSERT and UPDATE operations performed at V1can be mapped over el to c5. This may cause inconsistency inthe database. For example, there may exist <a1, a3> ECS,<a1,a2>Ecl, and <a2,a'3>Ec2 such that a3 #a'. Tomaintain consistency, the following rules have to be enforced.

1) If 3<a1,a2>ecC AND <a2,a3>Ec2C4<a1,a3>Ec5 such that a3 = a'

2) If 3<a1,a4>C3 AND<a4,a3>EC4 X2kal, a' > E c5 such that a3 # a'

where the notation =? denotes "implies."That is, if there exists a tuple <a1, a3> in c5, then it is not

possible to produce the tuple <a1, a'> from cl and c2 wherea3 a3.These rules guarantee the following situation: if a DELETE

is performed at V1, then this would not effect the informationat V2 or V3, and vice versa; however, INSERT and UPDATEeffect all views. In the example, the operations performed ateach Vi can be translated as follows.

1) View VI:

E1. DELETE<al, a3> E el: C. DELETE<a1, a3>Ec5El. INSERT<a1, a3>E e:e C. INSERT<a1, a3>E C5 AND

{if 3 <a1, z> E c1 then C. INSERT<Z, a3> E C2,if 3<a1, y> C3 then C. INSERT<y, a3>E C4}

E1. UPDATE<al, a3>Ee:-+ C. UPDATE<a1, a3>E cC AND{if 3 {<a1, y>E cl,<y, x>Ec2} thenC. UPDATE<y, a3> E C2} AND

{if 3 {<a1, z>E C3, <Z, X> Ec4} thenC. UPDATE <Z, a3>EC4} .

The variables x, y, and z can have any permissible value.Notice that DELETE iS performed at c5 only; however, INSERTand UPDATE are propagating to C1, C2, C3, and C4 if required.2) View V2:

among data in the intersections of these external views, butalso among operations allowed in each of these views. Inthis example, the operations INSERT and UPDATE are definedin V1, V2, and V3, but there are two kind of DELETE opera-tions. These are DELETEV and DELETEV2,v3. Hence, a dia-gram for intersections in terms of operations can be builtsimilar to one of the intersections in terms of dependenciesshown in Fig. 4. For example, Fig. 10 shows the operationintersections for the three external views discussed above.Notice that UPDATE and INSERT propagate to other externalviews, whereas, DELETEV, is limited to one view as impliedby the mapping definition given previously.

VIII. CONCLUSIONThe design method illustrated in Algorithm E-C presents a

number of important advantages in comparison with the I-design method. Our design method identifies the cause ofthe mapping problems between the external schema and theconceptual schema. No complete solution was previouslyknown for this problem with respect to the relational model[16], [23]. In this paper, it has been shown that if transitiv-ity is produced by the intersections of the external schemasthen no possible mapping function can be formulated. Somead hoc techniques can be used and one such technique hasbeen suggested. Except for the relations in external schemasthat cause transitivity in the conceptual schema, the interfer-ences among different external schemas are identifiable as setsof relations in the conceptual schema. Hence, as a result ofour design method, the explicit relations in external schemascan be realized as a natural join over independent componentsfrom the conceptual schema. This enables us to define a one-to-one function for the tuples in explicit relations to tuples inthe conceptual schema. The I-design method fails to identifysuch functions [231].The two visible disadvantages of our design method are as

follows.1) Introducing a new external schema V' in V, such that

F*(V') C F*(V), may cause a change in the conceptualschema. This change may result in "splitting" some relationin C into several components. However, this whole process

E2. DELETE<a1, a2>CE e22 C. DELETE<a a2>E clE2. DELETE<a2, a3>E e2 - C. DELETE<a2, a3>E C2E2. INSERT<a1, a2>E ee e C. INSERT<a1, a2> E C1

AND {if 3 <a2, y>Ec2,3 <a1, y> Ec5 then C. INSERT<a1, Y>E C5}

E2 . INSERT <a2, a3> E e2 -+ C.INSERT<a2, a3> E C2 AND{If 3<y, a2>E c1, 3<y, a3>E Cs then C. INSERT<y, a3>eC- s}

E2. UPDATE<al,a2>E el eC. UPDATE<a,, a2>eC1E2 . UPDATE<a2, a3> E e2 - C. UPDATE<a2, a3> E C2 AND

{if 3 {<y, a2>E cl, <y, a3>ec5} thenC. UPDATE<y, a3>E C5} AND if {3 {<a,, x>E C3,<x, a3> C4}then C. UPDATE <X, a3> C C4}

3) View V3: The operations are translated similarly to theones at view V2.In general, similar mappings can be developed for all im-

plicit relations. This discussion introduces the idea that in-terferences among external views can be detected not only

of modifying the conceptual schema can be performed auto-matically by the system if the nonredundant covers of the in-tersections obtained during the original run of Algorithm E-Care saved.2) The construction of relations for the conceptual schema

110

AL-FEDAGHI AND SCHEUERMANN: SCHEMAS FOR THE RELATIONAL MODEL

Fig. 10. Operations intersections of V1, V2, and V3.

in Algorithm E-C will produce more relations relative to theI-design method discussed previously. The worst case is whenevery functional dependency in the conceptual view forms a

relation in the conceptual schema. The penalty of redundancyin this context is analogous to the one introduced in thenormalization process in order to eliminate undesirable prop-

erties. It should be clear, however, that it is not necessary thatevery relation in the conceptual schema correspond to one

physical file. Thus, the actual cost of redundancy dependson the physical implementation of the relations in the con-

ceptual schema.Thus, we claim that the advantages of the new design method

presented so far clearly outweigh its disadvantages. In addi-tion, this design method has a very important potential appli-

cation for distributed database systems, and we are currentlypursuing this direction of research.

REFERENCES[1] ANSI/X3/SPARC, "Study group on data base management sys-

tems: Interim report," Bull. ACM-SIGMOD, vol. 7, no. 2,1975.[2] A. Aho, C. Beeri, and J. UDman, "The theory of joins in rela-

tional data bases," in Proc. 18th Conf. Foundation of Comput.Sci., 1977, pp. 107-113.

[3] S. Al-Fedaghi and P. Scheuermann, "An alternative design meth-odology for IMS," Dep. Elec. Eng. Comput. Sci., NorthwesternUniv., Evanston, IL, Tech. Rep., 1980.

[4] W. W. Armstrong, "Dependency structures of data base in rela-tions," in Information Processing 74. Amsterdam, The Nether-lands: North-Holland, 1974, pp. 580-583.

[5] C. Beeri, P. Bernstein, and N. Goodman, "A sophisticate's intro-duction to database normalization theory," in Proc. 4th Very LargeData Bases Conf., West Berlin, Germany, 1978, pp. 113-129.

[6] C. Beeri and P. Bernstein, "Computational problems related tothe design of normal form relational schemas," ACM Trans.Database Syst., vol. 4, pp. 30-59, Mar. 1979.

[7] C. Beeri, R. Fagin, and J. H. Howard, "A complete axiomatiza-tion for functional and multivalued dependencies in database re-lations," in Proc. ACM-SIGMOD Conf., 1977, pp. 47-61.

[8] E. Benci, F. Bodard, H. Bogaert, and A. Cabanes, "Concepts forthe design of a conceptual schema," in Modelling in DatabaseManagement Systems, G. M. Nijssen, Ed. Amsterdam: TheNetherlands: North-Holand, 1976, pp. 181-200.

[9] P. A. Bernstein, "Synthesizing third normal form relations from

functional dependencies," ACM 7rans. Database Syst., vol. 1,pp. 317-343, Dec. 1976.

[101 R. G. Casey and C. Delobel, "Decomposition of a data base andthe theory of Boolean switching functions," IBM J. Res. De-velop., vol. 17, pp. 374-386, Sept. 1977.

[11] C. Carlson and A. Arora, "The updatability of relational viewsbased on functional dependencies," in Proc. COMPSAC Conf.,1979, pp. 415-420.

[12] E. F. Codd, "Further normalization of the data base relational:A relational model of data banks," Commun. Ass. Comput.Mach., vol. 13, pp. 377-387, June 1970.

[13] -, "Further normalization of the data base relational," in

Courant Computer Science Symposium 6, Data Base Systems.Englewood Cliffs, NJ: Prentice-Hall, 1974, pp. 1017-1021.

[14] -, "Recent investigation in relational data base systems," inIFIPProc., 1974, pp. 1017-1021.

[15] C. J. Date, An Introduction to Database Systems, 2nd ed. Read-ing, MA: Addison-Wesley, 1977.

[16] U. Dayal and P. Bernstein, "On the updatability of relationalviews," in Proc. 4th Very Large Data Bases Conf., West Berlin,Germany, Sept. 1978, pp. 368-377.

[171 R. Fagin, "Multivalued dependencies and a new normal form forrelational databases," ACM Trans. Database Syst., vol. 2, pp.262-278, Sept. 1977.

[181 J. Goguen, J. Thatcher, E. Wagner, and J. Wright, "Abstract datatypes as initial algebra and the correctness of data representa-tion," in Proc. Conf Comput. Graphics Pattern Recognition andData Structures, Beverly HiDls, CA, 1975, pp. 89-93.

[19] L. R. Gotlieb, "Computing joins of relations," in Proc. ACM-SIGMOD Conf., 1975, pp. 55-63.

[201 G. Hubbard and N. Raver, "Automatic logical file design," inProc. Ist Very Large Data Bases Conf., Framingham, MA, 1975,pp. 227-253.

[21] E. Lewis, L. Sekino, and P. Ting, "A canonical representation forthe relational schema and logical data independence," in Proc.COMPSAC Conf., 1977, pp. 276-280.

[22] S. Navathe and M. Schkolnick, "View representation in logicaldatabase design," in Proc. ACM-SIGMOD Conf., 1978, pp.144-156.

[23] P. Paolini and G. Pelagatti, "Formal definition of mapping ina data base," in Proc. ACM-SIGMOD Conf., 1977, pp. 40-46.

[24] N. Raver and G. U. Hubbard, "Automated logical data base de-sign: Concepts and applications," IBM Syst. J., vol. 16, no. 3,pp. 287-312, 1977.

[251 J. Rissanen, "Independent component of relations,"ACM Trans.Database Syst., vol. 2, pp. 317-325, Dec. 1977.

[261 -, "Independent components of relations," IBM, San Jose,CA, IBM Res. Rep. RJ1899, 1977.

[27] J. M. Smith and D. C. Smith, "Database abstraction: Aggrega-tion and generalization," ACM Trans. Database Syst., vol. 2,pp. 105-133, June 1977.

Sabah Al-Fedaghi received the B.S. degreein computer science from Arizona State Uni-versity, Tempe, in 1973, and the M.S. degreein computer science from Northwestern Uni-

versity, Evanston, IL, in 1977.Currently, he is a Ph.D. candidate at North-western University. Between 1974 and 1976he worked as a Programmer at the ComputingCenter of Kuwait Oil Company. His currentresearch interests include conceptual modelingin database and distributed database.

Peter Scheuermann received the B.S. degreein applied mathematics from Tel-Aviv Univer-sity, Tel-Aviv, Israel, in 1969, and the M.S.and Ph.D. degrees in computer science fromthe State University of New York, StonyBrook, in 1972 and 1976, respectively.From 1975 to 1976 he taught at the Col-

lege of Wiliam and Mary, Williamsburg, PA,2 and since then he has been on the faculty of

Northwestern University, Evanston, IL, wherehe is an Assistant Professor in the Department

of Electrical Engineering and Computer Science. His current researchinterests include database systems, performance evaluation, and pro-grammmg languages. He has served as a Consultant for Sperry Re-search, Hahn-Meitner Institute, Berlin, Germany, and other organiza-tions. He is Associate Editor for the Journal ofSimulation.Dr. Scheuermann is a member of the Association for Computing

Machinery.

ill