Object-oriented database support for software project management environments: data-modelling issues

Object-oriented database support for software project management environments:

data-modelling issues S J Ochuodho

To keep pace with the stringent requirements of emergent applications, database technology has to change. Conventional databases have been succes,sful inasmuch as the application domain was, restricted to traditional data banks. Enhancement o f existing database management systems (DBMSs) has played its part, namely, stretching the capabilities o f their predecessors. Unfortu- nately, this extension is not unlimited. As of necessity, entirely new modelling concepts must be explored. The paper surveys conventional DBMSs, particularly with regard to their support [or integrated project support environments (IPSEs). Their strengths and limitations are discussed. The problems posed by such nontraditional applications are identified. Emerging ideas that seem poised to meet this challenge are analysed. Short- comings o f these 'advanced" methodologies are identified, with specific implementations considered. Based on the survey, .fundamental requirements of any IPSE database are presented. The object-oriented ( 0 0 ) approach seems to give a good handle.for satt)fying these requirements. Proposal is made to investigate fitrther how an O0-based data model suitable ['or IPSE support can be evolved.

databases, database management systems, object-oriented, software engineering environments, IPSEs

MOTIVATION

Emerging database applications, e.g., software engineering environments (SEEs*), computer-a ided design/very

*Several terminologies have been used to refer to software engineering environments, among them IPSE (for integrated project support environment), SEE, SDE (for software development environment), and ICASE (for integrated computer-aided software engineering). The author favours the first two terms, IPSE and SEE, and in most cases uses them interchangeably. However, he prefers to use SEE when the emphasis is on the environment component of SEEs, and IPSE when emphasizing the integration aspect of IPSEs (as in integrated project support environment). Literally, software development excludes maintenance. "Maintenance' is as much an issue of software engineering as is "development', and ideally the author would prefer to talk of software evolution in favour of development. The use of SDE has therefore been discouraged in this paper. Finally, ICASE is not used" because it invokes the 'CASE' connotations, which many researchers associate with nothing but 'tools'. This paper is no more interested in tools than in methodolog.ies, theoretical frameworks, and so on.

Department of Computer Science, University of York, Heslington, York YOI 5DD, UK

large-scale integration (CAD/VLSI) , geographical and office informat ion systems (GISs/OISs), pose extra- ordinary requirements on database technology. Tradit ional database management systems (DBMSs) were designed with a view largely to support ing 'simple' data-processing (DP) activities. Tradit ional DP artefacts were basically simple, if related, files o f records containing fixed-format fields. Convent ional DBMSs have therefore evolved from primitive fi le-management systems, with only a simple p rogramming interface, to systems that manage data across several files or tables.

The newer applications differ from conventional applications and f rom one another in several significant ways. First, artefacts (or objects) in these environments are by no means 'simple'. It may no longer be meaningful to talk o f data types such as ' integers ' and 'strings' as the typical granules o f computa t ion. New applications are characterized, for example, by spatial and temporal data and other forms of data with complex structure and semantics. Second, each application tends to require customized modelling constructs and tools. The types o f entities and associations between them, which must be described by a VLSI database circuit design (for example), may be quite different f rom the requirements o f a DP application or a GIS environment. However, by and large, these new applications seem to share more com- monalities than differences, as shall be shown.

This paper singles out SEEs for further investigation. The au thor strongly believes, however, that several o f the ideas expressed in this paper will apply to the other areas as well. In fact, Appendix 1, which discusses some of the major emerging application areas and summarizes their nature, clearly shows strong similarities in the requirements and challenges posed by such applications to database technology.

This study further reveals that it is not adequate simply to model the 'objects ' (i.e., structural information, traditionally called 'data ' ) , but that it is equally important to capture the processes responsible for their creation or consumpt ion. A separate paper ~ discusses process- modelling issues with regard to their support for SEEs.

The rest o f this paper is divided as follows. The next section gives an overview o f a typical database system, summarizing database fundamentals: what they are and

Vol 34 No 5 May 1992 0950-5849/92/050283-25 © 1992 Butterworth-Heinemann Ltd 283

Users/programmers what they provide. The third section reports a survey of existing and proposed data-modelling techniques. The fourth section discusses a few selected commercial or experimental engineering databases. Conclusions drawn from this study, and projections for further work, then appear. At the end of the paper, an extensive bibliogra- phy of related works is given.

O V E R V I E W O F D A T A B A S E S Y S T E M

Consider a simple conference database. The planning and administration of a conference involves the management of a substantial amount of information. Normally, this task is divided between two cooperating teams: an organizing committee (whose concern is the content, or programme, of the conference) and an administrative centre (whose concern is everything else).

Suppose the current interest concentrates on the latter. It is assumed that a conference programme, in terms of sessions, speakers, and chairpersons, has been decided on, enabling concentration on resources management. If it is assumed that the administration is required to host an international database conference, then it is possible to identify (at least) the following distinct information groupings: delegate, session, accommodation, and facilities information. Typical queries that may be sought from such an information base include booking (e.g. In which room was/will be the afternoon session of Thurs- day 24 August object-oriented database group held?), personal details (Does the chairperson of session 2A have any sight impairments, special dietary requirements, etc.?), and staff (How many support technicians are qualified to service the Xerox-710 copier, and when are they available?). All such queries can be readily handled using a simple database or information system.

An information system comprises functional components responsible for the transmission and processing of data for the benefit of some recipient. In the example the user could be the conference manager, the data-entry clerk, or even the system's programmer. The database component can be viewed as that subsystem of an information system responsible for the storage, retrieval, and maintenance of information for the benefit of other system components.

As evident from the above example, such data need to be logically coherent and must have an inherent meaning. For example, it may make little sense to talk about 'accommodation' without associating it with a delegate or a session. A number of authors believe that the integrated, self-describing repository of data is crucial to a database2. 3.

The set of programs that allows one or more people to define, construct, and/or modify (including deletion) the database constitutes the DBMS. Among other things, the DBMS facilitates redundancy control, sharing of data, restriction of unauthorized access, provision of multiple interfaces, representation of complex relationships among data, enforcement of integrity constraints,

DBMS software Software to process

queries/programs

Software to access stored data

Database )~ system Application programs/queries

Object-oriented database support for software project management environments: data-modelling issues

Figure 1. Simplified database system environment

and provision of back-up and recovery from hardware or software failures.

Figure 1 represents a simplified database system. The data bank and the data dictionary (also called schema) are shown. The schema is a meta-definition of specific data types or categories and their high-level conceptual relationships. Data and their schema are accessible to the user via the DBMS utilities.

D A T A - M O D E L L I N G T E C H N I Q U E S : S U R V E Y

The general trend eminent from the progression in database technology over the years is summarized in this section. For a comprehensive history on the evolution of database technology, the interested reader is referred to Fry and Sibley 4 and Whittington 5. As far as possible, the conference database example introduced in the previous section is used to illustrate fundamental functions of a DBMS and data modelling in particular.

The next two subsections investigate the so-called 'classical' and semantic data models, respectively, each time highlighting their strengths and limitations with regards to capturing engineering data. The third subsection looks at emerging database concepts and evaluates their performance, particularly against those of traditional DBMSs (especially their modelling power).

Conventional (or classical?) data models

The following considerations are normally essential for qualitative classification and comparison of DBMSs:

• classes of users assumed, and support given to each (i.e., user interface)

• functional components and internal interfaces • levels of data abstraction that they support • models of data supported

284 Information and Software Technology

S J OCHUODHO

] DELEGATE ~-~SPONSORED_BY H MAJOR_SPONS ]

Figure 2. Simple association between owner and member record types

• external application interface provided • class of database system configuration that they sup-

port

Of particular interest in this section are data abstraction and data models.

The database approach provides some level of data abstraction by hiding details of data storage not needed by certain users. A data model is the tool that provides for this abstraction. Three categories of data models are:

• high-level or conceptual models • record-based or implementation models • low-level or physical models

It is not the intention of this paper to explore the low- level models in any detail; suffice it to say that in this category, emphasis is on the actual physical file structure in which the database instances are stored. The conceptual models form the basis of a later section. The record- based models are briefly discussed below. Because of its wide acceptance, the relational model (which falls under this category) is individually treated. Indeed, several database researchers see the relational model as the de facto standard at present.

Navigational models The navigational models are largely attributed to the CODASYL Database Task Group (DBTG). The group originally reported their findings s, with subsequent per- iodic updates of these ideas 6. The two not-quite unre- lated models that were a result of the navigational concept are the network and hierarchical models. They are discussed below.

Network model This data model has two basic data structures - - records and sets. Data (or entity sets) are represented directly by logical record types. Attributes of an entity set become fields of the logical record format. In the case where an entity is determined uniquely only through a relationship with another field, a further field is added (i.e., the serial number notion) for the entity set, uniquely identifying each entity. Unlike relational and hierarchical models, which allow only simple attributes, the network model allows complex data items to be defined. Repeating groups allow the inclusion of a set of composite values for a data item in a single record. A set type is a' description of a 1 :n relationship between two record types. A set type has a name (e.g., SPONSORED BY in Figure 2), an owner record type (DELEGATE), and a member record type ( M A J O R S P O N S O R ) . Member records of a set instance are ordered, i.e., owner-coupled set or coset.

A set of instances is represented as a ring (circular

linked list), linking the owner record and all member records of the set. Each record type has a type field - - to distinguish between owner and member records - - and a pointer field - - to identify NEXT, FIRST, LAST . . . . member records. For more efficient set implementations, double-linking, owner pointer representation, (physically) contiguous member records, pointer arrays in owner records, indexed representation, etc., are normally used.

Only binary and simple one-to-many relationships are representable directly by such links, m:n relationships are treated by creating an intermediary linking (or dummy) record type; 1:1 relationships are not permitted (see the entity-relationship approach later).

Although the navigation diction may be as powerful as the relational 'join', relational languages have natural dictions that are more powerful than the network languages. On the other hand, navigation through the network model can be more easily implemented than the relational operations like 'join'. However, insulation from implementation issues is perhaps the major strength of semantic data models over traditional models. It is the author's view that implementation considerations should be secondary when reviewing data models.

Many papers have been written on the network model. Olle 7, for example, exhaustively discusses this model. Notable realisations based on the network model are Total s and IDMS 9. The following implementations have also been realised: DMS (by Unisys), DBMS (Digital Equipment), and Image (Hewlett-Packard). Although the relational model has long been recognised by some as a standard, and that emergent data models are gaining ground quite rapidly, most of the world's data still reside in either network or hierarchical DBMSs.

Hierarchical model This model represents data as hierarchical tree structures. Each hierarchy represents a number of related records. The model represents hierarchical data in a direct and natural way, but has problems handling nonhierarchical data. Typical is the idea of the parent-child relationship (PCR). An instance of the PCR type consists of one record of the parent record type and a number of (possible zero) child record types. Consider record types SPONSOR and DELEGATE, with a PCR type (SPON- SOR, DELEGATE). This is a l:n relation. Problems arise with m:n relationships, e.g., (SESSION, DELE- GATE). These are usually handled by allowing for dupli- cation of child record instances, or by specifying several hierarchical schemas in the same database.

In fact, the hierarchical model is simply a 'network' in which all links point in the direction from child to parent; hence the author's justification to group the network and the hierarchical models under the same umbrella. Ull- man 2 describes the hierarchical model in detail. Notable implementations of this model are Information Manage- ment System (IMS) ~°,~ and MRFs System 2000 ~2. Both IMS and System 2000 have been evaluated 13,14.

Vol 34 No 5 May 1992 285

Object-oriented database support for software project management environments." data-modelling issues

delegate #

Table: DELEGATE

name affiliation room # address

a

Table: DEL_RM_ALLOC

delegate # name room #

b

Figure 3. Relations (a) and views (b)

Relational data model The relational model represents data as a collection of time-varying relations. Different people hold different perceptions of relations; as tables, sets, predicates, or even functions. The relations, possibly of different types, consist of tuples. All tuples for a given relation are of the same type. Tuples are themselves composed of attributes, also possibly of different types. By way of qualification operators, other relations (or views 15) can be derived from one or more base relations.

As an example, reconsider the D E L E G A T E record type, or relation. A delegate may be assigned a unique attribute delegate# (for delegate's number, shown in bold type), and other attributes, e.g., name, room # , etc. (see Figure 3(a)). The room-allocations committee may not be interested in the delegate's affiliation or address. They may therefore create a simpler relation, Del Rm Alloc, which only shows the delegate's number (and possibly name) and the number of the room allocated (room # - - see Figure 3(b)). Del_Rm_Alloc is a 'view' of the base relation Delegate.

Views are just one basic underlying concept of the relational model, the others being data manipulation, query processing, and integrity checking. This is not to say that these issues do not arise in the navigational models. Of course they do. However, with a firm theory to support the relational model (i.e., relational algebra and calculus), these ideas are perhaps best exemplified in the relational model. Their discussion is, however, beyond the scope of this paper.

The relational data model forms the basic concept that underlies the relational approach to data definition, manipulation, query, and integrity.

In May 1979, the ANSI /X3/SPARC DB S -S G Rela- tional Database Task Group (RTG) was chartered to investigate the justifiability of proposing a relational standard to ANSI. An objective of the R T G was to develop a relational data model system (RDMS) charac- terization that was consistent with existing relational technology. The group analysed 14 of the then existing so-called relational DBMSs 16. One thing that was clear

was that not one of the DBMSs could claim to be fully relational, in the context of Codd 's 12 rules tT. Neverthe- less, the following points seemed to put the relational model ahead of the navigational models:

• A strong theoretical foundation: relations are based on the set-theory and first-order predicate calculus.

• Simplicity - - a consequence of the above, simple, and well understood mathematical concepts.

• Case for uniformity - - relational algebra and/or calculus exhibit closure, i.e., a query on a relation returns a relation as a result is.

• Data independence - - relational schemas and languages are free of many representational details such as access paths, ordering, and indexing.

• Conceptual and internal schemas are usually indis- tinguishable.

• A basis for high-level interfaces. • Multiple views of data may be supported; a dynamic

definition of new user views is permitted. • A basis for database semantics (perhaps more 6f an

advantage of semantic data models than the relational model - - see later).

Two major breakthroughs of the relational model evident in the early 1980s were that productivity increased by a significant factor (5 20 times), and end-users could now handle their own problems without the need for application programmers to mediate between the users and the systems. But the relational model is not without problems. Nevertheless, its most noticeable problems are true of the navigational models as well and therefore not discussed until the next subsection, where these drawbacks are summarized.

Prominent examples of the relational model are Ingres, a product of the University of California, Berke- ley, USA, and Relational Technology 49,2°, IBM's DB2 and System R 2t, and Oracle (Oracle, Inc.) 22. Other major commercial products are Unify (Unify, Inc.), Empress (originally MRS, University of Toronto, Canada) 23, and Query-By-Example (IBM).

Problems with traditional models In summarizing, a terminology table (see Table 1) is used to compare the major hypotheses of the three traditional data models. Both the navigational models are founded on the premise of a record type. Each record type is characterized by values of its fields. Every field entry, or data item, must be of a particular data type (e.g., integer, character, string, etc.). Related record types share a navigational link.

The relational model exists in two variants: a more user-oriented, informal one, and a more developer- oriented, formal one. The major notion is that of a relation (or table). A relation has got properties (called attributes). Like data items, attributes must be of a particular data type. A primary key (or set of keys) can be used to identify uniquely one instance (or tuple) of the relation. The relational model closely resembles the entity-relational model (see later).


Table 1. Comparative terminology of traditional data models u

Network Hierarchical Relational model Entity- model model relationship

Formal Informal model

Record type Record type Relation Table Entity type description description schema descrip- schema

tion Record type Record type Relation Table Entity type

(or segment) Record Record occurrence occurrence Set type PCR type

Tuple Row Entity instance

a a l:n relationship type

Set PCR a a 1 :n occurrence occurrence relationship

instance Field, data Field, data Attribute Column Attribute item item Data type Data type Domain Data typeValue set b b Candi- Same Key

date key Key, unique Sequence key Key, unique field field Vector b b b

Repeating group b b b

b

Multivalued attribute

Composite attribute

a - - No corresponding concept exists; relationship established using foreign keys. b - - No equivalent concept or term.

Below is a summary of the problems of using the traditional models to capture software environment artefacts.

The network model:

(1) does not support 1:1 relationships, which are not as rare in software engineering environments as they are in typical DP applications (where they are almost nonexistent). Later, it shall be seen that in SEEs the number of object instances may not be as much a problem as the number of object categories.

(2) support for m . ' n relationships is unnatural. Again, the assumption is that most relationships of vital interest are l:m.

(3) is largely influenced by implementation issues, there- by compromising semantism.

The hierarchical model:

(1) has similar problems to (l) and (2) above. (2) breaks down when used to represent nonhierarchical

data. Fortunately, most data-intensive applications have an inherent hierarchical nature. Indeed, the object-oriented models are highly hierarchical, prompting some pessimists to claim that object orientation is simply a rebirth of the hierarchical model. But see (3) below.

(3) although hierarchical, barely supports reuse. For example, there is no direct relationship between a grandparent and a grandchild. Later, it is shown that

S J O C H U O D H O

reusability is perhaps the one major strength of object orientation.

The following drawbacks are common to all traditional data models. However, as they are problems perhaps most conspicuous in the relational model, and the same ones that have been advanced previously by proponents of semantic models, they are put under the relational model. It must be understood that they are as much a problem in the network and hierarchical models as well.

Thus the relational model:

( l )

(2)

(3)

(4)

separates 'instances' from the data 'schema'. This separation is not natural in the sense of the real world of discourse, which the database models. separates 'structure' from 'behaviour ' . Operations that manipulate instances (themselves seen simply as data objects) are separately held as application programs. Again, this separation is artificial. does not support m : n relationships and nary (i.e.,, involving more than two relations) relationships in a straightforward manner. is limited in semantics. Even when it is possible to represent the atypical relationship types, relations (or tables) do not say a lot about the semantism of an association between two or more relation types. Two simple examples may help to elaborate. First, consider two relations D E L E G A T E and SPEAKER. The two may be held in separate tables, with little to suggest that SPEAKER is merely a special class of D E L E G A T E . Second, ' room # ' may appear as an attribute of the relation DELEGATE; nothing whatsoever suggests the actual significance of this attribute. It can reasonably be assumed that this is the room in which the delegate has been accommodated during the conference. But why can it not be the room in which the session chaired by this delegate is held? And why not the usual room (or office) in which the delegate works (while not away at a conference)?

It is true that to expect a database to capture every minor detail of the universe it models is unrealistic. However, it has been shown that data models can be evolved which exhibit (even if only minimally) richer semantics. Subse- quent sections discuss some of these models. Lack of semantics is perhaps the greatest drawback to traditional data models.

E x i s t i n g s e m a n t i c d a t a m o d e l s

As has been mentioned, a DBMS must ascertain reliable and secure management, modification, and retrieval of data in the database. The schema and the actual data (stored as instances) capture (or at least at tempt to) the miniworld semantics. The ultimate goal of a DBMS is to keep the semantic gap between the miniworld and the database as narrow as possible.

Integrity constraints are crucial in any data model. The class of semantic data models discussed herein

Vol 34 No 5 May 1992 287


address themselves to these constraints in general, and especially to semantic integrity. As data are a symbolic representation of something else, semantic integrity requires a relationship of truth (and exactness) between the data and what they actually represent. Sound, faith- ful data representation must accurately reflect the 'real world' it represents, as seen by the database user or designer. The issue here is the expressive power of a data model that underpins the schema description. The more real-world semantics the model can capture, the greater the degree of integrity the database can be accredited with. Perhaps the highest degree of expressive power that can be expected of a model is at the level of a programming language. Atkinson and Buneman 25 liken a data model to a type constructor in a programming language.

In addition to expressiveness, a model should be general enough to allow the designer to describe a wide range of information systems. At best, expressiveness and generality are incompatible; in many cases one must be compromised for the other.

Classical databases (hierarchical, network, or relational) tend to deal with simplistic entities and therefore cannot incorporate complex objects, as is the norm in computer-aided designs. To represent a complex object, one conceptual entity has to be represented by a number of database objects, e.g., records, tuples, etc. Semantic data models attempt to close this gap. This section discusses some of the more widely used semantic models. Later object-oriented databases, which are based on a model that allows representation of one miniworld entity by exactly one object, shall be considered.

Entity-relationship (ER) model The ER approach adopts the view that the universe of discourse can be described in terms of entities and entity types, relationships and relationship types, and attributes. An entity is a distinguishable object of interest; it might be a physically tangible object, for example, DELEGATE, or it might be abstract, as for SESSION. Entity types can be subtypes or supertypes of other types. The subtyping aspect is similar to what Codd has chosen to call categories.

Whereas subtyping allows the capture of one form of association between entities, relationships allow the capture of other forms. Relationship types are characterized by their degree of functionality, which could be one-to- one (1:1), one-to-many (l:n), or many-to-many (re:n). Entities play roles within relationships and are characterized by their attributes (same concept as in the relational model). Each attribute is associated with a named value set, which defines the allowable values that it can assume. Identification key attributes are used to identify entities uniquely. Occasionally, the entity set is not distinguishable by its attributes, but rather by its relationship with entities of another type. A most important kind of 'buil t- in' relationship is the is_a relationship.

Refer to Table 1 for a comparison of ER terminology with that of conventional data models.

ER diagrams (ERDs) play a key role in the ER model. Perhaps the apparent success of the ER model can be

IDELEGATEF SPONSORED_~Y ~SPONSOR] a

IDELEGATE~SPONSORI b Figure 4. Simp~ examp~s of ERDs: (a) m:n re&t~nsh~ (b) Chen~ ERD notation

attributed to this graphical notation. To give an example of an ERD, Figure 2 is reproduced, slightly revised, as Figure 4(a). Recall that the network model, which Figure 2 emulated, does not support m:n relationships. The relationship between D E L E G A T E and SPONSOR is, strictly speaking, an m.'n relationship. A sponsor (institu- tion, company, bursary organization, etc.) can sponsor several participants to attend a conference. Conversely (albeit rarely!), one delegate can be cosponsored by several sponsoring bodies. It is not uncommon to find a delegate partially funded by an agency or department to attend a conference, especially in cases where either the overall conference expenditure exceeds the vote allotted to one individual by an organization, or where attend- ance at a conference is of mutual benefit to more than one agency. Figure 4 reflects this general state of affairs: a sponsor can support several delegates, and a given delegate may also be supported by one or more sponsors.

m:n relationships are shown in an ERD using a forked edge linking two (or more) entity types. Entities (or entity types) appear inside rectangular boxes. For instance, D E L E G A T E and SPONSOR are entity types.

To show a 1 :m association, a fork at the corresponding edge of a linking arc is removed. In a similar way, a 1:1 relationship can be depicted. It may even sometimes be necessary to show that participation in a relationship is optional. For example, depending on what semantics is chosen to be assigned to SPONSORED_BY, there can be an 'optional' association between D E L E G A T E and SPONSOR - - considering that there may be some delegates who do not have a sponsor or who have sponsored themselves. There are graphical notations to distinguish 'optional' from 'mandatory' relationships. Indeed, in Chen's original notation, even a relationship itself had a unique diamond shape to show a relationship (e.g., as in Figure 4(b)). However, for detailed description of these diagrams, together with other concepts of the ER model, the reader is referred to Chen 26,27. It may be worth mentioning that, since Chen first published his proposals, many different authors have preferred to use slightly varied flavours of the ERD notations.

At present, the ER model is used mainly during the analysis and design stages. A number of notations have been proposed over the past decade or so for conceptual analysis. But the ER model, first proposed by Chen in 1976, and later enhanced through the work of others 28,29, has emerged dominant. Chen 3° contends that a major reason for evolution of the ER approach was the need to unify existing data models. Each of the then existing


Table 2. Summary of mapping ER model concepts to relational, network, or hierarchical models

ER model Relational Network Hierarchical concept model model model

Entity type Weak entity type

1:1 relationship type

1 :n

relationship type

m.n relationship type

nary relationship type

As relation As relation, but include primary key of in set type identifying with identify- relation ing record

type as owner (or as repeating group)

Include Use set type primary key of whose instan- one relation as ces are restric- foreign key of ted to having other relation, one member or merge into record, or single relation merge into

single record type

As record type As record type As record type As record type that is member that is child of

identifying record type

Use PCR type whose instances are restricted to having single child record, or merge into single record type

Include Use set type Use PCR type primary key of ' l-side' relation as foreign key in 'n-side' relation Set up new Set up linking (a) Use single relation that record type hierarchy and includes as and make it duplicate records foreign keys member in set (b) Use multiple primary keys types owned hierarchies and of participat- by participat- VPCR types ing relations ing record

types Same as m.n Same as m.'n (a) Same as m:n

(b) Make relationship as parent and participating entity types as children in single hierarchy

three models had their own unique pros and cons. More- over, there was a need to develop a logical, database design methodology that was independent of existing commercial DBMSs. It is no wonder then that the ER approach has been used during the design stage by users and proponents of all three (somewhat orthogonal) traditional data models.

A summary of a mapping of concepts of the ER model on to the classical models is given in Table 2 24 . It is to be hoped that, in the near future, DBMSs will evolve that will have the capacity to implement directly a database described by a high-level conceptual schema. A lot of research is being done in this direction 31,32.

Other semantic data models Extensive work has been done on semantic data models by various researchers. Acknowledgement in this respect would not be complete without the mention of the contributions of Brodie 33,34 and Codd 29. In this section, some

S J O C H U O D H O

DELEGATE

~ c ~ p ~ - ° n ~cleared_on ~ is_charged

ROOM IN_DATE OUT_DATE RATE_PER_NIGHT

Figure 5. Example o f binary relationship structure

notable implementations (or suggestions) are discussed. Certain models whose principal hypotheses have been used in subsequent, more advanced models have been deliberately excluded. Two notable examples in this category are the basic semantic model proposed by Schmid and Sweson 35 and the Semantic Data Model (SDM) 36. Semantic data models have been surveyed 3v.

There are essential features that a semantic data model must possess in one way or another. Of course, it must have the three basic elements expected of any data model, i.e., structural definition, well defined operations, and integrity rules. In addition, the model should:

• support identification of individual objects based on nothing but existence of the object in the system

• possess a notion of grouping similar objects into types or classes

• have a set of built-in relationships, e.g., subtype of, instance of

• provide for user-defined relationships • have at least the basic operations of create, delete, and

update

Binary model The binary modeP 8 aims to achieve a formal specification tool for expressing relationships. The model addresses the problem of object identification. When a new object enters the perception field, it must be identified as a new or regular object. Although it is permissible to associate an object with an external name, it is preferred to keep the identification procedure within the model and give the model the responsibility for naming objects by some internal mechanism.

This model is founded on the fact that a binary relationship is the most basic tool into which complex relationships must be broken for their complete and efficient representation.

The structure of the binary model consists of two access functions (or links) forming a binary relationship between two categories (or types). This model is perhaps best illustrated with an example (see Figure 5). In this case a D E L E G A T E was accommodated in room ROOM between the dates IN_DATE and OUT_DATE at the daily fee of RATE_PER_NIGHT.

The integrity constraints of this model are described in

Vol 34 No 5 May 1992 289


terms of the cardinalities of the access functions and by programs. Operations available include generate (for creating new objects) and kill, which corresponds to the relational 'delete'. Other set operations are possible by using access functions in combination with the usual set operators.

The binary model suffers from two major problems. First, it gets unwieldy even for a simple relationship (consider, for example, the relationship between a D E L E G A T E and his/her ADDRESS, disallowing composite attributes!). Second, and perhaps more crucial, the model assumes that all relationships of significance are binary. How about relationships with an arbitrary degree, e.g., unary, nary (n > 2)? Their representation is not straightforward. However, because of its binary nature, intuitively, it should be easy to implement using typical digital techniques: VLSI technology is founded on binary (or Boolean) arithmetic. However, there is no evidence to support this claim, nor is it relevant in the context of current discussion.

Semantic hierarchical model (SHM, S H M + ) Smith and Smith z8 justify why it is essential to incorporate the abstractions of aggregation and generalization*. Some of the points supporting the case for aggregation and generalization include:

• Abstractions (or views) pertinent to different database users can be effectively integrated and consistently maintained.

• Data independence can be provided under several kinds of evolutionary control.

• A more systematic approach to database design, particularly of database procedures, can be developed.

• More efficient implementations are possible as more assumptions can be made about higher-level structures.

In SHM, the generalization attributes (G-attributes) of an object are those that are 'relevant' to the entire membership of the class it is considered to belong to. In addition, a class member (or instance) or member-group (i.e., subclass) may have attributes that are specific to that particular instance or subclass. Thus the attributes of an instance comprise the G-attributes together with instance-specific attributes.

*Although precise definitions of these and related terminologies (and whether they are really novel ideas) are still a bone of much debate 39.4°, the following definitions will suffice: • Specialization - - further classification of a class of objects into more

specialized subclasses. • Generalization -- the inverse process (of specialization) of generaliz-

ing several classes into a higher-level abstract class that includes the objects in all constituent classes.

• Aggregation -- an abstraction concept for building composite objects from their component parts, e.g., a conference is composed of delegates, sessions, presented papers, etc.

• Association -- for associating objects from otherwise independent classes, e.g., delegates attend sessions. When an association link is deleted, participating objects may continue to exist. Notice that with aggregation, when a component object is dropped, the composite object can no longer be whole, in the original sense.

A set of relational integrity rules together with (often complex) cascading operations are supported.

On the other hand, the extended version ( S H M + ) provides both structural and operational primitives, constructors, and hierarchies. Its approach is based on:

• data abstraction (to hide nonessential details) • localization (to ensure that each property is indepen-

dently designed)

The operation primitives analogous to create, delete, and update are, respectively, store, drop, and modify, while the operational constructors (no direct equivalents in the relational model) are functional composition, choice, and repetition.

Brodie and Dzenan 34 express concern over two poten- tially conflicting database philosophies in this model:

• semantic relativism (relationships are not tinguished from objects)

• relationships are distinguished parts of the model

dis-

They observe that the latter provides simple operations in that integrity is easily preserved.

The lack of behavioural (or operational) properties is a definite drawback to the original SHM. SHM + introduces some limited way of incorporating operations; however, the set of operations supported is too limiting. Besides, it is sometimes necessary to impose constraints on certain operations, or operands. Neither SHM nor SHM + capture constraints. But that SHM + supports limited behavioural information gives it a leading edge over the ER model in that regard. However, the author considers the support for aggregation and generalization as the major contributions of SHM with regards to the ER model. On the other hand, the ER model gives a simpler and more powerful means of capturing general associations. More recently, some researchers 24,41,42 have attempted to integrate the concepts of the ER model with those of aggregation and generalization to full fruition.

Two other shortcomings of the SHM models are:

• A model must have concepts, tools, and a methodology to be useful. The author is unaware of any tools that support SHM or SHM +.

• It is sometimes necessary for the data schema to evolve dynamically, if possible. The SHM models do not specifically address schema-altering transactions.

Later the paper discusses the object-oriented (OO) paradigm, which subsumes some of the key features exhibited by S H M / S H M + , i.e., support for aggregation and generalization, abstraction and encapsulation (or information hiding and localization), and support for behaviour. Indeed, object orientation provides such a general way to conceptualize operations that even operations specific to entire generic classes or schemas can be simply defined. Unfortunately, object orientation is not so successful at supporting 'normal ' associations. Full discussion of object orientation is delayed until the next


section. Next a further model is presented that attempts to incorporate aggregation, etc., into a relational model.

Extended relational model ( RM/T, RM/T + + ) As already mentioned, the languages currently in use for the relational model are restricted versions of the relational algebra and relational calculus. Although quite satisfactory for general DP applications, the relational model and its aforementioned languages are rather inadequate for other 'data-intensive' domains, especially for knowledge-based and engineering applications 43~44. This has led to a concerted effort to 'extend' existing models, the relational being no exception 45,46.

The first extensions of the relational model that have been studied deal with hierarchically structured objects, often referred to as nested relations or complex objects. The shortcomings of the first normal form (1NF)p ro - posed by Codd were first brought to light by Makinou- chi 47. Numerous models that partially or completely remove the 1NF restriction, with associated languages, have been proposed 48-5°. Beeri 45 classifies the proposed models roughly as supporting either nested relations or general complex objects. In a nested relation, an attribute value may be either an atomic value or a relation. The constructors that are used in the model are the set and tuple constructors, without any restriction on which order they are used, allowing sets of tuples, sets as tuple components, etc. He treats nested relations (also called INF, NFNF, or NF 2 relations) as a special case of

complex objects in which the set and tuple constructors alternate.

Stonebraker 5~ proposed the following extensions to QUEL (yielding QUEL + ): if F be the construct 'QUEL- col-1 . . . QUEL-col-n.field', then F can appear wherever a field of a relation can appear, and the construct 'tuple- variable.F' can appear whenever a tuple variable or a relation name can. Clauses of the form G1 newop G2 are allowable if GI and G2 are tuple variables, or the construct 'tuple-variable.F' and newop is in the set {U,!!, < < , > > , = - , < > , J J , O J , ( ) } , where symbols have the same meaning as in QUEL 52. EXECUTE and EXECUTE-ONE are added as commands. An operator, in, is added by accepting an indirectly referenced column as a left operand and a relation name as a right operand. A keyword, with, which is usable with the EXECUTE command to indicate the presence of a parameter list, is also introduced.

Stonebraker's ideas of database extension with procedures are implemented in Postgres 53. Postgres is a relational model (spinoff from the Ingres work 54) that has been extended with abstract data types (ADTs), including user-defined operators and procedures, relation attributes of type procedure, and attribute and procedure inheritance. Indeed, the Postgres query language is a generalized version of QUEL, called POSTQUEL. QUEL was extended in several directions. First, POST- QUEL has a from clause to define tuple variables rather than a range command. Second, arbitrary relation- valued expressions may appear in QUEL. Third, transit- ive closure and execute commands have been added to

SJOCHUODHO

the language. And, last, Postgres maintains historical data so POSTQUEL allows queries to be run on past database states or on any data that were in the database at any instant of time.

Supporters of Postgres argue that semantic and functional data models 28,36,55 do not provide a flexibility that extended relational models (such as Postgres) offer. The issue is that semantic and functional models cannot easily represent data with uncertain structure (for exam- pie, objects with shared subobjects that have different types). However, the author contends that the notion of aggregation may be redefined to deal with such inhomo- geneity.

Another dimension to relational model extension is presented by Biskup and Bruggemann 56. They are content with the universal relation view concept as an effec- tive way to extend the relational model. The universal relation is a view (external conceptual schema) on top of the relational database schema. It shows the whole database as a single fictitious 'whole' relation. The main goal of the universal relation view is to allow queries with no navigation whatsoever (of course as far as possible). Thus queries are stated only by means of attributes without mentioning the database relations. In this paper, the universal relation view is not discussed in any detail.

In general, investigations have been aimed at capturing (in a reasonably formal way) more of data meaning, while preserving independence of implementation. Codd 17 summarized the goals as:

• the search for meaningful units that are as small as possible (atomic semantics)

• the search for meaningful units that are larger than the usual nary relation (molecular semantics)

Daly 57 gives a survey of constraints to realisation of the above. Several attempts aimed at achieving these goals have been cited earlier. However, Codd's RM/T ILl7 stands out as a distinctive example of an extensible relational model.

The following questions motivated the evolution of RM/T:

• Is it possible to be more precise about what constitutes a simple assertion?

• What other regularities can be exploited in a formatted database?

• To what extent can these additional regularities be represented in readily analysable data structures as opposed to procedures?

In his quest to answer these questions, Codd incorporated the concepts of system-defined surrogates, E- and P- relations, association, aggregation, generalization, and event precedence into the original relational model. RM/T contains its own extensible catalogue and has an extended operator set. Systematic use of entity domains enables RM/T to support widely divergent viewpoints on atomic semantics, ranging from the extreme position that the minimal meaningful unit is always a binary relation to other more moderate positions.

Vol 34 No 5 May 1992 291


In RM/T, every entity in the database is an instance of at least one entity type, with all entities of a given type sharing the common properties of that type. Through a type hierarchy, an entity can be an instance of more than one type, and an instance may inherit properties of its supertype(s). Associated with every entity is a unique, system-generated identifier called a surrogate, which can always be used within a database to identify uniquely the entity. A mechanism does exist for associating external names to entities 58.

All entity types are classified to be one of kernel, associative, or characteristic, and each type may be designative in addition. Through this classification, integrity constraints can be applied over and above those normally possible with traditional relational models. An integral part of the model, called the catalogue, is a schema that defines the internal structure of the database. RM/T has the relational model as its core, and therefore maximally exploits the powerful relational mathematical concepts and query languages. Additional operators have also been defined at the entity level that extend the algebra of operations.

The Aspect project 15,58 used RM/T for its information base. An attempt was even made to extend further RM/T itself 46 to get what was called R M / T + +. A distinctive feature of RM/T + + is the self-referential nature of its catalogue. R M / T + + has uniform surrogates. It distinguishes between the semantic and relational levels, and it has a well defined set of operations dealing with the semantic level. Unlike RM/T, RM/T + + 'attempts' to support null values. The set of R M / T + + operators closely follows the presentation of the Peterlee Relation- al Test Vehicle (PRTV59). Additionally, five new operations are inherited from Codd's RM/T, namely, COM- PRESS, APPLY, Partition-by-Attribute (PATT), Partition-by-Tuple (PTUPLE), and Partition-by-Rela- tion (PREL). The CLOSE 'graph-operator ' is also offered by RM/T + +. RM/T + + further provides three database manipulation operators (DENOTE, TAG, and SETREL), which cannot be specified without reference to an encompassing database, plus manipulative database operations, such as create/destroy relation, create value set, and create value subtype. Earl calls the three identifier-relation-associating operators name operators.

In summary, whereas the result of extensions to the original relational model (RM) have resulted in models (RM/T and clones) with a richer variety of objects, and enshrines a more powerful algebra, it has probably made the algebra overtly complicated. Codd himself acknow- ledges this sad predicament. Furthermore, as RM/T is intended primarily for database designers and sophisticated users, the need for research into alternative representations aimed towards the general end-user has perhaps never been so great.

For its part, R M / T + + proposes a few changes and extensions to RM/T 46. Insignificant omissions (in Earl's opinion) are also suggested. As with RM/T, perhaps the additions themselves have only succeeded in making what was a simple, relational model into an overtly complex data model. But what is probably more fundamental

is that, like the semantic models, RM/T models also separate data instances from their schemas. And although RM/T extends the spectrum of permissible 'operations', operations are not 'embedded' within entities. Next functional data models are discussed, which not only uphold the significance of 'behaviour modelling', but also see entities simply as operands to be passed on to operations (or functions) as parameters.

Functional data model Functional Query Language (FQL) 6°,6t and Daplex 55 are two notable examples of the several proposed (and, in some cases, implemented) functional data models (FDMs). In the view of FDM proponents like Shipman and Buneman, this model is a more 'natural' representation of the universe of discourse and is likely to be more easily grasped by a user. Of these two models, several authors agree that Daplex exemplifies the functional concepts best; therefore it is used for illustration.

Daplex's basic notions are of an entity and a function. At present, Daplex is embedded in Ada; it might be interesting to see a Daplex 'naturally' embedded in a functional language like Glide or Miranda6L To establish functions in the system, the DECLARE statement is used:

DECLARE Delegate() = > > ENTITY DECLARE Name(Delegate) = > > STRING

which declares entity type 'Delegate' with one attribute 'Name'. The multivalued property of the Delegate function is emphasized by the double-headed arrow. More- over, functions may take several arguments. To define inverse functions, derived data, or views, the DEFINE statement is used:

DEFINE ROOM(Delegate) = > > INVERSE OF Delegate(Room),and DEFINE Delegate(Room) = > > Delegate(Room(Rate_ PerNight))

This composition of functions can be seen as the traver- sal of a graph, with entities at the nodes and function applications as arcs.

Daplex also provides for the much-talked about super- and subtypes. The DECLARE statement is used in this regard, for example, the following 'declares' that a Speaker is a Delegate:

DECLARE Delegate() = > > ENTITY DECLARE Nationality(Delegate) = > STRING DECLARE Speaker()= > > Delegate.

For retrieval, the FOR statement is invoked (see select in relational). EXCLUDE and INCLUDE are the equivalents of 'delete' and 'append', respectively. Although Daplex is based on the ideas of functional application and composition, it retains a predicate calculus outlook by virtue of the FOR statement.

Daplex is simply one of several proposals 36,°3 aimed at higher-level features because of semantic expressiveness.


A pragmatic survey of these proposals' development has been done 64.

Overall, FDMs and ER models are at opposite ends of a semantic-data-modelling scale. The former emphasizes ~behaviour', the latter 'structure'. Different circumstances will obviously require a shift in the focal point. Object orientation attempts to reconcile the two extremes, but only just.

Summary Samples of representative existing data-modelling techniques have been considered. Not all data models have been covered; such a target would both be unachievable and fall outside the scope of this paper.

Some of the major problems with existing models have been discussed, and these are now summarized:

. Limited semantics: this is a major drawback to traditional models, and hence the main driving force behind the evolution of semantic models. But even semantic models do not offer as much semantics as some users would like to see. That more recent work seems to have concentrated on incorporating more semantism confirms this deficiency.

• Lack of support tools: a useful database system should provide a conceptual model, methodology, and tools. DBMSs have evolved that support the traditional data models. Few tools exist today to support semantic models. It can only be hoped that with the advent of computer-aided software engineering (CASE) tools, this situation may be reversed in the near future.

• To date, the relational model seems to have been accepted by several database technologists as a standard. Apart from its sound theoretical foundation, the relational model is simple and straightforward, especially in the way it represents entities as tables, with rows and columns, etc. However, this only works as long as the domain of attributes is limited to simplistic, flat records. The extended relational models attempt to annex the original model to capture more complex attributes. But this has only helped to take away the very simplicity that underpinned the relational model.

• It has been said that it is important to capture both the 'structural' aspects of a universe, as well as its dynamic (or 'behavioural') ones. Of the semantic models discussed, the ER and functional models appear to be at the opposite ends of a spectrum, emphasizing one or the other (of structure or behaviour). A good model should capture both axes, yet remain simple enough to be comprehended by even the most inexperienced end- user of the database. The OO paradigm gives insight into some possible solutions.

• All the models discussed so far separate schemas from instances (also sometimes called extensionals). Intuiti- vely, there may be good reasons to ~encapsulate' instances together with schemas. Perhaps more con- cretely, the separation of the two is seen as an artificial decision.

• Finally, most of the models discussed so far only sup-

SJOCHUODHO

port some high-level forms of constraints - - more specifically integrity and referential constraints. They, however, do not support constraints at arbitrary levels of granularity, e.g., at the level of an instance or an attribute. Next some more recent models, which not only support explicit constraints, but also have marginal self-reasoning capabilities, are discussed.

Emerging data models

In addition to the relatively more established data types dealt with so far, newer modelling concepts are surfac- ing. This saga has been a result of the emergence of database applications too complex to be handled effectively by traditional DBMSs. Examples of such applications include computer-aided design/computer-aided manufacturing (CAD/CAM) databases, software engineering databases, imaging and graphics, cartographic and geological databases, multimedia databases, and knowledge bases for artificial intelligence (AI). Indeed, this list is not exhaustive. The semantic data models discussed in the previous section address one of the major requirements of these applications - - that of expressive (or semantic) power. But as has already been shown, even the semantic models do not go far enough in expressiveness. For example, they do not have an inferencing capability, and most of them do not model behaviour.

Now two more recent models with enhanced capabilities are discussed: knowledge-based and object-oriented systems.

Knowledge-based representation Knowledge-based representation (KBR) aims to model accurately some domain of discourse by storing, manipulating, and using knowledge to draw inferences, make decisions, or just answer queries. Like semantic models, KBR uses an abstraction mechanism, but additionally provides for constraints and operations. The scope of KBR is, in general, wider, and includes different forms of knowledge, such as rules, incomplete and default knowledge, and temporal and spatial knowledge. It provides for reasoning mechanisms, and often mixes schemas with instances for flexibility in representing exceptions resulting in inefficiency though. KBR also allows for meta-classes, i.e., allowing a class to be an instance of another class (see super-/subclass relationship in extended ER), in addition to simple class/instance relationships. Identification is of course necessary to distinguish objects and classes, and also for relating them to the real world that they represent.

A knowledge-based system is characterized by an architecture that distinguishes:

• a knowledge base, containing facts, rules, and skills • an inference engine, which contains a set of strategies

and guidelines for managing the knowledge base • a "friendly' external interface by which the user inter-

acts with the system, for example, to impart acquired knowledge

Vol 34 No 5 May 1992 293


Knowledge is recognised basically to be of two forms: rules and facts. Normally, semantic networks (similar to those used for data representation) represent facts, while production rules represent application constraints and design rules. Prolog 65 seems to have been widely used in the latter role. Prolog, in addition to having speeds comparable with other logic programming languages such as Lisp, supports essentially higher-level features, for example, nondeterminism.

Logic databases exploit the expressive power of first- order logic to develop clear, concise, and readable programs. Logic programs are inherently modular, as the information they contrive is inevitably broken down into small, independently meaningful units (called mole- cules). This means that databases - - or selected 'views' of them - - may be shared by various users, and also that very large databases can be accommodated into different, smaller modules, with the consequent gain in retrieval speed.

Dah166 developed one of the earliest database systems implemented in Prolog. Today, there are two distinct categories of researchers interested in logic programming for database access. One group has centred on the integration of existing DBMSs with a logic programming component 67,68, while the other has been simply to take a widely acceptable logic programming language such as Prolog and try to use it for database access 66,69. With the latter technique, the programming language not only serves to define the data, but also to compute it. There are various limitations to this approach, notably:

• the backtracking behaviour of Prolog is inadequate even for relatively undemanding database applications

• the inadequacy of Prolog to manipulate data stored on secondary memory

• the treatment of negation as nonprovability - - closed- world assumptions may lead to inconsistencies

The basic issues that this approach must therefore resolve can be summarized in these three somehow related (and yet each unveiling a major potential techni- cality!) questions:

• How can a Prolog system store a very large knowledge base in secondary memory?

• How can a large knowledge base be used by multiple users?

• How can a Prolog system obtain desired facts from secondary memory in the shortest time possible?

Before these problems are satisfactorily tackled, the author is convinced that the alternative approach is more promising, and therefore the rest of this section is devoted to systems being developed along that path. First, a brief background to this approach is provided.

Logic has been proposed as an underlying data model for relational and other representation schemes. The relational calculus is based on first-order predicate calculus, and hence the use of logic already exists to character- ize relational queries. Logic provides a formalism that

can be used, e.g., query languages, integrity modelling, query evaluation, treatment of null values, dealing with incomplete information, etc. Moreover, logic leads to a formal understanding of deduction in databases. On the other hand, Prolog implementations, like other logic programming languages, simply assume a random access to objects that they manipulate and rely on the virtual- memory support of the underlying computer system when large-volume data are involved. Consequently, Prolog employs a tuple-at-a-time nested-loop join stra- tegy, which is not suitable for data access in secondary storage.

In relational databases, control of execution of query languages is the responsibility of the system, which, via query optimization and compilation techniques, ensures efficient performance over a wide range of storage structures and database demographies. Relational systems are further superior to Prolog in simplicity of use, data independence, and suitability for parallel processing - - to name but a few. However, the expressive power (as already discussed) and functionality offered by database query languages is limited, compared with logic programming languages. Moreover, query languages do not support recursion and general unification that entails the computation of closures and the use of complex structures. In contrast to Prolog, which can be used as a general-purpose procedural programming language, query languages have to be embedded in traditional programming languages. This method has the drawback of 'impedance mismatch' between the relational query language and imperative languages.

A first aspect of this mismatch is the conflict between the prescriptive (imperative) paradigm, typically used by existing programming languages, and the descriptive (declarative) nature favoured by database query languages. A second aspect is that the DBMS and the programming languages may manipulate single records with complex internal structures, while relational systems support sets of unstructured tuples.

Nevertheless, the benefits of both logic programming and relational databases can be combined, as in LDL 7° and CPD 68, for maximized benefits. CPD's architecture is reviewed in Appendix 2.

Current research in knowledge-based management systems has yielded a number of specific proposals, but few implementations. In general, a relational database system is coupled with a simple deductive component. The two data and knowledge components are tightly linked together. The database acts as a back-end data repository for the knowledge base. It stores the multi- level object data, for example, logic, symbolic circuit representations, and the various object instances. The knowledge base stores admissible operating rules con- cerning:

• the modifications of the object structures • the specific design rules • specific heuristics for the automatic consistency con-

trols implemented

To achieve integration, there is need to generalize, on the


one hand, the database techniques to cope with the irre- gularity of knowledge and, on the other hand, to expand the knowledge-processing techniques to deal with large volumes of data. Many researchers (especially AI opti- mists) are working on one or other of these issues 71 73.

Object-oriented data models Object orientation is yet another concept that promises to solve some drawbacks of classical databases. The characteristics of OO languages are the notions of objects, classes, and inheritance 74. Objects with common characteristics (i.e., of the same type) are grouped into classes. Class/subclass hierarchies may be formed by further subgrouping objects that exhibit or share additional, specialized properties. An object identity, OlD, is useful for sharing and updating objects. Inheritance allows for reuse of existing objects or classes, and for the definition of new specialized objects or classes. Some of the OO terminology that is used in this paper is described in Appendix 3.

For each object class (or type), it is necessary to under- stand both its static and dynamic aspects, i.e., in addition to the object's attributes, it is also necessary to know about the events that may occur during the object's life.

Two major breeds of OO databases (OODBs) seem to emerge. On the one hand are those systems built on existing relational or other traditional databases, to add a programming interface capable of manipulating more ADTs and providing other features pertinent to the OO paradigm. Most existing OODBs fall in this category. In addition, there are systems that have evolved from OO programming languages (OOPLs) and added persistence. The trend seems to suggest that in future most OODBs will be of this latter type. It must be emphasized, however, that only a few pilot implementations of OODBs have been realised, but a plethora of them can be expected in the next ten years. Several examples of recently announced commercial products, or proposed prototypes, will be discussed later. However, two systems - - Exodus and OOPS - - are now introduced largely to exemplify the two main contrasting approaches mentioned above.

Exodus 75 extends a traditional DBMS by adopting a rule-based* approach to query optimization, so that one query optimizer may be extended to handle new database operators, new methods for existing operators, and new data types.

Exodus is designed as a modular (and incremental) system rather than as a 'complete' database system, and is intended to be flexible enough to support the needs of a wide range of potential applications. The designers of Exodus emphasize object and file management as the most crucial component of an extensible database system. At the top level, Exodus hopes to provide facilities for generating application-specific, high-level query language interfaces, while allowing applications to inter-

*A similar approach has been adopted by Probe 76. Probe uses an extended version of Daplex ".77 as its query language.

SJOCHUODHO

User/programmer I

I Programming system (OOPL) I I

[ Io,e,ace I I

Figure 6. Main components o f OOPS

act with the system at lower levels, should this be necessary. The Exodus approach may be characterized as the 'DBMS generator' approach, with the overall goals being to provide:

• a storage system • tools to support development of appropriate ADTs,

access methods, operations, and version support • a rule-based optimizer • a flexible query-interface generator

A library of useful routines (and rules) for the extensible components of the system is also provided to help application-specific developers.

OOPS 78 adopts a somewhat different methodology to solve modelling problems. An integrated programming and data-management system is used. OOPS consists of two major parts, the OO programming language OOPL and the database component (ODBS). ODBS forms the basis of the programming environment, and OOPL is grown on it (see Figure 6). The interface between OOPL and ODBS is not visible to the user. Thus, from the user's perspective, OOPS is nothing more than an OO programming environment with a persistent repository.

OOPL is strongly typed, and it allows formulation of triggers and constraints. In OOPS, properties are called roles. A special kind of object, dependent object, is 'embedded' in other objects, i.e., it does not exist by itself, but only in the context of its parent objects. As OOPL supports repeating group types, like SetOf or ListOf, all types of (binary) relationships can be expressed, i.e., 1:1, l:n, and n:m relationships. The ODBS, on the other hand, is responsible for the maintenance of all persistent objects. A version mechanism allows easy modification of the conceptual schema of object descriptions.

Advantage has been taken of the availability of large main-memory sizes in modern hardware. A version of the database cache is implemented. The technique inte- grates buffer management and recovery methods and therefore leads to an essential performance enhancement.

What follows summarizes the pros and cons of object orientation.

Case for OO approach As may be clear by now, the interest in objects seems to have arisen mainly due to recent concerns with:

Vol 34 No 5 May 1992 295


• the behavioural descriptions of the universe of discourse

• the dynamic aspects of databases • the manipulation of rather complex entities in engi-

neering (and related) databases

Below is a summary of some reasons that have convinced the author that 'Object-orientation may be the first step towards the ultimate solution to DBMS problems ''°4.

Data abstraction Complex information is 'hidden' as a consequence of abstraction.

Extensibility Object types and their methods can be modified as required. Such changes are localized and hence much easier to maintain than in record-based systems where many records may be affected. New object classes and their methods can be incorporated.

Behavioural constraints Because of encapsulation, the behaviour of each object type is predetermined by a fixed set of methods.

Flexibility of type definition The user is not limited to modelling constructs of a data model, but can define a wide range of data types, each with unique properties.

Modelling power 'Near ' 1 : 1 correspondence between the universe of discourse and the model is implemen- table; inheritance of both attributes and methods is a powerful tool for modelling.

Incremental development Reusability of code and data is achieved by the inheritance property.

To crown it, theoretical foundations of OO languages have already been proposed 79-81. These models formalize standard OO features such as inheritance and genericity (also called type polymorphism).

Disadvantages of OO approach Like any other model proposed so far, object orientation is not without drawbacks. The following stand out distinctly as key issues that OO adherents must address before it can be universally accepted (if ever).

Lack of associations Association is not directly supported, and is only achieved indirectly by allowing inter- object references. More powerful constructs for modelling associations (or relationships) must evolve.

Behavioural rigidity The notion of predetermining and prespecifying all operations by a fixed set of methods is a rigid constraint that is contrary to the trend of 'evolution' in database and software technologies.

No high-level query language At least for current* OO data models!

Performance An efficient implementation is still an area of practical concern. The comparatively poor performance of Smalltalk-80, for example, is partly due to the fact that the language is interpreted rather than com- piled. Thus a deeply nested inheritance hierarchy may have to be traversed at run-time to fetch the original definition of a method 83, with an obvious impact on performance. Performance could be improved by using compilation, as much of this search could be done at compile-time. Unfortunately, this would weaken the benefits from overloading 84.

All in all, object orientation seems to promise a brighter future, as an engineering database, than other data models. Indeed, the author sees deductive models (or KBR) as the only other way to achieve comparable capabilities. In what follows, however, it is summarized why the author is further convinced that object orientation is a more realistic approach than KBR, in the light of existing (and foreseeable) technology.

Object orientation versus KBR As is evident from the previous sections, the future is likely to see a further proliferation of OODBs, continued use of logic for deductive databases, and a merging of KBR, programming languages, and data models. Classi- cal DBMSs (hierarchical, network, or relational) tend to deal with simplistic entities and therefore cannot cope with complex objects, e.g., as used in CAD. With conventional models, one conceptual entity has to be represented by a number of database objects, e.g., records, tuples, or relations. (An OODB is based on a model that allows the representation of one miniworld entity by exactly one object.)

While semantic and logic-based databases take a step towards overcoming the inadequacies of existing DBMSs, they do not entirely do so. The OO approach, on the other hand, is aimed at not only providing enhanced semantics, but also supporting the functional properties of objects, integrating meta-data and data, and, in short, moving away completely from the rigidity of record-based systems. Note that even expert DBMSs (implemented so far) still rely heavily on conventional concepts in one way or another.

Current logic databases (mostly Prolog based) are founded on the premises that a Prolog program is a collection of both facts and rules and can thus be used to integrate meta-data and data and to define constraints and inference rules. The integration of, and uniform access to, meta-data and data would enable sophisticated user interfaces to be constructed more easily. Neverthe- less, there are still major problems in the field of logic- based, expert databases to be solved, as has already been discussed above. In the author's view, the OO paradigm seems to be a more feasible 'immediate future' solution. Therefore, in the remainder of this paper, the OO approach is assumed as the basis of further study.

*At the time of revising this paper, at least two OO query languages had been reported - - OSQL and OQLSL

Object-based models: conclusions The question of the boundary between O 0 and object- based systems is still a topic of much debate. However,


S J O C H U O D H O

the following levels of object orientation can be identified. It is generally agreed that an implementation must provide them (in one way or the other) to be passed for an OODBMS.

• Structural - - allows for capturing complex data structures.

• Operational - - a data model must include operators to deal with complex objects in entirety.

• B e h a v i o u r a l - by incorporating the notion of methods from OOPLs.

Merely constructing a database as a set of ADT instances (in the author 's opinion) does not constitute an OODBMS; nor does simply presenting to the user/application programmer an interface inspired by the OO programming paradigm. While the latter fits well with the last of the three requirements above, it may equally be implanted on any other DBMS, not simply an OODB.

As a summary, the following can be identified as potential issues which the OO approach must tackle. They will go a long way to endear sceptics to 'object orientation'.

• Versioning - - multiple representations of the same semantic entity, to account for different stages, different times of validity, etc.

• New concepts are needed to deal with large amounts of data, spatial and temporal data, long-duration transactions, recovery, and consistency controls.

• Protection (based on the object notion), archiving, and related issues - - j u s t some of the usual DBMS functionalities.

• Specialized access paths of complex objects, storage structures, and main-memory buffering.

• Appropriate development tools: compatible CASE tools are desirable.

Critics of the OO approach argue that lots of its concepts are nothing new. 'The class concept is hierarchical in nature, and has been identified with the hierarchical model for many years, for example' , they argue. They further contend that something like inheritance is just another word for universal polymorphism (as in Mir- anda62), or genericity (as in Ada-like systemsSS). They also question whether 'object orientation' is a 'revolu- tion' or an 'evolution'. (Anyway, what does it matter?) In the author 's opinion, object orientation seems to 'deliver the goods' and gives a handle to conceptualizing engineering data with satisfaction unsurpassed by any other known model. This is what should (and indeed does) count?

S O M E S E L E C T E D C O M M E R C I A L ] EXPERIMENTAL OODBS

This section presents a few (proposed or existing) implementations that seem to promise to satisfy the notion of object orientation spelled out earlier. Of the three models discussed, Iris is perhaps the most widely publicized.

Vbase has been chosen for a number of reasons. It supports most of the OO concepts expressed elsewhere in this paper. In addition, it reasonably supports versioning by use of persistent objects. A couple of interactive utilities are also provided. It incorporates strong typing, and thus enjoys the advantages thereof, e.g., errors are resolved at compile-time, not at run-time. Besides, it is likely to be the first truly 'OO' DBMS that will be available to the author for research purposes.

Yobos has also been selected for not dissimilar reasons. Furthermore, as the author 's future research will evolve hand in hand with Yobos, there is doubtless a lot that these two developments stand to offer each other. Results of one group will, hopefully, serve as a driving tool for the other.

Apart from its wide acceptability within the OODB community, the inclusion of Iris has been influenced by two fundamental reasons:

• Unlike other equally popular models, Iris relies on a functional model, Daplex.

• It is one system that has treated 'multimedia ' management with the emphasis it deserves. The need to provide multimedia support has been discussed t.

Time and again, comparison will be made with other existing implementations, as the need arises. In particular, two other models seem to have attracted much attention - - Taxis and Orion - - and this paper will not deviate from the tradition of turning to them for comparison.

Vbase integrated system

Vbase is an OODB that provides persistence for objects in C + + programs. It is aimed at special needs of CAD applications (i.e., CAD, CASE, CAM, etc.). It can be used as a general-purpose DBMS as well.

Below, its main features are outlined. It supports:

• a distributed, multiuser access in a homogeneous network

• a rich transaction model • an explicit and transparent object access • dynamic class creation and procedure invocation • a library of commonly used data abstractions • utilities, including a browser and a ' smart make'*, and

provides for back-up and recovery

Vbase provides a convenient interface to C + + applications via a set of class definitions and functions. It is based on a client/server architecture. The server provides controlled access to shared persistent objects. The client provides a level of abstraction around the communication between applications and servers. Each server manages one or more storage 'directories'. Directories

*Similar to the simple make utility proposed by Feldman ~ for maintaining versions of programs.

Vol 34 No 5 May 1992 297


I O0er Type II Prop Type II Exce

Key

I Kernelrype~AssoeType I Figure 8. Yobos type hierarchy

distinguishes a method from an operator type or property type? Is a method not as much an operator as it is a property? Vbase seems to emphasize type/subtype association to the exclusion of the other more typical associations. Normally, it is preferred, for example, to see properties and operations, first, as attributes_of (rather than as subtypes of) a type. Anyhow, as previously pointed out, one major weakness of object orientation is its representation of general associations, and therefore Vbase is no exception.

Figure 7. Type hierarchy in Vbase Yobos model

may further be split into segments, which are the smallest units of data transfer between a server and a client. An 'object', being the smallest granule for manipulation purposes, is a contiguous block of space within a segment. Normal operations like association, aggregation, etc., are possible on these objects. Moreover, they can be (singly or multiply) 'activated' or 'deactivated' via standard C + + programming (e.g., using triggers). Pre- cisely how Vbase allocates (or 'deallocates') memory for activated objects has been described 87,88. A simple Vbase type hierarchy is given in Figure 7.

Everything is an 'entity'. In general, entities are not instantiable (shown in Figure 7 as ovals). However, 'value specification' and 'method' are two subtypes of 'entity' that may be instantiated (i.e., may be populated). The assumption is that entities such as 'number', 'aggregate', etc., are atomic and therefore uninstantiable.

In the Vbase notion, 'objects' means instances. More- over, unlike the Smalltalk convention of 'message passing', Vbase adopts the philosophy of operation invoking. ('Messages' are distinctly missing in Figure 7.) Associa- tion is provided through properties, which can be identified using OIDs. Exception types are also generated to handle errors at run-time. Methods implement operations.

The Vbase model suffers from at least three major problems:

• It fails to separate clearly 'instances' from 'types'. Why, for example, is an 'entity' uninstantiable, whereas a 'method' or 'value spec' (both subtypes of 'entity') are instantiable?

• How objects 'communicate' is not well defined. What

This is a proposed model in its early stages of development at the University of York, UK 89. In this model all conceptual entities are modelled as objects. An ordinary integer or string is as much an object as is a complex assembly of parts, such as an aircraft or a submarine. An object consists of some private memory that holds its state. The private memory is made up of the values for a collection of attributes. The value of an attribute itself is an object, and therefore it has its own private memory for its state. A primitive object, such as an integer or a string, has no attribute. It only has a value, which is the object itself. More complex objects contain attributes, through which they reference other objects, which in turn may contain further attributes, and so on.

Figure 8 illustrates the type hierarchy of a Yobos object. The behaviour of an object is encapsulated in methods. Methods consist of a code that manipulates or returns the state of an object. Methods, like attributes, are not visible from outside the object. Objects can communicate with one another by way of messages. Mess- ages constitute the public interface of an object. For each message understood by an object, there is a corresponding method that executes the message. An object reacts to a message by executing the corresponding method and returning another object.

Of course, if every object were to carry its own attribute names and methods, the amount of information to be specified would become unmanageably large. Similar objects are therefore grouped together into classes. A class describes the form (attributes) of its instances, and the operations (methods) applicable to them. A class hierarchy is a hierarchy of classes in which an edge between a pair of nodes represents the is-a relationship;


SJOCHUODHO

i.e., the lower-level node is a specialization of the higher- level node (and, conversely, the higher-level node is a generalization of the lower-level node). The attributes and methods (collectively called property) of a supertype are inherited (shared) by all its subclasses. The domain (which corresponds to data type in conventional programming) of an attribute is a class. The kerneltype forms the base types that are central to the schema.

In Yobos (as in a few other OO models), multiple inheritance is allowed, i.e., a class can have more than one supertype. This class lattice, however, may give rise to conflicts in the names of attributes and methods. Although Yobos has not specifically addressed how to resolve this conflict, existing systems resolve it by giving precedence to the definition within the class over that of its superclasses 78.9°m. Name conflicts among superclasses are resolved by superclass ordering.

The conception of Yobos was inspired by the failure of most OO systems to treat 'type' and 'substance' issues separately. Inasmuch as it advocates encapsulation, Yobos recognises the need to identify important issues in the design and construction of these two 'elements'. It aims to provide a simple set of facilities and to investigate the benefits (if any) of this approach in support for IPSEs.

The Yobos model is the one closest to 'the' OO model so much cherished earlier.

Iris: multimedia management

Iris 92 OODBMS is yet another system targeted at nontraditional database applications like CAD and SEEs. It has the usual DBMS capabilities to support 'perma- nence' of data, controlled sharing, back-up, and recovery. It is also venturing into rich data-modelling constructs, inference support, novel and extensible data types, e.g., graphic images, voice, text, vectors and matrices, long-duration transactions, and versioning. It basically consists of:

• a query processor • a relational storage subsystem (RSS) to manage

memory • a host of programs and interactive interfaces, one of

which is an extension to SQL (OSQL)

The query processor translates Iris queries and operations into an internal relational algebra format, which is then interpreted against the stored database. The storage manager provides tuple-at-a-time processing, with the usual relational operations of update, delete, etc. Exten- sions to the manager are under investigation, for example, to include support for multimedia objects. Like many other DBMSs, Iris is accessible from any number of programming languages, and by standalone interactive interfaces. A set of C subroutines constitutes the object-manager interface.

OODBMSs generally support at least one type of basic object. Iris supports 'entity' and 'attribute' objects, which are referred to as nonliteral and literal objects,

respectively. Iris is based on the FDM and thus uses functions for specifying primary and inverse relationships. Relationships can be defined between any two object types. Thus it is possible to map the entity relationships of a conceptual data model explicitly. Not all systems support the independent definition of structural data, but most support derived object types, which enable aggregate objects to be defined. The most important form of aggregate object is the entity aggregate. The object schema for this consists of a list of data properties, similar to the list of attributes in a relation.

In Iris, the data properties are considered to be functions representing primary relationships, but the domain object of the functions is the entity aggregate itself (there is no difference between an entity aggregate and an entity - - the former provides a shorthand mechanism for defining a collection of functions that have the same domain object).

Data properties in Taxis are similar to Iris data properties, but the relationships implied by the data properties cannot be defined independently - - they can only be defined in entity aggregate or subtype definitions. In KBZ 93, the data properties are relationships based on defined structural objects, but the entity aggregate object is considered as a separate object, although based on the object in the domain of the data properties. In other systems, data properties are simply attribute objects (or equivalent), and the entity aggregate represents an entity in a similar manner to the way a relation represents an entity. ODM is an exception in that it does not support entity aggregates; all derived objects are based on structural objects (i.e., the domain or range object in a three- tuple may itself be a structural object). Thus ODM is a binary relationship-based OODBMS, rather than an entity-based one.

Functional properties may be associated with different object types. Update rules can be provided for creating, modifying, and deleting objects, and other operations on the objects may also be defined. In KBZ, update rules and other operations form named functional properties in the relevant object schema, thus enabling the object schema to provide a complete definition of each object. In other systems, procedure-like objects are defined separately, but are associated with the relevant aggregate objects. For example, in Taxis they are referred to as transaction-classes, which name the objects (entity aggregates) involved in the transactions and enable a precondition and postcondition, as well as the main processing procedure, to be identified in each transaction if required. In ODM, which is based on binary relationships, 'operations' provide a similar facility, but in this case the objects associated with an operation are basic objects (entities and attributes).

Support for other, more complex, derived objects also varies. Orion additionally supports composite objects that reflect 'is-part-of' relationships. Iris supports complex relationships between objects by means of derived functions. In Taxis, the transaction classes provide the means of defining complex properties. KBZ supports cover aggregation; also views and queries are naturally

Vol 34 No 5 May 1992 299


supported as derived objects whose data properties may be associated with more than one domain object.

GemStone is yet another popular OODBMS. It has been designed as a database interface to the OO programming environment of Smalltalk-80. Its object model is similar to that of Smalltaik. It supports hierarchies, methods, and multivalued attributes. It is stated without proof that an OODB maps more easily and naturally on to an OO base programming language (PL) than on to a functional or other type of PL. The author is therefore, in general, not in favour of OODBMSs based on functional PLs or data models, e.g., Iris. However, the way such OODBs implement relationships as functions will no doubt serve as good experience for implementing in tomorrow's OODBs not only relationships, but also methods.

Some observations

From the foregoing discussions, it should be clear by now that OO systems vary in their degree of 'object orientation', and hence in the extent to which they eliminate the mentioned inadequacies. However, the following features are supported by most existing OO models:

• data abstraction using the 'object' notion • property inheritance via type hierarchies • association of functional properties with

objects specific

The provision of a natural language, browsing, and graphical user-interface facilities depends on these features, which can only currently be supported by building a system interface that manages its own knowledge base and interacting with an existing DBMS and the database that it manages. Other features also being incorporated that will go a long way in increasing flexibility and functionality of OODBSs are:

• support for versions and time • text and graphics as primitive object types

Versioning issues are investigated in the next section.

C O N C L U S I O N S AND FURTHER WORK

From the previous sections, it is evident that no one single model can claim to satisfy all the requirements of emerging database applications. There is need either to incorporate certain aspects of other models into specific models or simply to integrate models altogether. A lot of issues still remain unanswered. Which kind of model - - object oriented, functional, or logic based - - is (dis)ad- vantageous, and under which circumstances? All conceptual systems seem to have weaknesses. For example, knowledge-based models do not seem to scale up to data- based models in several ways. Why, for example, is it so difficult to expand expert systems beyond a few hundred rules? Moreover, is there any difference at all between an expert system and a decision tree? Are current DBMS

models sufficiently rich to be the basis for knowledge- based systems, or are more elaborate models needed? What are general methodologies for interworking KBMSs and DBMSs?

Other things aside, object orientation and KBR appear to offer the more feasible solutions. However, the author is not particularly optimistic about KBR. In con- strast to A1, which focuses on the principles and techniques necessary to build systems that will mimic human intelligence, the database technology is concerned with systems that can handle a large volume of information. As useful AI systems often require a large volume of information, the common patch is quite obvious. How- ever, there is a fundamental difference between the two approaches: AI enshrines purity and faithfulness of representation, whereas databases emphasize reality and pragmatism of implementation. They start from different ends of the same spectrum to achieve the same goal 94.

The areas of DBMSs, PLs, and AI overlap in many new applications, as for instance in OISs, CAD/CAM, and expert systems. Available software systems do not satisfy the requirements of these new applications' (see Appendix). Some systems are pursuing the goal of ultimate integration by seeking a 'grand unification' of database systems, logic languages, and OO programming. To operate with intelligence and expertise, a system must make good use of the available knowledge - - in knowledge lies power (so they claim). The aim should be to achieve an OODBS with a single language for data manipulation and application programming. Clearly, this would eliminate the problem of impedance mismatch.

From Appendix l, it is apparent that two other issues that object orientation must address itself to are:

Space - - spatial semantics captured by solid, boundary, or abstract (e.g., 'above', 'near') representation. Spatial requirements could be 2- or 3-dimensional (2- or 3D). Time - - as a special case of I D spatial information. Temporal aspects built in databases must include time points, time intervals, and abstract time relationships, e.g., during, concurrently, etc. Current systems do not distinguish between registration time and logical time. There is need to separate time-varying information from time-invariant information, for which versioning is not crucial. One approach used in the temporal relational model is tuple-time stamping - - with the start and end times of the logical time shown. Attri- bute-time stamping has also been proposed. See, for example, the discussion on Arcadia by the author' .

The author believes that in addition to advances in data models, the following areas also have great implications for the future of database technology:

• user interfaces • database interoperability • integration with software technology • integration with AI technology • database machines and architectures


SJOCHUODHO

Figure 9. Future integration of technologies

• interfacing with advanced programming languages

These inter-relationships are summarized in Figure 9, which recognises the central and crucial role played by the database component. The inter-relationships are further discussed elsewhere ~.

It is also recognised that good design methodologies come from experience, and experience comes from both good and bad designs. As for those researchers who hold views contrary (or otherwise) to those expressed in this section, and especially those attempting to interpret those ideas into reality, they should be hailed, not con- demned. Physicists have hardly forgotten Copernicus, persecuted in his time for proposing that the sun, not man, was central to the universe. Who else has time ever proved so right?

For the remainder of this project, it is proposed to investigate further three major issues: version management, 'object' modelling, and process modelling. This shall be limited, however, to the software engineering application domain. All the three topics are outlined below.

Version management

The previous sections identified a number of major issues that an IPSE database must address. It is realised that all this is a milestone that cannot be accomplished within the scope of one single project. For the author's purposes, version modelling and management is identified as a most crucial area that evolving software environments and data-modelling technologies must explore more seriously than has been the case to date.

Software environments require support for managing the evolution of objects over time. In particular, design processes involve a high degree of trial and error, backtracking to earlier stages, and test of hypotheses, and they may extend over days, or even years. Consequently, design data tend to be transient, volatile, tentative, and dependent on individual designers. As a result of the development of alternative solutions for a given task, or of revisions due to changing requirements, error correc- tions, etc., the 'same' object may exist in multiples, with only slight variations. Thus versions of representation objects have to be dealt with. The designer has to choose

a configuration of the (sub)system by selecting a consistent set of versions, one for each representation desired.

Version management addresses challenges posed by such evolution. An object may be considered as a partially ordered set of versions. By embedding change semantics in the database model, the design engineer can be relieved of managing the detailed effects of changes. However, mechanisms are needed to limit the scope of change propagation and to identify unambiguously the objects to which propagated changes should apply.

It is proposed to study version management further. In a way, it appears that at the core of the problems of version management are data and process modelling. The two must therefore be investigated hand in hand with version management. This report has concentrated on data modelling. A separate paper will survey the current status of process modelling.

Data and process modelling

It is generally recognised that both 'data' and 'process' are two important facets of an information system, and that to study one without the other ignores the intricate relationships between them. However, focus of SEEs may either be on the processes or tools that are provided by the environment, or the objects or data that are produced and manipulated by these tools. Which viewpoint is most appropriate is, at the moment, a point of debate. Supporters of the process notion argue that the object notion is doomed to similar constraints imposed by CADES 95. They say, for example, that granularity of the database is product oriented and thus the granularity of a supported process is of necessity constrained to the level of product modularity. Some merits for the latter paradigm have been put forward %. It is the author's premise that suitability of whether to adopt the 'process' or the 'object' viewpoint depends on the particular environment. For example, for interactive tools that maintain a continuous dialogue with the user, the process paradigm would be ideal. For example, in a browsing tool it is the process of gathering information that is of paramount interest to the user. On the other hand, for mainly 'noninteractive' tools that function with little or no user intervention, the object viewpoint might be more appropriate. For such tools, it is the data (or objects) produced by the tools that are essential, and not how they are produced. For example, when test data regression analysis is being performed, what is of prime interest is the output data resulting from running the modified programs on existing test data - - not "how' these output data are computed.

It is proposed to study further this subtle relationship between "data' and 'process', and how well they are supported by existing models. As a result of the survey and conclusions arrived at in this paper, the author specifically proposes to identify and define a data model suitable for representing software artefacts, i.e., programs, documents, users, tools, etc. Gradually, it is proposed to evolve a process model to deal with the more dynamic concerns of SEEs.

Vol 34 No 5 May 1992 301

Object-oriented database support for sojtware project management environments: data-modelling issues

ACKNOWLEDGEMENTS

Many thanks to the editor of YCS Yellow Report Series, Professor Ian Wand, for several useful comments on an earlier draft of this article. To Alan Brown and Peter Hitchcock: this work could never have happened without your support. The conference database example so extensively used in this paper came about from discussions with Dick Whittington. Finally, to Chris Higgins and Bill Daly, it can never be too late nor too early to thank you for being so helpful.

REFERENCES

10chuodho, S J 'Object-oriented database support for software project management - - process modelling and related issues' Technical report University of York, UK (1991)

2 Ullmao, J D Principles of database systems Computer Science Press (1982)

3 Whittington, R P Database systems engineering Oxford University Press (1988)

4 Fry, J P and Sibley, E H 'Evolution of database management systems' Comput. Surv. Vol 8 No 1 (March 1976) pp 186-213

5 CODASYL Group 'An information algebra: phase 1 report of the language structure group' Commun. ACM Vol 5 No 4 (April 1962) pp 190-204

6 CODASYL Group COBOL J. development Material Data Management Centre, Quebec, Canada (1978)

70l le , T W The CODASYL approach to database management John Wiley (1980)

8 Cincom Systems OS Total: reference manual Cincom Systems, OH, USA (1978)

9 Cullinane Corporation IDMS DML programmer's reference guide Cullinane Corporation, MA, USA (1981)

10 IBM IMS/VS publications IBM, White Plains, NY, USA (1978)

11 Date, C J An introduction to database systems - volume H Addison-Wesley (1983)

12 MRI Systems Systems 2000 reference manual MRI Systems Corp., Austin, TX, USA (1978)

13 Tsiehritzis, D C and Lochovsky, F H Data base management systems Academic Press (1977)

14 Cardenas, A F Data base management systems (2nd ed) Allyn and Bacon (1985)

15 Brown, A W 'A view mechanism for an integrated project support environment' PhD thesis University of Newcastle upon Tyne, UK (January 1988)

16 RTG Relational database systems: analysis and comparison Springer-Verlag (1983)

17 Codd, E F 'Relational completeness of data base sublan- guages' in Rustin, R (ed) Data Base Systems, Courant Comput. Sci. Symp. 6th Prentice Hall (1972) pp 65-98

18 Gray, P M D Logic, algebra, and databases Ellis-Horwood (1984)

19 McDonald, N and McNally, J 'Feature analysis of Ingres' in Relational database systems Springer-Verlag (1983) pp 158-181

20 Stonehraker, M The Ingres papers: anatomy of a relational database system Addison-Wesley (1986)

21 Kinsley, K C 'Feature analysis of System R' in Relational database systems Springer-Verlag (1983) pp 529-560

22 Driver, B H 'Feature analysis of Oracle' in Relational database systems Springer-Verlag (1983) pp 288 331

23 Kornatowski, J Z, Ladd, I and Robertson, C M 'MRS system evaluation' in Relational database systems Spr- inger-Verlag (1983) pp 221-257

24 Ehnasri, R and Navathe, S B Fundamentals of database systems Benjamin/Cummings (1989)

25 Atkinson, M P and Buneman, P 'Types and persistence in database programming languages' Comput. Surv. Vol 19 No 2 (June 1987) pp 105-190

26 Chen, P P 'The entity-relationship model - - toward a unified view of data' ACM Trans. Database Syst. Vol ! No 1 (March 1976) pp 9-36

27 Chen, P P Entity-relationship approach to information modelling and analysis North-Holland (I 981 )

28 Smith, J M and Smith, D C P 'Database abstractions: aggregation and generalization' ACM Trans. Database Syst. Vol 2 No 2 (June 1977)

29 Codd, E F 'Extending the database relational model to capture more meaning' ACM Trans. Database Syst. Vol 4 No 4 (December 1979) pp 397-434

30 Chen, P P 'ER - - a historical perspective and future directions' in Entity-relationship approach to software engineering North-Holland (1983) pp 71-77

31 EImasri, R and Wiederhold, G 'Gordas: a formal high-level query language for the entity-relationship model' in Entity-relationship approach to information modelling and analysis North-Holland (1983)

32 Benneworth, R L 'The implementation of GERM, an entity-relationship data base management system' in Proc. Seventh Very Large Data Base Conf. Cannes, France (Sep- tember 1981)

33 Brodie, M L 'Specification and verification of database semantic integrity' PhD dissertation (1978)

34 Brodie, M L and Dzenan, R R Fundamental concepts for semantic modelling of objects Computer Corporation of America (October 1984)

35 Schmid, H A and Swenson, J R 'On the semantics of the relational data model' in Proc. ACM SIGMOD (May 1975) pp 21 !-223

36 Hammer, M and McLeod, D Database description with SDM: a semantic database model (1981)

37 Daly, W 'Meaning in databases: its capture and constraint' Research proposal University of York, UK (1987)

38 Abrial, J R 'Data semantics' in Klimbie, J W (ed) Data base management North-Holland (1974) pp 1-64

39 Blair, G S, Gallagher, J J and Malik, J 'GENERICITY vs INHERITANCE vs DELEGATION vs CONFOR- MANCE vs . . . (towards a unifying understanding of objects)' Technical report University of Lancaster, UK (1988)

40 Meyer, B 'Genericity versus inheritance' in Proc. OOPSLA '86 Portland, OR, USA (1986)

41 Elmasri, R, Weeldreyer, J and Hevner, A 'The category concept: an extension to the entity-relationship model' Data Knowl. Eng. Vol 1 No 1 (1985)

42 Teorey, T J, Yang, D and Fry, J P 'A logical design methodology for relational databases using the extended ER model' Comput. Surv. Vo118 No 2 (June 1986) pp 197- 222

43 Lorie, R and Plouffe, W 'Complex objects and their use in design transactions' in Proc. ACM-S1GMOD Conf. Engi- neering Design Applications (May 1983) pp 115-122

44 UIIman, J D 'Implementation of logical query languages for databases' ACM Trans. Database Syst. Vol 10 No 3 (1985) pp 289-321

45 Beeri, C 'Data models and languages for databases' in Proc. 2nd Int. Conf. Database Theory Bruges, Belgium (August/September 1988) pp 19-40

46 Earl, A N 'The specification and implementation of an extended relational model and its application within an integrated project support environment' PhD thesis University of York, UK (May 1988)

47 Makinouchi, A 'A consideration on normal form of not- necessarily normalized relations in the relational model' in Proc. 3rd lnt. Conf. Very Large Data Bases (October 1977)

48 Abiteboul, S and Bidoit, N 'Non-first normal form rela-


tions to represent hierarchically organized data' in Proc. ACM SIGACT-SIGMOD Symp. Database Systems Ontario, Canada (April 1984) pp 191-200

49 Klug, A 'Equivalence of relational algebra and calculus query languages having aggregate functions' J. A CM Vol 29 No 3 (July 1982)

50 Ozsoyoglu, G, Ozsoyoglu, Z M and Matos, V 'Extending relational algebra and relational calculus with set-valued attributes and aggregate functions' A C M Trans. Database Syst. Vol 12 No 4 (December 1987) pp 566-592

51 Stonebraker, M, Anton, J and Hanson, E 'Extending a database system with procedures' A C M Trans. Database Syst. Vol 12 No 3 (September 1987) pp 350-376

52 Held, G 'Ingres - - a relational database system' in Proc. 1975 Nat. Computer Conf. Anaheim, CA, USA (May 1975) pp 409416

53 Rowe, L A and Stonebraker, M R 'The Postgres data model' in Proc. 13th Int. Conf. Very Large Data Bases (September 1987) pp 83-96

54 Stonebraker, M R, Wong, E and Keeps, P 'The design and implementation of Ingres' A CM Trans. Database Syst. Vol 2 No 3 (September 1977) pp 189-222

55 Shipman, D W 'The functional data model and data language Daplex' A C M Trans. Database Syst. Vol 6 No 1 (March 1981)pp 140-173

56 Biskup, J and Bruggemann, H H 'Universal relation views: a pragmatic approach' in Proc. Ninth Int. Conf. Very Large Data Bases (November 1983) pp 172-185

57 Daley, W G 'A graphical management system for semantic multimedia databases' PhD thesis University of York, UK (1990)

58 Hall, J A, Hitchcock, P and Took, R 'An overview of the Aspect architecture" in Integrated project support environments Peter Peregrinus (1985) pp 86-99

59 Hall, P A, Hitchcock, P and Todd, S J 'An algebra of relations for machine computation' UKSC 0066 IBM UK Ltd, Peterlee, UK (January 1975)

60 Buneman, O P and Frankei, R E 'FQL - - a functional query language' in Proc. SIGMOD Int. Conf. Management o f Data Boston, MA, USA (1979)

61 Buneman, O P and Nikhil, R 'The functional data model and its uses for interaction with databases' in On conceptual modelling Springer-Verlag (1984)

62 Turner, D A Miranda: a non-strict functional language with polymorphic O'pes architecture Nancy, France (September 1985)

63 King, R and McLeod, D 'A unified methodology for conceptual database design' in On conceptual modelling Spr- inger-Verlag (1983)

64 Brodie, M L 'On the development of data models' in On conceptual modelling Springer-Verlag (1984)

65 Clocksin, W F and Mellish, C S Programming in Prolog (LNCS) Springer-Verlag (1984)

66 Dahl, V 'On database systems development through logic' ACM Trans. Database Syst. Vol 7 No 1 (March 1982) pp 102-123

67 Sciore, E and Warren, D S 'Towards an integrated database-Prolog system' in Proc. First Int. Workshop on Expert Database Systems (1984) pp 801-815

68 Zhang, Y 'Extending the functions of Prolog with Db + + ' Research proposal University of Newcastle upon Tyne, UK (1988)

69 Mendelzon, A D 'Functional dependencies in logic programs' in Proc. l l th Int. Conf. Very Large Data Bases Stockholm, Sweden (August 1985) pp 324-330

70 Tsnr, S and Zanioio, C 'LDL: a logic-based data-language' in Proc. 12th Int. Conf Very Large Data Bases Kyoto, Japan (August 1986) pp 3341

71 Buchmann, A P 'Current trends in CAD databases' Com- put.-Aided Des. Vol 16 No 3 (May 1984) pp 123-126

72 Fauvet, M C and Rien, D 'CADB: un systeme de gestion de

SJOCHUODHO

bases de donnees et de connaissances pour la CAO' in Proc. MICAD'87 Conf. Paris, France (February 1987)

73 Nguyen, G T and Rien, D 'Expert database support for consistent dynamic objects' in Proc. 13th Int. Conf. Very Large Data Bases Brighton, UK (September 1987) pp 493- 5O0

74 Wegner, P 'The object-oriented classification paradigm' in Research directions in object-oriented programming MIT Press (1987)

75 Carey, M J, De Witt, D J, Richardson, J E and Shekita, E J 'Object and file management in the Exodus extensible database system' in Proc. 12th Int. Conf. Very Large Data Bases Kyoto, Japan (August 1986) pp 91-100

76 Dayal, U e t al. 'PROBE: a knowledge-oriented database management system' in On knowledge base management systems: integrating artificial intelligence and database technologies Springer-Verlag (1986)

77 Manola, F and Orenstein, J A 'Toward a general spatial data model for an object-oriented DBMS' in Proc. 12th Int. Conf. Very Large Data Bases Kyoto, Japan (August 1986) pp 328-335

78 Schlageter, G, Unland, R, Wikes, W et al. 'OOPS - - an object oriented programming system with integrated data management facility' in Proc. 4th Int. Conf Data Engi- neering Los Angeles, CA, USA (February 1988) pp 118 125

79 Bruce, K B 'An algebraic model of subtypes in object oriented languages' SIGPLA N Notices Vol 2 No 40 (1986)

80 Cardelli, L and Wegner, P 'On understanding types, data abstraction, and polymorphism' Comput Surv. Vol 17 No 4 (1985)

81 Lecluse, S and Richard, P 'Modelling inheritance and genericity in object-oriented databases' in Proc. 2nd Int. Conf. Database Theory Bruges, Belgium (1988) pp 223- 238

82 Alashqur, A M, Su, S Y W and Lain, H 'OQL: a query language for manipulating object-oriented databases' in Proc. 15th Very Large Data Bases Conf. Amsterdam, The Netherlands (August 1989) pp 433442

83 Burning, A and Ingalls, D H H 'A type declaration and inference system for Smalltalk' Technical report 81-08-02a University of Washington, USA (November 1981)

84 Zaniolo, C and Hassan Ait-Haci, H 'Object-oriented database systems and knowledge systems' in Kerschberg, L (ed) Expert database ©'stems: Proc. First Int. Workshop Benja- min/Cummings (1984)

85 Barnes, J G P Programming in Ada (1984) 86 Feldman, S | 'Make - - a program for maintaining

computer programs' Soft. Pract. Exper. Vol 9 No 4 (April 1979) pp 255-265

87 Ontologie, Inc. 'Vbase+: an object database for C + +" Report 12-22-88 Ontologic, Inc., USA (1988)

88 Andrews, T and Harris, C 'Combining language and database advances in an object-oriented development environment' in Proc. OOPSLA '87 (October 1987) pp 430-440

89 Brown, A W 'Yobos: the design of a simple object-oriented database" Report University of York, UK (1989)

90 Woelk, D, Luther, W and Kim, W "An object-oriented approach to multimedia applications and database requirements' in Proc. ACM SIGMOD Int. Con/i Manage- ment of Data (May 1986) pp 311 325

91 Carey, M J, DeWitt, D J and Vandenberg, S L 'A data model and query languages for Exodus' Technical report University of Wisconsin-Madison, WI, USA (December 1987)

92 Fishman, D H et aL 'Iris: an object-oriented database management system' ACM Trans. Office Syst. Vol 5 No 1 (January 1987) pp 49-69

93 Oxborrow, E and Ismail, H 'KBZ an object-oriented approach to the specification and management of know-

Vol 34 No 5 May 1992 303


ledge bases' Report No 51 Computing Laboratory, University of Kent, UK (1988)

94 Deen, S M 'Anyone for a VLDB in the year 2000?' in Proc. 12th Int. Conf. Very Large Data Bases Kyoto, Japan (August 1986)

95 Pearson, D 'CADES' Computer Weekly (26 July, 2 August, 9 August 1973)

96 Clemm, G M 'The Odin system: an object manager for extensible software environments' PhD thesis (February 1986)

97 Brown, A W 'The relationship between CAD and software development support: a database perspective' Comput.- Aided Eng. J Vol 5 No 6 (December 1988) pp 226-232

98 Bernstein 'Database system support for software engineering' in Proc. 9th Software Engineering Conf. ACM (March 1989) pp 166-178

99 Oehuodho, S J 'Electronic telephone monitoring unit' MSc thesis University of Nairobi, Kenya (1987)

100 Clarke, L A, Richardson, D J and Zeil, S J 'Team: a support environment for testing, evaluation, and analysis' Soft. Eng. Notes Vol 13 No 5 (November 1988) pp 153- 162

101 Agnew, M and Ward, R Users'guide to the db + + relational database management system (1 st ed) Software and Con- suiting GmbH, Frankfurt, Germany (1988)

102 Agnew, M and Ward, R 'db + + relational database management system' in The Raccess library routines and the C- language interface reference (3rd ed) Software and Con- sulting GmbH, Frankfurt, Germany (1984)

103 Tanaka, K and Yoshikawa, M 'Towards abstracting complex database objects: generalization, reduction and unification of set-type objects (extended abstract)' in Proc. 2nd Int. Conf. Database Theory Bruges, Belgium (1988) pp 252 266

104 IEEE Proc. 4th Int. Conf. Data Engineering IEEE (Febru- ary 1988)

I Graphic interfaces I I Language interfaces I

Applicati°n l l

Manufacturing management

Figure 10. Use o f D B M S in design and applications

Two main reasons for using a centralized* database system, are:

APPENDIX 1: EMERGING DATABASE APPLICATIONS

The main body of this paper has tried to justify interest in OO ideas with a view to facing up to the challenges of new database applications. This Appendix discusses some of these potential application areas in greater detail. These are by no means the only applications that stand to benefit from OODBMSs. Other uses are bound to surface with time. Besides, a DBMS that does not support current DP and business applications is not likely to gain much ground.

Engineering design and manufacturing

An integrated approach to managing manufacturing information requires uniform representation and manipulation of information in different phases of the product 's life-cycle. Engi- neering design is an exploratory and iterative process. Intermit- tently, designs are cross-checked against other independently evolving designs and, finally, acceptable designs or versions are agreed and stored. It is not unusual in engineering design to forfeit a new design for an older one!

Two major areas of current investigation are the design of VLSI and design of mechanical structures and systems - - geo- metric modelling. It is a traumatic experience for a user to transport large amounts of data among different CAD systems, or from a CAD system to a manufacturing (CAM) site. Data- base technology can play an important role in storing such information centrally, for use with analytical tools (e.g., finite- element analysis and 'Karnaugh-map-like' reduction), graphics (typically 3D), simulations, and optimal design algorithms.

• Part of the design can be synthesized, analysed, coordinated, and documented while individual project teams work on other parts of the design.

• Constraints related to identical standards, designs, and spe- cifications and other physical properties, design style, and topological relationships can be verified and automatically enforced.

Figure 10 is a simple illustration of how a DBMS can be used in engineering design. This diagram deviates from the tradition of having the database at the bottom (in a diagram), to emphasize the role that such an information base plays in engineering (and related) designs. A summary of why conventional databases cannot capture engineering information effectively (see below) serves to identify and isolate the uniqueness that underlies such information too.

What makes engineering data unique? The following are but a few reasons that render 'classical' databases inadequate for manipulating engineering data. Most of them apply to other nontraditional applications (discussed below) as well.

(1) Engineering design data contain nonhomogeneous objects. (2) Classical DBMSs are intended for formatted data, short

strings, and fixed-length records; on the other hand, engineering designs:

*'Centralized' as used here is only conceptual. In general, such databases would actually be physically distributed, but the distribution is made transparent to the user.


(a) include long strings (b) have variable-length or textual information (c) often contain vectors and matrices

(3) Temporal and spatial relationships are essential. (4) Design data are characterized by a large number of types,

each with a small number of instances. (5) Schemas must be allowed to evolve constantly. (6) Transactions in design databases are of long duration. 17) Versioning is crucial. (8) Making a design 'eternal', releasing it to production,

archiving, etc., is also essential. (9) Design data need not be duplicated at lower levels (e.g.,

'bolt'- or 'gate'-like details).

From the aforementioned, stronger points for the OO approach are evident. The following OO concepts support this view:

• Common model: miniworld can be mapped by a 1 1 correspondence to the database objects.

• Uniform interface: all objects may be treated uniformly by accessing them through methods.

• Support for complex (as well as for simple) objects. • Information hiding and support for abstractions: generaliza-

tion and aggregation are readily supported. • Modularity, flexibility, extensibility, and tailorability:

schema evolution, new objects, or new operations are easily incorporated.

• Versioning may be more easily supported.

Extensions of the relational and network models to accommodate CAD applications have been reported, but have not been very successful.

A proposal has been made 97 to tailor the Aspect environment for VLSI design requirements. The Aspect IPSE was designed primarily for a software environment. This being an area of direct concern in this research, it is now looked at how CAD and software design relate from a database perspective. In any case, is not 'software technology' just as much 'engineering' as any other?

Software engineering environments (SEEs) The increase in the size and complexity of software projects in recent times has led to the need for better planned and managed software development environments. They are primarily intended to provide assistance in the development of large systems, involving many people and evolving over a long time. Normally teams of developers are scattered. Unfortunately, the distribution of computer systems introduces several problems that make it difficult to maintain data consistency and integrity. Data distribution does not make concurrent access or version and configuration management any easier, either.

Data-handling requirements of an SEE include:

• storing and manipulating atomic as well as multiple versions of data objects

• storing large, variable-length objects • multimedia data representations • flexible and powerful operators, such as operators on dir-

ected graphs • flexible data types

Bernstein 9~ gives an excellent survey of what cap~ibilities a DBMS must possess to support satisfactorily an SEE. He also gives a host of literature that identifies issues that render existing DBMSs inadequate for SEE support.

An SEE is inherently multilingual. The DBMS must accommodate program products and tools written in different languages. It must also cope with heterogeneous hardware. Target machines for SEE products may themselves be different

SJOCHUODHO

from the machines on which they were developed (as is the norm). It must be able to store large variable-length objects, such as documents and many lines of source programs. Tradi- tional database applications deal with a large number of elements, but a small number of data types. Software engineering DBMSs deal with a relatively smaller number of instances for a much larger number of types. All these facts are those discussed under CAD/VLSI. It is tempting, therefore, to believe that an excellent CAD/VLSI DBMS may be equally good for SEEs.

Brown 97 illustrates how a database schema intended for CAD/VLSI applications can be customized for SEEs, with minimal effort. For example, he makes a direct comparison between a VLSI circuit represented as a design with associated physical circuit layout, and a software system design and its implementations in terms of code in some programming language. The same way the circuit design is represented as a decomposition of smaller design cells (e.g., programming logic arrays, counters, gates, and half-adders), each with a corresponding physical layout cell, so a software design can be modelled as a decomposition of smaller design blocks (subroutines, functions, etc., targeted at specific logical functionalities), each with its own implementation in source code. This relationship is even more apparent if it is recognised that a 'program' itself is executed by the actual 'physical discrete components'. The concept of stored-program control (SPC) to replace or augment hardware design in telecommunications is not anything new ~9. After all, the program 'XOR' and the 'XOR' gate (for example) are logically the same thing! Besides, the question of documentation, simulation, management, archiving, etc., are basically a common requirement to both design technologies. The author therefore agrees with Brown's idea that databases for both CAD/CAM and SEEs seem to share a lot in common.

Office systems and decision-support systems Much office work is programmable work (cf., emergent 'crea- tive' work). Like CAD, OISs place a higher demand on databases while centralizing information in a given office environment. Characteristics of OIS applications are:

• Semantic richness and they require support for unstructured messages, letters, annotations, oral communication, etc. There are stereotypical information groupings, such as business letters or quarterly progress reports, which may maintain standard formats.

• Time factor is crucial. • Lack of structure - - instructions are often incomplete and

irregular; constant communication and dialogue is necessary.

• High interconnectivity functions can be complex. • Office constraints and evolution - - constant evolution in

which personnel, management, and responsibilities change. Functionality may be influenced by both internal and external forces.

• Interactive interface to cater for all levels of staff. • Filtering of information for consumption by the oft pyrami-

dal organizational structure. • Priorities, scheduling, reminders - - a variety of interrupts is

generated and must be constantly handled.

What exist today are islands of mechanization in offices, rather than total automation. Major activities in offices are document processing, storage, and retrieval. (These can be treated as operations in an OO model.)

OISs should also support decision making, e.g., in analysis of data, controlling the state of a system itself, analytical decision- making tools, organizational design and operating-system design changes, etc. Decision-support systems are today being designed with a database as their central component.

Vol 34 No 5 May 1992 305


Statistical and scientific databases

Statistical and scientific databases (SSDBs) are broadly of two types:

span the entire software life-cycle. Enormous amounts of data (versions and archives) must therefore be effectively stored and managed. This requirement makes the OO approach a front- runner.

• micro, containing records of individual entities, such as raw census data

• macro, which contain a summary of the former, derived by performing statistical operations like cross-tabulations or regression analysis

Conventional databases cannot conveniently handle the latter type. Data-compression techniques have been proposed for sparse matrix types of data. Inferencing and security control techniques have also been proposed. SSDBs have also been proposed for statistical sampling and testing in manufacturing, together with real-time control of test sequences for quality control. However, serious problems still exist. Elmasri and Navathe 24 summarize the major problems as:

• Data definition: micro SSDBs tend to have hundreds of attributes per record. Schema changes must be supported without reloading a database. A variety of data types, e.g., time series, vectors, matrices, are needed. A construct supporting crossproducts of attributes is also necessary for macro SSDBs.

• Data manipulation: a greater variety of statistical operations is desirable.

• User interface: meta-data-browsing support is essential for users to determine what database portion to look at.

• Physical modelling: compression of usually sparse micro- data is required.

• Data dictionary: extensive and exhaustive dictionaries are necessary.

• System facilities: an elaborate logging of searches will facili- tate backtracking.

As in the previous examples, existing DBMSs are not sufficient to handle meaningfully SS data. Apart from an inherent 'miniature' nature of SS data, the problems of all these application areas are basically the same. A breakthrough in database application for one area will surely be good news for the others.

Real-time systems

Recently, real-time systems (RTSs) have also attracted the attention of the database community as a potential DBMS application area. Perhaps this has been a result of RTS developers realising the central role that a tailored DBMS is likely to play in the development, evaluation, and maintenance of such systems. In particular, OO and KBR concepts will have several implications for RTS data. RTSs are mostly found in life-critical environments, therefore they will require nothing short of a powerful and dependable DBMS. The marriage between DBMSs and RTSs is indeed recent, and therefore there may not be as much literature in this multidisciptinary problem area.

Though not primarily intended for RTSs, the Team environment I°° would be useful for handling one of the key issues of RTSs, namely, testing and evaluation. Team has been designed to support integration of, and experimentation with, an ever- growing number of software testing and analysis tools. Team results have demonstrated how modularity, genericity, and language independence all foster extensibility and interoperability.

Database systems are increasingly being used in life-critical applications. The question of software reliability is becoming more crucial as well. Software testing and analysis that can guarantee high reliability is essential. The fact that the earlier an error is detected and corrected the better is not anything new. The implication is that testing tools are required that will

Conclusions

Other areas where DBMSs will find application in the near future are geographical and land information systems, medical and clinical information systems, AI and natural-language processing, human intelligence mimic and human-computer interaction (HCI), expert and legal information systems, and astro- physical and environmental studies. Indeed, DBMS technology, with advances thereof, is bound to impact on most, if not all, aspects of life. Unfortunately, the author does not have to hand any significant strides in any of these areas to highlight in this paper.

However, it is important to note that the requirements and challenges posed by all these emerging applications are similar, in the main.

A P P E N D I X 2: C O U P L I N G LOGIC P R O G R A M M I N G WITH RELATIONAL DATABASE w EXAMPLE

This Appendix reviews CPD (Coupled-Prolog/db + +), a pro- totype database system under development at the University of York, UK. Some of the reasons mentioned earlier inspired CPD's conception. Additionally, its developers were motivated by the dire need to produce a database system tailored to the requirements of (yet not limited to) CAD/CAM, VLSI, office automation, military applications, and expert systems. In CPD, large parts of Prolog programs are stored in a relational database, d b + + 101.102. Unfortunately, not all Prolog rules can be stored in relational form as procedures with variables have no direct equivalent relational form.

Figure 1 1 shows the CPD structure. A Prolog predicate is parsed to the predicate-transforming program (TPP), and the resulting output is a standard d b + + command. The system sends this command to db+ + through the calldb command. Once db + + has retrieved the queried data from the data file, tuples are relayed to the tuples-formatting program (FTP), which converts the tuples into standard Prolog assertions (facts). Thus FTP, not necessarily in C, transforms a db + + output file into a Prolog input file. TPP is built in C source code. C was chosen as the programming language for conve- nience and consistency. Both db + + and C-Prolog are written in C.

A P P E N D I X 3: SELECTED T E R M I N O L O G I E S OF OO P A R A D I G M This Appendix explains some of the terminologies normally used in OO data modelling.

A database is considered to comprise a collection of objects. An object can be a time-evolving or invariant entity. Consider, for example, the following three object types in a simple entre- preneur's world: CLIENT, STOCK, and ORDER. CLIENT is clearly a persistent kind of object, while the other two objects are transient. It is usual to implement persistent objects as records in the database. On the other hand, transient objects tend to be implemented as transactions.

An object can be a physical entity, concept, idea, or even event. In classical record-oriented data models, data are viewed as a collection of record types (or relations), each having a collection of records (or tuples) stored in a file. OO modelling aims at maintaining a direct correspondence between the real world and its database objects. In addition to representing semantic structures of the real-world data, OO models try to


SJOCHUODHO

Expert system1 ~ Expert system2 I (Prolog progl) (Prolog prog2)

Special p ~ p r e d i c a t e ~ I transforming ~ ~ - ~ program I I

Unix operating system

DB++ programs

Shared data file

Tuples formatting program

I

. . • Expert systemN (Prolog progN) I

Figure 11. Architecture of CPD

capture behavioural aspects as well. The notion of OODBs is similar to that of OOPLs, except that OODBs require the existence of persistence objects stored permanently in secondary storage.

Data abstraction and encapsulation is the ability to define a set of operations (methods) that can be applied to the objects of a particular class (object type). All access to an object is constrained to be via some predefined methods. An object has an interface part, which is public (to other objects/users), and an implementation part, which is private and may be changed without affecting the interface. All access to an object is via its interface. Object classes can be organized into type hierarchies.

Object identity is independent of attributes and is generated by the system. Thus any object attribute can be updated without destroying its identity.

Inheritance means a subclass inherits both attributes and methods from its supertype. Partial inheritance is possible.

Complex objects: composite objects can be formed from previously defined objects through association or aggregation.

Message passing: objects communicate and perform all oper-

ations, including retrieval, computations, and updates via messages. A message consists of an object (or several objects) followed by a method to be applied to these objects.

Operator overloading: a common operator name may be used to denote different operations when applied to different object types. The meaning of the operation is then 'overloaded' and can be resolved only when the said object is provided.

Tanaka and Yoshikawa ~03 have more recently introduced two additional concepts of:

• Reduction - - an operation to remove 'redundant' elements from a set-type element.

• Unification - - an operation to unify two or more tuple-type objects into a more generic tuple type by generalizing two set-type objects.

These two, however, do not seem to have received much attention overall. Still unconvinced of their key significance, they are simply introduced here for completeness.

Vol 34 No 5 May 1992 307

Object-oriented database support for software project management environments: data-modelling issues

Documents

Transcript of Object-oriented database support for software project management environments: data-modelling issues