A Stack-Based Approach to Query Languages

63

Transcript of A Stack-Based Approach to Query Languages

PRACE IPI PAN � ICS PAS REPORTSKazimierz Subieta, Catriel Beeri,Florian Matthes, Joachim W. SchmidtA Stack-Based Approachto Query Languages738

INSTYTUT PODSTAW INFORMATYKIPOLSKIEJ AKADEMII NAUKINSTITUTE OF COMPUTER SCIENCEPOLISH ACADEMY OF SCIENCESWarszawa, December 19931

Prac ,e zg losi l Piotr Dembi�nskiAuthors' addresses: Kazimierz SubietaInstitute of Computer SciencePolish Academy of SciencesOrdona 2101-237 Warszawa, Polande-mail: [email protected] BeeriHebrew UniversityComputer Science DepartmentGivat Ram, Jerusalem 91904, IsraelFlorian Matthes, Joachim W. SchmidtUniversity of HamburgDepartment of Computer ScienceVogt-K�olln-Stra�e 30D-2000 Hamburg 54, GermanyCR: D.3.2, H.2.3P r i n t e d as a m a n u s c r i p tN a p r a w a c h r ,e k o p i s uISSN: 0138-0648 2

AbstractWe follow a new paradigm of programming languages in which imperative programmingconstructs and programming abstractions are built around a declarative query language.Seamless integration of queries with programming constructs implies a new approach toquery languages, in which we employ the classical naming, scoping and binding issues. Wede�ne a simple abstract storage model, which makes possible to map modelling primitivesof data models, including relational and object-oriented models. Our main concern are it-erations encapsulated in the form q1�q2, where q1 and q2 are arbitrary queries, and � is anoperator having some tradition in the QLs domain: � can be selection, projection, naviga-tion, join, quanti�er, sorting, transitive closure, etc. Such operators are formally de�ned byan abstract machine, which iteratively evaluates q2 in new environments, which are deter-mined by tuples returned by q1. The machine is based on two stacks: the result stack storingpartial results of evaluation, and the environment stack, determining scoping and binding.The approach allows us in a consistent semantic frame to consider query constructs whichresemble well-known approaches (tuple and domain relational calculi, SQL, and object- ori-ented query languages). It directly corresponds to the real implementation and it is alreadyimplemented in the system LOQIS. The approach supports complex objects, object identi-ties, null-values and variants. Query operators can be seamlessly integrated with imperativeconstructs such as updating, for each, etc. We discuss procedures and functional procedures(views) based on queries, and object-oriented concepts. Finally, a new optimization methodbased on the proposed approach is presented.3

Podej�scie do j ,ezyk�ow zapyta�n oparte o stosStreszczenieArtyku l dotyczy nowego paradygmatu j ,ezyk�ow programowania polegaj ,acego na tym, _zekonstrukcje imperatywne oraz abstrakcje programistyczne s ,a budowane dooko la deklara-cyjnego j ,ezyka zapyta�n. Bezszwowa integracja zapyta�n z konstrukcjami programistycznymiimplikuje nowe podej�scie do j ,ezyk�ow zapyta�n, w kt�orym stosujemy klasyczne rozwi ,azaniaokre�slane jako nazywanie, ograniczanie zakresu i wi ,azanie. De�niujemy prosty, abstrak-cyjny model przechowywania obiekt�ow, umo_zliwiaj ,acy odwzorowanie elementarnych poj ,e�cmodeli danych, w szczeg�olno�sci modelu relacyjnego i modeli obiektowych. Zajmujemy si ,eg l�ownie iteracjami ukrytymi w formie q1�q2, gdzie q1 i q2 s ,a dowolnymi zapytaniami, za�s� jest operatorem posiadaj ,acym pewn ,a tradycj ,e w j ,ezykach zapyta�n: � mo_ze by�c se-lekcj ,a, projekcj ,a, nawigacj ,a, kwanty�katorem, operatorem sortowania, operatorem tranzy-tywnego domkni ,ecia, itp. Takie operatory s ,a formalnie zde�niowane poprzez maszyn ,e ab-strakcyjn ,a, kt�ora iteracyjnie oblicza q2 w nowych �srodowiskach, wyznaczonych przez krotkizwr�ocone jako wynik q1. Dzia lanie maszyny opiera si ,e na dw�och stosach: stos rezultat�ow,przechowuj ,acy cz ,e�sciowe rezultaty oblicze�n, i stos �srodowisk, okre�slaj ,acy ograniczenia za-kresu i wi ,azanie. Podej�scie to pozwala nam, w semantycznie sp�ojnych ramach formalnych,rozwa_za�c konstrukcje j ,ezyk�ow zapyta�n przypominaj ,ace dobrze znane podej�scia (krotkowy idziedzinowy rachunek relacji, SQL, oraz obiektowe j ,ezyki zapyta�n). Jest ono bezpo�sredniozgodne z rzeczywist ,a implementacj ,a i zosta lo ju_z zaimplementowane w systemie LOQIS.Podej�scie to uwzgl ,edniania z lo_zone obiekty, identy�katory obiekt�ow, warto�sci zerowe i wari-anty. Operatory j ,ezyka zapyta�n mog ,a by�c bezszwowo zintegrowane z konstrukcjami im-peratywnymi, takimi jak aktualizacje, for each, itp. W artykule dyskutujemy proceduryi procedury funkcyjne (wizje) oparte na zapytaniach, oraz poj ,ecia zwi ,azane z podej�sciemobiektowym. Na ko�ncu prezentujemy now ,a metod ,e optymalizacji zapyta�n, opart ,a na pro-ponowanym podej�sciu.4

1 IntroductionRecently the domain of query languages (QLs) is going into analogies and integration withprogramming languages and environments, since advanced database applications require so-phisticated programming rather than simple querying. Attempts to combine querying withprogramming led to the impedance mismatch, which has undermined the meaning of QLsin database programming. However, as argued in [Beer89, SRH90, SRLG+90], QLs arethe successful achievement of the database domain. For big and sophisticated applicationsQLs are imperative either for technical and ergonomical reasons. Declarative queries aremuch easier to optimize than sequences of procedural statements, and make possible par-allel computations [BBKV87]. For some scale of complexity the automatic optimizationalmost always results in better performance than manual optimization done in a procedurallanguage. Moreover, QLs based on data independence, conceptual data views and involv-ing macroscopic operations allow for increasing the programmer's productivity and programreliability, readability and modi�ability on the order of magnitude.In database systems QLs have two main roles. The �rst is ad hoc interactive querying andupdating of a database by less experienced users (see SQL, QUEL, Query-By-Example, QBF[Ingr89] and many others, e.g. [CMW87, PPT91, ZhMe83]). In the second role, queries areused as high-level programming constructs, with various applications: fetching, updating,inserting and deleting database data, determining integrity constraints, determining views,snapshots, database procedures, scripts in 4GL-s, rules, active capabilities, subschema de�-nitions, access restrictions, etc.This paper is mainly devoted to the second role of QLs. The popular classi�cationdistingushes between the embedded and integrated approach. SQL can be considered asan important example of embedded QLs. Embedding, however, inherently su�ers fromesthetic, technical and ergonomical drawbacks. Majority of database programming languages(DBPLs) follow the integrated approach, for example, Pascal/R [Schm77], Galileo [ACO85],DBPL [ScMa92, MRSS92a], Napier88 [MBCD89], Machiavelli [OBB89], Taxis [MBW80],Adaplex [SFL81]), LOQIS [Subi91], O2C [O2Ma92], etc. Also, commercial products such asIngres Windows 4GL [Ingr90] and Oracle PL/SQL [Orac91] integrate SQL with imperativestatements.There are two philosophies of integration of QLs with PLs. The �rst one assumes a pro-cedural PL with added-on QL constructs. Majority of current DBPLs follow this philosophy.The second one is just the reverse: a QL is the basis, and procedural constructs and pro-gramming abstractions are add-ons. This is implemented in Ingres Windows 4GL, in OraclePL/SQL, in POSTGRES [SRH90], and in some AI developments [Mant91]. Advantages ofthe �rst philosophy concern full computational and pragmatic universality, clean semanticsdue to the long tradition of PLs development, dealing with well-established programmingabstractions (procedures, functions, ADTs, modules, classes, etc.), and the systematic treat-ment of the strong and static type checking. This is in contrast with the second line, which5

main advantages are user friendliness, macroscopic programming, declarativeness, and dataindependence. As a matter of fact, although this line resulted in less "smooth\ PLs, it hasenjoyed the big commercial success.In this paper we we would like to combine both philosophies. We build a foundation ofa QL-centralized programming language according to the traditional paradigms of the PLsdomain; the idea is called "seamless\ integration of a QL with a PL [Daya89]. The essenceof our method is a modi�cation of the classical PL methods and mechanisms aiming queryprocessing.Before we start to present our idea we must explain a subtle point concerning relationshipsbetween data models and data structures. Data models, especially conceptual data models,address human understanding of data semantics on some abstract level. In contrast, QLsand PLs are strongly related to data structures; for example, some query maps data objectsinto elements of some other set, e.g. into fTRUE;FALSEg. The formal de�nition of QLsand PLs is impossible without formalization of data structures which are to be queried ormanipulated. Hence in our approach we distinguish a data model and an abstract storagemodel. The later to a big extent is orthogonal to data models. The same storage model canbe used to map relational, nested relational, functional, entity-relationship, object-oriented,etc. data models. We start directly from the de�nition of the storage model, assuming thatthere is a mapping from a particular data model into the storage model; as we will see, itis not di�cult to develop informal principles of such a mapping. Then, our de�nitions ofQL operators address the storage model only; again, we assume that there is (or could be)an informal mapping of these operators into some QL concepts of a particular data model.This approach is illustrated in Fig.1.Formal part

Informal partof our approach

Data model

Abstractstorage model

Query in thedata model

Query addressing theabstract storage model

Query result

Interpretationof the result

in the data model

MachineprogramFigure 1: Relationships between data models and the abstract storage modelEither QLs and PLs are based on the relationship between names occuring in language'sstatements and data structures. The relationship is called binding. An essential property6

of the binding is locality of names. The scope for a name in PLs is restricted to some well-de�ned context; for example, the scope of a variable local to a procedure. The scopingis also relevant to queries. For example, in embedded SQL queries may contain names of(volatile) host variables. In a QL integrated with a PL persistent and volatile variables haveequal rights, thus in general they are the subject of scoping rules. Moreover, it is intuitivelyclear that e.g. names of attributes have a somewhat di�erent scope than names of relations.Thus in our approach we put PLs and QLs into a common semantic frame referred to asnaming-scoping-binding.The run-time mechanism of classical programming languages (at least, the Pascal family)is based on two stacks: the result stack, storing partial results of arithmetic and otherexpressions 1, and the environment stack, being the main memory allocation mechanismand used for binding of names occuring in a program. The stack-based approach to PLsis motivated by such requirements as orthogonality and unlimited nesting of language'sconstructs, locality of programming objects (variables, procedure parameters), and recursionconcerning procedures and functions.The stack-based mechanism is also relevant to QLs. Even more, we believe that anyserious development of a QL integrated with imperative constructs and programming ab-stractions must lead to the naming, scoping, binding and stacking issues. The de�nitionof QLs in the spirit of PLs requires, however, changes of the mechanism. Stacks assumedin classical PLs are not prepared to associative processing of bulk data structures, and touniform treatment of persistent and volatile (complex) programming objects. In the paperwe present an extended construction of the stacks (on the conceptual level) and de�nitionsof semantics of QLs' operators through operations on these stacks. Then, we show howthe mechanism can be used to integrate query constructs with imperative statements andprogramming abstractions.The main motivation for the new approach to QLs is pragmatic universality 2. The prag-matic universality is frequently in opposition to conceptual simplicity and user-friendliness;achieving a good trade-o� is perhaps the major problem of QLs. From one side, QLs are tobe used by humans, thus should introduce features which are easy to understand, use andcombine for a variety of programming situations. From another side, these features shouldform a minimal and consistent set, with formal (machine) semantics.There is a number of factors which contribute to the pragmatic universality of a PL basedon a QL. Among them there are admissible data structures, introduced for various needs inprogramming and in conceptual modelling. Besides the classical relation, other (bulk) typeconstructors are considered: sets, bags, sequences, arrays, variants, etc. Modern DBPLsassume the possibility of orthogonal combination of all type constructors; this, in particular,leads to the NF2 concept and, in general, to arbitrarily complex data objects. They alsoassume orthogonality of types and persistence; in particular, bulk data may be stored asvolatile variables and individual variables may be stored in the database. This implies a1The result stack is usually \hidden" in recursive de�nitions of language's constructs. In this paper,however, our goal is to present the mechanism before any formalization of it.2Note that frequently used term computational universality (in the Turing sense) is not much relevant toQLs, since the goal of QLs is not expressing a class of mathematical functions (as the Turing machine does),but serving data structures for all required purposes.7

uniform treatment of all such structures by a QL. The pragmatic universality means a largecollection of retrieval operators: selections, projections, joins, navigation down hierarchicalobjects, navigation via references, quanti�ers, resolving name con icts, comparisons, arith-metic and string operators/functions, aggregate functions, grouping, ordering, transitiveclosures, etc. Then, the pragmatic universality requires a variety of imperative statements:creating new data objects, updating (assignment), inserting, deleting, if statements, loops,for each, etc. Current programming technologies require also a collection of programmingabstractions | procedures, functional procedures, views, snapshots, classes, modules, etc.Usually programmers expect that procedures/functions may have parameters, can be recur-sive, and make possible to update data objects via their parameters and via side e�ects. QLsas PLs introduce a new quality: since queries can be considered as generalized PL expres-sions, it is natural to expect that queries will be used as actual parameters of proceduresand will determine the output from functional procedures (views); the later property makespossible to use calls of functional procedures within queries.The above factors determining a level of the pragmatic universality we can confrontwith the current QLs theories, such as the relational algebra, relational calculus, predicatelogic, and other. They had a big in uence on the development of QLs; however, the level ofabstraction assumed in them causes di�culties with the formal treatment of many mentionedabove features. Recently the database domain has been in uenced by novel ideas | complexobjects, object identity, classes and inheritance, roles, deduction, etc. | which cause growthof complexity of theories, especially if they extend the traditional relational or logic concepts.(An excellent survey of this line can be found in [Cruz89]). Thus theories frequently abstractfrom vital properties, such as updating. Data models and QLs, however, are getting more andmore sophisticated, and their precise semantics can no longer be explained by the intuitiveextrapolation of semantics of basic retrieval capabilities.Thus we would like to reconstruct the QL concept from the uniform PL perspective, whichcovers majority of phenomena that can be found in practical QLs and in theoretical issues.Although our orientation is pragmatic (a consequence of the implementation experience),and the presentation is semi-formal, it is not di�cult to see how to make it fully formal. Inthis paper we de�ne QL constructs via operational semantics based on an abstract machine.Declarative semantics can be obtained through building a denotational model; preliminariesof it are presented in [Subi85, SuMi86].The approach presented in this paper is most of all relevant to object-oriented databases,but at the beginning we avoid direct associations with this idea. Our concept of QLs worksfor a slightly more general model than object-oriented, and we will show that a variety ofobject-oriented QLs can be obtained by specialization and modi�cation of the presentedde�nitions. Our considerations to a big extent are orthogonal to data models: some topicsconsidered in this paper are relevant to the relational model, extended relational models,nested relational models, entity-relationship models, functional models, and data modelsassumed in DBPLs.We did a big e�ort trying to simplify the presentation, but | unfortunately | theapproach has little in common with known approaches to QLs. Some notions and examplesmay require from the reader a lot of imagination and concentration. The paper, however, is\self-content": we reduced as much as possible references to the PL literature. The reader,8

who will be enough patient to follow through (we hope) not very sophisticated concepts,can enjoy the generality of our approach and, simultaneously, its correspondence to the realimplementation. It enables us in one simple and consistent framework to consider queries inthe SQL style, in the tuple calculus (QUEL, DBPL) style, in the domain-calculus style, andin object-oriented styles. It deals with selections, projections, navigations, quanti�ers, verygeneral joins, introduces a general variant of a transitive closure; it supports complex objects,references to objects, object identities, null-values, variants, updating, for each statements,(database) procedures, (updatable) views, etc. We will also show that the approach leadsto a powerful query optimization method.The rest of the paper is organized as follows. In Section 2 we discuss the abstract stor-age model and present preliminary formal de�nitions. In Section 3 we discuss the abstractmachine model, introducing details of the operational semantics used for the speci�cationof QL's constructs. In Section 4 we present and discuss various constructs of QLs (opera-tors and comparisons, selection, projection, navigation, path expressions, join, quanti�ers,calculus-like variables, transitive closure, ordering, null values and variants). In Section 5 wediscuss programming constructs: assignments, for each statements, procedures and views,and discuss issues concerning object-orientation. Section 6 presents a new method of queryoptimization. The paper is �nished by a conclusion.2 An Abstract Storage ModelThe range of type constructors and data structures which have to be served by query lan-guages is very wide. It includes individual variables, records (tuples), relations, sets, bags,sequences, arrays, complex attributes, repeating attributes, references (pointers), variantswith and without explicit discrimination, null values, recursive data types, etc. In databaseprogramming languages, for example in DBPL [ScMa92], queries can combine access to per-sistent and volatile data structures. Object-orientation introduced more aspects: behavioralaspects, classes, object identities, structural and behavioral inheritance, and encapsulation.The large set of conceivable features of data structures causes growth of potential options,which are necessary to serve them. This spectrum of design choices causes problems fortheoretical approaches, which need some pure and homogeneous, but general data concept.In our formal model of data structures we tried to achieve either some level of abstraction,minimality, and completeness. Before introducing the details of the model, let us note somefeatures that we want to exclude, and some that we want to include. The model should notinclude a description of storage hierarchies and bu�er management, and details of physicalorganization and indices. We also want the model to be general, so that query languages forvarious conceptual models, from value-based to object-based, can be described in it. Thiscalls for abstracting from the details of conceptual models. On the other hand, although weare not interested in details of physical organization, we do want to re ect what the usersees, in particular the fact that a in conceptual model an entity can include pointers to otherentities | a basic feature of object-oriented models, that allows for implicit joins by usingdot notation.The formal de�nition of the binding process requires establishing a relationship betweennames occuring in a program (as seen by the programmer) and stored data the program9

deals with. Classical semantics of imperative PLs assumes two name spaces. The �rst spacecontains symbolic names invented by the programmer. Usually they have an informal de-scriptive role supporting the conceptual view on the program; for example, DEPARTMENT,EMPLOYEE, SALARY, ACCOUNT, PART. The second space contains names which areinternally used for identi�cation of data. In the simplest case of assemblers they are physicaladdresses, in the denotational semantics they are "locations\, in relational databases theyare tid-s, and in object-oriented databases they are object identities. Complex objects (forinstance, records) associate external names with their components, which leads to nestingof the binding relationship. For bulk types there is a necessity to associate an internal iden-ti�er to each element of a bulk data structure, rather than one identi�er to the whole, asfor other structures, since it may be necessary to identify internally particular elements of abulk structure.There are several candidate formal models of data structures re ecting both name spacesand covering the concepts of bulk data and complex objects. We propose one of them. Intu-itively, a stored database, at a given point of time, consists of a collection of stored entities,which we call storage objects; these should not be confused with objects of a conceptualmodel. There are three components that we care about in a storage object:� the value stored there, i.e., its content;� its location or internal identi�er;� the external name invented by the programmer or by the database designer.Since we want to abstract away from physical organization, we represent internal iden-ti�ers abstractly and do not interested in their pragmatic nature; they can be physicaladdresses, symbolic addresses, object identi�ers, primary key values, etc.The content of a storage object can be one of three kinds:� an atomic value, for instance, 5, I am a string, etc.� a complex value. We will represent a complex value as a set of storage objects.� a reference to another storage object, i.e. its internal identi�er.There is a number of aspects that we would like to abstract from in our de�nitions.In particular, we neglect types. De�nitions of query operators have very little to do withtypes; indeed, majority of current QLs are untyped. On the other hand, typed languagesare a central research area in programming languages, and a successful facility for automaticprogram checking, especially important for large programs. Usually queries are not so large,but Cardelli warns [Card89]: \a surprisingly common mistake consists in designing languagesunder the assumption that only small programs will be written". We do not take a stand inthis paper about whether query languages should be typed, and whether this is a prerequisitefor a successful integration with programming languages. However, we realize that programswritten in integrated QLs can be large, thus lack of strong typing may be the reason oflow reliability and programmers' productivity. In the paper we will show another argumentin fovour of typing. In some cases the untyped framework leads to semantic ambiguities10

connected with scoping rules; thus a kind of typing information cannot be avoided in theproposed QLs. We would like to consider types after recognizing properties of QLs. Thismakes possible to discuss which of them can or cannot be discarded because of the typingsystem, and which typing systems is relevant for QLs 3.We also abstract from persistence. Assuming orthogonality of data structures and persis-tencs, QLs are independent on this feature. In particular, the variables de�ned in a programor a procedure are also storage objects, albeit of a more temporary nature compared to thosein the database. The notion of storage object thus contains both volatile and persistent vari-ables. For the subject of this paper, the term "storage object\ seems to be more appropriatethan traditional "variable\.In attempts to eliminate secondary features, we make uni�cation of records, tuples, ar-rays, and all bulk structures; indeed, all of them are collections of elements. This has led usto the following simple (but su�ciently rich) de�nition.Formally, let I be the set of internal identi�ers, N be the set of external data names, andV be the set of atomic values. We do not assume any speci�c nature of V ; in particular itmay contain numerals, strings, texts, graphics, compiled procedures, and so on. Atomicitymeans that we do not assume the existence of operations that are known as referring to theirparts. We assume I \ V = ;. N \ V needs not to be ;. It is possible, even common, thatprograms compute some values, then use them as names, e.g. integers that are used as arrayindices.A storage object is a triple < i; n; v >, where i 2 I; n 2 N; are its identi�er and name,respectively, and v is its value. We often say that i identi�es this object. The value can beone of the following:� An atomic value from V ;� An identi�er j from I. This identi�er, as the value of the storage object, serves as a(logical) pointer to another storage object.� A set of storage objects.We refer to these three types of objects as value-objects, pointer-objects, and set-objects,respectively. The �rst two kinds are also called atomic, and those of the third kind are calledcomplex. Below is an example of a complex storage object.< i5; EMP; f< i6; NAME;Smith >;< i7; SAL; 2000 >;< i8;WORKS IN; i17 >g >A database instance is a set of storage objects. We assume that it satis�es some obviousconstraints. An element i 2 I is used in it at most once as an identi�er: there is one-to-onecorrespondence between storage objects and identi�ers. If an identi�er is used as a pointer,then it also identi�es some object (the referential integrity).The above de�nition is not unique. Note also that we could adopt an approach whereinstead of a set of storage objects we would allow only a set of identi�ers in the last case.Further generalization of this idea leads to the model presented in [Subi85, SuMi86, SuRz87],3We have already some ideas concerning typing of QLs; they will be the subject of a subsequent paper.11

where a database instance is a relation being a subset of I � N � (V [ I). Our approachallows for a somewhat more direct modelling of database objects as bounded units, andmakes easier the de�nition of semantics of updating operations, e.g. delete. We do notconsider potential extensions of the storage object concept, for example, objects withouta name, and objects with more than one name (as we will see in the following, for somepurposes the later extension is reasonable).We emphasize that the identi�ers in I are internal identi�ers. They are not used in queriesor programs and are not printable. Consequently, a one-to-one mapping of all identi�ers toanother collection of identi�ers (a permutation of the identi�ers) cannot be recognized fromoutside. A database instance is a representative of the class of databases that can be obtainedfrom it by one-to-one identi�er mappings. In contrast, the names are external in the sensethat they are used in queries; they are a part of the user's model. There is no requirement ofuniqueness on names; as seen clearly in our examples, this freedom allows for representingbulk data.From now, "object\ always means storage object, unless speci�cally noted otherwise.Below we present an example database instance. Note that each storage object has aunique identi�er and there are no dangling pointers.ExampleMyDatabase:< i1; EMP;f< i2; NAME;Brown >;< i3; SAL; 2500 >;< i4;WORKS IN; i13 >g >;< i5; EMP;f< i6; NAME;Smith >;< i7; SAL; 2000 >;< i8;WORKS IN; i17 >g >;< i9; EMP;f< i10; NAME; Jones >;< i11; SAL; 1500 >;< i12;WORKS IN; i17 >g >;< i13;DEPT;f< i14;DNAME;Toy >;< i15; LOC;Paris >;< i16; LOC;London >g >;< i17;DEPT;f< i18;DNAME;Sales >;< i19; LOC;Berlin >g >In Figure 2 we illustrate this example graphically.Having de�ned the model, let us now consider how it can be used to represent databasesthat are given in one of the common conceptual models. Let us start with the relational12

i15 LOCParis

i19 LOCBerlin

i14 DNAMEToy

i13 DEPT

i2 NAMEBrown

i4 WORKS_INi13

i12 WORKS_INi17

i17 DEPT

i3 SAL2500

i11 SAL1500

i1 EMP

i6 NAMESmith

i7 SAL2000

i10 NAMEJones

i18 DNAMESales

i8 WORKS_INi17

i9 EMPi5 EMP

MyDatabase

i16 LOCLondon Figure 2: Nesting of storage objectsmodel. In this model, when we want to deal with employees, we may have a relation calledEMP . This name is thus associated with the relation | the set of tuples. However, weare interested in names in the context of binding. Consider a typical query like "select SALfrom EMP\. During execution of this query, the name EMP gets bound to each tuple ofthe relation. Therefore, in our model, the name EMP is associated not with the relation,but rather with its tuples. In the example above, if in the object WORKS IN we storedepartment names (instead of pointers) and reduce the possibility to repeat LOC attributes,we obtain a relational database.Now, consider the representation of tuples. In the example, we have �ve tuples | threeemployees and two departments. We �rst note that each tuple is a storage object, it has anidenti�er. This identi�er has nothing to do with the conceptual model, it is essentially anabstract representation of the internal identity of the tuple. (Many relational systems indeeduse internally tuple-id`s.) Also note that the tuple`s value is a set of storage objects. Eachof these represents one attribute value within the tuple. Note that the attribute is simply aname that, in the context of this tuple, is bound to a certain stored value. The same attributealso appears in the other tuples of the relation. Finally note that we do not have tuple as adata structure in our model | a tuple is represented by the set of its components. Indeed,for our task, a tuple is simply a local environment | a set of names and a stored valuesthat can be bound to them. The concept of a tuple as a data structure, and the associatedconstraint that all tuples (in a given relation) have the same attributes, are irrelevant in our13

context.To generalize to the nested relational, or complex object models, we need to be able torepresent sets, and obviously we have that in the model. Note that we do not have the possi-bility of a set of values as the value (i.e., third component) of an abject. While this possibilityhas advantages for conceptual modeling, it would violate the storage principle [SuMi86] say-ing that each stored value, that can be distinguished in the structure, should have an uniqueinternal identi�er. This principle is strongly motivated by engineering requirements. Thuswe view a set as a collection of stored values, and each of these is represented as a storageobject. Also note, that just as for relations, if the set has a name in the conceptual model,this name is associated with each element of the set.We can easily represent lists. For example, a list is a set of storage objects< i1; N1; f< i2; N2; value >;< i3; N3; i4 >g >where each pointer i4, leads to a next list element; the last element does not contain thepointer. Similarly one can represent trees, and generally, values of recursively de�ned datatypes.Our storage model allows for representing stored bags. For example, the storage object< i1; N; f< i2;M; 5 >;< i3;M; 5 >g >represents the bag f5; 5g.Names can be numbers, for example,< i1; A; f< i2; 1;Monday >;< i3; 2; Thusday >; ::; < i8; 7; Sunday >g >In the terminology of programming languages such a structure is called array. Note thatthe names used in arrays are also values, and are routinely calculated during run-time. Incomparison to other data structures, e.g. to records, arrays have the only essential property:names of their elements can be calculated during run-time (they are �rst-class citizens).This property, however, is not essential for our presentation. In LOQIS each data name is�rst-class; we assumed that the construct [e] denotes a data name obtained by evaluation ofexpression e; for example, if x = 5 then [x + 2] denotes name 7, and A:[x+ 2] identi�es theSunday element of the presented array. In consequence, the di�erence between arrays andother type constructors is neglected.We have already stated that we do not impose the constraint that objects with samename have the same structure. This allows us to represent variants, as in the followingdatabase fragment, again without having this concept in the model:< i1; EMP; f< i2; NAME;Brown >;< i3; SAL; 2500 >g >< i6; EMP; f< i7; NAME;Smith >;< i9; IS STUDENT; true >g >The existence of a �eld to serve as a discriminator of a variant is not obligatory.For the same reason, we can easily represent null values, when they stand for lack of14

stored information. The main reason for dealing with null values is that a particular infor-mation does not �t exactly the prede�ned format. Because the relational model is havilybased on prede�ned formats of tuples, the problem of null values received a big attention.Null values cannot be avoided in real-life database systems, thus cannot be avoided in QLs.Since the de�nition of database instances does not include any concept of format, such anull value is represented simply by the absence of information. For example,< i1; EMP; f< i2; NAME;Brown >;< i3; JOB; programmer >g >< i6; EMP; f< i7; NAME;Smith >g >means that JOB for Smith is null-valued. Note that we do not associate any semanticswith this absence. As far as we are concerned the above example, Smith may have no jobat all, or his job may be unknown, or his job is determined in another way. For example,assume that almost all employees are clerks; thus the JOB information is explicitly writtenonly for these, who are not clerks. The interpretation of the absence of information belongsto the realm of conceptual modeling, and is outside of the scope of our model. Similarly,if the conceptual model stores nulls as special values (marked nulls), these will be simplystorage objects for us. Of course, we will need the ability to test for the lack of a storedobjeect with a given name, so that the query evaluator can interpret this and apply theappropriate semantics, as determined by the conceptual model.In summary, this simple model allows one to represent a variety of data structures andconcepts, including records/tuples, arrays, (nested) relations, sets and bags, and their com-binations; variants, null values; (complex) objects, object sharing, and instances of recursivetypes. As shown in Fig.1, our model is only a formal tool which makes possible to representconceptual modelling primitives from a particular data model. In many cases this repre-sentation is quite strightforward; this concerns the relational model and NF2 models. Forsemantic models such as the entity-relationship model, functional models, IFO, etc. there areseveral methods of representation of their primitives. For example, an instance of a binaryrelationship can be represented as two pointer objects inserted into entities to be connected,as a special object containing two pointer objects, or in another way.The main advantage of the model is the level of abstraction, which allows us to explainQLs properties without going into details of various special cases. Some issues, in particular,generalization/specialization relationships, object-oriented concepts, and ordered bulk data(sequences), require further features of the model, without violating basic assumptions. Someextensions will be discussed later.3 An Abstract Machine ModelIn this section we present an abstract machine model, on which query language expressionscan be evaluated. A component of a state of the machine is a database instance representedin the storage model, as described in the previous section. We describe here other com-ponents the the state, that is, structures used for computation and for query evaluation.These include an environment stack, and a query result stack. The environment stack, asusual, determines scoping and binding. The result stack is a storage for intermediate query15

results, used either for the evaluation of query operators and for the evaluation of arithmetic-style expressions (we do not make distinction between these cases). We discuss some basiccommands of the machine, and its facility for parallel execution.3.1 The Environment StackWhen a query is evaluated, we need �rst to evaluate its atomic components: names, con-stants, etc.; then to build the meaning of larger constructs (arithmetic expressions, sub-queries, queries) from the meaning of its components. The evaluation of names meansbinding them to data units. Such a binding depends on the context. Just as the denota-tion of identi�ers are dependent on the procedure in which they appear, the denotation ofattributes in a select list depend on the bindings of relation names listed in the from clause.As a simple example, consider a query in a relational database, that asks for the names ofemployees satisfying some conditionselect NAME from EMP where . . .Semantically, the query implies a loop, where the binding to EMP is iteratively changed; eachbinding for NAME is determined by a previous binding to EMP. For each of these, NAME isbound to the appropriate subobject that corresponds to the NAME attribute value for thatemployee.While bindings for simple relational queries, such as the one above, seem to be obvious,this is not the case for more complex queries. For example, in a query that selects thesalaries of employees that earn more than their managers, SAL is used three times, andthere is a need to de�ne how each occurence is bound. The problem is agravated in modelsthat, unlike the relational model, admit deeply nested structures, possibly, with repeated(sub) attributes. The binding to a name (i.e. its meaning) then may depend both on theposition of the identical name in the data structure, and its use in the query.Using the terminology of programming languages, the issue here is essentially that ofscope rules. To deal with it, we propose a mechanism that is well known in programminglanguages, namely that of an environment. In typical PLs, an environment is an associationof names and objects, and we have re ected that in our de�nition of the storage model.Because for various reasons we need to restrict the scope of a particular name occuring ina program, the environment is subdivided into parts (called sections), which form a stack.Binding the name implies a procedure, which looks for a proper object, starting from thetop of the stack, and ommiting irrelevant stack sections. In classical PLs the search is doneduring the compilation time, thus the stack exists in two versions: during compilation (so-called static environment) and during run-time. The run-time stack consists of object valuesonly, since after binding explicit names need not be stored.Following PLs, we represent the environment by a stack, which we call the environmentstack, and denote ES. In comparison to PLs, we have several reasons to change its construc-tion. Because we would like to abstract from compilation, and because of the late binding,we need to have full information about data structure and data names during run-time, andthis is already taken into account in the storage model. In classical PLs a stack section has a16

�xed format during run time, but this is not the case for QLs. For example, bulk data (rela-tions, repeated attributes), text or multi-media data lead to a variable formats. The stack isusually a main memory structure, while the data are stored at secondary storage; this meansthat some data at the stack must be represented by pointers to the secondary storage. Thelast reason for the change of the stack construction ultimately leads to the pointer variant:storage objects can be shared between di�erent stack sections. Indeed, for example, a stacksection may contain a storage object EMP, but during evaluation of some query we needto build another section (a local environment) with the storage objects NAME, SAL andWORKS IN being attributes of the EMP object. Therefore for uni�cation we assume thatstorage objects are stored in some pool independent of the stack, and the stack stores onlypointers to them. Hence, each stack section is a set of data identi�ers. The structure isillustrated in Fig.3.i129 Z

...

i1 EMP

i130 T...

i128 Y...

i13 DEPT...

i2 NAMEBrown

i4 WORKS_INi13

i17 DEPT...

i3 SAL2500

i127 X...

i9 EMP

...

i5 EMP

...

MyDatabase

i129 i130

i127 i128

...

...

i2 i3 i4

...

...

i1 i5 i9 i13 i17

Volatile Objects(local to procedures, modules, ...)The environment

stack

Top

Bottom

Figure 3: Storage objects and the environment stackThe environment stack is presented on the level which supports the conceptual clarityand uniformity. We abstract from the methods aiming performance, easy programming andreliability. In particular, in implementation some objects can be stored directly at the stack,17

which reduces a level of indirection. The run-time search in the stack can be partly avoidedsince a section of the stack which is relevant for a particular binding can be determinedduring compilation. Since usually the stack is a main memory structure, while data arestored at a secondary storage, it may be reasonable to introduce redundancy: the stackstores pairs < name; identifier > rather than the identi�ers alone. This makes possible toavoid many disc operations. In LOQIS we assumed some abbreviations in representationof identi�ers; in particular identi�ers of all objects that are subordinated to object havingidenti�er i are represented by i with a special ag. Some additional pointers, leading froma section to the next section to be visited during the search, can also improve performance.In LOQIS we avoid direct representation of sets of identi�ers of objects having the samename, for example, identi�ers of all employees or all departments. This is implemented asan additional indirection level in the data structure.We assume that at the beginning of the evaluation process the environment stack consistsof one section containing identi�ers of all database \records" i.e. objects belonging to thetop hierarchy level in the database instance, as EMP and DEPT in MyDatabase. (Asnoted above, simple methods make possible to avoid long lists of identi�ers.) Some otherassumptions were tested in LOQIS. For example, in the entity-relationship model there is aconcept of weak entities, which exist only together with their super-entities (e.g. children ofan employee). Removing an entity implies removing all week entities subordinated to it, butfor retrieval week entities behave as ordinary entities. Such an entity can be modelled as asub-object of another object, but its identi�er is included into the initial section of the envi-ronment stack. We also tested the situation, when the initial section contains also identi�ersof relationships; this leads to somewhat di�erent pragmatic rules of the query language. Inreal systems the structure of data repositories and their behaviour can be complex, thus therules for the initial �lling of the environment stack may be more sophisticated.In summary, our proposal changes the environment stack as used in programming lan-guages in two major ways: (1) The stack contains pointers to storage objects rather thanthe objects itself; (2) A search necessary to bind a name occuring in a query/program mayreturn multiple bindings; i.e. many objects can be bound to a single name. This is intimatelyrelated to the parallelism inherent for the semantics of queries.3.2 Binding and Opening a New ScopeBinding a particular name n occuring in a query implies a search for object(s) named n in theenvironment stack. The search follows scope rules. For the typical case they are as follows.The search starts from the top of ES, and it is terminated after the object(s) are found orthe bottom of the stack is reached. The name is bound to all objects having the given nameand which identi�ers are in the ES section where the search has been terminated. Note thatall objects named n from one ES section are bound to the name n.The scope rules must be sometimes more sophisticated because of various locality con-cepts in PLs. For example, if procedure p1 calls procedure p2, then local objects of p1 shouldnot be visible during the binding of names occuring in p2. This means that the ES sectioncontaining identi�ers of local objects of p1 should be ommitted during the search. Furtherrules are the consequence of nested program blocks, modules (distinguishing speci�cation18

and implementation objects), ADTs, classes and inheritance, viewers [SMSRW93], speci�cmethods of parameter passing in procedures, and perhaps, other techniques. We discussthese problems in more detail latter.While a search may go into the stack ES, updates of the stack can be only performedat the top. We allow only the traditional operations on a stack, namely push(s), and pop,where s is a section. Since a section represents a scope, they correspond to opening a newscope and closing a scope, respectively.In programming languages, opening a new scope corresponds, e.g., to an activation ofa block or a procedure. In a query language, it corresponds additionally to the need toevaluate a query component in a context determined by another component, and is relatedboth to the structure of the query and the structure of the data. For example, in the queryN1:N2, each possible binding for N2 is determined by a binding for N1. That is, �rst N1is bound, in general to many identi�ers i1; i2; :::; ik. Each such identi�er ij de�nes a scope,consisting of identi�ers of objects that are nested in the ij object. The semantics of the dotoperation is that the binding for N2 is iteratively determined in these scopes.As another example, the semantics of the query R�S, where � is some join operationinvolving names of attributes and a comparison, can be described using the nested loopapproach by the following procedure: (1) Determine a binding for R; this is an identi�erof a tuple (tid) of the relation R. This binding determines a new scope with bindings forattributte names of R. (2) Using the same environment, determine a binding for S (i.e. atid of tuple of S). (3) Evaluate the condition � in an environment where these two scopesare the top ES sections; if it is true, compute the result tuple and add it to the join result.Both the condition and the result are de�ned in terms of attribute names of R and S, andthe bindings for these names should therefore be found at the top two sections of the stack4. Let i be an identi�er. We denote by nested (i) the following set of identi�ers: if i identi�esa set object, then nested(i) contains all identi�ers of the objects in the set. If i identi�es apointer object < i; n; j >, then nested(i) = fjg. For uniformity we assume that if v 2 Vthen nested(v) = ;. Then, we upgrade the function to arguments being sets/sequencesof identi�ers: the result is an union of partial results. For example (see MyDatabase),nested(i1) = fi2; i3; i4g, nested(i4) = fi13g, nested(2500) = ;, and nested(< i1; i13 >) =fi2; i3; i4; i14; i15; i16g. Formally, the function nested has also an (implicit) argument beingthe database instance; for abbreviation we ommit it.In this paper our main concern are queries of the form q1�q2, where q1 and q2 are atomicor compound queries, and � is an operator having some tradition in the QLs domain. � canbe where (selection), \." (projection, navigation), ./ (a variant of join), 8 and 9 (quanti�ers),order by (sorting), closed by (transitive closure), etc. In our approach all these operators arede�ned by application of the same formal mechanism. The idea of the mechanism relies inthe iterative evaluation of q2 in new environments, which are determined by tuples returnedby q1. Thus q2 returns as many results as the number of rows returned by q1. The �nalresult of q1�q2 is some combination of the result returned by q1 and the results returnedby q2. The role which we assume for the function nested is the following: for evaluation of4Note that relational structures can be easily modelled in our model, if we assume that a tid is a pair<relation name, primary key value(s) >, and an attribute value identi�er is a pair < atribute name, tid>.19

some query operator acting on a tuple t returned by q1 (where t consists of identi�ers) weopen a new scope with identi�ers of objects nested in objects identi�ed by t; that is, we pushnested(t) on ES. Then, q2 is evaluated in this new environment.3.3 The Query Result StackIn addition to the environment stack ES, we assume also a distinguished place where resultsare kept, called query result stack (QRES). The concept is well-known in traditional PLs;sometimes it is called the arithmetic stack. The result stack is used to store results of allsubqueries (in particular, arithmetic operators), and at the end of evaluation it holds theanswer to the query.The result stack can store numerical values or strings, but it can also store complex values,e.g. relations. The decision, which kind of values can be represented at the result stack, is animportant design decision, especially in the context of complex objects and object-orienteddatabases. Note that the stack is also used to preparing actual parameters of procedures, andto store the output from functional procedures. The designers of a query language are facedwith the spectrum of choices, which include the following: storing complex objects directlyat the stack vs. storing only pointers to them; storing nested structures (sets, relations)of values, objects and pointers vs. storing at structures only; storing multi-media data(texts, graphics) vs. storing only pointers to them, etc. These assumptions are correlatedwith operators assumed in the de�ned query language. The choice of operators, in turn,is constrained by human factors: the formal semantics should be in accordance with theintuitive meaning of a query.Because of contradictory requirements, the designers must make trade-o�s, aiming eitherlow complexity, full universality and easy semantics. The proposal below is an example ofsuch a trade-o�: it is enough simple and non-trivial to present generalities of our approachto QLs.We assume that the result of a query is represented at QRES as a table, where a tableis a multi-set (bag) of rows (tuples), all of the same width. Rows may contain atomic valuesor identi�ers, i.e. their elements belong to V [ I. A table can be in particular a relationover values, thus in this part we cover the relational model as a special case. Note that thetable does not contain names. For elements being identi�ers they can be derived from adatabase instance, but in general either tables and their columns are unnamed. This may bea disadvantage for some queries; we return to this later. As in many programming languages,values TRUE and FALSE are distingushed.An example table, referring to MyDatabase, is presented below; it may represent ananswer to the query \Get employees, their departments, and 10% of their salary". Such atable is a single element of the QRES stack.i2 i14 250i6 i18 200i10 i18 150For uniformity, in this paper we do not make distinction between single values or singleidenti�ers and tables having one row and one column (1 � 1 tables).20

3.4 ParallelismAn important characteristic of query languages is their capabilities for parallelism. Theprocedure for evaluating a join above is to be performed for each tuple of R and each tupleof S. In principle, it can be done in any order. What that means is that the semantics shouldbe de�ned in terms of a parallel execution. Of course, sometimes sequential execution cannotbe avoidedMany operators that are de�ned in this paper have the semantic potential for parallelexecution. As we will see later, the for each operator of the abstract machine does notdetermine the order in which elements of some set/bag are visited. This makes possible toexecute the body of the operator on many processors. The straightforward implementationassumes that one processor is assigned to each element of the set, and each processor includesthe introduced above computational structures, ES and QRES. This may be unrealistic forcurrent hardware technologies, thus some optimization techniques are required.In this paper we do not contribute to the problem of parallel machines. Besides theabove problem, we have recognized other problems. As for other approaches, for exampleFAD [BBKV87], the problem of parallelism is conceptual rather than semantic. It concernsan e�cient architecture which makes possible to distribute the task among many parallelprocessors, and to compose the �nal result from partial results. For QLs with the largeselective power and with other functionalities (e.g. updating) the problem seems to be veryhard.4 The Language SBQLIn this section we illustrate how the semantics of query language primitives, and hence alsocomplete query languages, can be de�ned on our machine. We de�ne here a language, calledSBQL (Stack-Based Query Language), which is an untyped, query-centralized programminglanguage in the 4GL style. We also discuss alternative de�nitions of language's constructsand modi�cations of the language and of the stack-based mechanism, necessary to achievedesired properties.4.1 A Running ExampleFor illustration purposes, we use the same database schema as a running example, Figure4. There is an arbitrary number of DEPT and EMP objects. A DEPT object containssub-objects named DNO, DNAME, MGR, having some atomic values; for the name LOC,it may contain several value objects (thus LOC is a set-valued attribute); it has also severalsub-objects named EMPLOY S containing pointers to EMP objects (thus EMPLOY Smodels a 1:n relationship); and it has a unique pointer sub-object named MANAGER,containing a pointer to an EMP object. Similarly, an EMP object contains some simplevalue subobjects, several complex sub-objects PREV JOB, and a single pointer objectWORKS IN .As de�ned, the database has redundancy, introduced on purpose to allow us to demon-strate di�erent styles of querying (e.g. relational vs. navigational). For example, the at-21

DEPT DNO

DNAME

MGR

LOC

1

n

EMP

ENO

NAME

SAL

JOB

EDNO

PREV_JOB1

n

MA

NA

GE

R

1

EM

PLO

YS

1

n

WO

RK

S_I

N

1

n 1COMPANY

FROM

TILL

JOBFigure 4: Schema of data structures used in examplestributes EDNO and WORKS IN of EMP represent the same information by using aforeign key (i.e., relational style) and a pointer (i.e., navigational, or object-oriented style).The attribute EMPLOY S of DEPT also represents the same information. Similarly, thevalue attribute MGR and the pointer attribute MANAGER of DEPT contain equivalentinformation. We can convert this database to a simple object-oriented database, by remov-ing the redundant value attributes. It can also be transformed to a value-based databaseby removing the redundant pointers, and replacing other pointers by foreign keys; but notethat after such a transformation it is still not relational, as it contains set-valued attributes.In LOQIS this schema can be described as follows (repeating denotes the iteration knownfrom regular expressions, " denotes "pointer to\):repeating DEPT ( DNO(string) DNAME(string)MGR(string) repeating LOC(string)MANAGER( " EMP )repeating EMPLOY S( " EMP ))repeating EMP ( ENO(string) NAME(string)SAL(integer) JOB(string) EDNO(string)repeating PREV JOB( COMPANY (string) FROM(date)TILL(date) JOB(string))WORKS IN( " DEPT ) )For the examples, we assume that the environment stack initially contains one sec-22

tion with identi�ers of all objects with the names DEPT and EMP . Thus, for example,PREV JOB objects are not accessible from the initial environment (hence, no query canstart with this name).4.2 Syntax and Semantics: General AssumptionsSBQL is de�ned by simple syntactic rules. We assume parenthesized expressions and thefollowing productions:� Let C � V be a set of constants; if q 2 C [N then q is a query;� If q is a query and � is an unary operator than �q is a query;� If q1; q2 are queries, and � is a binary operator, then q1�q2 is a query.In contrast to classical QLs which are based on big syntactic patterns with a lot of sugar(e.g. the select clause of SQL), the atomic queries of SBQL are constants and names. Forexample, the single nameEMP is a query, the name SAL is a query, and the constant 1800 isalso a query. From such queries we build more complex queries by means of applying unaryor binary operators and parentheses, for example, EMP where (SAL > 1800). Indeed,queries of SBQL are a generalization of PLs' expressions; e.g. 2 � 2 is a query. We assumefull orthogonality of operators, i.e. they can be used in parenthesized expressions to anylevel of nesting. To improve readability, we avoid some parentheses according to the typicalprecedence rules for arithmetic expressions. In many cases we apply the rule saying thatthe evaluation is performed from left to right. For example, a:b:c:d:e should be understoodas (((a:b):c):d):e. We also assume that the dot operator has the highest precedence, and weusual omit parentheses around a predicate written after the operator whereTo deal with semantics of SBQL we introduce the procedure eval. There are two views onthis procedure, denotational and operational. In the denotational view, eval is is a functioneval : SBQL ! (DBI � ES ! <)where SBQL is the set of all strings forming syntactically correct queries, DBI is the setof all possible database instances, ES is the set of all possible states of the environmentstack, and < is the set of all possible query results. The function eval maps each characterstring forming a syntactically correct query into a function, which maps a state into a queryresult; the state consists of a database instance and a state of the environment stack. Toobtain the denotational model of the language we have to de�ne all the above mathematicalobjects: SBQL via abstract BNF syntax, DBI, ES and < via (�xed-point) domain equa-tions, and the mappings via semantic clauses (also �xed-point equations). This view of theQL semantics is presented in [SuMi86], for a slightly di�erent model.In this paper we prefer to deal with the operational view on eval, because it is moreclose to the real implementation and much easier for readers without strong mathematicalbackground. In the operational view, eval is a recursive procedure with the argument beingthe syntactic construct of SBQL. The procedure makes side e�ects on the stacks that we haveintroduced, ES and QRES. As previously, the result of a query belongs to some domain <,but this time the result is always accumulated at QRES. During evaluation the procedure23

may change the environment stack; but always the state of the stack after evaluation is thesame as before evaluation. The argument(s) of a query operator is (are) stored at QRES(the right one at its top, and the left one below the top), and �nal result of a query is storedat the top of QRES.As usual for the syntax-driven semantics, we assume that there exists a parsing procedure,which e�ciently subdivides the entry string (forming a query) into smaller strings havingwell-de�ned meaning. Then we compose the meaning of the whole string from the meaningsof its parts. (This is a basic principle of modularity, common to all popular approaches tomathematical semantics, including algebraic, denotational and operational). Assuming theexistence of such a parser, we de�ne the semantics of the language via top-down recursion,according to the syntactic tree of a query: eval recursively scans the tree, collecting themeaning from its leaves to the main root.4.3 Atomic QueriesWe start here by considering the de�nition of eval for atomic queries, namely constants andnames. Both kinds of queries do not change ES.For the query consisting of a single constant c, c 2 C, eval(c) pushes at the top of QRESa 1 � 1 table with this constant, and terminates.For a query consisting of a single name n, n 2 N , eval(n) performs the search in theenvironment stack, according to the described scoping rules. The result of the search is asingle-column of identi�ers of objects named n, which is pushed at the top of QRES; thenthe procedure terminates.In general, we do not assume that unsuccessful search in ES will cause a run-time error.If a database instance contains empty bulk data, optional data (null values), or variants thenit may be impossible to bind some names. If search is unsuccessfull then the empty table ispushed at the top of QRES. Unsuccessful search can be also caused by an incorrect name;we assume that such cases are detected by typing (in this paper we do not deal with it).4.4 Compound queries: algebraic operatorsQueries composed through algebraic operators can be of the form �q or q1�q2, where � is anoperator such as count, sum, max, sqrt, sin, log, etc., and � denotes comparisons, booleanand arithmetic operators, concatenation, cartesian product, union, etc. Algebraic operatorshave nothing to do with the environment stack: they take their argument(s) from the QRES,process them, and send the result back to the QRES. For example, function sum takes fromthe top of QRES a single-column table of numbers, makes the total sum of them, cancelsthe argument table, and stores the �nal number at the top of QRES. Similarly, semanticsof query constructs q1 + q2, where q1; q2 are atomic or complex queries, is determined by thefollowing part of the de�nition of eval:procedure eval( query: string);beginparse(query); (* Subdivide the query into largest components *)if query is recognized as q1 + q2 then 24

beginvar RESULT : real;eval(q1);eval(q2);RESULT := top(QRES);pop(QRES);RESULT := procedure plus(top(QRES); RESULT );pop(QRES);push(QRES;RESULT );endelse ...end (*eval*);After evaluation of q1 and q2 the result stack contains two more elements at the top withthe results of q1 and q2. The �nal result is computed by procedure plus and accumulatedin the temporary variable RESULT . Then, the partial results are removed from the stack,and the �nal result is pushed on it.There is no conceptual di�erence between the above de�niton, and the de�nition ofalgebraic operators processing complex tables, e.g. a union, set comparisons, a cartesianproduct, etc. For example, the de�nition of a union requires only another type of theinternal RESULT variable, and another functional procedure computing the �nal result.The following algebraic operators can be considered as candidates to build in a particularquery language:� Numerical comparisons, operators and functions: <;�;=; 6=; >;�, +;�; ?; =; ??, sin,log, sqrt, etc. Since arguments of them are tables, which may contain many values, wecan upgrade these functions in the APL style to bags or vectors of arbitrary length.In SQL, for example, there are such comparisons, quali�ed by the key words \all"and \any". In general, for some operators the generalization can be ambigueous, thusmay require the explicit syntactic distinction. In this paper we do not apply operatorsde�ned in this style.� String comparisons, operators and functions: equality of strings, lexical order relation,is-a-substring, is-a-superstring, concatenation, head, tail, the number of characters,etc.� Boolean operators and, or, not, etc.� Comparisons, operators and functions concerning other atomic data types, e.g. text,graphics, etc.� Comparison of identi�ers for equality/non-equality.� Coercion operators for changing types, e.g. a string into an integer.25

� Aggregate arithmetic functions sum, min, max, avg, median, standard deviation, etc.They act on a single-column table with numeric values and return a numeric value.There was a lot of discussion concerning aggregate functions, because of two proper-ties: (1) some of them require preserving duplicate tuples in their argument, which isinconvenient for pure relational theories; (2) they have to be associated with grouping,as in SQL. Since we do not make the assumption that the tables on the result stackare sets of rows, we avoid the discussion concerning the �rst problem. The secondproblem also does not exist in our framework, since we achieve the desired e�ects byorthogonal combination of aggregate functions with other operators. (In this paper wedo not deal with grouping, as conceptually it is a particular case of our variant of ajoin.)� Function unique or distinct removing duplicate rows (as in SQL).� Function exists mapping a non-empty table to TRUE and empty to FALSE (seeSQL), and function count mapping a table into the number of rows.� Dereferencing operator.It changes an identi�er into a value stored at the object withthis identi�er.� \Set-theoretic" operators and comparisons: cartesian product, union, intersection, dif-ference, is-equal-set, is-subset, in, contains, etc. De�nitions of these operators dependupon the mathematical model for the element of the result stack; for example, if weassume that it is a sequence of rows, then we can de�ne is-equal-sequence, is-sub-sequence, etc.; the same for bags. If results of queries can be either sets, bags andsequences, then additional coercion operators may be necessary.Some operators, e.g. +, implicitly call the dereferencing operator. For set-orientedoperators the de�nition may require some non-trivial concept of equivalence of rows, e.g.the operator for removing duplicates, is-equal-set, intersection, minus, etc. Presence ofidenti�ers imply the problem if for these operators the dereferencing operator should beformerly applied, and how to compare complex data.In this paper we neither deal with (usually obvious) de�nitions of algebraic operators.(But we will use them in examples, assuming | we hope correctly | that the readerhas similar associations concerning their meaning.) We also do not discuss the pragmaticmeaning and necessity of them in QLs.4.5 Compound queries: non-algebraic operatorsThe main contribution of this paper concerns binary operators such as selection, projection,join, quanti�ers, ordering, transitive closure, etc. We call them \non-algebraic", since theyare di�cult to adopt by algebraic theories 5.5To be more precise, algebraization is possible, perhaps in the spirit of Tarski's cylindric algebras, butnot in the spirit of the classical relational algebra. 26

Consider the query q1�q2 where � is a non-algebraic operator. Let r be a row of thetable returned by q1. Our intention is that the row is to be processed by q2; for example,q1 is EMP (returning a single-column table of identi�ers), q2 is SAL = 2500, and theoperator is where. We expect that q2 contains names of sub-objects of the objects pointedby elements of r. Thus, we make them visible for binding by pushing nested(r) at the topof the environment stack, and then, evaluate q2. This is repeated for each row of the tablereturned by q1;More formally, eval(q1�q2) is de�ned by the following piece of the eval procedure (a littlebit modi�ed for the transitive closure and ordering):procedure eval( query: string);begin...if query is recognized as q1�q2 thenbeginInitialization of the temporary RESULT table;eval(q1);for each r 2 top(QRES) dobeginpush(ES; nested(r)); (* Open a new scope on ES *)eval(q2);Update RESULT by r and/or by top(QRES), depending on �;pop(QRES); (* Cancel the result of q2 *)pop(ES); (* Restore the previous state of ES *)end;pop(QRES); (* Cancel the result of q1 *)push(QRES;RESULT );endelse ...end (*eval*);Note that the evaluation is not symmetric in the two subqueries, since q2 is evaluated inenvironments determined by the result of q1. The evaluation process is illustrated in Figure 5.4.6 SelectionThe syntax for the selection operation is q where p, where q is an atomic or complex query,and p is a boolean-valued query, i.e., a condition. The de�nition of the semantics of q where pis the following:procedure eval( query: string);begin... 27

Resultstack

Environmentstack

Opening a new scopefor processing

of the current row r

Row r

nested( r )

Binding name noccuring in a query

Scoping rules:searching for identifiers

of objects named n

Figure 5: Behaviour of the stack-based mechanismif query is recognized as q where p thenbeginvar RESULT : Table;RESULT := ;;eval(q);for each r 2 top(QRES) dobeginpush(ES; nested(r)); (* Open a new scope on ES *)eval(p);if top(QRES) = TRUE theninsert r into RESULT ;pop(QRES); (* Cancel the result of p *)pop(ES); (* Restore the previous state of ES *)end;pop(QRES); (* Cancel the result of q *)push(QRES;RESULT );endelse ...end (*eval*);The procedure evaluates q; then, for each tuple returned by q opens a new scope andevaluates p. The result contains such tuples returned by q for which p returns TRUE.Example (C.f. MyDatabase).Consider the query \List employees earning more than 1800", In our language, this query isexpressed by EMP where (SAL > 1800). 28

We follow the execution of the query. Initially, the environment stack contains one sectioncontaining identi�ers of all database records (to be more explicit, we augment them withnames; these names need not be physically present at the stack):i1(EMP ); i5(EMP ); i9(EMP ); i13(DEPT ); i17(DEPT )First, the eval procedure evaluates q, that is, the atomic query EMP . Name EMP isbound to three identi�ers, which are put in a single-column table in QRES; it will containone element with the table: i1i5i9Each row of this table is processed by the body of for each. For the row < i1 > holdsnested(< i1 >) = fi2; i3; i4g. This set is pushed onto ES:i2(NAME); i3(SAL); i4(WORKS IN)i1(EMP ); i5(EMP ); i9(EMP ); i13(DEPT ); i17(DEPT )Then, SAL > 1800 is evaluated by eval. The predicate does not contain an operatorwhich can change ES; thus both atomic queries SAL and 1800 are evaluated with the sameenvironment stack. In the following, SAL is evaluated, 1800 is evaluated, then the compar-ison (an algebraic operator) is performed. The comparison inplicitly calls the dereferencingoperator. This leads to the following states of the result stack:After eval(SAL):i3i1i5i9 After eval(1800):1800i3i1i5i9 Afterdereferencing:18002500i1i5i9 After comparison:TRUEi1i5i9Since the predicate returns TRUE, the row < i1 > is included into the result.The same action is repeated for the rows < i5 >, for which the predicate is also TRUE,and for < i9 >, for which the predicate returns FALSE; thus this row is not included intothe result. At the end, upon exit from the for each, the environment stack will be the sameas at the beginning, and QRES will be the following:i1i5De�ning joins. Using the product and the selection, we can de�ne joins. Consider thequery (DEPT �EMP ) where DNO = EDNO29

As described above, the query DEPT � EMP leaves in QRES a two-column table withidenti�ers of DEPT and EMP objects, being the cartesian product of single-column tablesreturned by queries DEPT and EMP . A stream in the evaluation of the where clause willpush onto ES nested(< i; j >) for one such pair, then evaluate the condition. The topsection now contains identi�ers of objects named DNO, ..., LOC (many), MANAGER,and EMPLOY S (many) taken from an object of DEPT , and identi�ers of objects namedENO, NAME, ... , EDNO, PREV JOB (many) and WORKS IN taken from an objectof EMP . The condition DNO = EDNO is indeed evaluated on attributes taken from twoobjects, for each combination of objects DEPT and EMP . The �nal result is a set of pairsof identi�ers, one from each collection, that identify pairs of objects for which the conditionis true.4.7 Projection, Navigation, Path ExpressionsWe need the ability to retrieve subobjects of the already identi�ed objects, identifying thesubobjects by their names. As usual, this is provided by the `dot' operator. When used morethan once, it allows navigation in an object graph (e.g. path expressions). Our de�nitionis more general than typical de�nitions. The syntax is q1:q2, where q1 and q2 are atomicor complex queries. Let t denote the union of bags. The de�nition of the semantics is thefollowing:procedure eval( query: string);begin...if query is recognized as q1:q2 thenbeginvar RESULT : Table;RESULT := ;;eval(q1);for each r 2 top(QRES) dobeginpush(ES; nested(r)); (* Open a new scope on ES *)eval(q2);RESULT := RESULT t top(QRES);pop(QRES); (* Cancel the result of q2 *)pop(ES); (* Restore the previous state of ES *)end;pop(QRES); (* Cancel the result of q1 *)push(QRES;RESULT );endelse ...end (*eval*);Compare this de�nition with the de�nition for q where p and note that the only di�er-ence concerns how the �nal result is created. For the dot operator the result is the union of30

all tables returned by q2.ExamplesEMP:SALThe query returns a single-column table of identi�ers of value objects with the name SALnested in EMP objects.(EMP where (NAME = "Smith"):WORKS INThe query returns a table of identi�ers of WORKS IN objects nested in the EMP objectsthat have the name Smiths. Note that this combines selection and projection. In terms ofthe environment, we start with ES containing a section with bindings for EMP objects (alsofor DEPT objects, but these are irelevant here). The evaluation of the selection transferthe identi�ers to QRES; then for each such identi�er pushes the nested identi�ers (i.e. alocal environment) onto ES, evaluates the condition,and �nally leaves in QRES only theidenti�ers for which the condition is true. The evaluation of the `dot' is similar, but it leavesin QRES the set of identi�ers of WORKS IN objects nested in the identi�ers left by theselection in QRES.EMP where SAL > ((EMP where NAME = "Smith"):SAL)This query combines nested where clauses with `dot'. It expresses the query \List employ-ees earning more than Smith". It is of interest to examine its evaluation in detail, since itcontains two occurrences of each of EMP and SAL, and it is illuminating to see how theyget the right bindings.Let us denote the condition by SAL > q2, so q2 refers to the subquery that retrievesSmith's salary. Now, when eval is called �rst, it calls eval(EMP ). Since ES initiallycontains the bindings for the three employees (as shown above), the following table is put inQRES. i1i5i9Now eval evaluates the condition for each i in QRES, and if true leaves this i in the result.Let us see what it does for i1. It �rst pushes nested(i1) onto ES, then evaluates the condition.The top of ES now containsi2(NAME); i3(SAL); i4(WORKS IN)Hence, evaluation of SAL pushes i3 onto QRES. Then eval is called for q2. According tothe scoping rules, the evaluation of the atomic query EMP will cause the top-down searchin ES, which again succeeds in the bottom section of ES, and will therefore leave in QRESa table with the three employees. The ES stack is not changed. eval(NAME = "Smith")is again called for the three employees, with the ES stack containing three elements, forexample: 31

i6(NAME); i7(SAL); i8(WORKS IN)i2(NAME); i3(SAL); i4(WORKS IN)i1(EMP ); i5(EMP ); i9(EMP ); i13(DEPT ); i17(DEPT )In this case the evaluation of the atomic query NAME will return i6. After the for eachloop, implied by the second where operator, the ES stack will be reduced to two elements,and the state of the result stack will be the following:i5 The result of evaluation of EMP where (NAME = "Smith")i3 The result of evaluation of the �rst occurence of SALi1i5 The result of evaluation of the �rst EMPi9The 'dot` operator initializes a loop through elements of the top table of QRES; thistime the table contains one element. Again, the ES stack will contain three elements, asshown above. Binding of the second occurence of SAL will be accomplished at the top stacksection, returning i7. Thus, before and after evaluation of the second occurence of SAL,after dereferencing, and after comparison, the state of QRES will be changed as follows:i5i3i1i5i9 i7i3i1i5i9 20002500i1i5i9 TRUEi1i5i9Thus, i1, the current employee, will be included in the �nal result of the query.(EMP where NAME = "Smith"):WORKS IN:DEPT:MANAGER:EMP:NAMEreturns identi�ers of value objects with name NAME nested within the EMP objects de-scribing a Smith's manager. (See below for a discussion of why such a long path expressionis necessary.)(EMP where NAME = "Smith"):WORKS IN:DEPT:EMPLOY S:EMP:(NAME � JOB)returns a two-column table, where each column contains identi�ers of objects NAME andJOB for each employee working in Smith's department;(EMP where WORKS IN:DEPT:MANAGER:EMP:NAME = "Brown"):NAMEreturns identi�ers of NAME for employees managed by Brown.Nested dot notation, or path expressions, for addressing deeply nested components ofcomplex objects or for navigation between objects is introduced in many proposals and32

systems, for example, FAD [BBKV87], Exodus [CDV88], O2 [Deux+90], Orion [KGBW90],Taxis [MBW80], Postgres [SRH90], Iris [WLH90] 6 and GEM [Zani83]. In [KKS92] pathexpressions, equipped with sophisticated additional features, are the basis of the functionalityof the XSQL language. In our proposal path expressions are a\side e�ect": in fact, the dotoperator is binary.In the proposal for O2 [ClDe92] navigation through pointer-objects is slightly di�erentthan ours: if an attribute is pointer-valued, then attributes of the object pointed to can bedirectly applied. Thus, our query(EMP where NAME = "Smith"0):WORKS IN:DEPT:MANAGER:EMP:NAMEwould be written in their language as(EMP where NAME = "Smith"):WORKS IN:MANAGER:NAMEThe change may be caused by another storage model implicitly assumed in this language.Asssuming our storage model, we can be compatible with this idea by changing the functionnested into nested0 which behaves as nested, except that for a pointer object < i1; n; i2 >,nested0(i1) = nested(i2).The advantage is that the model is apparently more conceptual, and queries are shorter.A disadvantage concerns updating of references, which is a useful feature in real systems,see example ChangeDept in the next section. To update a pointer object, we must returnits identi�er rather than the identi�er being its value. This makes the necessity to makedistinction between the output from(EMP where NAME = "Smith"):WORKS INand from (EMP where NAME = "Smith"):WORKS IN:DEPTWe can also apply a combined solution, assuming the function nested00(r) = nested(r) [nested0(r). In this case we obtain all mentioned capabilities, but such an idea introducessome ambiguity and may be more di�cult for a typing system. We underline here that (incontrast to other formal approaches) our framework makes possible to consider such semanticdetails.4.8 Navigational JoinThe join de�ned through the cartesian product followed by a selection allows us to createpairs of identi�ers of objects that satisfy some condition. A case of particular interest is thenavigational join, where the result is, again, a set of pairs, but now the second object in eachpair is reachable by some path (i.e. by using `dot') from the �rst. For example, we mightwant to create a set of EMP and DEPT object pairs that represent the WORKS IN6Followers of functional approaches to databases may use the syntax n(q) instead of q:n, for consistencywith standard functional notation. 33

relationship. Such a query can be written by a product followed by a selection (assumingavailable some additional operators). However, it is useful to have a more direct expressionthat navigates via WORKS IN rather than uses a product. Thus we modify the de�nitionof the dot operator that traverses a link and returns both endpoints. Acording to ourde�nitional pattern for eval, we present below a de�nition which (as we will see) covers amore general case.The syntax is q1 ./ q2. Let r denote a single-row table obtained from the row r, and letsymbol denote the \horizontal" composition of bags \each row with each row" (it is anatural generalization of the cartesian product). Semantics of the construct is determinedby the following part of the eval procedure:procedure eval( query: string);begin...if query is recognized as q1 ./ q2 thenbeginvar RESULT : Table;RESULT := ;;eval(q1);for each r 2 top(QRES) dobeginpush(ES; nested(r)); (* Open a new scope on ES *)eval(q2);RESULT := RESULT t (r top(QRES));pop(QRES); (* Cancel the result of q2 *)pop(ES); (* Restore the previous state of ES *)end;pop(QRES); (* Cancel the result of q1 *)push(QRES;RESULT );endelse ...end (*eval*);For each tuple r returned by q1 we combine the tuple with each tuple returned by q2 forthis r. The result is a union of all such combinations. Note that in comparison to previousde�nitions the change concerns only how the �nal result is formed.ExamplesEMP ./ WORKS INThe query returns a two-column table, where each row contains the identi�er of a objectEMP and the identi�er of a object WORKS IN nested in the object EMP .EMP ./ (WORKS IN:DEPT ) 34

returns a two-column table what we have wanted at the beginning; the operator ./, however,is fairly general and covers many other interesting cases. Note that in the above exampleduring binding of DEPT the environment stack will contain three elements, and DEPTobject(s) are pointed from the �rst and third section of the stack; the name DEPT is boundto a single object DEPT pointed from the third stack section.EMP ./ (DEPT where EDNO = DNO)is another (relational) variant of the previous example.EMP ./ (WORKS IN:DEPT:(DNAME � LOC))returns a three column table, where identi�ers of EMP are associated with identi�ers ofDNAME and LOC nested in proper DEPT . Note that the cartesian product acts on a1 � 1 table and a single-column table of identi�ers. Identi�ers in the �rst two columns maybe repeated; this is caused by a repetition of LOC.DEPT ./ avg(EMPLOY S:EMP:SAL)returns a two-column table, where each row associates an identi�er of DEPT with the num-ber being the average salary in this department. In SQL this query requires the group byoperator; our de�nitons allow us to avoid it (as well as having predicates).4.9 Quanti�ersThe �rst idea to deal with quanti�ers is to consider them as aggregate functions: an exis-tential quanti�er is a generalized or operator, and the universal quanti�er is a generalizedand. For example, the query \Is it true that each employee earns more than 1500?" canbe expressed as 8(EMP:(SAL > 1500)). The query EMP:(SAL > 1500)) returns a single-column table of truth values, which can be processed by 8, 9, or another quanti�er. Thisidea can be generalized as pump [BBKV87], i.e. a higher-level polymorphic operator, whichis an encapsulated iteration taking a function and a base value as arguments.We follow here another idea which is syntactically more close to the traditional quanti�erconcept in the predicate calculus, and follows our de�nitional style. The syntax is 8q(p) and9q(p), where q returns a table and p returns a boolean value. Then the semantics for 8q(p)is de�ned as:procedure eval( query: string);begin...if query is recognized as 8q(p) thenbeginvar RESULT : Boolean;RESULT := TRUE;eval(q);for each r 2 top(QRES) dobeginpush(ES; nested(r)); (* Open a new scope on ES *)35

eval(p);RESULT := RESULT ^ top(QRES);pop(QRES); (* Cancel the result of p *)pop(ES); (* Restore the previous state of ES *)end;pop(QRES); (* Cancel the result of q *)push(QRES;RESULT );endelse ...end (*eval*);The de�nition for 9q(p) can be obtained from the above by changing two lines: theRESULT variable should be initialized to FALSE, and the line collecting the �nal resultshould be RESULT := RESULT _ top(QRES).ExamplesGive departments where all programmers used to work for IBM:DEPT where 8 ((EMPLOY S:EMP ) where JOB = "programmer")( 9 PREV JOB (COMPANY = "IBM"))Consider the supplier-part database with the schemaSUPP (SNO;SNAME; :::) PART (PNO;PNAME; :::) SP (SPSNO;SPPNO; :::).Give names of suppliers supplying all parts:(SUPP where 8 PART ( 9 SP (PNO = SPPNO and SNO = SPSNO))):SNAMEWe see here some limitations. Since quanti�ers are not associated with (bounded) vari-ables, many predicate calculus queries are impossible to express. Moreover, if the relationSP would be de�ned as SP (SNO;PNO; :::), we would have the con ict between names ofattributes. The next sub-section addresses this problem.4.10 Bounded Variables, "Correlation\ Variables, SynonymsIn this sub-section we would like to investigate the concept of variable inherited by QLsfrom the predicate calculus. Although in mathematics the concept is semantically clear,this is not so in computer languages. If any name is introduced in a computer language, itraises the binding problem: what kind of data structures it implies, how these structures aremanipulated, and how the name will be bound to them. A query language implemented inDBPL [ScMa92], based on the predicate calculus, presents an excellent example of this kindof problems. For the constructFOR EACH x IN EMP : x:JOB = "clerk" DO x:SAL := 3000; END;the calculus variable x becomes a mutable programming object, allowing to make updating,as shown above. The variable has an untypical \copy" semantics: it stores a copy of a tuple,36

which at the end of a loop is ushed to the original relation. This semantics has consequenceswhich are going far behind the meaning assumed in the relational calculus.Auxiliary names much increase the selective power. The possibility of naming structuresto be processed (or parts of queries) supports also the level of abstraction and conceptualprogramming. Auxiliary names are associated not only with quanti�ers. In SQL, if therelation is to be joined with itself, we must use \correlation variables" or \synonyms" becauseof the name con ict. The problem is more complex if we would like to apply such variablesto any queries (not only to stored bulk data), we would like to make updating through thempossible (as in DBPL), and simultaneously we would like to avoid the copy semantics (whichleads to undesirable e�ects).The problem has many solutions, having various properties and consequences. One ofthem (the presented above DBPL case) is based on the observation that we already havevariables (i.e. named storage objects), and they can be adopted as \calculus" variables.This approach was also experimentally implemented in a predecessor of LOQIS (with pointersemantics), then we abandoned it; in this paper we do not discuss its negative consequences.We invented and implemented another method, which seems to be free of disadvantages.We adopted the idea that an auxiliary name temporarily \overwrites" the original objectname. The new name should be valid only in some context determined by a query; outsidethis context the auxiliary name should have no meaning. As we can expect, the aboveassumption leads to scoping and binding issues. Since a new name is locally associatedwith an object, the name should be the propery of the environment stack rather than theproperty of storage objects. So far, we did not assume that ES involves names, thus wehave to change this assumption. Moreover, we must smoothly incorporate this novelty intothe semantic de�nitions that we presented so far.We have done some e�ort to rise the idea to reasonable generality, which allows us (as wewill see in further examples) to achieve additional interesting e�ects. As usual, we distinguishtwo kinds of occurences of auxiliary names in a query: declaration and application. Thesyntax for the declaration is n 2 q, where n 2 N , q is an arbitrary query. The name ncan be applied in a query after the declaration, and its scope is syntactically unlimited (thescope is a semantic property).The formal semantics is very simple. The construct n 2 q formally associates name nto each row r of the table produced by q; the new table contains elements which we willdenote n(r) 7. Such a table is considered as single-column. To be consistent with previousde�nitions we make improvements to the function nested and to the binding:� nested(< row >): Atomic values and identi�ers occuring in the < row > are treatedas previous. For elements of the form n(e1 e2 ::: ek) nested is the identity function.This means that such elements are copied to the environment stack without changes.� The binding operator: it works according to previous principles and scoping rules, butbinding name n to an element n(e1 e2 ::: ek) on the environment stack returns to theresult stack < e1 e2 ::: ek >; name n is not propagated to the result.7Semantics of 2 is essentially di�erent from the classical, but we leave this symbol because of sometradition in QLs and associations concerning how to use it.37

Such semantics of auxiliary variables has also consequences, which are going beyond themeaning assumed in the predicate calculus. Some untypical consequences of this semanticswe will employ in examples of transitive closures and views. We illustrate this feature byexamples.ExamplesC.f. MyDatabase. Consider the query x 2 EMPAs before, the atomic query EMP returns a single-column table with identi�ers i1; i5; i9.The operator 2 associates with each identi�er the name x; in the result, QRES will containthe following table: x(i1)x(i5)x(i9)Consider the query (x 2 EMP ) where (x:SAL > 1800)The table presented above is processed by the operator where, followed by the predicate(x:SAL) > 1800. According to the semantic rule for where, for each element of the table wehave to create a new scope. For the �rst element x(i1) holds: nested(x(i1)) = x(i1), hencethe element without changes is writen at the top of ES. Thus the predicate is evaluatedwith the following environment stack:x(i1)i1(EMP ); i5(EMP ); i9(EMP ); i13(DEPT ); i17(DEPT )Name x occuring in the predicate is bound to the top element of the stack. In e�ect, theelement is written back to QRES, but this time the name x associated with this element iscut o�. After the binding the state of QRES is the following:i1x(i1)x(i5)x(i9)The top table of the stack, consisting of one element i1, is now processed by the dot oper-ator, followed by the atomic query SAL. According to the semantics of the dot operator,nested(i1) is pushed at the top of ES; its state will be the following:i2(NAME); i3(SAL); i4(WORKS IN)x(i1)i1(EMP ); i5(EMP ); i9(EMP ); i13(DEPT ); i17(DEPT )38

Now the name SAL is bound as usual, returning i3 to QRES. Below we present statesof QRES till the end of evaluation:After eval(x:SAL):i3x(i1)x(i5)x(i9) After eval(1800)and dereferencing:18002500x(i1)x(i5)x(i9) After comparison:TRUEx(i1)x(i5)x(i9)x(i1) is accepted. The �nal resultof the whole query:x(i1)x(i5)Note that in comparison to the previously analysed query EMP where SAL > 1800 the�nal result is a little bit di�erent: each element of the result table is equipped with name x.The reader can check that the query((x 2 EMP ) where x:SAL > 1800):xremoves x from the above �nal result, thus it will be exactly the same as in the previous case.C.f. the running example. Give names and department names for employees earning morethan their manager:(a) The relational model, the tuple-calculus (QUEL and SQL) style:(((e 2 EMP )� (m 2 EMP )� (d 2 DEPT ))where e:EDNO = d:DNO and d:MGR = m:ENO and d:SAL > m:SAL):(e:NAME � d:DNAME)(b) The relational model, the domain-calculus style:((EMP:((ed 2 EDNO) � (en 2 NAME)� (es 2 SAL))�EMP:((m 2 ENO) � (ms 2 SAL))�DEPT:((d 2 DNO) � (dn 2 DNAME) � (dm 2MGR)))where ed = d and m = dm and es > ms):(en� dn)Consider the supplier-part database. Give suppliers supplying all parts (the DBPL style):(s 2 SUPP ) where 8(p 2 PART )( 9(q 2 SP ) (s:SNO = q:SNO and q:PNO = p:PNO))Note that in the last example variables p and q are \bound" i.e. they do not occur anywhereafter the query is evaluated, but s is \unbound", i.e. it occurs in the result returned by thequery. Consequences, in particular for the for each construct, will be illustrated in the nextsection. (End of examples.)Now a table on the result stack may contain not only elements of I[V , but also elementsof the form n(:::), where n 2 N . Because of the assumed orthogonality of operators, thereis no reason to forbid cartesian products of such tables. Hence we can obtain rows that mix39

elements of I, V and n(:::). Again, to such rows the operator 2 can be applied, what means,that the result stack can contain elements e.g. of the form n1(i1; v1; n2(:::)). Extending thisway of thinking, we must allow as elements of structures stored at both stack arbitrarilynested labelled lists. In particular, the classical concept of a tuple < a1 : v1; :::; ak : vk >belonging to a relation named r can be represented as r(a1(v1):::ak(vk)). In this unusual waywe have came to the concept of complex value which can be manipulated directly on stacks.During the development of LOQIS we considered other extensions to semantic domains, inparticular the case when an auxiliary name is assigned and bound to a whole table (not onlyto a row). This extension allows us to consider grouping and nested relations, and there areexamples when such queries are reasonable. However, we also have to achieve some trade-o�between the complexity of the language and its universality, because too complex semanticsis not accepted by users. For this reason programming languages, as a rule, restrict the classof elements which can be manipulated on stacks; nevertheless, they are su�ciently universalbecause of other capabilities.4.11 Transitive ClosuresThe transitive closure makes possible to process recursive data structures and to encapsulatesome non-trivial iterations; thus much extends the power of QLs [AhUl79]. There are twoapproaches when de�ning the operator. The �rst one assumes that the relation to be closed isexplicitly stored as a permanent or temporary table. Since the closure of it can be expensive,many papers are devoted to e�cient algorithms. The second approach concerns how to makethe operator computationally powerful. Examples of tasks requiring such a transitive closureconcept are the following: (1) for a data structure describing parts-subparts, which containsinformation about quantities of subparts and weights of atomic parts, get the total weight ofa given complex part; (2) calculate the least �xed-point of some numerical equation x = f(x);(3) de�ne the aggregate function sum; (4) calculate the shortest path in a graph; etc. Thisconcept of the transitive closure assumes that the relation to be closed is implicitly de�nedby some complex expressions; it may happen that the relation cannot be physically stored,since it is too large or in�nite.Below we follow the second approach (which does not exclude the �rst one, if the relationcan be e�ciently stored in extenso). The transitive closure \explodes" a set of initial elementsaccording to some relation r. An element b is inserted into the set if there is already anelement a in the set, such that < a; b >2 r. Both initial elements and the relation canbe determined by queries. Let q1 be a query determining the initial set. The elementscollected in the �rst step of the explosion can be determined by a query q1:q2. Query q2navigates from the initial elements to their direct successors in the closure. Analogously, aquery q1:q2:q2 determines elements collected in the second step of the explosion, q1:q2:q2:q2determines elements collected in the third step, and so on. We can therefore represent thetransitive closure as an in�nite unionq1 [ q1:q2 [ q1:q2:q2 [ q1:q2:q2:q2 [ :::This union is denoted by q1 closed by q2. The semantics follows our standard method,through a modi�cation of the de�nition for the dot operator. We remind that q1:q2 repeats40

the evaluation of q2 for each row returned by q1. The same does q1 closed by q2, but theresult of evaluation of q2 is not added to the temporary table, but to the table returned byq1. In this way rows returned by q2 will be further processed as the original rows of q1. Theprocess terminates when for the last row of the processed table the table returned by q2 isempty.The de�nition semantics is the following:procedure eval( query: string);begin...if query is recognized as q1 closed by q2 thenbeginvar NEXTSTEP : Table;for each r 2 top(QRES) dobeginpush(ES; nested(r)); (* Open a new scope on ES *)eval(q2);NEXTSTEP := top(QRES); (* Store the result of q2 *)pop(QRES); (* Cancel the result of q2 *)top(QRES) := top(QRES) tNEXTSTEP ;(* Update the table returned by q1 *)pop(ES); (* Restore the previous state of ES *)end;endelse ...end (*eval*);ExamplesWe take the example from [AtBu87]. The database schema is the following (or denotesexclusive variants):repeating Part( Name(string)( Base( Mass(real) :::)orComposite(repeating MadeFrom( Uses(" Part)Quantity(integer):::))))The database contains information on base and composite parts. All parts have theattribute Name. A base parts has additionally the attribute Mass, and a composite parthas the information about direct components: they are determined as a collection of pairs< p; q >, where p is a pointer to a component, and q is the quantity of the component inthe part. 41

Get all parts recursively composing a part named \engine":Part where (Name = "engine") closed by (Composite:MadeFrom:Uses:Part)Compute the total mass of the engine:sum((((Part where Name = "engine")� (q 2 1)) closed byComposite:MadeFrom:(Uses:Part� (q 2 (q ? Quantity)))):((Composite:0 [ Base:Mass) ? q))The sub-query from the 2-nd line returns a 2-column table consisting of one row, wherethe �rst element is an identi�er of the required part, and the second is the quantity of thispart denoted q and equal 1. In the 3-rd line we navigate from the part to its sub-parts,counting simultaneously their quantities. This process is repeated for each sub-part, thusin the result we obtain a two-column table with identi�ers of all parts participating in the\engine", together with their quantities. In the 4-th line each row of this table is projectedto the multiplication of weight and quantity; the result is a single-column table of numbers.The aggregate function sum counts the total sum of these numbers.Assume that query q returns a number. Give a query counting pq, according to the �xed-point equation x = (q=x + x)=2, starting from x = 1 and making 15 iterations.((((x 2 1)� (c 2 1)) closed by(((x 2 ((q=x+ x)=2)) � (c 2 (c + 1))) where c � 15))where c = 15): xThe auxiliary name c represents the counter of iterations. Constructs such as c 2 (c + 1)remind assignments from programming languages, but their semantics is di�erent. The right-hand c refers to an actually processed (existing) row, and the left-hand c participates in theconstruction of a new row.This variant of the transitive closure we implemented in LOQIS, thus the above examplesand many others were carefully checked. After the implementation we realized that the op-erator is insu�cient for some tasks; thus we implemented other variants. The last exampleshows the case when only the last produced row is essential for the next step of the closure;intermediate rows can be removed \on-the- y", which results in shorter queries and betterperformance. Thus we introduced the syntactic variant q1 leaves by q2. The least examplecan be formulated as follows:(((x 2 1)� (c 2 1)) leaves by(((x 2 ((q=x+ x)=2)) � (c 2 (c + 1))) where c � 15)): xNext problem concerns duplicate rows. In the example with parts they should not42

be removed, but in examples where the graph to be closed contains cycles they must beremoved; otherwise the process would never terminate. Theoretically, we can apply thefunction distinct removing duplicates, but this leads to unsafe computations. For exampledistinct(1 closed by 2) should return a table with 1 and 2, but the evaluation will neverterminate. Hence we introduced a special syntax for the case when duplicates should beremoved \on-the- y".4.12 OrderingIn the relational model ordering of data is not considered a conceptual issue. In contrast,object-oriented systems, (e.g. O2 [Deux+90]) and DBPLs (e.g. Galileo [ACO85]) deal withordering (introducing lists or sequences). Ordering is an extremely important in real systems:benchmarks for typical data processing systems have shown that more than half of theperformance time is devoted to sorting. In the proposal of DBTG CODASYL [CODA71]there was a possibility to keep simultaneously several orders of the same record type (viaspecial \sets"), which may greatly support performance.Most QLs features are independent on data ordering, and for relational database theoriesordering is an inconvenient feature. This is perhaps the reason for the popular belief thatordering is not a conceptual issue and is necessary only for forming the �nal result. Ordering,however, is important either for the data modelling, performance, visualization of the output,and for querying. Many reasonable queries require ordering, e.g. \Give departments whereall 50 best-paid employees are clerks". The query is easy to formulate if a QL would containthe ordering operator and an operator allowing to select �rst n rows from a table.Taking and generalizing the ordering concept of SQL and QUEL, we introduce the opera-tor order by by a modi�cation of the operator ./. Assume syntax q1 order by q2. Semantically,we make the join of q1 and q2, sort the result according to columns produced by q2, and thenproject onto columns produced by q1. Let col nbr(q) denotes the number of columns of thetable returned by the query q. The de�nition of semantics of q1 order by q2 is the following:procedure eval( query: string);begin...if query is recognized as q1 order by q2 thenbegineval(q1 ./ q2);sort the top(QRES) table according to col nbr(q2) last columns;remove from top(QRES) col nbr(q2) last columns;endelse ...end (*eval*);ExamplesEMP order by NAME 43

DEPT order by count(EMPLOY S)Let the function first(n : integer; t : Table) : Table return n �rst rows from the tablet. \Get departments where all 50 best-paid employees are clerks" can be formulated as thefollowing query:DEPT where 8 (first(50; EMPLOY S:EMP order by (�SAL)))(JOB = "clerk")In [Ott92] similar capabilities are proposed for SQL. In LOQIS the order by operator issupported by another facility: a table returned by a query is equipped with a standardadditional column storing elements of the form number(< row nbr >), where number is anauxiliary name, and < row nbr > is a successive row number, starting from 1. (The columnis virtual, it is not physically stored.) This feature appears to be convenient for low-leveloperations on tables. In a combination with the sorting and transitive closures it seemsto be more powerful than the pump operator of FAD [BBKV87] and the hom operator ofMachiavelli [OBB89]; in particular, we can show that it can be used to de�ne all popularaggregate functions.ExamplesGive the median of salaries (cannot be expressed by pump or hom).((EMP order by SAL) where number = entier(count(EMP )=2)):SALIn statistics we frequently need to remove extreme observations. Give the average salary,removing 5 lowest and 5 highest salaries (cannot be expressed by pump and hom):avg(((EMP order by SAL) where number > 5 ^ number � count(EMP ) � 5):SAL)Ordering implies a change in understanding of database instances and the output fromqueries. Previously we have assumed that sub-objects of a complex object form a set. Nowwe must assume that they may form a sequence, or a set of sequences. The also concernsresults of queries stored at QRES.Since sequences are the most informative (i.e. they conceptually cover bags and sets)they could be choosen as a single structure from the above three. Such an idea simpli�esthe semantic problem, but it has disadvantages. Many collections of real objects behave assets or bags; thus the system would not support this aspect of the conceptual modelling.Another disadvantage concerns query optimization: some methods do not work if we assumethat the order of tuples in the result must be preserved.4.13 Null Values and VariantsNull values and variants imply problems for QLs. The �rst concerns the necessity of specialcare in queries. Consider the query EMP where SAL > 1800. If for some employee SAL isnull-valued, then binding of name SAL returns the empty table, which has to be compared44

by the comparison > with 1800. Both results of the comparison, TRUE and FALSE, areunacceptable. On the other side, the situation is quite normal, thus it should be possible toavoid a run-time error. Special approaches were proposed in connection to this problem, seefor example [Codd79, Zani83]. In SQL such cases are handled by operators is [not ]null:EMP where is not null(SAL) and SAL > 1800EMP where is null(SAL) or SAL > 1800The �rst query does not include employees with the null-valued salary in the output, andthe second does. In both cases boolean operators and; or have a special semantics (corre-sponding to Ada operators \and then" and \or else") avoiding redundant evaluation, thusthe comparison of the empty table with the numerical value will never occur.Since null values are coded by the write-nothing rule, we have already possibilities toavoid special operators acting on null-values or a special many-valued logic. Equivalents ofthe above two queries can be expressed by quanti�ers:EMP where 9(s 2 SAL)(s > 1800)EMP where 8(s 2 SAL)(s > 1800)or by the function exists:EMP where exists(SAL) and SAL > 1800EMP where not exists(SAL) or SAL > 1800An example of treatment of variants is shown previously in the query \The total weight ofthe part". Thus the problem is not conceptually challenging. Note that our treatment ofnull values within aggregate functions arguments (null values do not in uence the result) isthe same as in SQL. The useful SQL function ifnull(q; "when empty") can be expressed asq [ ( "when empty" where (not exists(q))). The second problem connected with null values and variants is more serious and concernsbindings and scoping rules. Consider the query \Get employees earning the same salary asBrown does":EMP where SAL = ((EMP where NAME = "Brown"):SAL)What will happen if the Brown's salary is null-valued? Intuitively the query should cause arun-time error, but let follow the formal semantics. During binding of the second occurenceof symbol SAL the environment stack will be the following:Identi�ers of objects ENO, NAME, JOB,...contained in the Brown's EMP objectIdenti�ers of objects ENO, NAME, SAL, JOB,...contained in the actually tested EMP objectIdenti�ers of objects DEPT and EMP45

Since the top does not contain a pointer to a SAL object, the search will be continued inlower sections of the stack. Unfortunately, the second section contains the pointer to SAL,thus the binding will be successful. In the result, the predicate after the �rst where will beTRUE for any EMP having a SAL sub-object; obviously, it is a wrong result.The problem is caused by scoping rules. The search should be �nished at the top of thestack, but the model contains no information what is the semantic quali�cation of the secondoccurence of the name SAL in the query. This information should be explicitly given, andshould in uence the scoping rules. The information can be introduced by types. Names ofsubobjects of an object (possibly actually not present in the object) can be deduced fromthe type.Hence some elements of types and the static binding cannot be avoided in the presentedapproach. This modi�cation of the binding mechanism has also a performance advantage.For each name occuring in a query we can statically determine the section of the environmentstack where the binding has to be done; thus the dynamic search down the stack can beavoided. In the following we employ this property for query optimization.The same considerations can be repeated for auxiliary variables.5 Procedural ConstructsMany PLs constructs can be adopted to extend the power of a QL: data creation, assign-ments, insertions, deletions, control commands (if...then...else, for, while, repeat, case, etc.).Variants of them can be found in DBPL and LOQIS. Here we present two constructs whichare important for the procedural many-data-at-a-time processing, namely, assignments (up-dating) and for each.5.1 AssignmentsThe semantics of assignments l := r in programming languages assume that the left-handside l is evaluated to an identi�er (l-value), and the right-hand side r is evaluated to a value(r-value); then the operator assigns the value to the object with this identi�er. The problemis slightly more complicated if the language deals with pointers and complex values.This syntax can be extended to QLs. We can discuss three methods for such an extension:� The APL method: l and r are vectors of equal size, and i-th value of r is assigned tothe object identi�ed by the i-th identi�er of l.� The DBPL method: l and r are sets of identi�ers and values, respectively; all datapointed by l are deleted and then a new collection of data is created with valuesdetermined by r.� The SQL method: �rst, a query returns tuples determining a context for updating.Then, l and r concern updating of attributes for each tuple inside this context.The APL and DBPL methods have disadvantages. The APL method relies on dataorder, which in many cases is irrelevant. Both APL and DBPL cause syntactic redundancy;46

for example, the assignment \Rise by 100 the salary of all suppliers working in the Toydepartment" must be coded as((DEPT where DNAME = "Toy"):EMPLOY S:EMP where JOB ="supplier"):SAL :=((DEPT where DNAME = "Toy"):EMPLOY S:EMP where JOB ="supplier"):(SAL + 100)For the APL method we must sometimes predict the size of tables, what is harmfull. Forexample, \Give salary 5000 for all clerks" cannot be formulated as(EMP where JOB = "clerk"):SAL := (5000 t 5000 t 5000 t :::)Similarly, if the assignment concerns di�erent tables or null-valued data, it may be hardto assure the equal size of tables. The DBPL method makes di�culties with the object-orientation, assuming invariant identities for objects during their life.Below we generalize the update statement of SQL, which avoids these disadvantages. As-sume the syntax := q, where q is a query returning a two-column table; the �rst columnstores identi�ers (it corresponds to the l � value) and the second column stores values (itcorresponds to the r � value). In consequence, two last examples can be written as:= ((DEPT where (DNAME = "Toy")):EMPLOY S:EMP where (JOB = "supplier")):(SAL� (SAL+ 100))and:= (EMP where (JOB = "clerk")):(SAL� 5000)To incerease readability, in the typical 1:1 case we will also use the traditional syntax.The assignment can concern complex objects. The naturally extended semantics assumesdeleting sub-objects of a object pointed by the left-hand side, and then copying \into" it sub-objects determined by the right-hand side. There is the necessity to distinguish syntacticallythe assignments of complex values and the assignment of pointers. This distinction is usuallydetermined by types. Since so far our semantic framework is untyped, we use ad hoc syntax::= pointer means assignment of a pointer. For example,:= pointer Y � (EMP where NAME = "Smith")means that the identi�er of the Smith's object is assigned as a value of the object Y .5.2 For each statementsStatements for each can be introduced into the language, with the syntaxfor each q do swhere s is a statement or a sequence of statements, enclosed in the parentheses begin andend. The semantics will follow our de�nitional pattern, in which s is iteratively executed in47

the environments determined by tuples returned by q. Similarly to eval, execute(s : string)is a recursive procedure with side e�ects on the state (a database instance, ES, and QRES),which determines the semantics of s (i.e. it \executes" s). The semantics of the construct isthe following:procedure execute( statement: string);begin...if statement is recognized as for each q do s thenbegineval(q);for each r 2 top(QRES) dobeginpush(ES; nested(r)); (* Open a new scope on ES *)execute(s);pop(ES); (* Restore the previous state of ES *)end;pop(QRES); (* Cancel the result of q *)endelse ...end (*execute*);We show on examples that this de�nition is quite powerful and free of disadvantages,such a the copy semantics, non-orthogonality and limited capabilities.Examplesfor each EMP where JOB = "clerk" dobegin SAL := SAL + 100; newline; print(NAME); endfor each (x 2 (2 t 3 t 5 t 7)) ./ (y 2 sqrt(x)) dobegin print(x� y); newline; endfor each DEPT ./ (MANAGER:EMP ) dobegin print( "Department :"�DNAME� "Manager :"�NAME); newline; endfor each (s 2 SUPP ) where 8(p 2 PART )( 9(q 2 SP ) (s:SNO = q:SNO and q:PNO =p:PNO)) dos:SAL := s:SAL+ 100;The last example resembles DBPL, but the semantics of s is essentially di�erent: a copy ofa SUPP object is not made. This makes possible to avoid other anomalies.48

5.3 ProceduresIn the database domain views and database procedures have a special conceptual and prag-matic meaning: views are understood as virtual data derived from stored data (usuallydetermined by a query), and database procedures are understood as stored procedures writ-ten in a QL. There are many other terms denoting similar concepts, for example, virtual(derived) attributes, methods, rules, selectors, constructors, etc. Essentially all these con-cepts can be considered particular cases of a well-known concept of procedure. Unfortunately,this analogy is not well recognized in the database domain. The approach to database pro-cedures and views from this side can be very fruitfull, since programming procedures havewell-established auxiliary concepts, theory, and the implementation state-of-the-art, see e.g.[WaGo84].Procedures, especially in the context of complex data, pointer-valued data, and declara-tive QLs present variety of ideas. Before �xing our proposal, we discuss some possibilities.Semantics of procedure calls. Historically, the earliest semantics of procedure calls wasbased on the textual substitution: when the program control has reached the procedurecall, it is textually substituted by the procedure body, with simultaneous substitution offormal parameters by actual parameters. (This is retained in some languages, e.g. C, inmacros.) Because of obvious disadvantages this semantics has been abandoned. We mentionit only for one reason: processing and optimization of queries involving views (so called querymodi�cation or rewriting techniques) are essentially based on this technique.The most popular semantics is stack based. It means that local environments and actualparameters are stacked at the environment stack, what allows to introduce recursive proce-dures and - through scoping rules and static binding - supports locality of identi�ers andgood performance.Recently, in connection with logic programming, another kind of semantics is considered,called �xed-point semantics. A procedure name p denotes a set of mathematical objectssatisfying a system of �xed-point equations and constraints. The �xed-point semantics canbe considered for any expression-oriented language, in particular, to a QL from the class weare dealing with 8. The idea is accomplished in constructors of DBPL [ScMa92, ERMS91] andin other languages integrating procedural and declarative programming [Mant91, HFLP89].Consistent integration of this idea with locality of objects, with procedural constructs (forupdating, programming interactive scenarios, etc.), and with �ne programming abstractionsmay result in a higher programming comfort, power and conceptual modelling support. Sofar, the idea leads to problems, e.g. how to assure good performance, safe computations,and pragmatic universality.We follow the classical stack-based semantics. For a particular case when there is nolocal environments, this semantics is equivalent to the textual substitution.Local environments and scoping rules. A procedure can introduce local objects, whichare removed when the the procedure is terminated. As shown in Fig.3 this can be consistenty8Ullman claims that this kind of semantics can be consistently introduced only for logic-based (value-oriented) languages, see e.g. [Ullm91]; the claim is the subject of some critics in the database literature.49

implemented by storing the objects in the independent pool and stacking at ES only theiridenti�ers. The stacking supports nested procedure calls and recursion.Parameter transmission. There are several semantically di�erent methods of dealingwith parameters. We mention call-by-name, call-by-value, call-by-reference, and call-by-need. The call-by-name is an old method discussed in the context of Algol-60, is essentiallya technique of macro-substitution, where formal parameters inside a procedure body aretextually substituted by the actual parameters. As in the case of semantics of procedure calls,we mention this method only because of query optimization based on rewriting. In the Pascalfamily two techniques are applied: call-by-value, where values of parameters are evaluatedbefore passed to the procedure body, and call-by-reference, where the parameter after theevaluation is a reference. The call-by-need method (called sometimes lazy evaluation) is atechnique which postpones evaluation of a parameter to the moment when it is necessary inthe body, preventing sometimes from unnecessary computations.An interesting approach to parameter passing is implemented in INGRES/Windows 4GL[Ingr90]. We call it call� by�union. Parameters are local objects of a procedure, visible incall statements. Thus parameter passing is a normal assignment to these objects. The callstatement has therefore an untypical scope: it is the union of the calling environment andsome objects from the local environment of the called procedure. In the context of QLs thisapproach has many advantages. In particular makes clean situation when parameters arecomplex structures, does not require new syntax and semantic models for the passing of pa-rameters makes programs to be more readable, supports conceptual modelling (since namesof \formal parameters" are written together with their values), allows to avoid ordering ofparameters, and allows to avoid writing actual parameters which values are inessential for aparticular call (it deals with variable number of actual parameters). The method introducesa change to typing systems, which are traditionally based on the functional view on theprocedure concept.In LOQIS we use queries as parameters of procedures; as in Algol-68 we assumed thestrict call-by-value technique, where references are values. To avoid syntactic distinction of\mutable" and \immutable" parameter's elements we follow the idea of lazy application ofthe dereferencing operator. This makes updating through parameters possible. Parametersare treated as local constants. They are evaluated on QRES, then shipped to a special stack,di�erent from ES and QRES. The assumption that columns of tables returned by queriesare unnamed is sometimes inconvenient; we provided a special construct which makes pos-sible to name the columns, as does the operator 2 described above.Side e�ects of functional procedures. In all popular programming languages the outputfrom a procedure can depend upon values of global objects, and functional procedures canupdate global objects. Side e�ects of functional procedures called in queries are consideredto be dangerous and lead to problems with query optimization (which may change the execu-tion order). Thus we see the need to introduce the attribute \side-e�ects-free" to the typingsystem. Otherwise two extreme approaches are possible: to forbid procedure calls insidequeries (as in DBPL), which is inconsistent with the orthogonality principle and much cutsthe power, or to leave everything in hands of the programmer, as in almost all programming50

languages.Output from functional procedures. Because we would like to combine procedure callswith queries, we assume that the output from functional procedures belongs to the samesemantic domain as for queries. As for parameters, we assume that the dereferencing oper-ator is lazy. This causes a problem for local objects: returning pointers to them should beconsidered an error, since these objects are canceled when the procedure is terminated. Inprogramming languages the problem is solved by typing; here we apply the explicit derefer-ence.Views as functional procedures, view updating. Some authors observed that viewscan be recursive and can have parameters [Toya86]. It can be shown on examples that func-tional procedures with a complex output are conceptually equivalent to views, in the SQLsense. All properties of views discussed in [Daya89] can be explained through well-knownprogramming concepts. Since the output can contain pointers, it is possible to update thedatabase through a view. DBPL selectors [ERMS91] are examples of such updatable views.There is, however, some undesirable homonymy in the terminology. A view, as understoodin the conceptual modelling, is a virtual data structure; this structure can be determined| in particular | by a view, understood as a functional procedure. Updating through aview (a functional procedure) is not equivalent to update of a view (a virtual data struc-ture). Mapping of updating of this structure into updating of stored data is ambiguous,sometimes impossible, and depends upon data semantics and the user intention. Ideally, theview updating should be transparent for the programmer: there should be no syntactic andpragmatic di�erence in updating stored data and views. The problem of transparent viewupdates received a lot of attention in the relational model, and this is a promising researchdirection in our approach.We use the following syntax:� Declaration of a procedure:procedure <procedure name> ( <formal parameters separated by ';'>)begin <sequence of statements separated by ';'>end <procedure name>;The list of formal parameters may be empty.� Procedure call (we associate names of formal parameters with actual parameters):<procedure name> (p1 : q1; p2 : q2; ::: )where pi is a name of i-th formal parameter, qi is a query associated with this parame-ter. For procedures without parameters we use syntax <procedure name>( ) or simply<procedure name>. 51

� Statement return: we assume two forms: return and return <query>.� Local objects: we introduce a statement for declaration/creation a object with syntaxcreate local < speci�cation of the object >Invisiblesections {

Identifiers of objects ENO, NAME, SAL,...nested in the actually tested EMP object

Identifiers of local objects of p1

Identifiers of local objects of p2

......

Identifiers of global objects of the module of p1

Identifiers of global objects of the module of p2

......

Identifiers of global objects of the main module

The environment stack

{Invisiblesections

Actual parameters of p1

Actual parameters of p2

......

The stack of parameters

Scoping rules:binding of name n

Figure 6: Binding of a name in LOQISIn this paper we do not consider further details of the syntax and semantics of proce-dures. In Figure 6 we present the environment stack of LOQIS during binding of name noccurring in a query EMP where f(n), which is inside the body of a procedure p1, which iscalled from a procedure p2. The �gure does not present modi�cations of the scope rules byviewers and external interfaces of modules.Examples (all of them can be writen in the LOQIS syntax)A functional procedure `poor' has a list of jobs as a parameter. It returns pointers tonames, salaries, and department names of employees, who do one of the speci�ed jobs andearn less than the average.procedure poor( JOBS )begincreate local AVERAGE( avg( EMP.SAL ) );52

(* Creating 0 or more pointer oobjects POOR pointing proper EMPs *)create local POOR( pointer to EMP where JOB in JOBS and SAL < AVERAGE );return POOR.EMP.( (N 2 NAME) � (S 2 SAL) � (D 2 WORKS IN .DEPT.DNAME));end poor;Give names of poor clerks and programmers from the department 'Sales'.(poor( JOBS: "clerk" t "programmer" ) where (D = "Sales")).NIncrease salaries of poor programmers from the department 'Sales' by 100::= (poor( JOBS: "programmer" ) where (D = "Sales")).(S � (S+100))A procedure `ChangeDept' has parameters EMPS, storing pointers to employees, andDEP storing a pointer to a department. It causes moving the speci�ed employees to thespeci�ed department.procedure ChangeDept( EMPS; DEP )beginfor each e 2 EMPS dobegindelete e.WORKS IN.DEPT.EMPLOYS where EMP = e;create and insert the object EMPLOYS( pointer to e) into DEP;:= pointer (e.WORKS IN) � DEP;e.EDNO := DEP.DNAMEendend ChangeDept;Let Brown take all designers working for Smith:ChangeDept( EMPS: EMP where JOB = "designer" andWORKS IN.DEPT.MANAGER.EMP.NAME = "Smith";DEP: DEPT where (MANAGER.EMP.NAME = "Brown" ))De�ne a view MyV iew(Dname;AvgSal;Mgr(Name; Sal)) containing information aboutdepartment names, average salaries, and manager names and salaries for departments lo-cated in Paris.procedure MyView( )beginreturn (DEPT where "Paris" in LOC).((Dname 2 DNAME) �(AvgSal 2 avg(EMPLOYS.EMP.SAL)) �(Mgr 2 (MANAGER.EMP.( (Name 2 NAME) � (Sal 2 SAL)))))end MyView; 53

Give manager name for the department with the highest average salary:(MyV iew where AvgSal = max(MyV iew:AvgSal)):Mgr:NameIncrease by 200 the salary of the manager of the Sales department::= (MyV iew where Dname = "Sales"):Mgr:(Sal� (Sal + 200))In this example updating of the view implies no ambiguities and side e�ects.See [AtBu87]. The database description is the following:repeating Part( Name(string)( Base( Cost(real) Mass(real) :::)orComposite( AssemblyCost(real) MassIncrement(real)repeating MadeFrom( Uses(" Part)Quantity(integer) :::))))De�ne a recursive view costAndMass( name, cost, mass ) with the parameter PARTSbeing a single-column table of identi�ers of objects Part; for the parts it returns name, thetotal cost, and the total mass.procedure costAndMass( PARTS )beginreturn PARTS.((name 2 Name) �(cost 2 (Base.Cost [ Composite.(AssemblyCost +sum(MadeFrom.(Quantity * costAndMass(PARTS: Uses.Part).cost))))) �(mass 2(Base.Mass [ Composite.(MassIncrement +sum(MadeFrom.(Quantity * costAndMass(PARTS: Uses.Part).mass))))))end costAndMass;This procedure can be optimized in the style shown in [AtBu87].5.4 Object OrientationSome features of object-oriented databases are already assumed in our framework, in par-ticular object identi�ers, complex objects, object sharing, path expressions, complex values,and integration of QLs with PLs. The idea of object-orientation includes other concepts thatare relevant to QLs, namely, classes, inheritance, methods, and encapsulation.Classes. There are many de�nitions of the class concept and relationships beetween classesand types. We do not consider subtles of these de�nitons and their dependency on a concretesystem or theory. For our idea the following view is convenient: a class is an object storing54

invariants for other objects (being elements of the class). These invariants are also objects;they may include methods/procedures, default attributes [WKS89], common values of at-tributes, constraints, typing information, etc. Invariants are inherited by elements of theclass and elements of its subclasses. Two kinds of the inheritance can be considered: staticand dynamic. In the static case (e.g. C++) classes are not the �rst-class citizens; hencethe inheritance of invariants works during the compilation time only. For systems with latebinding we can also consider also dynamic inheritance of the invariants. A promising conceptfor the dynamic inheritance we called viewers [SMSRW93]; it is implemented in LOQIS.Classes imply the following consequences for QLs:� Structural inheritance. Objects from sub-classes inherit structural properties oftheir super-classes. For instance, an object from the class STUDENT is consideredto be in the class PERSON; hence queries addressing objects of the class STUDENTcontain names of attributes that are de�ned for the class PERSON. Moreover, queriesaddressing objects of the class PERSON need to take into account the STUDENTobjects. This property leads to semantic ambiguity; in POSTGRES [SRH90] suchqueries require an explicit syntactic distinction.Semantics of the structural inheritance can be easily incorporated into the proposedmechanism. A concrete solution depends on the citizenship status of the class con-cept. When classes are not �rst-class objects, we can store the graph of relationshipsbetween classes; then we can apply rewriting rules to some queries before executingthem. The rules will substitute names refering to some class C by the union of namesof all subclasses of C; for example, PERSON will be substituted by (PERSON [STUDENT [ EMPLOY EE [ :::). Then, names occuring in a query can be boundas usual.When classes are �rst-class objects, the solution depends on the storage model assumedfor objects and classes. For example, we can extend our storage model assuming thateach object can have many names; the object with names n1; n2; ::: is bound to thename n occuring in a query, if some ni = n. Then, we assume that each object hasthe name of its class, and the names of all its superclasses as synonyms. For example,an object STUDENT has both names STUDENT and PERSON . Also in thiscase we can apply the standard binding mechanism. Note that the storage modelwith synonimic object names is more powerful than typical object-oriented models; inparticular, it allows us to model dynamically changed object roles [RiSc91]. Duringthe implementation of LOQIS we also considered another storage model, with specialpointers connecting a class with all its members (incuding members of its subclasses);in this case the binding rules must be changed.� Behavioral inheritance or inheritance of invariants. Classes contain invariants| methods, procedures, default values, common values, etc. | which are inheritedby elements of the class and by elements of its sub-classes. In our framework thismeans that objects are connected not only to \own" sub-objects (i.e. attributes),but also to objects stored inside their classes. This can be easily taken into ac-count in the de�nition of a QL by a change of the function nested, introduced pre-viously, and a change of the scope rules to resolve possible name con icts (i.e. to do55

\overridding"). Such modi�ed mechanism is implemented in LOQIS. For example,(EMP where (NAME = "Smith")):fire() denotes an application of a method fire,which is an object stored inside the EMP class. Assume that nested is changed intonested�, where nested�(r) = nested(r) [ nested(class(r))[ nested(super(class(r)))[nested(super(super(class(r)))) [ :::. Functions class and super return identi�ers ofthe class and the superclass(es) of the object(s) identi�ed by r, respectively. Such achange of the function nested allows us to consider fire as an attribute of the object,thus we can apply the usual binding rules.Methods. Methods are procedures stored within classes; \message passing" is a termino-logical and syntactic variant of a procedure call. An example of a call of the method fireis shown above. Queries can be applied as parameters of a method, and within its body. Amethod, as a functional procedure, can be called within a query, and a query can be usedto determine the context in which the method is to be applied. For example, consider theconstruct (EMP where (JOB = "analyst")):fire(), with the semantics \Fire all analysts".The procedure fire is applied many times and it has an implicit parameter: the currentanalyst. To deal with imlicit parameters object-oriented languages introduce the symbolself . Semantics of it is very simple in our framework: the symbol self is bound to the rowfrom the top of QRES that is actually processed by an operator where, dot, ./, etc.Encapsulation. An orthodox approach to encapsulation [ABDD+89] assumes that ob-jects can be accessed and processed only by methods, which leave a little room for QLs.There are examples (see e.g. [Daya89]) showing that this orthodoxy is contrary to the com-mon sense, and perhaps all object-oriented database approaches do not follow it, see e.g.[Beer89, ClDe92, Cruz89, Daya89, Kim89, KKS92]. We can also argue that in the databasedomain the concept of a schema has a well-founded tradition as a tool for representing datasemantics for the applicational programmer. A schema describes mainly static data; theirbehavioral aspects (methods) are add-ons only. Hence, data should be visible for users andprogrammers on the proper level of data independence.Encapsulation is also a principle of programmming languages such as Modula-2 or DBPL,where access to modules' properties (in particular, objects, procedures and types) is re-stricted by export/import lists and the speci�cation part of the module is separated fromthe implementation part. In this way inessential properties of data and procedures can behidden or shifted to the lower implementation level. Classes, besides the above property, in-troduce a structure supporting the conceptual modelling and inheritance (multi-inheritance).We can therefore consider a class as a concept sharing properties of modules and supportinginheritance. A class, as a module, exports some properties, in particular, methods and some\projection" of objects (the projection hides internal object properties, e.g. attributes).On the other hand, classes are connected into the structure of inheritance relationships,where methods (in general, invariants of objects) are inherited by sub-classes. With thisunderstanding we see no contradiction between QLs and encapsulation.56

6 Query OptimizationIn the paper we do not discuss many methods and aspects of query optimization which arerelevant for the class of QLs that we have proposed. In particular, we can present manyrules for equivalent transformation of queries. It is also obvious that the typical optimizationsknown from the relational model (selections through indices, selections before joins, e�cientjoin methods) are applicable for some queries of a QL developed according to the stack-basedframework.In general, current optimization methods are very sensitive to particular query patterns:even small violation of a pattern causes the optimization method to be inapplicable. Thisproblem is addressed in Exodus [GrDe87] and Postgres [SRH90] where parameterized queryoptimizers are developed. Parameterization allows to adapt the optimizer to new querypatterns and data organizations.Below we describe an easy to implement method, which is to a big extent independentof query patterns. We explain it on examples, then present the general rules.The method is based on observations concerning bindings. Consider the query(e 2 EMP ) where e:SAL = (EMP where NAME = "Smith"):SALIt is non-optimal since the nested subquery (EMP where NAME = "Smith"):SAL will beevaluated as many times as the number of employees. The nested query should be evaluatedin advance (or in the �rst loop performed by the outer query). In [Kim82] this kind of queriesis called type-A or type-N. Consider another query(e 2 EMP ) where e:SAL = max((EMP where DNO = e:DNO):SAL)This time the inner query cannot be executed in advance because it depends upon the outerobject e. This kind of queries is called in [Kim82] type-JA or type-J. We need criteria torecognize that the inner query is independent from the outer part.As we argued in the section devoted to variants and null values, each name occuring ina query, which has to be bound, should be statically relativized to the environment stack.That is, during static analysis of the query each name can be associated with two relativenumbers: the size of the stack when the name is bound, and the level of the stack wherecorresponding pointers have to be found. We used the term relative since (as usual in lan-guages supporting procedure nesting and recursion) stack sections are relativized accordingto scoping rules: we are interested only in sections that are visible through scoping rules andin each program point may abstract from invisible sections. Thus, without loss of generalitywe can assume that the execution of a query (also queries nested in procedures) starts withthe environment stack having only one section. With this assumption, relative numbers(StackSize;BindingLevel) for names occuring in the �rst query are the following(e 2 EMP ) where e :SAL = (EMP where NAME =00 Smith00):SAL(1; 1) (2; 2) (3; 3) (2; 1) (3; 3) (3; 3)and for the second query 57

(e 2 EMP ) where e :SAL = max((EMP where DNO = e :DNO ):SAL)(1; 1) (2; 2) (3; 3) (2; 1) (3; 3) (3; 2) (4; 4) (3; 3)The reason why the subquery in the second query is not independent from the outerquery is the following: name e is bound on the 2-nd level of the stack, but just this level ispermanently changed by the loop implied by the operator where of the outer query. In the�rst query no name occurring in the inner subquery is bound on the 2-nd level, hence it isindependent.These examples illustrate the principle, which can be easily generalized. We observe thatduring making of the above inference: (1) We are not interested in how complex the partof the outer query is before and after where; (2) We are not interested in how complex theinner query is and how it is constructed: only formal properties of bindings are essential;(3) We are not interested which operators connect the inner query with the outer query; (4)With no change we can repeat the inference if instead of where we will consider projection,join, quanti�ers, transitive closures, and order by.We formulate the method in more general terms. Assume syntax q1�f(q2), where q1, isa query, � is an operator where, dot, ./, 9, 8 (for uniformicity, we consider quanti�ers asin�x operators), closed by, order by, etc.; semantics of these operators is based on iterativeopening a new scope on the environment stack. q2 is an inner query, f represents a syntacticconstruct involving q2.Because we make no assumptions concerning f , the inner subquery q2 may participatein several loops implied by the above operators. Let < n1; n2; :::; nk > denote the sequenceof names occuring in q2, and let < (s1; b1); (s2; b2); :::; (sk; bk) > denote the correspondingsequence of pairs of numbers associated with these names; si denote the stack size when ni isbound, and bi denote the stack level where ni is bound. Let StackSizeq2 denote the size of thestack when the query q2 is started to be evaluated. Usually it holds StackSizeq2 = min1�i�ksi(but in general q2 may not contain names).The inner query q2 is totally independent from the outer query and may be evaluated inadvance (or, lazily, once when needed) if holds:- it does not contain data names, or- for each bi holds bi = 1 or bi > StackSizeq2.That is, the inner query is totally independent form the outer query if binding of namesoccuring in the inner query is not accomplished on such sections of ES that can be changedby the operator � and by similar operators occuring in f .Consider another example (\For each department give the number of employees earningmore than their manager").DEPT ./ count(EMPLOY S:EMP where (SAL > MANAGER:EMP:SAL))Intuitively, the nested subquery MANAGER:EMP:SAL is independent on the whereloop processing EMP objects, but it is dependent upon the external ./ loop processingDEPT objects. Thus the subquery is partly independent: it can be evaluated once in externalloop and need not to be evaluated inside an internal loop. As before, partial independencecan be expressed in general terms. Evaluation of q2 can be shifted outside j internal nestedloops if for each bi holds: 58

bi � (StackSizeq2� j) or bi > StackSizeq2The partial independence of queries generalizes the well-known rule \do selections beforejoins". Indeed, consider the queryDEPT ./ (EMPLOY S:EMP where "Paris" in LOC and JOB = "clerk")The inner subquery "Paris" in LOC is partially independent and can be switched out ofthe internal where loop. Thus it will participate only in a loop concerning DEPT objects.Taking into account additivity of the ./ operator with respect to its �rst argument, we cantransform the query to a more optimal form(DEPT where ("Paris" in LOC)) ./ (EMPLOY S:EMP where JOB = "clerk")We can also show that the decomposition method, concerning conjunctive queries [WoYo76],can be generalized in our framework through the concept of the partial independence of sub-queries.7 ConclusionIn this paper we have presented an approach to query languages based on a modi�cation ofthe concepts known in programming languages. We believe, the approach makes possibleto achieve a proper level of the pragmatic universality and precision of speci�cation of se-mantics. Although our pesentation is semi-formal, it is easy to see that the approach makespossible to build powerful mathematical models. We have shown that classical stack-basedmechanism of programming languages can be modi�ed in order to process the declarativequeries. Various query language concepts and operators are formally de�ned by simple stackoperations, without refering to theoretical frameworks such as relational algebra, calculus orlogic. This allows us to avoid some limitations implied by these frameworks. The proposedapproach allows to build powerful query languages to a variety of data models, in particular,for relational and object-oriented models. Following a strong de�nitional discipline as-few-concepts-as-possible, we received a level of uniformicity which creates a new potential forquery optimization. The presented approach has been implemented in the system LOQIS,and this experience is quite encouraging.References[AhUl79] A.V. Aho, J.D. Ullman. Universality of Data Retrieval Languages. Proc. of6-th ACM Symposium on Principles of Programming Languages, San Antonio,TX., Jan. 1979, ACM NewYork, 110-117[ACO85] A. Albano, L. Cardelli, R. Orsini. Galileo: A Strongly-Typed, InteractiveConceptual Language. ACM Transactions on Database Systems, Vol.10, No 2,1985, pp.230-260 59

[ABDD+89] M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, and S. Zdonik.The Object-Oriented Database System Manifesto. Proc. 1-st DOOD Conf.,Kyoto, pp.40-57, 1989.[AtBu87] M.P. Atkinson, O.P. Buneman. Types and Persistence in Database Program-ming Languages. ACM Computing Surveys, Vol.19, No.2, pp.105-190, 1987[BBKV87] F. Bancilhon, T. Briggs, S. Khosha�an, and P. Valduriez. FAD, a Powerfuland Simple Database Language. Proc. 13th VLDB Conf., Brighton, pp.97-105,1987[Beer89] C. Beeri. Formal Models for Object-Oriented Databases. Proc. 1-st DOODConf., Kyoto, pp.370-395, 1989.[Card89] L. Cardelli. Typeful Programming. DIGITAL Systems Research Center, PaloAlto, Report No 45, May 1989[CDV88] M.J. Carey, D.J.DeWitt, S.L. Vandenberg. A Data Model and Query Languagefor EXODUS. Proc. ACM SIGMOD Annual Conf., pp.413-423, 1988.[ClDe92] S. Cluet, C. Delobel. A General Framework for the Optimization of Object-Oriented Queries. Proc. ACM SIGMOD Conf., pp.383-392, 1992.[CODA71] CODASYL Database Task Group Report, ACM, New York, 1971[Codd79] E.F. Codd. Extending Database Relations to Capture More Meaning. ACMTransactions on Database Systems, Vol.4, No 4, 1979, pp.397-434[Cruz89] I.F. Cruz. Declarative Query Languages for Object-Oriented Databases. O�ceand Data Base Systems Research '89 (Ed. F.H. Lochowsky) Technincal ReportCSRI-238, Computer Systems Research Institute, University of Toronto, pp.92-130, June 1990.[CMW87] I.F. Cruz, A.O. Mendelzon, P.T. Wood. A Graphical Query Language Sup-porting Recursion. Proc. ACM SIGMOD Conf., pp.323-330, 1987.[Daya89] U. Dayal. Queries and Views in an Object-Oriented Data Model. Proc. of 2-ndDBPL Workshop, Gleneden Beach, Oregon, pp.80-102, 1989[Deux+90] O. Deux et al. The Story of O2. IEEE Transactions on Knowledge and DataEngineering, 2:1, pp.91-108, 1990.[ERMS91] J. Eder, A. Rudlo�, F. Matthes, J.W. Schmidt. Data Construction with Recur-sive Set Expressions. Next Generation Information System Technology. Proc.of 1st East/West Database Workshop, Kiev, USSR, Oct.1990, Springer LectureNotes in Computer Science 504, 1991, pp.271-293[GrDe87] G. Graefe, D.J. DeWitt. The EXODUS Optimizer Generator. Proc. of ACMSIGMOD 87 Conf., 1987, pp.160-17260

[HFLP89] L.M. Haas, J.C. Freytag, G.M. Lohman, H. Pirahesh. Extensible Query Pro-cessing in Starburst. Proc. ACM SIGMOD Conf. pp.377-388, 1989.[Ingr89] Using INGRES Through Forms and Menus for the UNIX and VMS OperatingSystems. INGRES Release 6, Relational Technology, June 1989.[Ingr90] Language Reference Manual for INGRES/Windows 4GL for the UNIX andVMS Operating Systems. INGRES Release 6, Ingres Corporation, August1990.[Kim82] W. Kim. On Optimizing an SQL-like Nested Query. ACM Transactions onDatabase Systems, Vol.7, No 3, 1982, pp.443-469[Kim89] W. Kim. A Model of Queries for Object-Oriented Databases. Proc. of 15-thVLDB Conf., Amsterdam, The Netherlands, pp.423-432, 1989[KKS92] M. Kifer, W. Kim, Y. Sagiv. Querying Object-Oriented Databases. Proc. ACMSIGMOD Conf. pp.393-402, 1992.[KGBW90] W.Kim, J.F.Garza, N.Ballou, D.Woelk. Architecture of the ORION Next-Generation Database System. IEEE Transactions on Knowledge and DataEnginering, Vol.2, No.1, 1990, pp.109-124[Mant91] R. Manthey. Declarative Languages - Paradigm of the Past or Challenge ofthe Future? Proc.1st Intl. East/West Database Workshop on Next GenerationInformation System Technology, Kiew, USSR 1990 Springer Lecture Notes inComputer Science, Vol.504, pp.1-16, 1991.[MRSS92a] F. Matthes, A. Rudlo�, J.W. Schmidt, K. Subieta. The Database ProgrammingLanguage DBPL, User and System Manual. FIDE, ESPRIT BRA Project 3070,Technical Report Series, FIDE/92/47, 1992[MBCD89] R. Morrison, F. Brown, R. Connor, A. Dearle. The Napier88 Reference Man-ual. Universities of St Andrews and Glasgow, Departments of Comp. Science,Persistent Programming Report 77, July 1989.[MBW80] J. Mylopoulos, P.A. Bernstein, H.K.T. Wong. A Language Facility for Design-ing Database-Intensive Applications. ACM Transactions on Database Systems,Vol.5, No 2, 1980, pp.185-207[O2Ma92] The O2 User Manual, Version 4.1. O2 Technology, Versailles, France, October1992[OBB89] A. Ohori, P. Buneman, V. Breazu-Tannen. Database Programming in Machi-avelli - a Polymorphic Language with Static Type Inference. Proc. of ACMSIGMOD 89 Conf., 1989, pp.46-57[Orac91] PL/SQL, User Guide and Reference, Version 1.0, June 1991. Oracle Corpora-tion 1991. 61

[Ott92] N. Ott. Aspects of the Automatic Generation of SQL Statements in a NaturalLanguage Query Interface. Information Systems 17, 2, pp.147-159, 1992[PPT91] J. Paradaens, P. Peelman, L. Tanca. G-Log: A declarative Graphical QueryLanguage. Proc. 2nd Intl. Conf. on Deductive and Object-Oriented Databases,Munich, Germany. Springer LNCS 566,pp.108-128, 1991[RiSc91] J. Richardson, P. Schwarz. Aspects: Extending Objects to Support Multiple,Independent Roles. Proc. of ACM SIGMOD 91 Conf., 1991, pp.298-307[Schm77] J.W. Schmidt. Some high level language constructs for data of type relation.ACM Transactions on Database Systems, Vol.2, No 3, 1977, pp.247-261[ScMa92] J.W. Schmidt, F Matthes. The Database Programming Language DBPL, Ra-tionale and Report. FIDE, ESPRIT BRA Project 3070, Technical ReportSeries, FIDE/92/46, 1992[SFL81] J.M. Smith, S. Fox, T. Landers. Reference manual for ADAPLEX. TechnicalReport CCA-81-02, Computer Corporation of America, 1981[SRH90] M. Stonebraker, L.A. Rowe, and M. Hirohama. The Implementation of POST-GRES. IEEE Transactions on Knowledge and Data Engineering, 2:1, pp.125-142, 1990.[SRLG+90] M. Stonebraker, L.A. Rowe, B. Lindsay, J. Gray, M. Carey, M. Brodie, P. Bern-stein, D. Beech: The Committee for Advanced DBMS Function. Third-Generation Data Base System Manifesto. ACM SIGMOD Record 19(3), pp.31-44, 1990.[Subi85] K. Subieta. Semantics of Query Languages for Network Databases. ACMTransactions on Database Systems, 10:3, pp.347-394, 1985.[SuMi86] K. Subieta, M. Missala. Semantics of query languages for the Entity-Relationship Model. Proc. 5th Conf. on Entity-Relationship Approach, Dijon,France, pp.197-216, 1986.[SuRz87] K. Subieta, and W. Rzeczkowski. Query Optimization by Stored Queries. Proc.13th VLDB Conf., Brighton, England, 1987, pp.369-380[SMA90] K. Subieta, M. Missala, and K. Anacki. The LOQIS System. Institute ofCoputer Science Polish Academy of Sciences Report 695, 1990.[Subi91] K. Subieta. LOQIS: The Object-Oriented Database Programming SystemProc.1st Intl. East/West Database Workshop on Next Generation InformationSystem Technology, Kiew, USSR 1990 Springer Lecture Notes in ComputerScience, Vol.504, pp.403-421, 1991.62

[SMSRW93] K. Subieta, F. Matthes, J.W. Schmidt, A. Rudlo�, I. Wetzel. Viewers: A Data-World Analogue of Procedure Calls. Proc. 19th VLDB Conf., Dublin, Ireland,1993, pp.269-277[Toya86] M. Toyama. Parameterized view de�nitions and recursive relations. Proc. ofConf. on Data Engineering, Los Angeles, IEEE Computer Society, pp.707-712,1986[Ullm91] J.D. Ullman. A Comparison of Deductive and Object-Oriented Database Sys-tems. Proc. 2nd Intl. Conf. on Deductive and Object-Oriented Databases,Munich, Germany. Springer LNCS 566, pp.263-277, 1991[WaGo84] W.M. Waite, G. Goos. Compiler Construction. Springer 1984[WKS89] W. Wilkes, P. Kahold, and G. Schlageter. Complex and composite objectsin CAD/CAM databases. Proc.5th Conf. on Data Engineering, Los Angeles,California, pp.443-450, 1989.[WLH90] K.Wilkinson, P.Lyngb�k, W.Hasan. The Iris Architecture and Implementa-tion. IEEE Transactions on Knowledge and Data Engineering, Vol.2, No.1,1990, pp.63-75[WoYo76] E. Wong, K. Yousse�. Decomposition - A Strategy for Query Processing. ACMTransactions on Database Systems, Vol.1, No 3, 1976, pp.223-241[Zani83] C. Zaniolo. The Database Language GEM. Proc. ACM SIGMOD Conf. pp.423-434, 1983.[ZhMe83] Z.G. Zhang, A.O. Mendelzon. A Graphical Query Language for Entity-Relationship Databases. In: Entity-Relationship Approach to Software En-gineering. North-Holland, Amsterdam 198363