Composite regions in topological queries

16
Pergamon Information Systems Vol. 20,No. 7,pp. 579-594,1995 Copyright@ 1995 Else&r Science Ltd Printed in Great Britain. All rinhts reserved 0306-4379(95)00031-3 0306-4379/95-$9.50 + 0.00 COMPOSITE REGIONS IN TOPOLOGICAL QUERIESt ELISEO CLEMENTINI, PAOLINO DI FELICE and GIANLUCA CALIFANO Department of Electrical Engineering, University of L’Aquila, 67040 Poggio di Roio, L’Aquila, Italy (Received 31 March 1994; in final revised form 7 July 1995) Abstract - Spatial data are at the core of many scientific information systems. The design of suitable query languages for spatial data retrieval and analysis is still an issue on the cutting edge of research. The primary requirement of these languages is to support spatial operators. Unfortunately, current systems support only simplified abstractions of geographic objects baaed on simple regions which are usually not sufficient to deal with the complexity of the geographic reality. Composite regions, which are regions made up of several components, are necessary to overcome those limits. The paper introduces a two-level formal model suitable for representing topological relationships among composite regions. The contribution gives the needed formal background for adding composite regions inside a spatial query language with the purpose of answering topological queries on complex geographic objects. Key words: Composite Regions, Topological Relationships, Spatial Query Languages, Extensible Database Systems 1. INTRODUCTION The interest of the scientific community towards systems that are able to manage great deals of spatial data has grown over the years. Spatial data have a crucial role in many scientific applications. In such systems a primary activity is spatial data analysis and processing [15]. Integration between Geographical Information Systems (GIS) and spatial analysis can take the form of a language, whose primitive elements represent the fundamental operations of spatial analysis [ll]. Spatial relationships among geographic objects must be supported by spatial query languages. There is considerable evidence that people think about geographic space in terms of the objects in it and the relationships among them, rather than in terms of systematic coordinates [16]. Researchers have assessed the importance and the difficulty of mapping the linguistic description of geographic facts into a digital database that contains geographic objects and relationships. The main body of the literature on spatial relationships deal with simple regions and lines [4, 5, 7, 8, 12, 13, 141. Simple regions are two-dimensional point sets homeomorphic to a disk and simple lines are one-dimensional features embedded in the plane with only two end-points. The variety and complexity of geographic entities can be hardly modeled with simple geometric features. With regard to regions, the two main extensions are towards separations of the exterior (holes) and separations of the interior (multiple components). Both extensions are common in geography. Countries, for example, are made up of separations (islands, exclaves, external territories) and holes (enclaves). In this paper, regions with multiple components are taken into account. So far, composite regions have been treated only marginally in the spatial data modeling literature. In current GIS, composite regions are not directly implemented as data types. In the GEO++ system [20], three spatial data types are available: points, polylines, and polygons. In ARC/INFO [9], composite regions do not exist as single entities. An extension for supporting them has been planned in the forthcoming version 7.0 [19], although composite regions are seen as a way of overcoming the limitations of single coverage GIS. Ideally, composite regions are handled more naturally in object- centered spatial databases were complex objects are not spread through several layers, but can be represented directly in the data model. In [19], the importance of having a new feature class is recognized, providing the capability to manage non-contiguous areas with identical attributes as a single region. Spatial relationships *Recommended by Maurizio Lenzerini 579

Transcript of Composite regions in topological queries

Pergamon Information Systems Vol. 20, No. 7, pp. 579-594, 1995

Copyright@ 1995 Else&r Science Ltd Printed in Great Britain. All rinhts reserved

0306-4379(95)00031-3 0306-4379/95-$9.50 + 0.00

COMPOSITE REGIONS IN TOPOLOGICAL QUERIESt

ELISEO CLEMENTINI, PAOLINO DI FELICE and GIANLUCA CALIFANO

Department of Electrical Engineering, University of L’Aquila, 67040 Poggio di Roio, L’Aquila, Italy

(Received 31 March 1994; in final revised form 7 July 1995)

Abstract - Spatial data are at the core of many scientific information systems. The design of suitable query languages for spatial data retrieval and analysis is still an issue on the cutting edge of research. The primary requirement of these languages is to support spatial operators. Unfortunately, current systems support only simplified abstractions of geographic objects baaed on simple regions which are usually not sufficient to deal with the complexity of the geographic reality. Composite regions, which are regions made up of several components, are necessary to overcome those limits. The paper introduces a two-level formal model suitable for representing topological relationships among composite regions. The contribution gives the needed formal background for adding composite regions inside a spatial query language with the purpose of answering topological queries on complex geographic objects.

Key words: Composite Regions, Topological Relationships, Spatial Query Languages, Extensible Database Systems

1. INTRODUCTION

The interest of the scientific community towards systems that are able to manage great deals of spatial data has grown over the years. Spatial data have a crucial role in many scientific applications. In such systems a primary activity is spatial data analysis and processing [15]. Integration between Geographical Information Systems (GIS) and spatial analysis can take the form of a language, whose primitive elements represent the fundamental operations of spatial analysis [ll]. Spatial relationships among geographic objects must be supported by spatial query languages. There is considerable evidence that people think about geographic space in terms of the objects in it and the relationships among them, rather than in terms of systematic coordinates [16]. Researchers have assessed the importance and the difficulty of mapping the linguistic description of geographic facts into a digital database that contains geographic objects and relationships.

The main body of the literature on spatial relationships deal with simple regions and lines [4, 5, 7, 8, 12, 13, 141. Simple regions are two-dimensional point sets homeomorphic to a disk and

simple lines are one-dimensional features embedded in the plane with only two end-points. The variety and complexity of geographic entities can be hardly modeled with simple geometric features. With regard to regions, the two main extensions are towards separations of the exterior (holes) and separations of the interior (multiple components). Both extensions are common in geography. Countries, for example, are made up of separations (islands, exclaves, external territories) and holes (enclaves).

In this paper, regions with multiple components are taken into account. So far, composite regions have been treated only marginally in the spatial data modeling literature. In current GIS, composite regions are not directly implemented as data types. In the GEO++ system [20], three

spatial data types are available: points, polylines, and polygons. In ARC/INFO [9], composite regions do not exist as single entities. An extension for supporting them has been planned in

the forthcoming version 7.0 [19], although composite regions are seen as a way of overcoming the limitations of single coverage GIS. Ideally, composite regions are handled more naturally in object- centered spatial databases were complex objects are not spread through several layers, but can be represented directly in the data model.

In [19], the importance of having a new feature class is recognized, providing the capability to manage non-contiguous areas with identical attributes as a single region. Spatial relationships

*Recommended by Maurizio Lenzerini

579

580 ELISEO CLEMENTINI et al.

embedded in a database query language must apply to composite regions providing the tools to answer spatial queries, such as: “Is the cloud covering Italy. , 7” “Which are the states bordering on Michigan state?“. Notice that Italy has two big islands and that the state of Michigan is composed of two parts. In a query such as “In which country is Anchorage?“, the answer is “in the USA”, considering the USA as a single entity and without specifying in which land separation the city is.

In the paper, we restrict our attention to topological relationships. Topology is important because non-topological transformations do not preserve any of the original geometry, while changes preserving topological invariants are still easily recognizable by people. Topological relationships are preserved under translation, rotation, and zooming. It is difficult to learn non-topological changes [ 11.

The remainder of this paper is organized as follows. In Section 2, geometric definitions for composite regions are given (the spatial model) in terms of point-set topology. In Section 3, we briefly recall the Calculus-Based Method (C&V) for topological relations between simple regions [4]. Such a method considers a mutually exclusive and complete set of topological relationships suitable as a basis for spatial reasoning and for being integrated effectively in a spatial query language.

In Section 4, we propose the Topological Relationships for Composite Regions ( TRCR) model. Such a model is structured in two levels of description: the coarse level and the detailed level. At the coarse level, the same four relationships of the CBM can be stated for composite regions without changing the point-set definition, only the boundary operator needs to be redefined. These relationships treat composite regions as a whole, therefore they do not give any information about the specific topological relationship holding between each pair of components. To get a detailed description of the topological configuration involving two composite regions, two operators are necessary to extract single components and count the number of components. To link the two levels of description, rules for mapping the detailed relationships to the coarse ones are given.

While at the data model level it is worthwhile to have powerful abstraction primitives of the reality, at the query processing level the primary need is efficiency. The approach taken in this pa- per, and reflected into the TRCR model, satisfies both these needs. Specifically, the TRCR model allows us to refer to composite regions as a whole instead of being forced to think in terms of their components, while the processing of topological queries is managed by relying on classical algo- rithms of computational geometry among simple regions. In Section 5, we illustrate the algorithms to assess the topological relationships among composite regions and describe the implementation of the theory in the GIS prototype GEO++ [20], based on the POSTGRES extensible database system [18]. The composite regions are implemented as a new geometric data type in POSTGRJZS, while the TRCR relationships and operators are implemented as POSTGRES functions. CEO++ acts as a front-end for the visualization of queries. Section 6 presents a small set of topological queries concerning countries and lakes. Concluding remarks close the paper.

2. THE SPATIAL MODEL

The aim of this section is to introduce some basic concepts of point-set topology related to the definition of geometric two-dimensional objects, which are the usual model for representing geographic entities. The definition of geometric objects is purely topological since we disregard all the shape and metric properties and concentrate on the study of topological relationships.

Point-set topology deals with open and closed sets. We use the concepts of continuity, boundary, interior, closure, and dimension that are defined in terms of the neighborhood relation [17]. If A is a point-set, then dA, A”, x, and dim(A) denote the boundary, the interior, the closure, and the dimension of A, respectively.

The study of topological relationships between objects also depends on the embedding space, that we assume to be ‘9’. All kinds of complex geometric objects may be considered as extensions of simple ones, that can be defined as follows:

(a) simple regions are closed (non-empty) connected two-dimensional point-sets with no holes, such that they are equal to the closure of their interior (i.e., A = z);

Composite Regions in Topological Queries 581

Fig. 1: A composite region made up of four components.

(b) simple lines are closed connected one-dimensional point-sets embedded in !R2 with no self- intersections and with only two end-points;

(c) points are zero-dimensional sets consisting of only one element of R2.

Complex regions are far more common than simple ones in real applications. In the following, we remove the constraint that the region is connected building the concept of composite region.

The notion of composite region relies on the topological notions of connectedness and component [17]:

l If Y c R2, a separation of Y is a pair of disjoint non-empty open sets A and B whose union is Y. Y is connected if there exists no separation of Y. Disconnected otherwise.

l Given Y C !J?‘, let us define an equivalence relation on Y by setting a point z equivalent to a point y if there is a connected subset of Y containing both x and y. The equivalence classes are called the components of Y. From the notion of equivalence class it follows that the components of Y are connected disjoint subsets of Y whose union is Y.

Separations and components refer to interiors of regions, while geographic objects are usually represented by closed sets, in order to permit that a region contains its boundary. The following definition holds:

Definition 1 A composite region is a closed two-dimensional subset A of R2 such that, if we define the components Al . . . A, of A as the closure of the corresponding components of A”, then:

1. each Ai is a simple region;

2. AfnAIJ=0, Vifj;

3. aA; IT 8Aj = 0 or equal to a finite set of points {PI . . .pr~}.

In Figure 1, there is an example of composite region made up of four components Al . . . Ad. The definition allows that the boundaries of components may be connected at some points (not along an edge). Such a constraint preserves the property A = A” usually assumed in other papers

[6, 211. Let CA indicate the difference set between the embedding space R2 and the composite region

A, (CA = ?R2 - A), the definitions for boundary and interior of composite regions (same as those for simple regions) are:

l the boundary dA of a composite region A is defined as: dA = 2 f~ CA. Since A is a closed set, then dA = A rim;

l the interior A” of A is defined by difference: A” = x - dA = A - r3A.

The boundary of each component of a composite region is a closed curve homeomorphic to a l- sphere; therefore, the boundary of a composite region is made up of the union of n such curves, which can be disjoint or touching at some points.

582 ELISEO CLEMENTINI et al.

3. THE CALCULUS-BASED METHOD

In [4], the authors introduced a model (the Calculus-Based Method - CBM) for classifying topological relationships among simple regions. The basic idea underlying such a method is to provide a grouping of topological relationships under appropriate names in order to cover all topological configurations. The relationships are five (touch, an, cross, overlap, and disjoint) and they apply to simple points, lines and regions. Additional granularity is provided by two boundary operators, which combined with the relationships are able to distinguish among a higher number of topological cases [3]. Only four of the relationships above apply to simple regions, namely, touch, in, overlap, and disjoint, and a boundary operator (b), whose definitions are recalled below:

Definition 2 The topological relationships for simple regions are:

l touch:

(1)

0 in:

(A,in,B)*(AnB=A)r\(A”nB”#0) (2)

0 overlap: (A, overlap, B) ti (dim(A”) = dim(B”) = dim(A” n B”))A

(AnB#A)r\(AnB#B) (3)

l disjoint: (A, disjoint, B) u A n B = 0 (4)

Definition 3 The boundary operator for a simple region A returns the closed curve dA:

(A, b) = dA (5)

When boundaries are involved in a relationship, the CTOSS relationship needs also to be consid- ered. Its definition is:

(A,CTOSS, B) CP (dim(A” n B”) = mas(dim(A”),dim(B”)) - 1)~

(AnB#A)A(AnB#B) (6)

Figure 2 gives a pictorial representation of the relationships of the CBM for simple regions. Some cases require the application of the b operator to give conditions on boundaries:

1. (A,&, B) A ((A, b),overlup, (B,b))

2. (A,in,B) A ((A, b),cross, (B,b))

3. (A,&, B) A ((A,b),disjoint, (B, b))

4. (A, touch, B) A ((A, b),woss, (B, b))

5. (A, touch, B) A ((A, b),overlap, (B, b))

6. (A, disjoint, B)

7. (A,merlap, B) A ((A, ~),CTOSS, (B,b))

8. (A,overlap, B) A ((A,b),overlup, (B,b))

9. (A,in,B) A (B,in,A) .

Composite Regions in Topological Queries 583

7 8 9

Fig. 2: The topological cases for simple regions identified by the CBM.

4. THE TRCR MODEL

From a cognitive science perspective, it is recognized that people use a conceptual abstraction

of space at different levels of detail, organizing spatial knowledge in a hierarchical manner [2]. It is essential to understand how such hierarchies are structured in order to enhance GIS development and use [lo]. Hierarchies are efficient for representing spatial knowledge and they are well suited

for inference making. With regard to topology, it is reasonable to start from a small set of topo- logical relationships to give broad categories of topological configurations; if more detail is needed, additional operators are used to get more deeply in the hierarchy.

Below, we introduce the TRCR model suitable for describing topological relationships among composite regions at two levels of granularity called, respectively, coarse level and detailed level. At the coarse level, the general relationship between two composite regions is given, while at the refined level the single components are taken into account.

A set of rules suitable to link the two levels of description are also given; specifically, the rules

clarify how it is possible the mapping from the detailed relationships to the coarse ones. Those rules constitute the formal basis for the query processing of topological queries among composite regions.

4.1. The Two Levels

First, we introduce the coarse level and later the detailed one. At the coarse level the relation- ship between two composite regions can be given in terms of point-set topology as for the case of simple regions (same names and definitions of Section 3 for in, overlap, touch, and disjoint), while the boundary operator needs to be redefined. For simple regions the boundary operator returns a simple closed line homeomorphic to a l-sphere (Definition 3). When dealing with a composite region made up of n components, the boundary is made up of n closed curves that may be disjoint or touch at several points. Each of such closed curves is the boundary of a simple region.

Definition 4 The boundary operator for a composite region A with n components Al . . . A, re- turns the following point-set:

(A, b) = fi 3Ai i=l

(7)

584 ELISEO CLEMENTINI et al.

A2 B BI B2

A Ai A2

B

Bl B2

Table 1: The matrix describing the topological scene for the composite regions in Figure 3.

At the detailed level of description, besides using the topological relationships in, overlap, touch, and disjoint valid for simple regions, two further operators are needed to extract single components from a composite region and count the number of separate components. The TRCR model provides the operators i and N to achieve the goal. They are defined as follows.

Definition 5 Given a composite region A with n components Al . . . A, and the index i defined over the range 1 . . .n, the i operator returns the i-th component of A:

(A, i) = Ai (8)

Definition 6 Given a composite region A, the N operator returns the number of components of A:

(A,N) =n (6)

All the topological relationships and the operators being part of the TRCR model are devoted to have a counterpart at the query language level. Section 5 discusses the implementation of the model.

4.2. Mapping Between the Two Levels

Given a scene of two composite regions, we can be interested to know the topological relationship the two composite regions are involved in, or the topological relationship holding between specific pairs of their components, or both. From the theoretical point of view, these two cases can be seen independently one from the other and the TRCR model tells us how we can proceed to solve each of them, separately.

However, at the query processing level a satisfactory solution of the problem of computing the topological relationship between two composite regions is to divide it in sub-problems concerning their components, basically because of the availability of efficient geometric algorithms (e.g., the point-in-polygon algorithm) that compute the topological relationships in, overlap, touch, and disjoint in the case of simple regions.

In this section, we introduce a set of rules suitable for deriving the coarse relationship between two composite regions in terms of the relationships holding among their components.

Given two composite regions A and B, having the components Al . . . A, and B1 . . . B, respec- tively, the topological relationship between A and B is fully described by the relationships that link a set of m + n + 2 regions (m + n simple and two composite). The number of such relationships is p2, with p = (m + n + 2). The whole topological scene can be represented by means of a matrix in which the generic element in position (i, j) gives the relationship between the i-th row’s region and the j-th column’s region. Table 4.2 gives the matrix describing the configuration of Figure 3. Short notations are used to indicate relationships (O=owerlap, I=in, T=touch, D=disjoint). Re- lationships overlap, touch, and disjoint are symmetric, while the in relationship is not symmetric. In the matrix, the inverse of the in relationship is indicated with J. If both in and its inverse are satisfied, the corresponding entry in the matrix is indicated with E.

Composite Regions in Topological Queries 585

CA, overlap, B>

Fig. 3: A topological configuration between two composite regions.

Table 2: The reduced topology matrix for the composite regions in Figure 3.

The number of topological relationships necessary for describing a scene is less than /.L~ as it is

stated by the equation:

.=n.,.(;)+(~), 00)

where n x m is the number of relationships between A’s components and B’s components, (t> the

number of relationships among A’s components, and (7) the number of relationships among B’s

components. The reduction from p2 to v is due to the symmetry and reflexivity of relationships and, also,

to the fact that the relationship between a region and each of its components is always J. Other relationships that can be inferred are the relationships between a composite region and the compo-

nents of the other composite region, also the coarse relationship between A and B can be derived from the relationships between the components (through the forthcoming rules).

We call reduced topology matrix (RTM) the matrix containing the n x m relationships between the components of the composite regions (see the example in Table 4.2). Such relationships are those used to infer the relationships at the coarse level (see also the algorithms in Section 5). The rules mentioned previously are now introduced. A figure is associated to each of them (Figure 4).

Rule 1 (in) A composite region A is contained in a composite region B, if and only if each A’s

component is contained in some B’s component, i.e.:

(A,in,B)~VtliEl...n,3jEl...m((Ai,in,B,) 01)

Rule 2 (overlap) An overlap between two composite regions holds in one of the following cases:

l if there is an overlap between at least two components:

(A, overlap, B) + 3i E 1.. . n, 3j E 1.. . m ) (Ai, overlap, Bj) (12)

l if an A’s component is in a relation touch or disjoint with all B’S components and a B’S component is in a relation touch or disjoint with all A’s components and, at the same time, there is a relation inbetween an A’s component and a B’s component:

(A,overZup,B)~~iEl...n,3jEl...m/((A~,in,B~)V(B~,in,A~))~

3T E 1 . ..n.r#i(VsEl.. . m, ((AT, touch, B,) V (A,, disjoint, B,))A (13) 3s E 1.. .m,s#jIVrEl.. . n, ((AT, touch, B,) V (A,., disjoint, B,))

586 ELISEO CLEMENTINI et al.

<A, in, B> I

<A, touch, B> I

<A, disjoint, B>

CA, overlap, B> <A, overlap, B>

Fig. 4: The four relationships of the coarse level.

l if an A’s component is in the relation in with a B’s component and, at the same time, another B’s component is in the relation in with an A’s component:

(A,overl~~p, B)+3i E 1.. .TL, 3j E 1.. .m 1 (Ai,in, Bj) A -(Bj,in, Ai)A 3r E 1 . ..n.r #i,39 E 1 . ..m.s#j) (B,,in,A,)Al(A,,in,B,) (14)

Rule 3 (touch) Two composite regions touch each other if there exists at least a relationship touch between their components, while the remaining, if any, are disjoint, i.e.:

(A,touch,B) -+ 3i~l...n,3j~l...m]Vr~l...n,Vs~l...m, (Ai, touch, Bj) A ((A,., disjoint, B,) V (AT, touch, B,)) 05)

Rule 4 (disjoint) Two composite regions are disjoint if each A’s component is disjoint from all B’s components, i.e.:

(A, disjoint, B) -+ Vi E 1.. . n,Vj E 1.. .m, (Ai, disjoint, Bj) (16)

Figure 5 closes the section by pictorially summarizing the features of their connection either with the query language and the query processing.

5. PROCESSING TOPOLOGICAL QUERIES

the TRCR model and

The TRCR model was implemented on top of the POSTGRES/GEO++ system. GEO++ [20] is a GIS front-end for POSTGRES. POSTGRES is an extensible relational database system providing support for new advanced applications [18] I Native spatial data types of POSTGRES are points, polylines, and polygons. One of the prominent feature of POSTGRES is to be an extensible system, that is, it allows the definition of new data types and operations that are applicable to them. In order to add a new data type and functions in POSTGRES, it is necessary to define an internal (C code) and an external (character string) representation for the new data type, provide conversion functions from internal to external representation and vice versa, register new types and functions (with associated operators) using the POSTQUEL query language.

A new data type, that we call multipolygon, has been introduced for handling composite regions. It has the following internal representation:

Composite Regions in Topological Queries 587

The rules

Relationships

Operators

Fig. 5: The connection of the TRCR model with the query language/query processing.

typedef struct {

long length; /*storing space*/ int npoly; /*no. of polygons*/

int npts; /*total no. of points*/

POINT2 p Cl] ; /*coordinates vector*/ 3

MULTIPOLYSTRUCT; typedef MULTIPOLYSTRUCT MULTIPOLYGON;

The external representation is a string of the following kind:

where np is the number of polygons of the multipolygon, ni is the number of points of the i-th polygon, and xi, yj are the coordinates of the j-th point of the i-th polygon.

The topological relationships among multipolygons, those among polygons, and the opera- tors (b, i, and N) of the TRCR model have been all implemented as functions in the POST- GRES/GEO++ environment. The relationships between simple polygons are computed by apply- ing common algorithms of computational geometry (like the polygon-in-polygon algorithm), while the relationships between multipolygons are assessed by reusing the algorithms on single polygons. The derivation of the coarse relationships from the detailed ones relies on the rules of Section 4.2.

Below we give four algorithms, each assessing whether, at the coarse level, one of the four topo- logical relationships of the TRCR model holds between two composite regions. In these algorithms, A and B are two composite regions, with n and m components, respectively. The basic opera- tion corresponds to the activation of a function, called RTM(Ai, Bj), that given two components returns the topological relationship between them.

Algorithm 1 (for assessing the topological relationship in between two composite regions)

IN: (A, B) -+ Boolean s

{ i t 1; found t true;

while found and (ic=n> &I

{ j + 1; found + false; possible + true;

588 ELISEO CLEMENTINI et al.

while (not found) and possible and (jC=m) do { m RTM(Aj,Bj) &

'I','E': found t true; 'T','D': ; 'J','O': possible c false

endcase; j t j+l

) i t i+l

1 IN t found

1

Algorithm 2 (for assessing the topological relationship disjoint between two composite regions)

DISJOINT: (A,B)-t Boolean 3 { i t I; possible + true;

while possible and (i-Z-n> &

{j+l; while possible and (j+m) & { ~a~d RTM(A;,Bj) of

'J', 'T','I' ,'O','E': possible c false 'D':

endcase; j + j+l

1 i t i+l

1 DISJOINT t possible

)

Algorithm 3 (for assessing the topological relationship touch between two composite regions)

TOUCH: (A,B)+ Boolean = { i t I; possible t true; found + false;

while possible and (i<=n) &

{ j +- I; while possible and (jC=m) & { case RTM(Ai,Bj) of

'T,: found t true; 'J','I,,'O','E': possible + false; 'D':

endcase; j t j+l

) i t i+l

1 s possible then TOUCH + found else TOUCH +- false

1

Algorithm 4 (f or assessing the topological relationship overlap between two composite regions)

OVERLAP: (A,B)+ Boolean E { OVERLAP t false;

/* checks the cases described by Rule 2 (Equations 12 and 14) */ i t I; found1 t false; foundJ + false; found0 c false;

Composite Regions in Topological Queries 589

while (not found1 or not foundJ) and (not found0) and (i<=n) @ { continue f- true; j t 1;

while continue and (not found0) and (j<=m) & { M RTM(Ai,Bj) of

'I': (found1 t true; continue + false}; 'J': {foundJ + true; continue t false}; '0': found0 t true;

endcase; j + j+l

I i + i+i

u (found1 and foundJ) or found0 then OVERLAP + true elss /* checks the case described by Rule 2 (Equation 13) */ { i + I; foundIJE + false; foundRowTD + false;

while (not foundIJE or not foundRowTD) and (i<=n) &J { countTD t 0; possible t true; j t 1; while possible and (j<=m) & { ~ RTM(A&?j) of

'T','D': countTD +- countTD+l; 'I','J','E': {foundIJE t true; possible + false};

endcase; j c j+l

1 s countTD=m then foundRowTD t true; i t i+l

j + 1; foundColTD + false; while (not foundColTD) and (jC=m) & { countTD t 0; possible + true; i + 1; while possible and (ic=n) & { cafe RTM(Ai,Bj) of

'T' ,'D': countTD t countTDt1; 'I','J','E': possible c false;

endcase; i +-- i+l

1 u countTD=n then foundColTD + true; j t j+l

1 s foundIJE and foundRowTD and foundColTD then OVERLAP t- true;

} /* if */

1

The worst case complexity of the algorithms above is O(n x m). Notice that O(n x m) is also the complexity with respect to the average case, even though the actual running time can be shorter. For example, the in relationship between A and B holds whether each A’s component is contained inside one (and only one) B’s component. It follows that the execution of the inner while-do ends as soon as the condition (Ai, in, Bj) becomes true, that is when the logical variable found becomes true. The complexity of the function IN ranges from O(n x m) (whenever AI, AZ, . . . , A, are all contained in Bm) to O(n) (in the lucky case where AI, AZ,. . . , A, are all contained in BI).

Table 5 gives an overview of the definition of the functions implemented and the related oper- ators as they are used in the POSTQUEL query language. Note that all operator symbols start with an equal sign, in order to distinguish them from the other operators available in POST-

590 ELISEO CLEMENTINI et al.

function name operator input output Boundary2Mpgn MULTIPOLYGON POLYLINE2 Boundary2Pgn POLYGON2 POLYLINE2 IOperator MULTIPOLYGON Integer POLYGON2

Npoly MULTIPOLYGON Integer Touch2MpgnMpgn =* MULTIPOLYGON MULTIPOLYGON Boolean In2MpgnMpgn =@ MULTIPOLYGON MULTIPOLYGON Boolean Overlap2MpgnMpgn =& MULTIPOLYGON MULTIPOLYGON Boolean DisjointZMpgnMpgn =! MULTIPOLYGON MULTIPOLYGON Boolean Touch2PgnPgn =* POLYGON2 POLYGON2 Boolean In2PgnPgn =@ POLYGON2 POLYGON2 Boolean Overlap2PgnPgn =& POLYGON2 POLYGON2 Boolean DisjointPPgnPgn =! POLYGON2 POLYGON2 Boolean

Table 3: Functions and operators.

GRES/GEO++. The operators are overloaded: e.g., all touch relationships are represented by * = .

6. EXAMPLES OF TOPOLOGICAL QUERIES

For the purposes of the present paper let us refer to a database containing European countries and lakes. The classes COUNTRY and LAKE have, respectively, a multipolygon and a polygon as spatial attribute; both classes were created with the following POSTQUEL code:

create COUNTRY (name=text,geo_mpg=MULTIPOLYGON) create LAKE (name=text,geo_pgn=POLYGON2).

It is possible to formulate spatial queries either with POSTQUEL or the GEO++ graphical interface. Some examples of POSTQUEL queries follow.

Query 1 Retrieve the names of countries bordering on Italy.

retrieve (Cl.name) from Cl in COUNTRY, C2 in COUNTRY where(C1 .geo_mpg =* C2 .geo_mpg) and (C2.name = “Italy”)

Of the two conditions that are part of the where clause, the first one is of topological nature and it concerns pairs of composite regions (coarse level of the TRCR model).

Query 2 Retrieve the separate components of France.

retrieve (geo_pgn=IOperator(C1.geo_mpg,I.i)) from Cl in COUNTRY, I in INDEX where (Cl .name=“France”) end (I. iC=Npoly(Cl .geo_mpg))

The above query uses the operators i and N of the TRCR model, implemented by the functions IOperator and Npoly, respectively. The function IOperator returns the i-th component of a multipolygon, given the index I. i. The index is taken from a class INDEX containing integer values. Npoly returns the number of a multipolygon’s components and serves to limit the index range variation to the actual number of France’s components.

Query 3 Retrieve the names of lakes contained in the Italian peninsula.

Composite Regions in Topological Queries 591

retrieve (L. name) from C in COUNTRY, L in LAKE

where (L.geo_pgn =@ IOperator(C.geo_mpg,l)) and (C.name = “Italy”)

The topological condition in the where clause concerns a lake and a component of a composite region, namely two simple regions (detailed level of the TRCR model). The code of Query 3 has been simplified by assuming that the Italian peninsula is the first component of the composite region Italy.

Query 4 Retrieve Italian and French islands.

To process Query 4 the query processor has to take into account all the components of the multi- polygons representing the geometry of Italy and France. This could be done by using nested queries, but to simplify the where clause, we assume that the components of the two countries have been already selected (similarly to Query 2) and maintained in the views PARTS-OF-FRANCE and PARTS-OF-ITALY. With such an assumption the POSTQUEL code of Query 4 is the following:

retrieve (Pl.geo_pgn, P2.geo_pgn) from C in COUNTRY, Pl in PARTS_OF_FRANCE, P2 in PARTS-OF-ITALY where (Pl.geo_pgn =! C.geo_mpg) or (P2.geo_pgn =! C.geo_mpg)

Query 4 uses the disjoint relationship since, topologically, islands are the country components

disjoint from all other countries. In this case the topological relationship is tested between a polygon and a multipolygon. Figure 6 shows the result of Query 4. Islands are shown in black in

the window.

7. CONCLUSIONS

The variety and complexity of geographic entities call for a general theory that models complex features. With regard to regions, the two main extensions the researchers are working on are towards separations of the exterior (holes) and separations of the interior (multiple components). While specific models for regions with holes have already appeared in the literature of spatial modeling (6, 211, there is still lack of models for composite regions.

The TRCR model aims to formalize topological relationships between composite regions, which are geographic objects usually not offered in current GIS, but extremely useful in several applica- tions whenever a complex region made up of disjoint parts (e.g., a country, a state, an archipelago) needs to be treated as a single object.

The model is hierarchically structured and allows a coarse and detailed description. At the coarse level, the description of the topological relationship between two composite regions is done in terms of one of the four relationships: in, overlap, touch, and disjoint, which constitute a mutually exclusive and complete set of relationships. At the detailed level, the corresponding description is done in terms of a matrix summarizing the detailed relationships between their

components. The integration, within the TRCR formal framework, of disconnected regions and holes is an

open problem in the field. In the Appendix, we show that holes could be treated as a side effect of our model, but this direction needs further investigation from the efficiency point of view.

A spatial query language must support composite regions and allow users to ask spatial queries involving topological relationships both among composite regions as a whole and among their components. The present contribution gives the needed formal background for solving these two

problems. The theory presented in the paper has been implemented on top of the POSTGRES/GEO++

environment. Besides points, polylines, and polygons, we introduced the multipolygon spatial type necessary for implementing composite regions. Furthermore, the relationships and operators of the TRCR model were incorporated in the POSTQUEL query language.

Outside the scope of the present contribution is the investigation of methodologies for the query language mapping on the physical data structure, the definition of optimization strategies, and the

ELISEO CLEMENTINI et d.

Fig. 6: A GEO++ window showing the answer to a topological query.

analysis of experimental results when different data volumes and query categories are taken into account.

APPENDIX. MODELING REGIONS WITH HOLES AS COMPOSITE REGIONS

In Section 4, we proposed the TRCR model for describing topological relationships among composite regions both at the coarse and the detailed level. In this appendix, we show that the TRCR model can be used to treat regions with holes as composite regions.

Let us split a composite region A with holes (having n, components and nh holes) in two composite regions A, and Ah such that A, represents the regions without holes and Ah represents the holes (Figure 7). The coarse topological relationship holding between the two regions is always (Ah, in, A,). From Equation 10, at the detailed level, the topology (for two composite regions with n, and nh components, respectively) is described by a number of relationships equal to:

The n, x nh relationships can be either in or disjoint, depending whether a hole is contained or not in a component. (7) are the relationships among the components. (“2”) are the relationships among the holes and, similarly to components, can be only to& or disjoint.

Let us consider now a scene of two composite regions with holes (A and B). Let nc and m, be the number of A’s and B’s components, and nh and mh their number of holes, respectively. Since the coarse relationship holding between a region and its holes is always in, it turns out

Composite Regions in Topological Queries 593

Fig. 7: Modeling a composite region with holes as two composite regions.

that four global relationships are sufficient for expressing the coarse level of description for two

composite regions with holes, namely (A,, T, B,), (Ah, T, Bh), (Ah, T, B,), and (Bh, r, A,), where r stands for one of the four relationships of Section 4.1. At the detailed level, the number of necessary relationships is equal to:

V = (n, + n/J x (m, + 7%) + V1 + v2, (19)

where vl, ~3 (defined according to Equation 18) are the number of relationships describing the internal structure of A and B.

Acknowledgements - This work was supported by the Italian MURST project “Basi di dati evolute: modelli, metodi e sistemi” and CNR project no. 95.00460.CT12 “Modelli e sistemi per il trattamento di dati ambientali e territoriali”. We are grateful to Peter van Oosterom and Tom Vijlbrief of TN0 Physics and Electronics Laboratory (The Netherlands) for helping us in the implementation of composite regions and for making the GEO++ system available to us. We thank the anonymous referees for helping us to improve the quality and the organization of the paper.

ill

PI

131

[41

[51

[61

171

PI

REFERENCES

Felice L. Bedford. Perceptual and cognitive spatial learning. Perception and Performance, 19(3):517-530 (1993).

Journal of Experimental Psychology: Human

William G. Chase and Michelene T.H. Chi. Cognitive skill: Implications for spatial skill in large-scale environ- ments. In John H. Harvey, editor, Cog&ion, Social Behavior, and the Environment, pp. 111-136. Lawrence Erlbaum Associates, Hillsdale, NJ (1981).

Eliseo Clementini and Paolino Di Felice. A comparison of methods for representing topological relationships. Information Sciences, 3:149-178 (1995).

Eliseo Clementini, Paolino Di Felice, and Peter van Oosterom. A small set of formal topological relationships for end-user interaction. In David Abel and Beng Chin Ooi, editors, Advances in Spatial Databases - Third International Symposium, SSD’93, volume 692 of Lecture Notes in Computer Science, pp. 277-295. Springer- Verlag, Singapore (1993).

Zhan Cui, Anthony G. Cohn, and David A. Randell. Qualitative and topological relationships in spatial databases. In David J. Abel and Beng Chin Ooi, editors, Advances in Spatial Databases - Third Interna- tional Symposium, SSD’93, volume 692 of Lecture Notes in Computer Science, pp. 296-315. Springer-Verlag, Singapore (1993).

Max J. Egenhofer, Eliseo Clementini, and Paolino Di Felice. Topological relations between regions with holes. International Journal of Geographical Information Systems, 8(2):129-142 (1994).

Max J. Egenhofer and Robert D. Franzosa. Point-set topological spatial relations. International Journal of

Geographical Information Systems, 5(2):161-174 (1991).

Max J. Egenhofer and Jayant Sharma. Topological relations between regions in R2 and Z2. In David Abel and Beng Chin Ooi, editors, Advances in Spatial Databases - Third International Symposium, SSD’93, volume 692 of Lecture Notes in Computer Science, pp. 316-336. Springer-Verlag, Singapore (1993).

594 ELISEO CLEMENTINI et al.

[9] ESRI. Understanding GIS - The ARC/INFO Method. Longman, London (1993).

[lo] Reginald G. Golledge. Do people understand spatial concepts: The caSe of first-order primitives. In Andrew U. Frank, Irene Campari, and Ugo Formentini, editors, Theories and Models of Spatio-Temporal Reasoning in Geographic Space, volume 639 of Lecture Notes in Computer Science, pp. 1-21. Springer-Verlag, Berlin (1992).

[ll] Michael F. Goodchild. Geographical information science. International Journal of Geographical Information Systems, 6(1):31-45 (1992).

[12] Daniel Hernkndez. Maintaining qualitative spatial knowledge. In Andrew U. Frank and Irene Campari, editors, Spatial Information Theory: A Theoretical Basis for GIS - European Conference, COSIT’99, volume 716 of Lecture Notes in Computer Science, pp. 36-53. Springer-Verlag, Berlin (1993).

[13] Zhexue Huang and Per Svensson. Neighborhood query and analysis with GeoSAL, a spatial database language. In David J. Abel and Beng Chin Ooi, editors, Advances in Spatial Databases - Third International Symposium, SSD’93, volume 692 of Lecture Notes in Computer Science, pp. 413-436. Springer-Verlag, Singapore (1993).

[14] Wolfgang Kaina, Max J. Egenhofer, and Ian Greasly. Modelling spatial relations and operations with partially ordered sets. International Journal of Geographical Information Syslems, 7(3):215-229 (1993).

[15] Robert Laurini and Derek Thompson. Fundamental8 of Spatial Information Systems. Academic Press, New York (1992).

[16] Matthew McGranaghan. Matching representations of geographic locations. In David Mark and Andrew Frank, editors, Cognitive and Linguistic Aspects of Geographic Space, pp. 387-402. Kluwer Academic, Dordrecht (1991).

[17] James R. Munkres. Topology: a first course. Prentice-Hall Inc., Englewood Cliffs, NJ (1975).

[18] Michael Stonebraker and Greg Kemnitz. The POSTGRES next-generation database management system. Communications of Ihe ACM, 34(10):78-92 (1991).

[19] Jan van Roessel and David Pullar. Geographic regions: A new composite GIS feature type. In Robert B. McMaster and Marc P. Armstrong, editors, AUTO-CART0 11, pp. 145-156 (1993).

[20] Tom Vijlbrief and Peter van Oosterom. The GEO++ system: An extensible GIS. In 5th International Symposium on Spatial Data Handling, pp. 40-50. International Geographical Union IGU (1992).

[21] Michael F. Worboys and Petros Bofakos. A canonical model for a class of area1 spatial objects. In David J. Abel and Beng Chin Ooi, editors, Advances in Spatial Databases - Third International Symposium, SSD’99, volume 692 of Lecture Notes in Computer Science, pp. 36-52. Springer-Verlag, Singapore (1993).