Querying geographical data warehouses with GeoMDQL

15
Querying Geographical Data Warehouses with GeoMDQL Joel da Silva 1 , Ausberto S. Castro Vera 2 , Anjolina G. de Oliveira 1 , Robson do N. Fidalgo 1 , Ana C. Salgado 1 , Val´ eria C. Times 1 1 Centro de Inform´ atica - Universidade Federal de Pernambuco (UFPE) P. O. Box 7851 – 50.732-970 – Cidade Universit´ aria – Recife – PE – Brazil 2 Centro Universit´ ario Adventista de S˜ ao Paulo (UNASP) ao Paulo – SP – Brazil [email protected], [email protected], {ago,rdnf,acs,vct}@cin.ufpe.br Abstract. Integrating geographical and multidimensional processing has been proposed in several researches in the database literature. One of the most im- portant issues of this process is data querying. However, most of the current approaches do not take into account the use of a query language to make the simultaneous specification of multidimensional and spatial operators available. In this paper, we present GeoMDQL, which is a geographical-multidimensional query language, specifically been designed for SOLAP environments. Ge- oMDQL is based on well-known standards such as the MDX language and OGC Simple Features Specification for SQL. We present here the GeoMDQL grammar and a discussion regarding the taxonomy and syntax of GeoMDQL query types. Additionally, some aspects related to the GeoMDQL architecture implementa- tion are shown together with a case study description to illustrate the proposed query language syntax. 1. Introduction There have been several researches in the field of integrating multidimensional and geo- graphical processing [1, 16]. However, as both environments were originally conceived for different purposes, this integration is not trivial. The main objective is to provide an open and extensible environment with integrated functionalities for manipulation, queries and analysis not only of conventional data, but of geographical data as well. This new type of environment benefits the strategic decision-making process in the business of an organization by widening the analysis universe. This type of environment has been re- ferred to as SOLAP (Spatial OLAP) [22]. Nevertheless, to this moment, this integration has not yet been totally achieved or makes use of proprietary technologies, which often raises expenses and makes both the development and re-usage of the proposed solution more difficult. We believe that an ideal SOLAP environment must be open, extensible and in- dependent of platform, and must congregate: 1) tools for extracting, transforming and reading conventional and geographical data found in different sources; 2) a Geograph- ical Data Warehouse (GDW) as a repository for integrated multidimensional and geo- graphical data; 3) metamodels for the GDW and for the set of integration metadata; 4) a mechanism for multidimensional and geographical processing; 5) a query language with integrated syntax for simultaneous usage of both multidimensional and spatial operators, XXII Simpósio Brasileiro de Banco de Dados SBBD 2007 223

Transcript of Querying geographical data warehouses with GeoMDQL

Querying Geographical Data Warehouses with GeoMDQL

Joel da Silva1, Ausberto S. Castro Vera2, Anjolina G. de Oliveira 1,Robson do N. Fidalgo1, Ana C. Salgado1, Valeria C. Times1

1Centro de Informatica - Universidade Federal de Pernambuco (UFPE)P. O. Box 7851 – 50.732-970 – Cidade Universitaria – Recife – PE – Brazil

2Centro Universitario Adventista de Sao Paulo (UNASP)Sao Paulo – SP – Brazil

[email protected], [email protected], {ago,rdnf,acs,vct}@cin.ufpe.br

Abstract. Integrating geographical and multidimensional processing has beenproposed in several researches in the database literature. One of the most im-portant issues of this process is data querying. However, most of the currentapproaches do not take into account the use of a query language to make thesimultaneous specification of multidimensional and spatial operators available.In this paper, we present GeoMDQL, which is a geographical-multidimensionalquery language, specifically been designed for SOLAP environments. Ge-oMDQL is based on well-known standards such as the MDX language and OGCSimple Features Specification for SQL. We present here the GeoMDQL grammarand a discussion regarding the taxonomy and syntax of GeoMDQL query types.Additionally, some aspects related to the GeoMDQL architecture implementa-tion are shown together with a case study description to illustrate the proposedquery language syntax.

1. Introduction

There have been several researches in the field of integrating multidimensional and geo-graphical processing [1, 16]. However, as both environments were originally conceivedfor different purposes, this integration is not trivial. The main objective is to provide anopen and extensible environment with integrated functionalities for manipulation, queriesand analysis not only of conventional data, but of geographical data as well. This newtype of environment benefits the strategic decision-making process in the business of anorganization by widening the analysis universe. This type of environment has been re-ferred to as SOLAP (Spatial OLAP) [22]. Nevertheless, to this moment, this integrationhas not yet been totally achieved or makes use of proprietary technologies, which oftenraises expenses and makes both the development and re-usage of the proposed solutionmore difficult.

We believe that an ideal SOLAP environment must be open, extensible and in-dependent of platform, and must congregate: 1) tools for extracting, transforming andreading conventional and geographical data found in different sources; 2) a Geograph-ical Data Warehouse (GDW) as a repository for integrated multidimensional and geo-graphical data; 3) metamodels for the GDW and for the set of integration metadata; 4) amechanism for multidimensional and geographical processing; 5) a query language withintegrated syntax for simultaneous usage of both multidimensional and spatial operators,

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

223

and 6) a client application with a friendly interface for elaboration, submission, manip-ulation and visualization of the results. However, none of the integration approachesproposed up to this moment have all of the items listed. Thus, in order to provide a satis-factory SOLAP environment, the GOLAPA(Geographical Online Analytical ProcessingArchitecture)project [23] has studied ways to decrease the complexity usually related tothe activities meant to make queries about multidimensional and geographical data fordecision-making. An important result to be achieved in this project, is the development ofa geographical-multidimensional query language that allows simultaneous usage of boththe multidimensional and spatial operators. This would make recovering relevant infor-mation to the decision-making process possible, taking away the complexities of this typeof assignment, as much as possible. It is in this sense thatGeoMDQL (Geographical andMultidimensional Query Language)is presented in this article, along with its architecture.The specification of a query language with characteristics that allow simultaneous usageof multidimensional and geographical operators will make it possible to have a wide andsatisfactory support in the strategic decision-making context.

This paper is organized as follows. Section 2 gives an overview of the develop-ment of query languages for geographical and multidimensional data. Then, section 3presents our query language proposal, namelyGeoMDQL, including some formal def-initions, a short description of theGeoMDQLgrammar and a discussion regarding thetaxonomy ofGeoMDQLquery types. Subsequently, someGeoMDQLquery examplesare given in the section 4 to illustrate the proposed query language syntax and some ap-plication ideas. Finally, section 5 outlines our final considerations on the work reportedin this paper and points out some important issues on future work.

2. Languages for Spatial and Multidimensional QueryingAn important component of a decision support application is its query language. Nowa-days, dozens of approaches have been proposed to query geographical and multidimen-sional data. However, because of space limitation, we will only list some of them.

2.1. Multidimensional Query Languages

One of the most important approaches for querying multidimensional data is MDX(Multi-dimensional Expressions)[25]. By using MDX, users can perform many complex queriesover a multidimensional data cube, making available configurable data viewed in differentangles and aggregation levels by using multidimensional operators. Despite being similarto the traditional SQL, MDX is not an SQL extension. MDX is a special query languagewith a lot of analytical functions and has been optimized for querying multidimensionaldata. Currently, MDX is supported by most of the OLAP vendors. In the same context, in2000 theISO SQL OLAP[11] specification was published. It is an improvement for thetraditional SQL query language. Thus,ISO SQL OLAPoffers many functions for query-ing multidimensional databases by providing some OLAP operators, and is also supportedby some of the existing OLAP suppliers.

The MD-CAL Multidimensional Calculus[2] is another study that aims at pro-viding a multidimensional query language. MD-CAL performs calculus operations overa fact table in a multidimensional data source, supporting a high-level analysis over multi-dimensional data and allowing the use of scalar and aggregated functions built-in to its ex-pressions. Another approach for multidimensional querying is theData Cube[13], which

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

224

provides support to perform multidimensional data grouping, sub-totals, cross-tabulation,roll-up and drill-down operators. The last study listed in this section isSQL-M[18]. Thisapproach is composed by a data model, a formal algebra and a multidimensional querylanguage for multidimensional data analysis. The authors ofSQL-M highlight that oneof the greatest advantages of their study is the capability of manipulating complex andirregular hierarchies.

2.2. Spatial Query Languages

Regarding the geographical query languages, some study has been developed in this re-search area as well. TheSpatial SQL, proposed by Egenhofer [4] constitutes two modules:(i) a query language and (ii) a presentation language. The first module is based on the tra-ditional SQL and preserves the SELECT-FROM-WHERE clause, while the presentationlanguage, namedGPL (Graphical Presentation Language), allows users to customizehow spatial objects are presented. Another relevant study isGeoSQL[6], which is alsobased on the traditional SQL and is similar to theSpatial SQL. TheGeoSQLquery lan-guage has been implemented as a module of an object-oriented GIS prototype namedYH-GIS. In a query expression, the non-spatial constraints are expressed using logicaland comparison operators, similarly to the SQL WHERE clause. The spatial constraintsare expressed using logical expressions and spatial predicates, which are based on therelationships found among the geographical features.

SQL/SDA[15] extends the traditional SQL for spatial analysis and is based onthe OGC SFS4SQL (OGC Simple Feature Specification For SQL)specification [3], andthus, offers many spatial functions for the management and querying of geographicalfeatures. A graphical user interface, written in Java, provides icons that represent themost common spatial functions to help users in expressing their queries. In the samecontext, based on theOGC SFS4SQLspecification, a visual query language for spatialdatabases was proposed in [17]. The chosen technique for this approach is the translationfrom the queries expressed in flow diagrams to an SQL based spatial extension. Also, animproved graphical user interface has been developed for users to express their queries.One of most important studies related to spatial query languages is theISO SQL MM[14].This specification is an effort to include spatial processing functionalities in the traditionalSQL and is also based on theOGC SFS4SQL, resulting in the provision of many functionsfor the management, storage, analysis and recovery of geographical features.

2.3. Some Considerations

As we can see, none of the previously outlined studies provide a query language withcapabilities to integrate both multidimensional and spatial operators in a single syntax.The proposal presented in [20] has almost achieved this by presenting an object-orientedgeographical database, which has been extended to support links to analytical data storedin a multidimensional cube. Thus, starting from a spatial query result, it is possible torecover the multidimensional data related to these links. However, a query language thatfully integrates multidimensional and spatial operators is not discussed. Moreover, the ge-ographical database and the multidimensional data sources have not fully been integratedbecause a geographical DW has so far not been used.

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

225

3. The GeoMDQL Query Language

As shown in the previous section, there are many proposals in the database literature.However, there is not a query language with a syntax that allows the use of spatial andmultidimensional operators for querying a geographical data warehouse. Thus, in thissection we will present theGeoMDQL(Geographical and Multidimensional Query Lan-guage)that is based on both MDX [25] andOGC SFS4SQL[3] and offers a unified syntaxfor the statement of queries containing both multidimensional and spatial operators.

3.1. Formal Definitions

Some formal definitions concerning DW and data cubes have already been given by re-searches. However, the formal work presented in this section aims to consider the ex-istence of geographical objects in a DW. To achieve this, we shall now introduce theconcept of aDimension Table,Fact TableandGeographical Data Warehouse, where thedata types may be an element of a set of typesT. For example, an elementt of T may beInteger, Real, String or Geometry.

Defintion 3.1 Dimension TableA dimension table is an array relation overK × S× A1 × ... × An−1, whereK is theset of primary keys,S is the set of foreign keys, and eachAi, 1 ≤ i ≤ n − 1, is a set ofattributes. An attribute may have any data type, even Geometry.

Defintion 3.2 Fact TableGiven a setDT of dimension tables, a fact table is an array relation overK1 ×K2 × ...×Km ×M1 ×M2 × ...×Mr, wheren = m + r, eachKi, 1 ≤ i ≤ m, is a primary key of adimension tabledti ∈ DT , and eachMj, 1 ≤ j ≤ r, is a set of measures of the fact table.The elements of a fact tableft are n-tuples, which are called facts or measures.

Defintion 3.3 Geographical Data WarehouseA Geographical Data Warehouse (GDW) is a collection of at least one dimension tableand at least one fact table.

Based on a GDW, several data cubes may be instantiated. Thus, we present nextsome other formalizations related to data cubes, including definitions for a dimension,hierarchy, geographical dimension and data cube.

Defintion 3.4 HierarchyAn hierarchyh, denoted byh =< L,�> is a partially ordered set (poset) over the set oflevelsL, so that forlx andly ∈ L, lx ≤ ly defines the relationship between the levelslxandly. Each levellx is a collection of members of a data typet.

Defintion 3.5 DimensionA dimensiond is a set of hierarchies.

Defintion 3.6 Geographical DimensionLet d be a dimension with at least one hierarchyh =< L,�>, so that there is anli ∈ Lof type Geometry. We say that d is a geographical dimension, denoted byGD.

Defintion 3.7 Data CubeA data cube (or cube)C is a par< D,F >, whereD is a set of dimensions andF is a setof facts or measures defined from a fact tableft. The setF is represented by a set of pars< fi, mi >, wherefi : Mt → M is a function by which a measuremi is associated with

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

226

the tuples defined over the combination of measures offt (denoted byMt). The dimensionof a cubeC is equal to|D|, i.e. the cardinality ofD. If |D| = n, we use the notationCn todenote the dimension ofC.

Defintion 3.8 Gegraphical Data CubeLet a data cubeC =< D, F > if D has at least one geographical dimension, we say thatC is a geographical data cube, denoted byGC.

There are several operations that can be used to handle a GC. These include con-glomerating/dispersing, selecting/projecting and navigating operations which have beenimplemented inGeoMDQL. However, due to space restrictions, only the mainGeoMDQLoperators are formally defined in this paper. To formalize them, we consider that, in aposet< S,�> we say that an elementy ∈ S coversan elementx ∈ S if x < y and thereis no elementz ∈ S such thatx < z < y. We also say thaty is animmediate successorof x andx is an emphimmediate predecessor ofy. Now, we can define the aggregationoperation Roll-up and the disaggregation operation Drill-Down as follows.

Defintion 3.9 Aggregation Operation Roll-upGiven a levellx in a hierarchyh =< L,�>, the aggregation operationRoll-up from h toLi∪{>} is defined byRoll−up(lx) = {l1, ..., lm}, so that eachlj, 1 ≤ j ≤ m, coverslx.When there is no element inL which coverslx the symbol> is returned. In other words,theRoll-upoperation applied to a levellx ∈ L returns all the immediate successors oflx.

Defintion 3.10 Disaggregation Operation Drill-DownGiven a levelly in a hierarchyh =< L,�>, the disaggregation operationDrill-Downfrom h to Li ∪ {⊥} is defined byDrill − Down(ly) = {l1, ..., ln}, so that eachlj, 1 ≤j ≤ n, is an immediate predecessor ofly. When there is no immediate predecessor forly, the symbol⊥ is returned. In other words, theDrill-Down operation applied to a levelly ∈ L returns all the immediate predecessors ofly.

Defintion 3.11 Slice OperationGiven a cubeCn =< D,F > andX ⊆ D such that|X| = m , theSliceoperation fromCn × P (D) to Cn−m is defined bySlice(< D, F >, X) =< D −X, F >, whereP (D)is the set of parts ofD. In other words, the slice operation removes a subset of dimensionsof a cubeCn and creates a sub-cubeCn−m.

3.2. A Taxonomy of GeoMDQL Query TypesTheGeoMDQLoperators are based on the spatial operators given in [5, 3], OLAP opera-tors found in [26] and on a combination of these. Thus, theGeoMDQLquery types havebeen classified into the following groups that are described below.

GEO: A request of type GEO only contains geographical parameters for performing aspatial query. For this query type, users can use spatial operators likedistance,intersects, contains, overandcrossesfor evaluating spatial relationships betweentwo geographical features. As an example we consider a land use data cube inwhich users wish to knowwhich farms intersect a specific riveror which farmsare contained in an specific hydrographic basin. The result of these queries willalways be a geographical feature or a set of geographical features displayed on amap.

MD: A query of type MD only contains multidimensional parameters and allows theexecution of a multidimensional query against the geographical data cube. For

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

227

example, in a retail data cube, users may wish to know thelist of the 10 productsseen as more sold for each category and for a specific month of the year. Toformulate this query, the user can use well-known multidimensional operators asrank, roll-up, slice or diceand the request result will always be a multidimensionaldata table.

GEOMD: A GEOMD request type consists of a combination of the two previous querytypes. This request can be further classified into two other types as follows: (1)Mapping GEOMD, corresponds to a multidimensional request that makes datawith geographical correspondences to be displayed on a map, and (2)IntegrationGEOMD, where multidimensional and spatial restrictions are specified and usedin the request processing as well. For aMapping GEOMDquery type, as an exam-ple we consider users querying a GC to know where their best customers live. Inthis case, a similar query to a MD query is formulated using multidimensional op-erators only, and then, the location of the query results is displayed on a map. Forthis, theGeoMDQLquery processor automatically executes the needed queriesagainst the GC to identify the MD query results that have some geographical cor-respondences. In this case, the query results are always displayed using both mapsand tables. On the other hand, in the case of aIntegration GEOMD, users can useboth multidimensional and spatial operators for formulating their requests. For aland use analysis given as an example, users may wish to knowwhich farms inter-sect an specific river and have produced above 10 tons of rice in the year of 2005.For the case of an emphIntegration GEOMD, the results can be displayed usingmaps and/or tables.

Notice that any user query results can be given as an input parameter to new furtherqueries being formulated which can always be mapped to one of the query types givenbefore.

3.3. The GeoMDQL Grammar and Syntax

Figure 1. GeoMDQL Language Grammar

We have defined theGeoMDQLgrammar using the EBNF [12] because this for-malism can be found in many academic work, has been applied to some commercial areasand finally, because the MDX original grammar was developed using this formal repre-sentation as well. Figure 1 shows the specification of the main elements of aGeoMDQLquery. The main element of theGeoMDQLgrammar isgeomdqlstatement, which hasbeen defined as aselectstatement. Thisselectstatementmay contain the followingdefinitions: a)(WITH formulaspecification)?; b)SELECT (axisspecificationlist)?; c)FROM cubespecification; d)(WHERE slicerspecification)?; e)(cell props)?and f) (ON

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

228

Operator Description SyntaxDrilldownLevel Drills down the members of a set, at a specified level,

to one level below. Alternatively, drills down on aspecified level in the hierarchy.

DrilldownLevel(<Set>)DrilldownLevel(<Set>,<Level>)

DrilldownMember Drills down the members in a set that are found in asecond set.

DrilldownMember(<Set>,<Set>)

DrillupLevel Drills up the members of a set, at a specified level, toone level above. Alternatively, drills up on a specifiedlevel in the hierarchy.

DrillupLevel(<Set>)DrillupLevel(<Set>,<Level>)

DrillupMember Drills up the members in a set that are found in a sec-ond set.

DrillupMember(<Set>,<Set>)

Ancestor Returns the ancestor of a member at a certain level.Ancestor(<Member>,<Level>)Ancestor(<Member>,<Numeric Expression>)

Descendants Returns the set of descendants of a member at a spec-ified level.

Descendants(<Member>)Descendants(<Member>,<Level>)Descendants(<Member>,<Numeric Expression>)

Ascendants Returns the set of the ascendants of a given member.Ascendants(<Member>)

Members Returns the set of members of a dimension, hierarchyor level.

<Dimension>.Members<Hierarchy>.Members<Level>.Members

Children Returns the children of a member. <Member>.Children

Siblings Returns the siblings of a given member, including themember itself.

<Member>.Siblings

All Returns the top level of an specified hierarchy <Hierarchy>.All

Table 1. GeoMDQL Operators Overloaded from MDX

MAP)?. According to the EBNF syntax, the character ”?” indicates that an element isoptional, while the uppercase elements (e.g.,WITH, SELECT, FROM, WHERE and ONMAP) are terminal elements of theGeoMDQLlanguage.

Theformula specificationelement maintains the MDX [25] language original def-inition and allows the specification of formulas for the creation of sets of calculated mem-bers, based on the values stored in a data cube. For instance, the statementWITH MEM-BER Measures.[Profit Percent] AS ’(Measures.[Store Sales]-Measures.[Store Cost]) /(Measures.[Store Cost])’makes use of two measures stored in a data cube (i.e. [StoreSales] and [Store Cost] ) to create a new calculated member namedProfit Percent, whosevalue is calculated at run time.

Note thataxis specificationlist represents the axes definitions of aGeoMDQLquery, being similar to the MDX original syntax. For an axis specification, the mul-tidimensional and spatial operators can be used for navigating and handling membersof a data cube hierarchy. Some of theGeoMDQLmultidimensional operators are in-herited from MDX, as shown in the following examples. The operatorFILTER re-turns a set resulting from filtering another set based on a search condition. Thus,the statementFILTER({[Localization] . [City].Members}, (Measures.[Population])>500000 ) ON ROWSmay be used to display at theROWSaxis of a query all mem-bers of theCity dimension that are associated with thePopulationmeasure whose valueis larger than 500000. Similarly, theTOPCOUNToperator returns a specified num-ber of items taken from the top of an optionally ordered set. Then, the statementTOPCOUNT({[Localization].[State].Members}, 5 , Measures.[Neonate Deaths]) ONCOLUMNSmay be used to display at theCOLUMNSaxis the members of theState

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

229

Operator Description SyntaxDistance Returns the cartesian distance between two members

of a geographical dimensionDistance(<Member> | <MemberGeometry>,<Member>| <MemberGeometry>)

Positional Operators These operators verify if a member is at certain po-sition of another member. For example, AtNorth Ofverifies if a member is at the north of another member.

At North Of(<Member> | <MemberGeometry>,<Member>| <MemberGeometry>)

Topological OperatorsThese operators verify topological relationships be-tween two members. For example, Intersects verifiesif a member intersects another member.

Intersects(<Member> | <MemberGeometry>,<Member>| <MemberGeometry>)

Intersection Returns the intersection area between two membersIntersection(<Member> | <MemberGeometry>,<Member>| <MemberGeometry>)

Union Returns a geometry that is the union set of two mem-bers or a set of members.

Union(<Member> | <MemberGeometry>,<Member>| <MemberGeometry>)Union(<Set>)

Buffer Returns a geometry that represents all points whosedistance from this geometry is less than or equal to aspecified distance.

Buffer(<Member>,<Numeric Expression>)

Area Returns the member area. Area(<Member>)

Length Returns the member length. Length(<Member>)

Equals Verifies if a member is equal to another member. Equals(<Member> | <MemberGeometry>,<Member>| <MemberGeometry>)

Table 2. GeoMDQL Operators based on OGC SFS4SQL

dimension that are related to the top fiveNeonate Deathsmeasure values.

Besides being able to use the original MDX operators, users can select any ofthe overloaded operators listed in Table 1. Based on MDX, these operators have beenoverloaded inGeoMDQLto provide a means of navigating and analyzing a geograph-ical data cube. For instance, to reduce the data granularity of a given dimension, theaggregation and disaggregation operations, namelyRoll-Up andDrill-Down, discussedin section 3.1, may be chosen. These correspond to theDRILLDOWNLEVELandDRILLUPLEVELoperators respectively. The following query statement given as an ex-ample drills down from theStatelevel form the next level of theLocalizationhierarchyby increasing the data details:DRILLDOWNLEVEL([Localization].[State].Members).Similarly, the statementDRILLUPLEVEL([Localization].[State].Members)may be usedto drill up from the State level to a Localization hierarchy level that is locatedabove. Also, theDRILLDOWNMEMBERand DRILLUPMEMBERoperators are usedfor navigating in a data cube hierarchy, by reducing or increasing the data gran-ularity of certain members of a given level. For example, the statementDRILL-DOWNLEVEL([Localization].[Region].Members,[Localization].[Region].[Sul], [Local-ization].[Region].[Norte])drills down the members of a set that is found in a second set,by applying theDRILLDOWNoperation to the membersSulandNorteof the levelStatesonly.

GeoMDQLalso provides a set of OGC based operators that are listed in Table 2and are used to solve spatial queries based on the geometrical data of geographical di-mension tables. While the positional operators verify if two members stand in a givencardinal direction relationship to one another(e.g. AtSouthOf, At EastOf, At WestOf,At North EastOf, At SouthEastOf, At North WestOf, At SouthWestOf), the topologi-cal operators identify whether topological relationships(e.g. Intersects, Touches, Crosses,

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

230

Operator Description SyntaxDrillOut For a given level, returns the neighboring members of

a certain member.DrillOut(<Member>)

TopDistance Rank all members that are within a given distancefrom a certain member and show the results in de-scendant order.

TopDistance(<Member>,<Set>)TopDistance(<Member>, <Numeric Expression>,<Set>)

LowDistance Rank all members that are within a given distancefrom a certain member and show the results in ascend-ing order.

TopDistance(<Member>,<Set>)TopDistance(<Member>, <Numeric Expression>,<Set>)

RankArea Rank all members according to their area. RankArea(<Set>)

RankLength Rank all members according to their length. RankLength(<Set>)

Point Returns all centroid points for a given set of polygons.Point(<Set>)

Table 3. Particularly Designed Operators for GeoMDQL labelGeoMDQLOperators

Within, Overlaps and Contains)between two members can be satisfied.

Table 2 shows the operators that may be used in an axis specifica-tion and have specially been designed for theGeoMDQL language, such asTOPDISTANCEand LOWDISTANCE. For instance, the statementSELECT LOWDIS-TANCE([Localization].[City].[Recife], 5, [Localization].[City].Members) ON ROWSmay be used to display at theROWSaxis the five closestCity level members to the city ofRecife. In this case,Location is clearly a geographical hierarchy. Also, theDRILLOUToperator may be used to identify all the neighbors of a member of a given geographicallevel. To illustrate this, the following query specification is given which may be used torecover all of the cities neighboring Recife:DRILLOUT([Location].[City].[Recife]). AsTOPDISTANCEis just the inverse operator ofLOWDISTANCE, an example of its appli-cation is not given here.

The cubespecificationelement syntax is kept as the MDX original definition,and indicates which data cube is being requested. Whilecell props maintains theMDX original specification,slicer specificationcorresponds to specificGeoMDQLre-strictions that are formulated by using any of the spatial operators listed in Table2. To illustrate this, consider the followingGeoMDQL query given as an example:AT NORTHOF([Location].[State], [Location].[State].[Pernambuco]). This shows thata spatial operator may be used in the slicer specification of aGeoMDQLquery to recoverthe members of theStatelevel that are located at the north of the State ofPernambuco.With regards toON MAP, it is a specificGeoMDQLclause and is used to display thequery results on a map. Finally, using theGeoMDQLquery language syntax presented inthis section, any query type discussed in the section 3.2 can be formulated and executedby our prototype system.

Additionally, if the user wish to query an ad hoc chosen area by selecting it froma given map, then aGeoMDQLoperator is available to identify the sketched area whosegeometry is given as a parameter to theGeoMDQLslicer specification clause. For ex-ample, with the statement(WITHIN([Localization].[City],(POLYGON((-55.51 -9.00,...,-55.51 -9.00)))))users can restrict the query context by recovering cities located withinthe polygon given as the second parameter of the spatial operatorWITHIN. It is clear thatsuch queries should have a graphical user interface to allow users to interact with previousquery results for performing new queries by just clicking and selecting objects. However,

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

231

such graphical user interface is out of the scope of the work presented in this paper.

3.4. The GeoMDQL Architecture Implementation

The GeoMDQL architecture makes use of a Geographical Data Warehouse (GDW), whichis based on the GeoDWFrame [7] guidelines. This GDW is similar to traditional datawarehouse. Also, our approach takes into account other concepts like dimensions, hierar-chies, levels and members.

Figure 2. Geographical-Multidimensional Architecture

Our system prototype is shown in Figure 2 and can be considered as an instanceof the GOLAPA architecture [8]. This architecture is composed by three layers (I, II andIII), which provide data, services and graphical user interface, respectively.

The first layer (I) contains the Geographical Data Warehouse(GDW) which isbased on the GeoDWFrame definitions citeGeoDWFrame2004 and on GeoDWM meta-model [10]. While GeoDWFrame was proposed as a set of guidelines to design spatialand multidimensional schemas, GeoDWM provides a set of classes, UML stereotypes andpictograms to help in the creation of GDW. A GDW schema is similar to traditional DWschemas (e.g. star schema) [26] and is based on well known concepts likefact tablesanddimension tables, except that the geographical objects geometries are stored in the GDWas well. Some important issues concerned to this GDW are given as follows: 1) it does notapply spatial measures, 2)it normalizes the geometrical data, 3) it provides geographicaldata in any dimensional level and 4)it stores the descriptive data of geographical fea-tures. In order to provide support to them, GDW makes use of two types of dimensions,namely: geographical and hybrid. The first one is classified into primitive and composeddimensions, while the second is grouped into micro, macro and joint. The primitive andcomposed dimensions have at all levels (or fields) just geographical data (e.g. client ad-dresses and its geo-references), while the others deal with geographical and conventionaldata (e.g. client addresses and its geo-references plus ages and genders). The open sourceDBMS used for creating the GDW is thePostgreSQLwith its spatial extension namedPostGis, previously mentioned in this paper. For the extraction, transformation and load-ing of the geographical and multidimensional data, we have used scripts based on thePostgreSQLPL/pgSQL language.

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

232

The second layer (II) implements the Geographical Online Analytical Process-ing Engine(GOLAPE)component of the GOLAPA architecture. This component is re-sponsible for receiving and processing geographical and/or multidimensional requests.In this layer, we have some software modules for query processing, query optimizationand query management. This engine has been implemented by extending theMondrianOLAP server [19] to provide support for the spatial queries processing. In this layer, allthe three types of queries listed in the section 3.2 can be processed. The requests of typeGEO, MD or GEOMD are expressed using theGeoMDQLquery language, presented inthe section 3. When anIntegration GEOMDquery is handed over to the system, theapplication program has to estimate the execution plans of sub-queries and decide howbest performance rates can be achieved. After this, the engine redirects sub-queries to theextendedMondrian OLAP Server, which collects the partial results to integrate them andsend information to the client. The graphical user interface (GUI) usesJPivot to providethe infra-structure needed to visualize the query results through charts, tables and maps.Then, the result set is showed on a web browser at the client’s side.

In the third (III) layer, the graphical user interface is given. This component hasbeen implemented by extending theJPivot[21] client application, which is responsible forsubmitting geographical and/or multidimensional requests to the GOLAPE engine. Afterobtaining the response document, the query result viewer module has been designed forgraphically displaying the results in charts, tables and/or maps using theHTML languageand theSVG(Scalable Vector Graphics) [24] technology as well.

The metadata source(METADATA)plays an important role in this work. The in-tegration metadata are accessed by the GOLAPE engine whenever a GEOMD requestis received. Thus, the GOLAPE component can find out if the multidimensional datahave some geographical correspondences. A geographical correspondence is the infor-mation representing the spatial object geometry. The metadata source implementation iscurrently based on the XML technology and on bothGAM (Geographical and AnalyticalMetamodel) andGeoMDM (Geographical Multidimensional Metamodel) metamodels,which are detailed in [8]. The CWM OLAP, GAM and GeoMD metadata are stored inthis repository and accessed by the GOLAPE engine, using the DOM API, whenever ageographical and multidimensional request is received. We also highlight that an exclu-sively geographical request or an exclusively multidimensional request usually does notneed to access the GOLAPA metadata while a geographical and multidimensional requestalways needs to access them.

As we can see, theGeoMDQLsystem prototype architecture presented in this sec-tion is based on open and extensible standards. Thus, it is suitable for the developmentof environments for decision support with low cost and according to current market stan-dards. In the next section, we will describe a case study based on the public health area tovalidate the ideas proposed in this paper.

4. A Case Study on Public Health System

In order to illustrate theGeoMDQLsyntax and applicability, someGeoMDQLquery ex-amples are listed in this section which have been designed for querying a GeographicalData Warehouse with data obtained from the Brazilian public health system. This GDWstructure is responsible for managing significant amounts of historical data that include

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

233

Figure 3. A GDW for Public Health Analysis

Query QueryType

GeoMDQL Syntax

For the year 2000, show the populationrates and the amount of neonate deathsfor each Brazilian State and grouped byregions.

MDSELECT [Measures].[Population], [Measures].[Neonate Deaths] ONCOLUMNS, DRILLDOWNLEVEL([Localization].[Region].MEMBERS) ONROWS FROM [BrPublicHealth] WHERE [Time].[Year].[2000]

Select all cities located within an ad hocchosen area. GEO

SELECT [Localization].[City].MEMBERS FROM [BrPublicHealth] WHERE(WITHIN([Location].[City],(POLYGON((-55.51 -9.00,...,-55.51 -9.00))))) ONMAP

For the year 2002, show ten BrazilianStates having the greatest number ofneonate deaths, by highlighting the re-sults on a map.

MappingGEOMD

SELECT [Measures].[Neonate Deaths] ON COLUMNS, TOP-COUNT([Localization].[States].Members,10,Measures.[Neonate Deaths]) ONROWS FROM [BrPublicHealth] WHERE [Time].[Year].[2002] ON MAP

Show the amount of successful birthsand of neonate deaths for the year 2002and for the States that are located in theNortheast of the Federal District.

IntegrationGEOMD

SELECT [Measures].[Borns Alive],[Measures].[Neonate Deaths]ON COLUMNS, [Location].[State].MEMBERS ON ROWSFROM [BrPublicHealth] WHERE (([Time].[Year].[2002]) and(AT NORTHEASTOF([Localization].[State], [Location].[State].[DISTRITOFEDERAL]))) ON MAP

Table 4. Examples of GeoMDQL Queries for the Public Health Data Cube

geo-referenced location and enables us to explore the capabilities of multidimensionaland geographical systems by improving the manipulation and evaluation of the consid-ered data. The chosen data have annual information about infantile mortality, woman’shealth, control of illnesses and buccal health; providing us with a means of analyzing thehealth and life conditions of the Brazilian population through the use of tables, charts andmaps to detect geographic and secular variations. Hence, this GDW may help authori-ties in the processes of planning, management, and evaluation of public policies in thehealth care assistance, as well as being an efficient means of elaborating goals to improvethe existing public health system services offered to the Brazilian population. The GDWschema (see Figure 3) was designed using the GeoDWCASE tool [9].

Based on this GDW schema, many data cubes can be created. For this case study,we have designed a geographical and multidimensional data cube namedBrPublicHealth.This data cube has some measures such asPopulation,Under One Year Old Deaths,Successful Births,Born Successfully and Under Weight,Neonate Deaths,People Attended

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

234

Figure 4. A Sample of GeoMDQL Query Result

by the Family Health ProgramandMaternal Deaths.Time is a conventional dimension(i.e. does not contain geographical data) and is composed by a hierarchy with the levelYear. Also,Localizationis a geographical dimension (i.e. does not contain geographicaldata) composed by a hierarchy with the following levels:Country, State, Meso Region,Micro RegionandCity. Moreover, emphClimate is a geographical dimension containing alevel namedZone, which stores the geometries of all climatic zones found in Brazil.

For exploiting the data cube designed and outlined above, severalGeoMDQLqueries can be defined to help in the monitoring of the Brazilian public health. To il-lustrate this, Table 4 presents some of them that have being grouped according to thetaxonomy of query types discussed in the Section 3.2 of this paper. For the query listed inthe last row of this table, Figure 4 shows the results that have been generated by runningour system prototype [23] that implements theGeoMDQLquery language proposed here.Figure refgdwDatasusResult-(a) shows a table having the rates of successful births and thenumber of neonate deaths for each Brazilian State located at the northeast of theDistritoFederal, Figure 4-(b) displays the map resulting from this query processing and Figure4-(c) displays the chart that can be optionally enabled by users to graphically exhibit themultidimensional data.

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

235

5. Conclusions

Although there are several proposed approaches to integrate multidimensional and ge-ographical processing, none of them offer a query language with a singular syntax forusing simultaneously multidimensional and spatial operators. This article presented theGeoMDQLlanguage, a new language based on MDX and specified for usage on SOLAPenvironments to recover geographical and multidimensional data stored in a GDW (Ge-ographical Data Warehouse). Some of the multidimensional and geographical operatorsfound in the language syntax were formally presented. TheGeoMDQLlanguage is in-serted in the context of the GOLAPA project [8], which aims at providing an integratedenvironment for multidimensional and geographical processing.

To demonstrate the application of theGeoMDQLlanguage, some points relatedto a public health case study, were also briefly presented. For this case study, a GDWmaking available information from a national health department was built and may beused for monitoring and evaluating some actions and services related to the Brazilianpublic health system. Many other similar applications can be developed and their resultsare important to the Brazilian economy and government as well. It is important to mentionthat the implementations are all based on open and extendable patterns, which simplifythe evolution and re-usage of the solution. Some planned approaches to future work aregiven as follows. In layer II of the GOLAPA architecture, some implementations willbe carried out for the query optimization module. Moreover, in layer I, for the graphicaluser interface, some improvements will be added to achieve a better user interaction withcharts, tables and maps. Finally, GeoMDQL will also be extended to handle complexhierarchies that represent partial containment of spatial objects.

References

[1] S. Bimonte, A. Tchounikine, and M. Miquel. Towards a spatial multidimensional model.In 8th ACM international workshop on Data warehousing and OLAP, 2005.

[2] L. Cabibbo and R. Torlone. Querying multidimensional databases. In6th InternationalWorkshop on Database Programming Languages, pages 319–335, 1998.

[3] Open Geospatial Consortium. Simple features specification for sql -http://portal.opengeospatial.org/files/?artifactid=829. Technical report, 1999.

[4] M. J. Egenhofer. Spatial sql: A query and presentation language.IEEE Transactions onKnowledge and Data Engineering, 6(1):86–95, 1994.

[5] M. J. Egenhofer and J. R. Herring. Categorizing binary topological relationships betweenregions, lines and points in geographic databases. Technical report, Department ofSurveying Engineering, University of Maine, 1991.

[6] H. Chen F. Wang, J. Sha and S. Yang. Geosql: A spatial query language of object-orientedgis. In Proceedings of the 2nd International Workshop on Computer Science andInformationn Technologies, pages 215–219, 2000.

[7] R. N. Fidalgo, V. C. Times, J. Silva, et al. Geodwframe: A framework for guiding thedesign of geographical dimensional schemas. InData Warehousing and KnowledgeDiscovery (DaWaK), pages 26–37, 2004.

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

236

[8] R. N. Fidalgo, V. C. Times, J. Silva, et al. Providing multidimensional and geographicalintegration based on a gdw and metamodels. InBrazilian Symposium on Databases(SBBD), pages 148–162, 2004.

[9] R. L. Fonseca, R. N. Fidalgo, J. Silva, and V. C. Times. Geodwcase: Uma ferra-menta para projeto de data warehouses geograficos. InBrazilian Symposium onDatabases(SBBD), Demos Session, 2007.

[10] R. L. Fonseca, R. N. Fidalgo, J. Silva, and V. C. Times. Um metamodelo paraa especificacao de data warehouses geograficos. In Brazilian Symposium onDatabases(SBBD), 2007.

[11] International Organization for Standardization.Database Language SQL-Amendment 1:On-Line Analytical Processing (SQL/OLAP). 1999.

[12] L. M. Garshol. Bnf and ebnf: What are they and how do they work?, www.garshol.priv.no/download/text/bnf.html, 2006.

[13] J. Gray., A. Bosworth, et al. Data cube: A relational aggregation operator generaliz-ing group-by, cross-tab, and sub-totals.Data Mining and Knowledge Discovery,1(1):29–53, 1997.

[14] ISO. ISO/IEC WD - SQL Multimedia and Application Packages Part 3: Spatial. 2003.

[15] H. Lin and B. Huang. Sql/sda: A query language for supporting spatial data analysisand its web-based implementation.IEEE Transactions on Knowledge and DataEngineering, 13(4):671–682, 2001.

[16] E. Malinowski and E. Zimanyi. Spatial hierarchies and topological relationships in thespatial multidimer model. InBNCOD, pages 17–28, 2005.

[17] A. J. Morris, A. I. Abdelmoty, B. A. El-Geresy, et al. A filter flow visual querying lan-guage and interface for spatial databases.GeoInformatica, 8(2):107–141, 2004.

[18] D. Pedersen, K. Riis, and T. B. Pedersen. A powerful and sql-compatible data model andquery language for olap. In13th Australasian database conference, 2002.

[19] Pentaho. Mondrian, http://mondrian.pentaho.org/, Last Visit Nov 2006.

[20] E. Pourabbas and M. Rafanelli. A pictorial query language for querying geographicdatabases using positional and olap operators.SIGMOD Record, 31(2):22–27, 2002.

[21] JPivot Project. Jpivot, http://jpivot.sourceforge.net/, Last Visit Nov 2006.

[22] Sonia Rivest et al. Solap technology: Merging business intelligence with geospatial tech-nology for interactive spatio-temporal exploration and analysis of data.Journal ofPhotogrammetry e Remote Sensing, pages 17–33., November 2005.

[23] J. Silva, V. C. Times, A. C. Salgado, et al. An open source and web based framework forgeographic and multidimensional processing. InSAC ’06: Proceedings of the 2006ACM symposium on Applied computing, pages 63–67, 2006.

[24] W3C. Scalable vector graphics, http://www.w3.org/tr/svg11/, Last Visit Nov 2006.

[25] M. Whitehorn, R. Zare, and M. Pasumansky.Fast Track to MDX. Springer, 2005.

[26] R. Wrembel and C. Koncilia.Data Warehouses And Olap: Concepts, Architectures AndSolutions. IRM Press, 2006.

XXII Simpósio Brasileiro de Banco de DadosSBBD 2007

237