Towards the new open source GIS platform AEGIS - ELTE

10
Towards the new open source GIS platform AEGIS Roberto GIACHETTA a , István LASZLÓ b , and Csaba Levente BÁLINT a a Eötvös Loránd University (ELTE), Faculty of Informatics, Budapest, Hungary b Institute of Cartography, Geodesy and Remote Sensing (FÖMI), Budapest, Hungary Abstract. In past years, geographical information systems have undergone spectac- ular development. Beside traditional applications some new areas have been opened by the spreading of navigation systems and the publication of geoinformation via Internet. These areas are in need of efficient spatio-temporal data handling due to the changing location and descriptive data of objects. This article presents the open-source AEGIS framework, which is a currently de- veloped spatio-temporal data management system at the Eötvös Loránd Unviersity Faculty of Informatics. The framework introduces a data model that aims to uni- formly represent raster and vector data, and to efficiently retrieve altering spatio- temporal objects. This model introduces a new indexing structure based on MV3R- tree and B-tree to monitor changes of spatial and descriptive data in time. To demonstrate usage of this model, a simple agent-based traffic simulation has been development, which is also presented in the article. Keywords. Spatio-temporal data, geospatial information systems, indexing structures, agent based simulation, traffic modeling Introduction Geographical information systems (GIS) have undergone a spectacular development in the past years. Beside traditional areas of GIS applications a rapid development took place in the world of navigation systems as well. Google Maps and Google Earth, to- gether with their Application Programming Interface, are common tools in the global handling of spatial data. The world of open source software has also evolved a lot. There is a rising need for professionals whose practice covers both information technology and geography. This paradigm shift has to be taken into account both by professionals and by academic people. At the Eötvös Loránd University, Faculty of Informatics (ELTE IK) the informal association called Creative University GIS Workshop (TEAM) deals with several related research topics, e.g. Intelligent Rasterimage Interpretation System (IRIS), University Digital Map Library (EDIT), Virtual Globes Museum (VGM), segment-based analysis of remote sensing images and the development of a GIS framework called AEGIS. In education and research an important collaboration takes place with the Institute of Geodesy, Cartography and Remote Sensing (FÖMI). This governmental institution is responsible for the research, development and applications of remote sensing, mainly in the areas of agriculture and environmental protection.

Transcript of Towards the new open source GIS platform AEGIS - ELTE

Towards the new open source GISplatform AEGIS

Roberto GIACHETTA a, István LASZLÓ b, and Csaba Levente BÁLINT a

a Eötvös Loránd University (ELTE), Faculty of Informatics, Budapest, Hungaryb Institute of Cartography, Geodesy and Remote Sensing (FÖMI), Budapest, Hungary

Abstract. In past years, geographical information systems have undergone spectac-ular development. Beside traditional applications some new areas have been openedby the spreading of navigation systems and the publication of geoinformation viaInternet. These areas are in need of efficient spatio-temporal data handling due tothe changing location and descriptive data of objects.

This article presents the open-source AEGIS framework, which is a currently de-veloped spatio-temporal data management system at the Eötvös Loránd UnviersityFaculty of Informatics. The framework introduces a data model that aims to uni-formly represent raster and vector data, and to efficiently retrieve altering spatio-temporal objects. This model introduces a new indexing structure based on MV3R-tree and B-tree to monitor changes of spatial and descriptive data in time. Todemonstrate usage of this model, a simple agent-based traffic simulation has beendevelopment, which is also presented in the article.

Keywords. Spatio-temporal data, geospatial information systems, indexing structures,agent based simulation, traffic modeling

Introduction

Geographical information systems (GIS) have undergone a spectacular development inthe past years. Beside traditional areas of GIS applications a rapid development tookplace in the world of navigation systems as well. Google Maps and Google Earth, to-gether with their Application Programming Interface, are common tools in the globalhandling of spatial data. The world of open source software has also evolved a lot. Thereis a rising need for professionals whose practice covers both information technology andgeography. This paradigm shift has to be taken into account both by professionals andby academic people.

At the Eötvös Loránd University, Faculty of Informatics (ELTE IK) the informalassociation called Creative University GIS Workshop (TEAM) deals with several relatedresearch topics, e.g. Intelligent Rasterimage Interpretation System (IRIS), UniversityDigital Map Library (EDIT), Virtual Globes Museum (VGM), segment-based analysisof remote sensing images and the development of a GIS framework called AEGIS.

In education and research an important collaboration takes place with the Instituteof Geodesy, Cartography and Remote Sensing (FÖMI). This governmental institution isresponsible for the research, development and applications of remote sensing, mainly inthe areas of agriculture and environmental protection.

Continuous improvement of image analysis methods has always been an importantpart of operational projects, in which FÖMI works together with ELTE IK. The other sideof this connection is education. The course Remote Sensing Image Analysis is taught bythe scientists of FÖMI.

The main stream of common research is segment-based image analysis. As a resultof many years’ research, a fully segment-based classification method has been developed,including clustering and final classification working on segments [1]. In three running,operational applications of FÖMI, an alternative way using segment-based classificationhas been introduced [2]. The first application is the automatic delineation of (ineligible)scattered trees and bushes in (otherwise eligible) pastures, within the updating of LandParcel Identification System. The second one is the surveying of red mud spill afterthe industrial disaster happened in the October of 2010 at Ajka. The third one is themonitoring of ragweed, which causes serious health problems in Hungary.

Based upon the experience gained with research and education – especially lab semi-nars –, the plan of a standalone open source multi-platform geographic framework, calledAEGIS, has been outlined. This framework will serve as the future platform of GIS edu-cation and research at ELTE Faculty of Informatics. It is currently in development withthe data model nearly complete, but most features are only outlined to be implemented inthe near future. The following sections concetrate on the current status of developmentand functions that have been already implemented.

The rest of the paper is arranged as follows. In Section 1 we will outline the conceptof our system. Section 2 explains details of spatio-temporal data modeling in AEGIS.Section 3 describes the first application using this architecture, an agent-based trafficsimulation. We will conclude in Section 4.

1. Concept of the AEGIS system

The purpose of AEGIS is developing a multi-platform, open source, client-server archi-tecture geographic information system for spatio-temporal data management. It is de-signed for broad functionality, efficiency, and allowing students to skip the learning curveand concentrate on their main tasks in lab projects and theses works without the need ofbuilding up auxiliary functionality from scratch.

The implementation is carried out using the Microsoft .NET Framework. This is dueto the wide possibilities and simple usage of this development platform, the strong .NETeducation carried out at ELTE and also the previous success of .NET based GIS softwareproducts like DotSpatial [3] and SharpMap [4].

The main features of AEGIS can be summarized as follows.

• Multiple client environments on several platforms. The main client functions ondesktop computers, but users have limited access available through browser, mo-bile and tablet interfaces.

• Project based storage of spatio-temporal data in both raster and vector format.The data model is based upon ISO and OGC standards and incorporates revisioncontrol to follow changes during editing. Data are stored in a central databaseusing pyramid architecture with vector data simplification and raster resolutionreduction.

• Services based encrypted online communication of subsystems. Changes made todatabase data are automatically visible to all clients.

• Import and export of data in any standard GIS format. Supported file formatsinclude ESRI ShapeFiles, Erdas Machine Independent Format and raw satelliteimage data. Online import enables access to OGC web services and map providers(OpenStreetMap, Google Maps, etc.). Data can also be shared online by providingOGC web service channels.

• Extendible processing library. Operations for raster and vector data can be addedon-the-fly and provide new possibilities for data analysis, process modeling andsimulation and the production of spatial and temporal statistics. These pluginscan be implemented in .NET using the framework’s API. Resource demandingoperations can be carried out in a computational cloud. A simple to use scriptinglanguage is provided to enable batch processing of data.

• Comprehensive user rights control with task scheduling and activity management.All actions performed on server data are logged and can be reverted using revisioncontrol. Logging also enables the automated revision and auditing of performedtasks.

1.1. System components

The AEGIS system is made up of four main components as seen on Figure 1. Thesecomponents are the following.

• Thick Client: A fully functional desktop GIS browser and editor application withsupport for local file system and web access of spatial data. It features three-dimensional graphics view (implemented in XNA) with editing, analysis and sim-ulation possibilities. Operations can be performed using the local machine or theprocessing services.

• Thin Client: A simplified application for browser, mobile and tablet platformsbuilt in Silverlight with reduced operating possibilities. This client has multipleinterfaces for each platform supported with the same functionality. Operations areperformed using the processing services. This client can only access data storedon the server.

• Processing Services: Realizes the computational cloud used for distributed oper-ation performing both on server and client side. The server side aims to distributeoperations and data among running nodes. They execute them and return resultsto the server. Nodes can be installed separately from the clients.

• Server Services: The server provides connection between client machines usingencrypted channels through WCF (Windows Communication Foundation) andalso provides OGC web services.

1.2. Data management

The system features centralized storage of spatio-temporal data both in raster and vectorformat with interaction possibilities for multiple databases. The primary database back-end is the MongoDB document-oriented database management system, which enablesthe schema-free storage of hierarchical data [5], and provides faster editing speed thanSQL based databases [6]. Support is planned for PostGIS and other databases as well.

Figure 1. AEGIS system components

Spatial references are stored in one preferred coordinate system. Raster and vectordata may be imported from several supported formats to the system, and data are repro-jected upon import, if necessary. However due to the possible data loss caused by rasterreprojection, the original images are also stored in the database and are used in case ofreporjections.

Spatial data is built up in a multi-resolution pyramid structure. Raster images aregradually reduced to several lower resolutions whilst vector objects are generalized usingproven methods [7,8]. This enables more dynamic access to maps (also in low-bandwidthconditions) without the need of on-the-fly image resampling. Using this feature multipleaccess levels can be assigned to different levels of the pyramid. Data read operationsmay be performed at any level of the pyramid while writing ones need permission at thelowest level (full resolution data).

The data model features both two- and three-dimensional spatial objects (objectrefers to both vector features and remotely sensed images) with time intervals. The pri-mary goal of time variable is the storage of data validity and the tracking of spatialchanges of objects. Versioning is also applied to data, to enable rollback of any modifi-cation. Objects are grouped into layers, which define dimensional and reference parame-ters. Since this complex data structuring is not supported at database level, the data accesslayer of the system is responsible to properly transfer stored data to revision controlledspatio-temporal entities and index data for fast retrieval. The data modeling environmentis presented in Section 2.

The database is separated into two main parts, as seen in Figure 2.

• Published data contains finalized objects that can be accessed via external chan-nels, including web services. These objects do not contain all editing informationand changes cannot be revoked, but it is enabled to switch between any publishedversions of a spatial object.

Figure 2. Data management

• Project data contains spatial objects that have not yet been finalized and are stillunder editing. This data are gathered under spatial projects that maintain all edit-ing and version changes to enable rollback of modifications.

2. Unified modeling and indexing of spatio-temporal geospatial data

Spatio-temporal data modeling and indexing has been a frequent topic among researcherssince the 90s, and many data models have been introduced so far [9,10]. However neithercommon solutions have arisen, nor standards have been developed for spatio-temporalmodels.

In our approach the base of data model is the Open Geospatial Consortium (OGC)Simple Features Specification (SFS) which defines an ISO based modeling environmentfor spatial vector features [11]. In this solution, the central item is the Geometry class,which has several specialized versions (including collections) in object inheritance tax-onomy. Geometry defines the interface for spatial properties and operations among anyspatial objects. The model focuses on two and three dimensional vector data, withoutany temporal references. In our implementation of this standard, several interfaces havebeen introduced beside the class structure, to enable flexible and effective implementa-tion of vector features (see Figure 3). Also, collections contain indexing structures forfast retrieval of objects, as described in Section 2.2. All Geometry objects may containmultiple resolutions of the spatial data and any number of descriptive data.

For operations to uniformly work on vector and raster data, we require both to behandled in the same manner. In current geospatial systems the representation and op-erations of raster and vector data are usually independent of each other. In our system,

Figure 3. Data model based on the Simple Features Specification (new classes are displayed in red)

we include raster data as an extension to the SFS. Therefore all spatial operations pre-viously defined on vector objects (e.g. intersection, translation, projection, etc.) can alsobe performed on raster data (for example the ability to intersect an orthophoto with theland parcel polygon). Also, restrictions can be applied to any operation, to prevent rasterfunctions (e.g. intensity transformations, image filtering) to work on vector data.

The data model has been enhanced in two steps. First, temporal properties wereadded to all geometries, and extensions were made to model raster datasets and specialvector objects. This extended data model can be seen in Figure 3. In the second step,indexing structures have been implemented to access spatio-temporal data faster, andalso to provide temporally variable information not stored at data level.

2.1. Extending simple features with complex objects

Our current model only concentrates on two dimensional spatial objects, but this will beshortly extended to 3D.

All geometries have been extended by storing the time interval in which they areconsidered to be valid. Generally, items in a collection do not need to exist in the sametime interval, so the interval of a collection is the closure of the intervals of all item, butrestrictions can be made to force temporal equality among all items of the collection. Ge-ometryCollection has also been extended with location and time based querying abilitiesmade possible by the inner indexing structure. Beside storing reference system proper-ties, geometries store descriptive data (metadata) as well. Metadata stores properties notrelated to space and time, including user access, copyright information, sensor data (incase of remotely sensed images) etc.

Several new vector formats are introduced as geometry descendants. The Rectangleis introduced mainly to support simple operations, like bounding box queries, but it isalso provides an intermediate geometry for the representation of raster images. The Ge-ometryNetwork and descendant collections are introduced to store topology related in-formation on data contained in one collection. For example, the LineNetwork class stores

Figure 4. The MV3R-tree extended with metadata variability

points connected with Line objects in a graph structure based upon [12]. It is mainlyaimed to store the architecture of road networks. IGeometryNetwork interface providesoperations to query neighbor geometries. Also, a SimpleGeometryCollection and descen-dant classes were added. These classes do not contain any indexing options, thereforethey only serve as simple collections of elements.

Concerning raster data, the Image, ImageCollection, ImageBand and ImageBand-Collection classes are introduced. ImageBand is a descendant of rectangle and containsone bad of a raster image, while Image is a collection of all bands of an image. Theseclasses are significantly extended with operations and properties related to raster imageryand image metadata. Images can be stored in several radiometric resolutions (from 8 bitup to 64 bit for every band), and a mask is applied to all images to indicate actual imagepixels. The collection classes serve as accumulators of data with common attributes, e.g.several images from a satellite’s path. Spatial operations can be executed between rasterand vector data with the result becoming raster data. For example intersecting a Poly-gon with ImageBand generates an ImageBand, where pixels outside the polygon are leftblank.

2.2. Indexing data with temporal variability

To ensure fast queries on spatio-temporal data, all (not simple) collections store indexingstructures based on Multi Version 3D R-trees (MV3R-trees). This structure was chosenbecause of its good performance of interval queries [13], but future research plans includetesting several available indexing structures or developing new ones for our purposes.

The MV3R-tree is a combination of MVR-tree and 3D R-tree. It stores multiple R-trees with different time stamps, each having a spatial bounding box, and uses multipleheuristics to enhance the performance of tree updates. To enhance the usage of indexing,an auxilary tree has been added at leaf level that contains metadata variables. Thesevariables contain descriptive information that changes in time, and is not contained atdata level, therefore it is only reachable through collections. With this extension, not onlyspatial changes of objects can be monitored, but also altering of non spatial information.This indexing structure can be seen in Figure 4.

The metadata variables are built up using B-tree data structure based on time inter-vals. Leafs can contain any amount of metadata variables besode the geometry object,and all leaf pointert refer to the same object. Variables consist of (key, value, modification

(a) Running agents (b) Traffic congestion measure

Figure 5. Visualization of the agent based traffic simulation

type) triplets. These triplets define metadata variability for any descriptive property ofgeometry. Temporal or geometry properties cannot be altered this way, this is handled atthe MV3R-tree level. Key refers to property name, modification type defines applicationof the value, e.g. override, add, multiply, etc. During the query of geometry, the variableproperties are gathered from the structure, and are considered during the evaluation ofthe object. In Section 3 we demonstrate an example for such metadata variable usage.

The data access layer enables the storage of entire collections within the database,so the metadata variables can also be stored in to be retrieved later, when the collectionis accessed.

3. Applications of spatio-temporal modeling: Agent-based traffic simulation

The first application based on AEGIS core architecture is a simplified agent-based trafficsimulation model of Budapest city. Due to the early status of development, only the datamodel of AEGIS was used in this project, and separate operations and display environ-ment is built on top of that. See Figure 5.

In this simulation independent agents have several target addresses to drive to duringthe day. Two main targets are working place and home. Multiple random targets (likeshopping centers or restaurants) can also occur. Agents travel to all locations by theirown car. Locations are built up using metadata of building objects in the Budapest map.Agents present in this simulation are primitive, they use simple random-based algorithmsto make decisions based on statistics. The goal of simulation is to present daily trafficinformation and congestion possibilities for every hour during the day.

At the start of simulation, agents plan their routes, and drive according to the plan.However, road traffic is constantly monitored, and road travel times can vary dependingon traffic density. In case an agent rates its arrival time at destination inacceptable, itreplans the routes with the updated travel times the next day. All agents are in posessionof the entire map and the accurate travel times. Simulations show that with constant agentcount these routes usually stabilize about 92% in 80 days.

Agents plan their routes using A*-algorithm working on the LineNetwork represen-tation of map. The routing algorithm calculates the journey based on road travel timewhich is available as geometry metadata, and is multiplied by a metadata variable. Thismultiplied speed value is aggregated during routing calculation, so for every hour, dif-ferent measures are taken into account. The algorithm follows current time of routingposition, the results are accurate to the hour. The travel times are contantly monitored

Figure 6. Routing with metadata variables in LineNetwork

during the simulation, and metadata variables are updated for every hour. This can beseen on Figure 6.

Using this approach there was no need to copy Line objects to store different travelspeeds for every hour, and it has also been proven to be more effective than using someother storage for this variability, since every line segment has different values for everyhour. However it was not necessary to split the temporal variability for all line segmentsand for every hour, since several consecutive hours have the same speed values (thereforeintervals can be merged).

In this application, all Line objects have the same time interval, so the temporalproperties of MV3R-tree were not used. Later in development, we allowed the blockingof any road section during any hour of the day. This was accomplished in two ways. Inthe first way, the metadata variable of travel time was increased to infinite for the givenhours. This practically did not alter the indexing of the MV3R-tree, and did not causeany changes in the performance of the simulation. In the second one, this Line has beensplit into two different objects with limited time intervals. This caused an update of theMV3R-tree, but without the need to change the metadata variables. During simulationno measurable performance alteration was seen. However, further testiung is needed forthe measurement of performance change during massive change of road sections duringsimulation.

To test the efficiency of model, a separate representation was also implemented,where no temporal variability is used, but multiple instances of Line objects were cre-ated and temporal changes were resolved by MV3R-tree. This solution resulted in over-whelming memory usage as a cost of slightly improved routing times, so this solutionwas abandoned.

4. Conclusion and future work

In the previous sections the concept of the AEGIS geospatial framework and the goals ofthe author’s research have been introduced. This system is based on the spatio-temporaldata model as described in Section 2, which uses complex data structures and MV3R-tree based indexing with temporal variability to enable more flexible management andmaintenance of data. The authors’ first application, the agent-based traffic simulationhas shown the justification of this model. However, more research is needed to measureperformance and competitiveness to other solutions.

Further reasearch includes the testing of other indexing structures both in combina-tion with temporal variance and without it, and their performance measurement using theagent-based simulation. Further applications of the AEGIS framework are planned to beimplemented. Also, in long term, enhancement possibilities of MongoDB spatial supportwith the low-level implementation of revision control, the OpenGIS SFS and indexingof spatial-temporal data are to be examined.

Acknowledgements

Research projects presented in this article are supported by the European Union andco-financed by the European Social Fund (grant agreement no. TÁMOP 4.2.1./B-09/1/KMR-2010-0003).

References

[1] I. László, B. Dezso, I. Fekete, T. Pröhle: A Fully Segment-based Method for the Classification of Satel-lite Images, Annales Univ. Sci. Budapest, Sectio Computatorica, 30 (2009), 157-174.

[2] I. László, K. Ócsai, D. Gera, R. Giachetta, I. Fekete: Object-based Image Analysis of Pasture with Treesand Red Mud Spill, 31th EARSeL Symposium, Prague, Chech Republic (2011).

[3] DotSpatial - Open Source Geospatial Framework in .NET, http://dotspatial.codeplex.com/.[4] SharpMap - Geospatial Applicaton Framework for the CLR, http://sharpmap.codeplex.com/.[5] C. Chodorow: Introduction to MongoDB, Free and Open Source Software Developers’ European Meet-

ing (FOSDEM), Brussels, Belgium (2010).[6] R. Giachetta, Zs. Máriás: Performance Evaluation of Storing Inhomogeneous Descriptive Data of Digital

Maps, Conference of PhD students in Computer Science (CSCS), Szeged, Hungary (2011).[7] R. Weibel: Generalization os spatial data, Lecture Notes in Computer Science, 1340 (1997), 99-152.[8] D. L. Paul Hardy: GIS-Based Generalization and Multiple Representation of Spatial Data, Proceedings

of the International Symposium on Generalization of Information (ISGI), Berlin, Germany (2005).[9] M. F. Mokbel, T. M. Ghanem, W. G. Aref: Spatio-Temporal Access Methods, IEEE Data Engineering

Bulletin, 26 (2003), 40-49.[10] T. Abraham, John F. Roddick: Survey of Spatio-Temporal Databases, GeoInformatica, 3 (1999), 61-99.[11] J. R. Herring (ed.): OpenGIS Implementation Standard for Geographic Information: Simple Feature

Access - Common Architecture (2011).[12] B. George, S. Shekhar: Time-Aggregated Graphs for Modeling Spatio-temporal Networks, Journal on

Data Semantics XI (2008), 191-212.[13] Y. Tao, D. Papadias: The MV3R-Tree, A spatio-Temporal Access Method for Timestamp and Interval

Queries, Proceedings of 27th International Conference on Very Large Data Bases (VLDB), (2001), 431-440.