Satellite Image Processing Applications in MedioGRID

8
Satellite Image Processing Applications in MedioGRID Ovidiu Muresan * Florin Pop Dorian Gorgan Valentin Cristea § Abstract This paper presents a high level architectural spec- ification of MedioGRID, a research project aiming at implementing a real-time satellite image processing sys- tem for extracting relevant environmental and meteoro- logical parameters on a Grid system. The presentation focuses on the key architectural decisions of the GRID- aware satellite image processing system, highlighting the technologies for each of the major components. An essential part of managing a global Data Grid is a mon- itoring system that is able to monitor and track all the site facilities, networks, and tasks in progress, all in real time. Considering this issue the paper analyzes the possible grid monitoring approaches, proposes a so- lution and presents a set of monitoring results for the MedioGRID data management subsystem. Keywords Monitoring, Grid architecture, Satellite image processing. 1. Introduction The MedioGRID project [15] aims at implement- ing a real-time satellite image processing system for extracting relevant environmental and meteorological parameters. The proposed architecture is modular pro- viding for easy addition of further processing plug-ins or support for other types of satellite imagery. This paper is structured as follows: Section 2 dis- cusses the possible approaches related to GRID middle- ware, and recommends a solution for the MedioGRID * Email: [email protected], Computer Science De- partment, Technical University of Cluj-Napoca, Romania Email: [email protected], Computer Science Departa- ment, University ”Politehnica” of Bucharest, Romania Email: [email protected], Computer Science De- partment, Technical University of Cluj-Napoca, Romania § [email protected], Computer Science Departament, Uni- versity ”Politehnica” of Bucharest, Romania project. Next, Section 3 proposes a high level architec- tural specification presents some general design prin- ciples and identifies candidate technologies for each of the main components. One very important aspect of running a successful computing grid is the generation and the availability of realtime monitoring information. In order to address this issue Section 4 presents an overview of the possible monitoring technologies, highlighting the most efficient approaches. The next section presents some prelimi- nary monitoring results for the MedioGRID data man- agement subsystem running on the high-speed LAN and WAN links interconnecting the MedioGRID sites. Finally, Section 6 presents the conclusions and some future development directions. 2. GRID infrastructure in MedioGRID Currently there are several research projects devel- oping grid middleware solutions. This section briefly presents the main alternatives and proposes a solution for the MedioGRID project [16]. Considered ”the de facto standard for Grid comput- ing”, Globus is an open source software toolkit that facilitates construction of computational grids and grid based applications [5]. The project is an R&D project conducted by the ”Globus Alliance” which includes Ar- gonne National Laboratory, Information Sciences In- stitute and others. Globus allows sharing of comput- ing power, databases, and other tools securely online across corporate, institutional and geographic bound- aries without sacrificing local autonomy. The core ser- vices, interfaces and protocols in the Globus toolkit allow users to access remote resources seamlessly while simultaneously preserving local control over who can use resources and when. The Globus architecture has three main groups of services accessible through a se- curity layer: Resource Management, Data management and Information Services. The success of this project is assured by some vital sections that are covered by the Globus project: packaging and distribution, col- laboration, data, computation, monitoring and discov- ery, security, using an ecosystem of Grid components, a Proceedings of The Fifth International Symposium on Parallel and Distributed Computing (ISPDC'06) 0-7695-2638-1/06 $20.00 © 2006

Transcript of Satellite Image Processing Applications in MedioGRID

Satellite Image Processing Applications in MedioGRID

Ovidiu Muresan∗ Florin Pop† Dorian Gorgan‡ Valentin Cristea§

Abstract

This paper presents a high level architectural spec-ification of MedioGRID, a research project aiming atimplementing a real-time satellite image processing sys-tem for extracting relevant environmental and meteoro-logical parameters on a Grid system. The presentationfocuses on the key architectural decisions of the GRID-aware satellite image processing system, highlightingthe technologies for each of the major components. Anessential part of managing a global Data Grid is a mon-itoring system that is able to monitor and track all thesite facilities, networks, and tasks in progress, all inreal time. Considering this issue the paper analyzesthe possible grid monitoring approaches, proposes a so-lution and presents a set of monitoring results for theMedioGRID data management subsystem.

Keywords

Monitoring, Grid architecture, Satellite image processing.

1. Introduction

The MedioGRID project [15] aims at implement-ing a real-time satellite image processing system forextracting relevant environmental and meteorologicalparameters. The proposed architecture is modular pro-viding for easy addition of further processing plug-insor support for other types of satellite imagery.

This paper is structured as follows: Section 2 dis-cusses the possible approaches related to GRID middle-ware, and recommends a solution for the MedioGRID

∗Email: [email protected], Computer Science De-partment, Technical University of Cluj-Napoca, Romania

†Email: [email protected], Computer Science Departa-ment, University ”Politehnica” of Bucharest, Romania

‡Email: [email protected], Computer Science De-partment, Technical University of Cluj-Napoca, Romania

§[email protected], Computer Science Departament, Uni-versity ”Politehnica” of Bucharest, Romania

project. Next, Section 3 proposes a high level architec-tural specification presents some general design prin-ciples and identifies candidate technologies for each ofthe main components.

One very important aspect of running a successfulcomputing grid is the generation and the availability ofrealtime monitoring information. In order to addressthis issue Section 4 presents an overview of the possiblemonitoring technologies, highlighting the most efficientapproaches. The next section presents some prelimi-nary monitoring results for the MedioGRID data man-agement subsystem running on the high-speed LANand WAN links interconnecting the MedioGRID sites.Finally, Section 6 presents the conclusions and somefuture development directions.

2. GRID infrastructure in MedioGRID

Currently there are several research projects devel-oping grid middleware solutions. This section brieflypresents the main alternatives and proposes a solutionfor the MedioGRID project [16].

Considered ”the de facto standard for Grid comput-ing”, Globus is an open source software toolkit thatfacilitates construction of computational grids and gridbased applications [5]. The project is an R&D projectconducted by the ”Globus Alliance” which includes Ar-gonne National Laboratory, Information Sciences In-stitute and others. Globus allows sharing of comput-ing power, databases, and other tools securely onlineacross corporate, institutional and geographic bound-aries without sacrificing local autonomy. The core ser-vices, interfaces and protocols in the Globus toolkitallow users to access remote resources seamlessly whilesimultaneously preserving local control over who canuse resources and when. The Globus architecture hasthree main groups of services accessible through a se-curity layer: Resource Management, Data managementand Information Services. The success of this projectis assured by some vital sections that are covered bythe Globus project: packaging and distribution, col-laboration, data, computation, monitoring and discov-ery, security, using an ecosystem of Grid components, a

Proceedings of The Fifth International Symposiumon Parallel and Distributed Computing (ISPDC'06)0-7695-2638-1/06 $20.00 © 2006

community of developers. The major success is assuredby Globus’s main goal: work with industry, developersand standards to define a grid but also to make soft-ware infrastructure of the grid available under an opensource license.

The UNICORE project represents a vertically in-tegrated Java based Grid computing environment thatprovides a seamless and secure access to distributedresources. The success of the project is assured by fea-tures like: a user-friendly interface, easy and uniformaccess to distributed computing resources, support forrunning scientific and engineering applications. An-other advantage is that the Unicore users (usually sci-entists and engineers) can exploit the supercomputers’computational power without having to learn details ofthe target operating system and without being expertsin access or security policies. The project had a threephase structure: phase I - handled self-contained, in-dependent jobs; phase II - managed access to data atother UNICORE sites; phase III - deled with interde-pendent tasks executing at different UNICORE sites.

The Gridbus project is involved in the creationof open-source specifications, architecture and a ref-erence Grid toolkit implementation of service-orientedgrid and utility computing technologies for eScienceand eBusiness applications. Gridbus emphasizes theend-to-end quality of services driven by computationaleconomy at various levels - clusters, peer-to-peer (P2P)networks, and the Grid - for the management of dis-tributed computational, data, and application services.The commoditization of grid services is supported atvarious levels: raw resource level (e.g., selling CPUcycles and storage resources), application level (e.g.,molecular docking operations for drug design applica-tion), aggregated services (e.g., brokering and resellingof services across multiple domains).

Legion is a vertically integrated Object based meta-system that helps in combining a large numbers of in-dependently administered heterogeneous hosts, storagesystems, databases legacy codes and user objects dis-tributed over wide-area-networks into a single, object-based metacomputer that accommodates high degreesof flexibility and site autonomy. The success of theLEGION project depends on the 10 objectives of itsdevelopers: site autonomy, extensible core, scalable ar-chitecture, easy-to-use, seamless, computational envi-ronment, high-performance via parallelism, single, per-sistent, name space, security for users and resourceowners, management and exploitation of resource het-erogeneity, multiple language support and interoper-ability, fault tolerance.

MedioGrid project will use an infrastructure basedon GLOBUS. This paper describes the data manage-

ment model, the architecture and some general designguidelines for the satellite image processing system, allbased on GLOBUS and related GRID technologies. Fordetails on the hardware architecture of the MedioGRIDproject, please refer to [14].

3. Developing GRID-aware environmen-tal and meteorological applications -high level architecture and generaldesign guidelines

The MedioGRID project aims at implementing areal-time satellite image processing system for extract-ing relevant environmental and meteorological param-eters. The initial version of the system will implementfire detection [13] and water coverage detection (usedfor flooded area extent estimation) by using realtimeMODIS imagery. The proposed architecture is mod-ular providing for easy addition of further processingplug-ins or support for other types of satellite imagery.

As described in Section 2, MedioGrid will use aninfrastructure based on GLOBUS and related GRIDtechnologies. In this context, we identify the compo-nents necessary for implementing such a system andpropose a high level architectural specification. Thissection also presents some general design principles ofthe system and identifies candidate technologies foreach of the main components.

3.1. MODIS satellite imagery - characteris-tics and applications

MODIS is an acronym for MODerate ResolutionImaging Spectroradiometer, a sensor on board of Terraand Aqua satellites of the Earth Observing System(EOS) operated by U.S. National Aeronautics andSpace Administration (NASA) [11]. Terra and Aquaare polar orbiting satellites - they revolve around theEarth along paths passing over the polar region. Theadvantage of polar satellites is that they move alongas the world turns below them so they will eventuallypass over almost every location of the Earth. Eachof the two satellites transmits approximately two dailyimages for each area of the Earth.

MODIS has 36 observational channels covering awide frequency spectrum from visible to infra-red ra-diation, optimized for imaging specific surface and at-mospheric features. The spatial resolution ranges from250m to 1km. MODIS is generally used for providingmeasurements in large-scale global dynamics includ-ing changes in Earth’s cloud cover, radiation budgetand processes occurring in the oceans, on land, and

Proceedings of The Fifth International Symposiumon Parallel and Distributed Computing (ISPDC'06)0-7695-2638-1/06 $20.00 © 2006

in the lower atmosphere. In order to provide accessto realtime MODIS imagery NASA has founded theDistributed Active Archive Center (DAAC) within theGoddard Earth Sciences Data and Information ServicesCenter (GES DISC) [6], providing an online data poolwith the most recent two weeks of MODIS data. Inorder to access longer term archives, specific applica-tions would need to archive this data by using privateresources. MEDIOGRID aims at creating a cache withthe most recent data for Romania and surrounding ar-eas which can then be processed by using GRID com-puting resources.

3.2. Data management fetching, archivaland transparent GRID access

The data management system handles all issues re-lated to data fetching, archival and transparent accessin the context of GRID technologies. The data man-agement system should provide means of implementingthe following characteristics:

• Robustness - The data management system shouldinclude support for distributed storage and datareplication.

• Efficiency - The data should be stored as close aspossible to the components accessing it. If multi-ple copies are available the component should ac-cess the closest replica of the data.

• Transparency - The data consumers should not beconcerned with how/where the data is stored.

In order to comply with all these core require-ments the data management system will be based onGridFTP [7] and Replica Location Service (RLS) [8]standard GLOBUS technologies, with the option of us-ing more advanced data management tools, such as:Reliable File Transfer (RFT) [9] and Data ReplicationService (DRS) [8]. The Data Management System hasthe following components, described in the next sec-tions:

• Data Mirroring and Indexing Component

• Metadata Catalog Service

• Data Access Component

Data Mirroring and Indexing Component.This component creates a local cache for the MODISdata granules corresponding to a specified area of in-terest. The initial area of interest will be Romania andits surroundings.

The data granules will be fetched and indexed inrealtime, while they are generated by the NASA GESDAAC. The data granules are selected by using theassociated XML metadata, which is then further onindexed for use by the MODIS Metadata Catalog.

The MODIS data granules fetching process is fol-lowed by the following set of data processing opera-tions:

• Split each data granule into the 36 composingspectral bands. This would allow clients to onlydownload and process the necessary data for spe-cific operations.

• Index the associated XML metadata

• Generate a full color JPEG representation for theMODIS data granules. These representations areused by the Command and Result Dissemina-tion Component for generating result dissemina-tion maps.

Metadata Catalog Service. Each MODIS datagranule has an associated XML metadata file [10], de-scribing characteristics such as: image type (spatialresolution, size), location (spatial extent), timeframe,satellite characteristics. The Metadata Catalog Serviceindexes this XML information and provides an inter-face for answering to queries such as: ”The list with allthe granules in area X, within the D1-D2 timeframe,during the daytime, of resolution R”.

The MedioGrid MCS will be implemented as an ex-tension to the Metadata Catalog Service (MCS) [12]which has been developed as a part of the GriPhyN andNVO projects and will be integrated with GLOBUSReplica Location Services (RLS), in order to providesupport for distributed data storage and access, overthe GRID.

Data Access Component. The Data Access Com-ponent (DAC) provides access to MODIS data granuleswhich are used as input for the GRID processing nodes.One issue that needs to be considered is that MODISdata granules contain 36 spectral bands, of variablespatial resolution. However, depending on the specificapplication (such as fire detection or water coverage de-tection) only a small subset (generally 3 bands) of the36 bands needs to be used. Considering this situationthe Data Access Component optimizes network trans-fers by individually providing access to all the spectralbands for each MODIS granule. This is implementedby separately storing layer data for each MODIS gran-ule.

Proceedings of The Fifth International Symposiumon Parallel and Distributed Computing (ISPDC'06)0-7695-2638-1/06 $20.00 © 2006

Client 1

Client N

Command and result dissemination componet

GLOBUSGatekeeper

Job Manager

Condor Pool

Condor ManagerAssigns jobs

Monitors jobs and returns results

Spatial database server

Accesses topological dataand processing results

Sends jobs

Data Management Component

Replica Location Service

MODIS MetaData Catalog

Accesses satellite imagery archives

Accesses satellite imagery archives

GridFTP ServerLocation1

GridFTP ServerLocation2

Figure 1. GRID processing application - general architecture

The data access component is implemented by us-ing GridFTP, in conjunction with the Replica Loca-tion Services, protocols developed by the GLOBUS Al-liance. The next section details the steps involved inaccessing MODIS granule data by using these technolo-gies.

3.3. GRID processing application architec-ture and general design guidelines

A simplified architectural view of the GRID pro-cessing application is presented in Figure 1. In orderto better understand the architectural components andthe way they communicate let’s consider the followingsimple scenario: A user would like to perform a firedetection operation, with a given set of parameters,for a specified timeframe and within an area of inter-est. The client connects to the Command and ResultDissemination Component (CRDC) and generates therequest. This component generates a set of jobs whichwill be scheduled for execution on local GRID resourcesor, depending on the number of jobs, on other execu-tion sites. When the jobs finish execution a notificationprocess is initiated and the client is presented with theresult of the requested processing on the GRID. This isa greatly simplified scenario and only exemplifies thedesired functionality of the system. The GRID pro-cessing application is complex and lots of issues needto be addressed, such as:

• design a flexible architecture for efficiently imple-menting general purpose satellite imagery process-ing on GRID systems

• create, query and access MODIS satellite imageryarchives

• implement a GRID processing notification system

• implement a GIS component which will integrate:satellite imagery, existing topological data andprocessing results; based on this component imple-ment a GRID command and result disseminationcomponent.

This paper only analyzes the issues relevant to thehigh level architecture of the system.

Note that Figure 1 references GLOBUS and Condorjob manager as GRID middleware solutions. In or-der to use the GLOBUS facilities mentioned above theuse of GLOBUS Gatekeeper is highly recommended.However, in the case of the job manager componentCondor is only a suggestion. Changing this componentdoes not imply major architectural changes, and othersolutions, such as PBS or Sun Grid Engine could bealternative solutions. In order to further clarify thearchitecture of the GRID processing application Fig-ure 2 reanalyzes the presented scenario detailing theoperational steps involved in the processing operationsfor one GRID computing resource. The operations aresimilar for all the computing resources involved in theprocessing operation.

Proceedings of The Fifth International Symposiumon Parallel and Distributed Computing (ISPDC'06)0-7695-2638-1/06 $20.00 © 2006

Command and result dissemination componet

Spatial database server

Accesses topological dataand processing results

GRID Computing Resource

5d. Store/update spatial data as result of GRID processing operations

Data Management Component

MCS WebServer

MODIS MetaData Catalog

MCS DatabaseServer

Replica Location Service

GridFTP ServerLocation2

Replica Index Node Local ReplicaCatalog

GridFTP ServerLocation1

GridFTP ServerLocation3

5c. Access data from a specific GridFTP location

3. A list of logical file names

2. Query the metadata service

5a. Query RLS service

GRID Scheduling Server

4. Schedules GRID jobs

Client 1

1. Command a GRID processing operation

Assigns jobs to computing resources

5e. Send job completion notification

5b. A list of physical file names

Figure 2. GRID processing application - operational steps

The following steps are performed:

1. Using a front-end of the Command and ResultDissemination Component (CRDC) the user com-mands a processing operation.

2. Based on the user specified parameters the CRDCgenerates a query to the Metadata Catalog (MC).

3. The MC replies with a list of the logical file namescorresponding to the attributes. This list indicatesthe MODIS granules that need to be processed.

4. The CRDC schedules a separate job to be run onGRID resources for each data granule that needsto be processed.

5. Each GRID Computing Resource: (a) generates aquery to the Replica Location Service in order toobtain the list with the closest physical file namescorresponding to the received logical file nameslist, (b) receives a reply with a list of physical filenames (indicating the specific GridFTP server tobe used) from the Local Replica Catalog of thecorresponding RLS server, (c) accesses the MODISdata granules from the specific GridFTP servers,(d) Performs the data actual processing and ex-ports the results to a spatial database server, (e)Generates a notification informing the CRDC thatdata processing has been completed for the specificgranule.

4. GRID monitoring solutions

Operating a successful grid, network or computingfacility requires vast amounts of monitoring informa-tion. The systems also have to automatically trou-bleshoot and optimize very large grid and network sys-tems. While the initial target field of these applicationswere networks and Grid systems supporting data pro-cessing and analysis for global high energy and nuclearphysics collaborations, monitoring tools are broadlyapplicable to many fields of ”data intensive” science,and to the monitoring and management of major re-search and education networks.

An essential part of managing a global Data Grid isa monitoring system that is able to monitor and trackthe many site facilities, networks, and the many tasksin progress, in real time. The monitoring informationgathered also is essential for developing the requiredhigher level services, and components of the Grid sys-tem that provide decision support, and eventually somedegree of automated decisions, to help maintain andoptimize workflow through the Grid.

GridICE [2] is a distributed monitoring tool de-signed for Grid systems. It promotes the adoption ofde-facto standard Grid Information Service interfaces,protocols and data models. Further, different aggre-gations and partitions of monitoring data are providedbased on the specific needs of different users categorieseach of them dealing with a different abstraction levelof a Grid: the Virtual Organization level, the Grid Op-

Proceedings of The Fifth International Symposiumon Parallel and Distributed Computing (ISPDC'06)0-7695-2638-1/06 $20.00 © 2006

eration Center level, the Site Administration level andthe End-User level.

The distribution of monitoring data follows a two-level hierarchy (local site collection, grid-wide collec-tion). The global monitoring information can be ac-cessed in different ways: web-based interface offeringboth textual an graphical representation, XML repre-sentation over HTTP for application consumption andpublish/subscribe for the notification of events of in-terest.

The Grid Monitoring Architecture (R-GMA)[4] consists of three components: Consumers, Produc-ers and a directory service, (which we prefer to call aRegistry). Consumers can query the Registry to findout what type of information is available and locateProducers that provide such information. Once thisinformation is known the Consumer can contact theProducer directly to obtain the relevant data.

The Relational Grid monitoring Architecture (R-GMA) is an implementation of GMA, with two specialproperties. Anyone supplying or obtaining informa-tion from R-GMA does not need to know about theRegistry, the Consumer and Producer handle the reg-istry behind the scenes. APIs are available in variouslanguages for interaction with R-GMA. R-GMA is cur-rently being developed as part of the ”Enabling Gridsfor E-science in Europe” project. Previously it wasdeveloped as part of the European DataGrid Project.

Ganglia [1] is a scalable distributed monitoring sys-tem for high-performance computing systems such asclusters and Grids. It leverages widely used technolo-gies such as XML for data representation, XDR forcompact, portable data transport, and RRDtool fordata storage and visualization. It uses carefully engi-neered data structures and algorithms to achieve verylow per-node overheads and high concurrency. It hasbeen used to link clusters across university campusesand around the world and can scale to handle clusterswith 2000 nodes.

Ganglia is based on a hierarchical design targetedat federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state withinclusters and uses a tree of point-to-point connectionsamongst representative cluster nodes to federate clus-ters and aggregate their state. Within each cluster,Ganglia uses heartbeat messages on a well-known mul-ticast address as the basis for a membership protocol.Membership is maintained by using the reception of aheartbeat as a sign that a node is available and thenon-reception of a heartbeat over a small multiple of aperiodic announcement interval as a sign that a nodeis unavailable.

Each node monitors its local resources and sends

multicast packets containing monitoring data on a well-known multicast address whenever significant updatesoccur. Ganglia distinguishes between built-in metricsand application-specific metrics through a field in themulticast monitoring packets being sent. All nodes lis-ten for both types of metrics on the well-known multi-cast address and collect and maintain monitoring datafor all other nodes. Ganglia is an open-source projectthat grew out of the University of California, Berke-ley Millennium Project which was initially funded inlarge part by the National Partnership for AdvancedComputational Infrastructure (NPACI) and NationalScience Foundation.

The MonALISA [3] system is designed as an en-semble of autonomous multi-threaded, self-describingagent-based subsystems which are registered as dy-namic services, and are able to collaborate and cooper-ate in performing a wide range of information gather-ing and processing tasks. These agents can analyze andprocess the information, in a distributed way, to pro-vide optimization decisions in large scale distributedapplications.

An agent-based architecture provides the ability toinvest the system with increasing degrees of intelli-gence, to reduce complexity and make global systemsmanageable in real time. The scalability of the systemderives from the use of multithreaded execution engineto host a variety of loosely coupled self-describing dy-namic services or agents and the ability of each serviceto register itself and then to be discovered and usedby any other services, or clients that require such in-formation. The system is designed to easily integrateexisting monitoring tools and procedures and to pro-vide this information in a dynamic, customized, selfdescribing way to any other services or clients.

The framework integrates many existing monitoringtools and procedures to collect parameters describingcomputational nodes, applications and network perfor-mance. Specialized mobile agents are used in the Mon-ALISA framework to perform global optimization tasksor help and improve the operation of large distributedsystem by performing supervising tasks for differentapplications or real time parameters.

The MedioGRID solution combines MonaLISA andGanglia. With Ganglia we have access to each node incluster and we can request all information about stateof node: load, CPU usage, etc. For centralize this dateon a single node (suppose to be a server in cluster)we use MonaLISA because we have support for col-lecting this type of data. We can collect informationsabout jobs state or another kind of parameters aboutjobs, parameters created by users with ApMon [3] andsend them into MonaLISA database. From MonaL-

Proceedings of The Fifth International Symposiumon Parallel and Distributed Computing (ISPDC'06)0-7695-2638-1/06 $20.00 © 2006

ISA database we can see the history about cluster in arepository.

In MedioGRID project will be developed applica-tions for satellite image processing and extracting rel-evant environmental and meteorological parameters.Grid applications can use registered services and tools(query, monitoring, discovery, factory, notification, se-curity, registration, management, scheduling) alongwith grid infrastructure. So, according with our pre-sentation about GRID infrastructure and GRID moni-toring solutions, we can conclude that designing an ap-plication for grid computing is much easier if you knowwhat to expect and which are the main work items.We plan to use a development environment or toolkitspecifically designed for grid applications, the GlobusToolkit, MonALISA, Ganglia and other middleware re-sources.

5. Monitoring results

The MedioGRID project is still at an early stage ofdevelopment. At the current stage a dedicated networkarchitecture connecting all the participants has beenimplemented.

In order to derive the optimal network transfer pa-rameters for both the high-speed GigabitEthernet net-works in each location and for the WAN links con-necting the individual locations a series of tests havebeen performed. All the data transfers in MedioGRIDare performed using the GLOBUS GridFTP protocol,which provides means of tweaking several performanceparameters. The parameters which directly influencethe overall data transfer performance are the GridFTPparallelism degree (number of parallel data connectionsused at a time) and the file size.

Figure 3. Transfer rates in MBps between the storage server and

a high speed client at the UTCN site

The following figure presents the transfer rates in

MB/s between the storage server and a high speedclient at the UTCN site. In creating the graphics4500Mb data sets with different file sizes (1x500Mb,5x100Mb, 50x10Mb, 100x5Mb) have been used.

Figure 4. The data transfer performance

This graphic (figure 3) demonstrates that the op-timal performance is achieved when using 3 to 15GridFTP parallel data streams.

The WAN data transfer characteristics in Medio-GRID have been measured between de central stor-age server and a data processing client located at theUPB site. Due to lower WAN speeds 3 50Mb datasets (1x50Mb, 10x5Mb, 50x1Mb) have been used. Asdemonstrated by the graphic in 4 the data transfer per-formance is directly influenced by the number of par-allel data streams.

The graphic demonstrates that, depending on thefile size, larger values for the number of parallel connec-tions offer better performance when transferring dataover the WAN links.

One of the MonaLISA module holds statistical in-formation about the ftp trafic for GridFTP transfers.The output of the module are input, output and therates.

These values are displayed in the MonaLISA client.In the figure 5 is represented the history of transfersbetween the storage server and a high speed clientsfrom UTCN (Technical University of Cluj-Napoca) andUPB (University ”Politehnica” of Bucharest). We usedMonaLISA as part of the VDT.

The graphic show the evolution offtpRateOut_utcn.ro and ftpRateOut_upb.roparameters. These parameters represent the transferof some image file between UTCN and UPB sites. Thevalues of these parameters (~45MBps) demonstratesthat the link between these sites offer the best supportfor date transfer involve in MedioGRID applications.

Proceedings of The Fifth International Symposiumon Parallel and Distributed Computing (ISPDC'06)0-7695-2638-1/06 $20.00 © 2006

Figure 5. Monitoring traffic with MonALISA

6. Conclusions and future developmentdirections

This paper has presented the high level architecturalspecification of the MedioGRID, a research projectaiming at implementing a real-time satellite image pro-cessing system for extracting relevant environmentaland meteorological parameters. The presentation fo-cuses on the key architectural decisions of the GRID-aware satellite image processing system, highlightingthe technologies for each of the major components.

One very important aspect of running a successfulcomputing grid is the generation and the availability ofrealtime monitoring information. In order to addressthis issue the paper also presents an overview of thepossible monitoring technologies, highlighting the mostefficient approaches. A set of preliminary monitoringresults for the MedioGRID data management subsys-tem is also presented, deriving the optimal GridFTPtransfer parameters for the high-speed LAN and WANlinks interconnecting the MedioGRID sites.

Future development directions are based on inte-grating the MODIS realtime processing system with apowerful GIS engine for efficient result dissemination.The system should provide means of easily identify-ing affected areas by integrating data from multiplesources. The data sources can include: detailed roadnetwork, natural landmarks, inhabited places, a.s.o.

Another future objective is the creation of a spe-cialized GIS tool providing for the evaluation of floodand fire evolution over time. This could be the start-ing point for creating an integrated system which canbe remotely accessed by key decision makers and cansupport the decision process within the teams handlingnatural disasters, such as floods and wood fires.

Acknowledgments

The research described in this paper was supportedby the Romanian Education and Research Ministry,under Contract 19CEEX-I03.

References

[1] Ganglia project. http://ganglia.sourceforge.net, Ac-cessed 10th March 2006.

[2] Gridice project. http://infnforge.cnaf.infn.it/gridice,Accesed 11th March 2006.

[3] Monalisa project. http://monalisa.cacr.caltech.edu,Accessed 15th March 2006.

[4] Rgma project. http://www.r-gma.org, Accessed 6thMarch 2006.

[5] Globus toolkit homepage. http://globus.org, GlobusAlliance 2006.

[6] Goddard earth sciences data and information servicescenter homepage. http://daac.gsfc.nasa.gov, NationalAeronautics and Space Administration 2006.

[7] Gridftp. http://globus.org/toolkit/data/gridftp/,Globus Alliance 2006.

[8] Gt 4.0 data replication service. http://www-unix.globus.org/toolkit/docs/4.0/techpreview/datarep/,Globus Alliance 2006.

[9] Gt data management: Reliable file transfer (rft).http://globus.org/toolkit/data/rft/, Globus Alliance2006.

[10] Modis level 1b metadata specification.http://daac.gsfc.nasa.gov/MODIS/FAQ/A metadata,National Aeronautics and Space Administration 2006.

[11] Nasa modis homepage. http://modis.gsfc.nasa.gov/,National Aeronautics and Space Administration 2006.

[12] E. Deelman. Grid-based metadata services,16th international conference on scientific andstatistical database management (ssdbm04).http://www.isi.edu/ deelman/MCS/, SantoriniIsland Greece 2004.

[13] C. Melenti, M. Ordean, D. Gorgan, and S. Oancea.Grid computing-based satellite image processing forfire detection. International Conference on Advancesin the Internet, Processing, Systems and Interdisci-plinary Research, IPSI 2004, Prague, Cehia, pp.101-107, ISBN: 86-7466-117-3, 2004.

[14] O. Muresan and D. Gorgan. Arhitectura retelei medio-grid. Atelier de Lucru MEDIOGRID vol 1, ISBN: 973-713-090-1, Ed MEDIAMIRA Cluj-Napoca, 2006.

[15] M. Ordean, C. Melenti, and D. Gorgan. Mediogrid sys-tem in meteorological and environment applications.International Conference on Advances in the Internet,Processing, Systems and Interdisciplinary Research,IPSI - 2005 Amalfi, Italy, pp: 203-207, ISBN: 86-7466-117-3, 2005.

[16] F. Pop and V. Cristea. Tehnologii actuale in sistemelegrid. Atelier de Lucru MEDIOGRID vol 1, ISBN: 973-713-090-1, Ed MEDIAMIRA Cluj-Napoca, 2006.

Proceedings of The Fifth International Symposiumon Parallel and Distributed Computing (ISPDC'06)0-7695-2638-1/06 $20.00 © 2006