International Journal of High Performance

24
http://hpc.sagepub.com Computing Applications International Journal of High Performance DOI: 10.1177/109434200101500302 2001; 15; 200 International Journal of High Performance Computing Applications Ian Foster, Carl Kesselman and Steven Tuecke The Anatomy of the Grid: Enabling Scalable Virtual Organizations http://hpc.sagepub.com/cgi/content/abstract/15/3/200 The online version of this article can be found at: Published by: http://www.sagepublications.com can be found at: International Journal of High Performance Computing Applications Additional services and information for http://hpc.sagepub.com/cgi/alerts Email Alerts: http://hpc.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: © 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.com Downloaded from

Transcript of International Journal of High Performance

http://hpc.sagepub.com

Computing Applications International Journal of High Performance

DOI: 10.1177/109434200101500302 2001; 15; 200 International Journal of High Performance Computing Applications

Ian Foster, Carl Kesselman and Steven Tuecke The Anatomy of the Grid: Enabling Scalable Virtual Organizations

http://hpc.sagepub.com/cgi/content/abstract/15/3/200 The online version of this article can be found at:

Published by:

http://www.sagepublications.com

can be found at:International Journal of High Performance Computing Applications Additional services and information for

http://hpc.sagepub.com/cgi/alerts Email Alerts:

http://hpc.sagepub.com/subscriptions Subscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

COMPUTING APPLICATIONSANATOMY OF THE GRID

THE ANATOMY OF THE GRID:

ENABLING SCALABLE VIRTUAL

ORGANIZATIONS

Ian Foster1

Carl Kesselman2

Steven Tuecke3

Summary

“Grid” computing has emerged as an important new field,distinguished from conventional distributed computing byits focus on large-scale resource sharing, innovative appli-cations, and, in some cases, high performance orienta-tion. In this article, the authors define this new field. First,they review the “Grid problem,” which is defined as flexible,secure, coordinated resource sharing among dynamiccollections of individuals, institutions, and resources—what is referred to as virtual organizations. In such set-tings, unique authentication, authorization, resource ac-cess, resource discovery, and other challenges are en-countered. It is this class of problem that is addressed byGrid technologies. Next, the authors present an extensibleand open Grid architecture, in which protocols, services,application programming interfaces, and software devel-opment kits are categorized according to their roles in en-abling resource sharing. The authors describe require-ments that they believe any such mechanisms must satisfyand discuss the importance of defining a compact set ofintergrid protocols to enable interoperability among differ-ent Grid systems. Finally, the authors discuss how Gridtechnologies relate to other contemporary technologies,including enterprise integration, application service pro-vider, storage service provider, and peer-to-peer comput-ing. They maintain that Grid concepts and technologiescomplement and have much to contribute to these otherapproaches.

1 Introduction

The term Grid was coined in the mid-1990s to denote aproposed distributed computing infrastructure for ad-vanced science and engineering (Foster and Kesselman,1998a). Considerable progress has since been made onthe construction of such an infrastructure (e.g., Beirigeret al., 2000; Brunett et al., 1998; Johnston, Gannon, andNitzberg, 1999; Stevens et al., 1997), but the term Gridhas also been conflated, at least in popular perception, toembrace everything from advanced networking to artifi-cial intelligence. One might wonder whether the term hasany real substance and meaning. Is there really a distinctGrid problem and hence a need for new Grid technolo-gies? If so, what is the nature of these technologies, andwhat is their domain of applicability? While numerousgroups have interest in Grid concepts and share, to a sig-nificant extent, a common vision of Grid architecture, wedo not see consensus on the answers to these questions.

Our purpose in this article is to argue that the Grid con-cept is indeed motivated by a real and specific problemand that there is an emerging, well-defined Grid technol-ogy base that addresses significant aspects of this prob-lem. In the process, we develop a detailed architectureand road map for current and future Grid technologies.Furthermore, we assert that while Grid technologies arecurrently distinct from other major technology trends,such as Internet, enterprise, distributed, and peer-to-peercomputing, these other trends can benefit significantlyfrom growing into the problem space addressed by Gridtechnologies.

The real and specific problem that underlies the Gridconcept is coordinated resource sharing and problemsolving in dynamic, multi-institutional virtual organiza-tions. The sharing that we are concerned with is not pri-marily file exchange but rather direct access to comput-ers, software, data, and other resources, as is required by arange of collaborative problem-solving and resource-brokering strategies emerging in industry, science, andengineering. This sharing is, necessarily, highly con-trolled, with resource providers and consumers definingclearly and carefully just what is shared, who is allowed to

200 COMPUTING APPLICATIONS

The International Journal of High Performance Computing Applications,Volume 15, No. 3, Fall 2001, pp. 200-222 2001 Sage Publications

Address reprint requests to Ian Foster, Mathematics and Com-puter Science Division, Argonne National Laboratory, 9700 S.Cass Avenue, MCS/221, Argonne, IL 60439; phone: (630) 252-4619; fax: (630) 252-9556; e-mail: [email protected].

1 MATHEMATICS AND COMPUTER SCIENCE DIVISION, AR-GONNE NATIONAL LABORATORY, ARGONNE, ILLINOIS, ANDDEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OFCHICAGO2 INFORMATION SCIENCES INSTITUTE, UNIVERSITY OFSOUTHERN CALIFORNIA3 MATHEMATICS AND COMPUTER SCIENCE DIVISION, AR-GONNE NATIONAL LABORATORY, ARGONNE, ILLINOIS

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

share, and the conditions under which sharing occurs. Aset of individuals and/or institutions defined by such shar-ing rules form what we call a virtual organization (VO).

The following are examples of VOs: the application ser-vice providers, storage service providers, cycle providers,and consultants engaged by a car manufacturer to performscenario evaluation during planning for a new factory;members of an industrial consortium bidding on a new air-craft; a crisis management team and the databases and sim-ulation systems that they use to plan a response to an emer-gency situation; and members of a large, international,multiyear, high-energy physics collaboration. Each ofthese examples represents an approach to computing andproblem solving based on collaboration in computation-and data-rich environments.

As these examples show, VOs vary tremendously intheir purpose, scope, size, duration, structure, community,and sociology. Nevertheless, careful study of underlyingtechnology requirements leads us to identify a broad set ofcommon concerns and requirements. In particular, we seea need for highly flexible sharing relationships, rangingfrom client-server to peer-to-peer; for sophisticated andprecise levels of control over how shared resources areused, including fine-grained and multistakeholder accesscontrol, delegation, and application of local and global pol-icies; for sharing of varied resources, ranging from pro-grams, files, and data to computers, sensors, and networks;and for diverse usage modes, ranging from single user tomultiuser and from performance sensitive to cost sensitiveand hence embracing issues of quality of service, schedul-ing, co-allocation, and accounting.

Current distributed computing technologies do not ad-dress the concerns and requirements just listed. For exam-ple, current Internet technologies address communicationand information exchange among computers but do notprovide integrated approaches to the coordinated use of re-sources at multiple sites for computation. Business-to-business exchanges (Sculley and Woods, 2000) focus oninformation sharing (often via centralized servers). So dovirtual enterprise technologies, although here sharing mayeventually extend to applications and physical devices(e.g., Barry et al., 1998). Enterprise distributed computingtechnologies such as CORBA and Enterprise Java enableresource sharing within a single organization. The OpenGroup’s Distributed Computing Environment (DCE) sup-ports secure resource sharing across sites, but most VOswould find it too burdensome and inflexible. Storage ser-vice providers (SSPs) and application service providers(ASPs) allow organizations to outsource storage and com-puting requirements to other parties, but only in con-

ANATOMY OF THE GRID 201

“The real and specific problem thatunderlies the Grid concept is coordinatedresource sharing and problem solving indynamic, multi-institutional virtualorganizations.”

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

strained ways: for example, SSP resources are typicallylinked to a customer via a virtual private network (VPN).Emerging “distributed computing” companies seek toharness idle computers on an international scale (Foster,2000) but, to date, support only highly centralized accessto those resources. In summary, current technology eitherdoes not accommodate the range of resource types or doesnot provide the flexibility and control on sharing relation-ships needed to establish VOs.

It is here that Grid technologies enter the picture. Overthe past 5 years, research and development efforts withinthe Grid community have produced protocols, services,and tools that address precisely the challenges that arisewhen we seek to build scalable VOs. These technologiesinclude security solutions that support management ofcredentials and policies when computations span multipleinstitutions; resource management protocols and servicesthat support secure remote access to computing and dataresources and the co-allocation of multiple resources; in-formation query protocols and services that provide con-figuration and status information about resources, organi-zations, and services; and data management services thatlocate and transport data sets between storage systemsand applications.

Because of their focus on dynamic, cross-organiza-tional sharing, Grid technologies complement rather thancompete with existing distributed computing technolo-gies. For example, enterprise distributed computing sys-tems can use Grid technologies to achieve resource shar-ing across institutional boundaries; in the ASP/SSP space,Grid technologies can be used to establish dynamic mar-kets for computing and storage resources, hence over-coming the limitations of current static configurations.We discuss the relationship between Grids and these tech-nologies in more detail below.

In the rest of this article, we expand on each of thesepoints in turn. Our objectives are to (1) clarify the natureof VOs and Grid computing for those unfamiliar with thearea, (2) contribute to the emergence of Grid computingas a discipline by establishing a standard vocabulary anddefining an overall architectural framework, and (3) de-fine clearly how Grid technologies relate to other technol-ogies, explaining both why emerging technologies do notyet solve the Grid computing problem and how these tech-nologies can benefit from Grid technologies.

It is our belief that VOs have the potential to changedramatically the way we use computers to solve prob-lems, much as the Web has changed how we exchange in-formation. As the examples presented here illustrate, theneed to engage in collaborative processes is fundamental

to many diverse disciplines and activities: it is not limitedto science, engineering, and business activities. It is be-cause of this broad applicability of VO concepts that Gridtechnology is important.

2 The Emergence of VirtualOrganizations

Consider the following four scenarios:

1. A company needing to reach a decision on theplacement of a new factory invokes a sophisticatedfinancial forecasting model from an ASP, provid-ing it with access to appropriate proprietary histor-ical data from a corporate database on storagesystems operated by an SSP. During the decision-making meeting, what-if scenarios are runcollaboratively and interactively, even though thedivision heads participating in the decision arelocated in different cities. The ASP itself contractswith a cycle provider for additional “oomph” dur-ing particularly demanding scenarios, requiring ofcourse that cycles meet desired security andperformance requirements.

2. An industrial consortium formed to develop a fea-sibility study for a next-generation supersonic air-craft undertakes a highly accurate multi-disciplinary simulation of the entire aircraft. Thissimulation integrates proprietary software compo-nents developed by different participants, witheach component operating on that participant’scomputers and having access to appropriate designdatabases and other data made available to the con-sortium by its members.

3. A crisis management team responds to a chemicalspill by using local weather and soil models to esti-mate the spread of the spill, determining the im-pact based on population location as well as geo-graphic features such as rivers and water supplies,creating a short-term mitigation plan (perhapsbased on chemical reaction models), and taskingemergency response personnel by planning andcoordinating evacuation, notifying hospitals, andso forth.

4. Thousands of physicists at hundreds of laborato-ries and universities worldwide come together todesign, create, operate, and analyze the products ofa major detector at CERN, the European high-energy physics laboratory. During the analysisphase, they pool their computing, storage, and net-

202 COMPUTING APPLICATIONS

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

working resources to create a “Data Grid” capableof analyzing petabytes of data (Chervenak et al.,2001; Hoschek et al., 2000; Moore et al., 1998).

These four examples differ in many respects: the numberand type of participants, the types of activities, the durationand scale of the interaction, and the resources being shared.But they also have much in common, as discussed in thefollowing (see also Figure 1).

In each case, a number of mutually distrustful partici-pants with varying degrees of prior relationship (perhapsnone at all) want to share resources to perform some task.Furthermore, sharing is about more than simply docu-menting exchange (as in “virtual enterprises”)(Camarinha-Matos et al., 1998): it can involve direct ac-cess to remote software, computers, data, sensors, andother resources. For example, members of a consortiummay provide access to specialized software and data and/orpool their computational resources.

Resource sharing is conditional: each resource ownermakes resources available, subject to constraints on when,where, and what can be done. For example, a participant inVO P of Figure 1 might allow VO partners to invoke theirsimulation service only for “simple” problems. Resourceconsumers may also place constraints on properties of the

ANATOMY OF THE GRID 203

Fig. 1 An actual organization can participate in one or more virtual organizations (VOs) by sharing some or all of its resources.We show three actual organizations (the ovals) and two VOs:P,which links participants in an aerospace design consortium,and Q,which links colleagues who have agreed to share spare computing cycles, for example, to run ray-tracing computations. The orga-nization on the left participates in P, the one to the right participates in Q, and the third is a member of both P and Q. The policiesgoverning access to resources (summarized in “quotes”) vary according to the actual organizations, resources, and VOs in-volved.

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

resources they are prepared to work with. For example, aparticipant in VO Q might accept only pooled computa-tional resources certified as “secure.” The implementationof such constraints requires mechanisms for expressingpolicies, establishing the identity of a consumer or re-source (authentication), and determining whether an oper-ation is consistent with applicable sharing relationships(authorization).

Sharing relationships can vary dynamically over time,in terms of the resources involved, the nature of the accesspermitted, and the participants to whom access is permit-ted. And these relationships do not necessarily involve anexplicitly named set of individuals but rather may be de-fined implicitly by the policies that govern access to re-sources. For example, an organization might enable accessby anyone who can demonstrate that he or she is a “cus-tomer” or a “student.”

The dynamic nature of sharing relationships means thatwe require mechanisms for discovering and characterizingthe nature of the relationships that exist at a particular pointin time. For example, a new participant joining VO Q mustbe able to determine what resources it is able to access, the“quality” of these resources, and the policies that governaccess.

Sharing relationships are often not simply client-serverbut peer to peer: providers can be consumers, and sharingrelationships can exist among any subset of participants.Sharing relationships may be combined to coordinate useacross many resources, each owned by different organiza-tions. For example, in VO Q, a computation started on onepooled computational resource may subsequently accessdata or initiate subcomputations elsewhere. The ability todelegate authority in controlled ways becomes importantin such situations, as do mechanisms for coordinating op-erations across multiple resources (e.g., coscheduling).

The same resource may be used in different ways, de-pending on the restrictions placed on the sharing and thegoal of the sharing. For example, a computer may be usedonly to run a specific piece of software in one sharing ar-rangement, while it may provide generic compute cycles inanother. Because of the lack of a priori knowledge abouthow a resource may be used, performance metrics, expec-tations, and limitations (i.e., quality of service) may be partof the conditions placed on resource sharing or usage.

These characteristics and requirements define what weterm a virtual organization, a concept that we believe is be-coming fundamental to much of modern computing. VOsenable disparate groups of organizations and/or individu-als to share resources in a controlled fashion, so that mem-bers may collaborate to achieve a shared goal.

204 COMPUTING APPLICATIONS

“Sharing relationships are often notsimply client-server but peer to peer:providers can be consumers, and sharingrelationships can exist among any subsetof participants.”

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

3 The Nature of Grid Architecture

The establishment, management, and exploitation of dy-namic, cross-organizational VO sharing relationships re-quire new technology. We structure our discussion of thistechnology in terms of a Grid architecture that identifiesfundamental system components, specifies the purposeand function of these components, and indicates howthese components interact with one another.

In defining a Grid architecture, we start from the per-spective that effective VO operation requires that we beable to establish sharing relationships among any poten-tial participants. Interoperability is thus the central issueto be addressed. In a networked environment, inter-operability means common protocols. Hence, our Grid ar-chitecture is first and foremost a protocol architecture,with protocols defining the basic mechanisms by whichVO users and resources negotiate, establish, manage, andexploit sharing relationships. A standards-based open ar-chitecture facilitates extensibility, interoperability, porta-bility, and code sharing; standard protocols make it easyto define standard services that provide enhanced capabil-ities. We can also construct application programming in-terfaces and software development kits (see the appendixfor definitions) to provide the programming abstractionsrequired to create a usable Grid. Together, this technologyand architecture constitute what is often termed middle-ware (“the services needed to support a common set of ap-plications in a distributed network environment”; Aikenet al., 2000), although we avoid that term here dueto its vagueness. We discuss each of these points in thefollowing.

Why is interoperability such a fundamental concern?At issue is our need to ensure that sharing relationshipscan be initiated among arbitrary parties, accommodatingnew participants dynamically across different platforms,languages, and programming environments. In this con-text, mechanisms serve little purpose if they are not de-fined and implemented so as to be interoperable across or-ganizational boundaries, operational policies, andresource types. Without interoperability, VO applicationsand participants are forced to enter into bilateral sharingarrangements, as there is no assurance that the mecha-nisms used between any two parties will extend to anyother parties. Without such assurance, dynamic VO for-mation is all but impossible, and the types of VOs that canbe formed are severely limited. Just as the Web revolu-tionized information sharing by providing a universalprotocol and syntax (HTTP and HTML) for information

exchange, so we require standard protocols and syntaxesfor general resource sharing.

Why are protocols critical to interoperability? A proto-col definition specifies how distributed system elementsinteract with one another to achieve a specified behaviorand the structure of the information exchanged during thisinteraction. This focus on externals (interactions) ratherthan internals (software, resource characteristics) has im-portant pragmatic benefits. VOs tend to be fluid; hence,the mechanisms used to discover resources, establishidentity, determine authorization, and initiate sharingmust be flexible and lightweight, so that resource-sharingarrangements can be established and changed quickly.Because VOs complement rather than replace existing in-stitutions, sharing mechanisms cannot require substantialchanges to local policies and must allow individual insti-tutions to maintain ultimate control over their own re-sources. Since protocols govern the interaction betweencomponents and not the implementation of the compo-nents, local control is preserved.

Why are services important? A service (see the appen-dix) is defined solely by the protocol that it speaks and thebehaviors that it implements. The definition of standardservices—for access to computation, access to data, re-source discovery, coscheduling, data replication, and soforth—allows us to enhance the services offered to VOparticipants and also to abstract away resource-specificdetails that would otherwise hinder the development ofVO applications.

Why do we also consider application programming in-terfaces (APIs) and software development kits (SDKs)?There is, of course, more to VOs than interoperability,protocols, and services. Developers must be able to de-velop sophisticated applications in complex and dynamicexecution environments. Users must be able to operatethese applications. Application robustness, correctness,development costs, and maintenance costs are all impor-tant concerns. Standard abstractions, APIs, and SDKs canaccelerate code development, enable code sharing, andenhance application portability. APIs and SDKs are anadjunct to, not an alternative to, protocols. Without stan-dard protocols, interoperability can be achieved at theAPI level only by using a single implementation every-where—infeasible in many interesting VOs—or by hav-ing every implementation know the details of every otherimplementation. (The Jini approach [Arnold et al., 1999]of downloading protocol code to a remote site does notcircumvent this requirement.)

ANATOMY OF THE GRID 205

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

In summary, our approach to Grid architecture empha-sizes the identification and definition of protocols and ser-vices, first, and APIs and SDKs, second.

4 Grid Architecture Description

Our goal in describing our Grid architecture is not to pro-vide a complete enumeration of all required protocols(and services, APIs, and SDKs) but rather to identify re-quirements for general classes of components. The resultis an extensible, open architectural structure within whichcan be placed solutions to key VO requirements. Our ar-chitecture and the subsequent discussion organize com-ponents into layers, as shown in Figure 2. Componentswithin each layer share common characteristics but canbuild on capabilities and behaviors provided by any lowerlayer.

In specifying the various layers of the Grid architec-ture, we follow the principles of the “hourglass model”(Realizing the Information Future, 1994). The narrowneck of the hourglass defines a small set of core abstrac-tions and protocols (e.g., transmission control protocol[TCP] and HTTP in the Internet), onto which many differ-ent high-level behaviors can be mapped (the top of thehourglass) and which themselves can be mapped ontomany different underlying technologies (the base of thehourglass). By definition, the number of protocols de-fined at the neck must be small. In our architecture, theneck of the hourglass consists of resource and connectiv-ity protocols, which facilitate the sharing of individual re-sources. Protocols at these layers are designed so that theycan be implemented on top of a diverse range of resourcetypes, defined at the fabric layer, and can in turn be used toconstruct a wide range of global services and application-specific behaviors at the collective layer—so called be-cause they involve the coordinated (“collective”) use ofmultiple resources.

Our architectural description is high level and placesfew constraints on design and implementation. To makethis abstract discussion more concrete, we also list, for il-lustrative purposes, the protocols defined within the Glo-bus Toolkit (Foster and Kesselman, 1998b) and usedwithin such Grid projects as the National Science Founda-tion’s (NSF’s) National Technology Grid (Stevens et al.,1997), NASA’s Information Power Grid (Johnston,Gannon, and Nitzberg, 1999), the Department of En-ergy’s (DOE’s) DISCOM (Beiriger et al., 2000),GriPhyN (www.griphyn.org) , NEESgrid(www.neesgrid.org), Particle Physics Data Grid(www.ppdg.net), and the European Data Grid (www.eu-

datagrid.org). More details will be provided in a subse-quent paper.

4.1 FABRIC: INTERFACES

TO LOCAL CONTROL

The Grid fabric layer provides the resources to whichshared access is mediated by Grid protocols: for example,computational resources, storage systems, catalogs, net-work resources, and sensors. A “resource” may be a logi-cal entity, such as a distributed file system, computer clus-ter, or distributed computer pool; in such cases, a resourceimplementation may involve internal protocols (e.g., theNSF storage access protocol or a cluster resource man-agement system’s process management protocol), butthese are not the concern of Grid architecture.

Fabric components implement the local, resource-specific operations that occur on specific resources(whether physical or logical) as a result of sharing opera-tions at higher levels. There is thus a tight and subtle inter-dependence between the functions implemented at thefabric level, on one hand, and the sharing operations sup-ported, on the other. Richer fabric functionality enablesmore sophisticated sharing operations; at the same time,if we place few demands on fabric elements, then deploy-ment of Grid infrastructure is simplified. For example, re-source-level support for advance reservations makes itpossible for higher level services to aggregate (co-schedule) resources in interesting ways that would other-wise be impossible to achieve. However, as in practicefew resources support advance reservation “out of thebox,” a requirement for advance reservation increases thecost of incorporating new resources into a Grid.

The issue/significance of building large, integratedsystems, just-in-time by aggregation (coscheduling andcomanagement), is a significant new capability providedby these Grid services.

Experience suggests that at a minimum, resourcesshould implement enquiry mechanisms that permit dis-covery of their structure, state, and capabilities (e.g.,whether they support advance reservation), on one hand,and resource management mechanisms that provide somecontrol of delivered quality of service, on the other. Thefollowing brief and partial list provides a resource-specific characterization of capabilities.

• Computational resources: Mechanisms are requiredfor starting programs and for monitoring and control-ling the execution of the resulting processes. Manage-ment mechanisms that allow control over the re-sources allocated to processes are useful, as are ad-

206 COMPUTING APPLICATIONS

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

vance reservation mechanisms. Enquiry functions areneeded for determining hardware and software charac-teristics as well as relevant state information such ascurrent load and queue state in the case of scheduler-managed resources.

• Storage resources: Mechanisms are required for put-ting and getting files. Third-party and high performance(e.g., striped) transfers are useful (Tierney et al., 1996).So are mechanisms for reading and writing subsets of afile and/or executing remote data selection or reductionfunctions (Beynon et al., 2000). Management mecha-nisms that allow control over the resources allocated todata transfers (space, disk bandwidth, network band-width, CPU) are useful, as are advance reservationmechanisms. Enquiry functions are needed for deter-mining hardware and software characteristics as well asrelevant load information such as available space andbandwidth utilization.

• Network resources: Management mechanisms that pro-vide control over the resources allocated to networktransfers (e.g., prioritization, reservation) can be useful.Enquiry functions should be provided to determine net-work characteristics and load.

• Code repositories: This specialized form of storage re-source requires mechanisms for managing versionedsource and object code: for example, a control systemsuch as CVS.

ANATOMY OF THE GRID 207

Fabric

Collective

Resource

Connectivity

Application

Application

Link

Transport

Internet

Grid

Pro

toco

l Arc

hite

ctur

e

Inte

rnet

Pro

toco

l Arc

hite

ctur

e

Fig. 2 The layered Grid architecture and its relationship to the Internet protocol (IP) architecture. Because the IP architecture ex-tends from network to application, there is a mapping from Grid layers into Internet layers.

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

• Catalogs: This specialized form of storage resourcerequires mechanisms for implementing catalog queryand update operations: for example, a relational data-base (Baru et al., 1998).Globus Toolkit. The Globus Toolkit has been de-

signed to use (primarily) existing fabric components, in-cluding vendor-supplied protocols and interfaces. How-ever, if a vendor does not provide the necessary fabric-level behavior, the Globus Toolkit includes the missingfunctionality. For example, enquiry software is providedfor discovering structure and state information for variouscommon resource types, such as computers (e.g., OSversion, hardware configuration, load [Dinda andO’Hallaron, 1999], scheduler queue status), storage sys-tems (e.g., available space), and networks (e.g., currentand predicted future load) (Lowekamp et al., 1998;Wolski, 1997), and for packaging this information in aform that facilitates the implementation of higher levelprotocols, specifically at the resource layer. Resourcemanagement, on the other hand, is generally assumed tobe the domain of local resource managers. One exceptionis the General-Purpose Architecture for Reservation andAllocation (GARA) (Foster, Roy, and Sander, 2000),which provides a “slot manager” that can be used to im-plement advance reservation for resources that do not sup-port this capability. Others have developed enhancementsto the Portable Batch System (PBS) (Papakhian, 1998)and Condor (Litzkow, Livny, and Mutka, 1988; Livny,1998), which support advance reservation capabilities.

4.2 CONNECTIVITY: COMMUNICATING

EASILY AND SECURELY

The connectivity layer defines core communication andauthentication protocols required for Grid-specific net-work transactions. Communication protocols enable theexchange of data between fabric-layer resources. Authen-tication protocols build on communication services toprovide cryptographically secure mechanisms for verify-ing the identity of users and resources.

Communication requirements include transport, rout-ing, and naming. While alternatives certainly exist, we as-sume here that these protocols are drawn from the TCP/Internet Protocol (IP) stack: specifically, the Internet (IPand ICMP), transport (TCP, UDP), and application (DNS,OSPF, RSVP, etc.) layers of the Internet-layered protocolarchitecture (Baker, 1995). This is not to say that in the fu-ture, Grid communications will not demand new proto-cols that take into account particular types of networkdynamics.

With respect to security aspects of the connectivitylayer, we observe that the complexity of the security prob-lem makes it important that any solutions be based on ex-isting standards whenever possible. As with communica-tion, many of the security standards developed within thecontext of the IP suite are applicable.

Authentication solutions for VO environments shouldhave the following characteristics (Butler et al., 2000):

• Single sign-on. Users must be able to “log on” (authen-ticate) just once and then have access to multiple Gridresources defined in the fabric layer, without furtheruser intervention.

• Delegation (Foster et al., 1998; Gasser and McDer-mott, 1990; Howell and Kotz, 2000). A user must beable to endow a program with the ability to run on thatuser’s behalf, so that the program is able to access theresources on which the user is authorized. The pro-gram should (optionally) also be able to conditionallydelegate a subset of its rights to another program(sometimes referred to as restricted delegation).

• Integration with various local security solutions. Eachsite or resource provider may employ any of a varietyof local security solutions, including Kerberos andUnix security. Grid security solutions must be able tointeroperate with these various local solutions. Theycannot, realistically, require wholesale replacement oflocal security solutions but rather must allow mappinginto the local environment.

• User-based trust relationships. For a user to use re-sources from multiple providers together, the securitysystem must not require each of the resource providersto cooperate or interact with each other in configuringthe security environment. For example, if a user hasthe right to use sites A and B, the user should be able touse sites A and B together without requiring that A’sand B’s security administrators interact.

Grid security solutions should also provide flexible sup-port for communication protection (e.g., control over thedegree of protection, independent data unit protection forunreliable protocols, support for reliable transport proto-cols other than TCP) and enable stakeholder control overauthorization decisions, including the ability to restrictthe delegation of rights in various ways.

Globus Toolkit. The Internet protocols listed aboveare used for communication. The public key–based Gridsecurity infrastructure (GSI) protocols (Butler et al.,2000; Foster et al., 1998) are used for authentication,

208 COMPUTING APPLICATIONS

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

communication protection, and authorization. GSI buildson and extends the transport layer security (TLS) proto-cols (Dierks and Allen, 1999) to address most of the is-sues listed above—in particular, single sign-on, delega-tion, integration with various local security solutions (in-cluding Kerberos) (Steiner, Neuman, and Schiller, 1988),and user-based trust relationships. X.509-format identitycertificates are used. Stakeholder control of authorizationis supported via an authorization toolkit that allows re-source owners to integrate local policies via a generic au-thorization and access (GAA) control interface. Rich sup-port for restricted delegation is not provided in the currenttoolkit release (v1.1.4) but has been demonstrated inprototypes.

4.3 RESOURCE: SHARING

SINGLE RESOURCES

The resource layer builds on connectivity-layer commu-nication and authentication protocols to define protocols(and APIs and SDKs) for the secure negotiation, initia-tion, monitoring, control, accounting, and payment ofsharing operations on individual resources. Resource-layer implementations of these protocols call fabric-layerfunctions to access and control local resources. Resource-layer protocols are concerned entirely with individual re-sources and hence ignore issues of global state and atomicactions across distributed collections; such issues are theconcern of the collective layer, discussed next.

Two primary classes of resource-layer protocols canbe distinguished:

• Information protocols are used to obtain informationabout the structure and state of a resource, for example,its configuration, current load, and usage policy (e.g.,cost).

• Management protocols are used to negotiate access toa shared resource, specifying, for example, resourcerequirements (including advanced reservation andquality of service) and the operation(s) to be per-formed, such as process creation, or data access. Sincemanagement protocols are responsible for instant-iating sharing relationships, they must serve as a “pol-icy application point,” ensuring that the requested pro-tocol operations are consistent with the policy underwhich the resource is to be shared. Issues that must beconsidered include accounting and payment. A proto-col may also support monitoring the status of an opera-tion and controlling (e.g., terminating) the operation.

While many such protocols can be imagined, the resource(and connectivity) protocol layers form the neck of ourhourglass model and as such should be limited to a smalland focused set. These protocols must be chosen to cap-ture the fundamental mechanisms of sharing across manydifferent resource types (e.g., different local resourcemanagement systems) while not overly constraining thetypes or performance of higher level protocols that maybe developed.

The list of desirable fabric functionality provided inSection 4.1 summarizes the major features required inresource-layer protocols. To this list, we add the need for“exactly once” semantics for many operations, with reli-able error reporting indicating when operations fail.

Globus Toolkit. A small and mostly standards-basedset of protocols is adopted, particularly the following:

• A Grid resource information protocol (GRIP, currentlybased on the lightweight directory access protocol[LDAP]) is used to define a standard resource informa-tion protocol and associated information model. Anassociated soft-state resource registration protocol, theGrid resource registration protocol (GRRP), is used toregister resources with Grid index information servers,discussed in the next section (Czajkowski et al., 2001).

• The HTTP-based Grid resource access and manage-ment (GRAM) protocol is used for allocation of com-putational resources and for monitoring and control ofcomputation on those resources (Czajkowski et al.,1998).

• An extended version of the file transfer protocol,GridFTP, is a management protocol for data access;extensions include use of connectivity-layer securityprotocols, partial file access, and management of par-allelism for high-speed transfers (Allcock et al., 2001).FTP is adopted as a base data transfer protocol becauseof its support for third-party transfers and because itsseparate control and data channels facilitate the imple-mentation of sophisticated servers.

• LDAP is also used as a catalog access protocol.

The Globus Toolkit defines client-side C and Java APIsand SDKs for each of these protocols. Server-side SDKsand servers are also provided for each protocol to facili-tate the integration of various resources (computational,storage, network) into the Grid. For example, the Grid re-source information service (GRIS) implements server-side LDAP functionality, with callouts allowing for publi-cation of arbitrary resource information (Czajkowskiet al., 2001). An important server-side element of the

ANATOMY OF THE GRID 209

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

overall toolkit is the “gatekeeper,” which provides what isin essence a GSI-authenticated “inetd” that speaks theGRAM protocol and can be used to dispatch various localoperations. The generic security services (GSS) API(Linn, 2000) is used to acquire, forward, and verify au-thentication credentials and to provide transport-layer in-tegrity and privacy within these SDKs and servers, en-abling substitution of alternative security services at theconnectivity layer.

4.4 COLLECTIVE: COORDINATING

MULTIPLE RESOURCES

While the resource layer is focused on interactions with asingle resource, the next layer in the architecture containsprotocols and services (and APIs and SDKs) that are notassociated with any one specific resource but rather areglobal in nature and capture interactions across collec-tions of resources. For this reason, we refer to the nextlayer of the architecture as the collective layer. Becausecollective components build on the narrow resource- andconnectivity-layer “neck” in the protocol hourglass, theycan implement a wide variety of sharing behaviors with-out placing new requirements on the resources beingshared. For example,

• Directory services allow VO participants to discoverthe existence and/or properties of VO resources. A di-rectory service may allow its users to query for re-sources by name and/or by attributes such as type,availability, or load (Czajkowski et al., 2001). Re-source-level GRRP and GRIP protocols are used toconstruct directories.

• Co-allocation, scheduling, and brokering services al-low VO participants to request the allocation of one ormore resources for a specific purpose and the schedul-ing of tasks on the appropriate resources. Examples in-clude AppLeS (Berman, 1998; Berman et al., 1996),Condor-G (Frey et al., 2001), Nimrod-G (Abramsonet al., 1995), and the DRM broker (Beiriger et al.,2000).

• Monitoring and diagnostics services support the moni-toring of VO resources for failure, adversarial attack(“intrusion detection”), overload, and so forth.

• Data replication services support the management ofVO storage (and perhaps also network and computing)resources to maximize data access performance withrespect to metrics such as response time, reliability,and cost (Allcock et al., 2001; Hoschek et al., 2000).

• Grid-enabled programming systems enable familiarprogramming models to be used in Grid environments,using various Grid services to address resource discov-ery, security, resource allocation, and other concerns.Examples include Grid-enabled implementations ofthe message-passing interface (Foster and Karonis,1998; Gabriel et al., 1998) and manager-worker frame-works (Casanova et al., 2000; Goux et al., 2000).

• Workload management systems and collaborationframeworks—also known as problem-solving envi-ronments (PSEs)—provide for the description, use,and management of multistep, asynchronous, multi-component workflows.

• Software discovery services discover and select thebest software implementation and execution platformbased on the parameters of the problem being solved(Casanova et al., 1998). Examples include NetSolve(Casanova and Dongarra, 1997) and Ninf (Nakada,Sato, and Sekiguchi, 1999).

• Community authorization servers enforce communitypolicies governing resource access, generating capa-bilities that community members can use to accesscommunity resources. These servers provide a globalpolicy enforcement service by building on resource in-formation and resource management protocols (in theresource layer) and security protocols in the connec-tivity layer. Akenti (Thompson et al., 1999) addressessome of these issues.

• Community accounting and payment services gatherresource usage information for the purpose of ac-counting, payment, and/or limiting of resource usageby community members.

• Collaboratory services support the coordinated ex-change of information within potentially large usercommunities, whether synchronously or asynch-ronously. Examples are CAVERNsoft (DeFanti andStevens, 1998; Leigh, Johnson, and DeFanti, 1997),Access Grid (Childers et al., 2000), and commoditygroupware systems.

These examples illustrate the wide variety of collective-layer protocols and services that are encountered in prac-tice. Notice that while resource-layer protocols must begeneral in nature and are widely deployed, collective-layer protocols span the spectrum from general purposeto highly application or domain specific, with the latterexisting perhaps only within specific VOs.

210 COMPUTING APPLICATIONS

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

Collective functions can be implemented as persistentservices, with associated protocols, or as SDKs (with asso-ciated APIs) designed to be linked with applications. Inboth cases, their implementation can build on resource-layer (or other collective-layer) protocols and APIs. Forexample, Figure 3 shows a collective co-allocation APIand SDK (the middle tier) that uses a resource-layer man-agement protocol to manipulate underlying resources.Above this, we define a coreservation service protocol andimplement a coreservation service that speaks thisprotocol, calling the co-allocation API to implement co-allocation operations and perhaps providing additionalfunctionality, such as authorization, fault tolerance, andlogging. An application might then use the coreservation ser-vice protocol to request end-to-end network reservations.

Collective components may be tailored to the require-ments of a specific user community, VO, or applicationdomain—for example, an SDK that implements anapplication-specific coherency protocol or a coreservationservice for a specific set of network resources. Other col-lective components can be more general purpose—for ex-ample, a replication service that manages an internationalcollection of storage systems for multiple communities ora directory service designed to enable the discovery ofVOs. In general, the larger the target user community, themore important it is that a collective component’s proto-col(s) and API(s) be standards based.

ANATOMY OF THE GRID 211

Co-reservation Service

Application

Co- reservation Service API & SDK

Resource Mgmt API & SDK

NetworkResource

NetworkResource

ComputeResource

Co- reservation Protocol

Resource Mgmt Protocol

Co-Allocation API & SDK

Fabric Layer

Resource Layer

Collective Layer

Fig. 3 Collective- and resource-layer protocols, services, application programming interfaces (APIs), and software develop-ment kits (SDKs) can be combined in a variety of ways to deliver functionality to applications

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

Globus Toolkit. In addition to the example serviceslisted earlier in this section, many of which build on Glo-bus connectivity and resource protocols, we mention theMeta Directory Service (MDS). The MDS introduces Gridinformation index servers (GIISs) to support arbitraryviews of resource subsets. The LDAP information protocolis used to access resource-specific GRISs to obtain re-source state and GRRP used for resource registration, aswell as replica catalog and replica management servicesused to support the management of data set replicas in aGrid environment (Allcock et al., 2001). An online creden-tial repository service (“MyProxy”) provides secure stor-age for proxy credentials (Novotny, Tuecke, and Welch,2001). The DUROC co-allocation library provides anSDK and API for resource co-allocation (Czajkowski,Foster, and Kesselman, 1999).

4.5 APPLICATIONS

The final layer in our Grid architecture comprises the userapplications that operate within a VO environment. Figure 4illustrates an application programmer’s view of Grid ar-chitecture. Applications are constructed in terms of, and bycalling on, services defined at any layer. At each layer, wehave well-defined protocols that provide access to someuseful service: resource management, data access, re-

212 COMPUTING APPLICATIONS

Applications

Fabric

Collective Services

Resource Services

Connectivity APIs

Collective APIs & SDKs

Resource APIs & SDKs

Collective Service Protocols

Resource Service Protocols

Connectivity Protocols

Languages & Frameworks

API/SDK

Service

Key:

Fig. 4 Application programming interfaces (APIs) are implemented by software development kits (SDKs), which in turn use Gridprotocols to interact with network services that provide capabilities to the end user. Higher level SDKs can provide functionalitythat is not directly mapped to a specific protocol but may combine protocol operations with calls to additional APIs as well as im-plement local functionality. Solid lines represent a direct call; dashed lines represent protocol interactions.

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

source discovery, and so forth. At each layer, APIs mayalso be defined whose implementation (ideally providedby third-party SDKs) exchange protocol messages withthe appropriate service(s) to perform desired actions.

We emphasize that what we label “applications” andshow in a single layer in Figure 4 may in practice call on so-phisticated frameworks and libraries—for example, theCommon Component Architecture (Armstrong et al.,1999), SciRun (Casanova et al., 1998), CORBA (Gannonand Grimshaw, 1998; Lopez et al., 2000), Cactus (Bengeret al., 1999), and workflow systems (Bolcer and Kaiser,1999)—and feature much internal structure that would, ifcaptured in our figure, expand it out to many times its cur-rent size. These frameworks may themselves define proto-cols, services, and/or APIs (e.g., the simple workflow ac-cess protocol) (Bolcer and Kaiser, 1999). However, theseissues are beyond the scope of this article, which addressesonly the most fundamental protocols and services requiredin a Grid.

5 Grid Architecture in Practice

We use two examples to illustrate how Grid architecturefunctions in practice. Table 1 shows the services that mightbe used to implement the multidisciplinary simulation andcycle-sharing (ray-tracing) applications introduced in Fig-ure 1. The basic fabric elements are the same in each case:computers, storage systems, and networks. Furthermore,each resource speaks standard connectivity protocols forcommunication and security and resource protocols for en-quiry, allocation, and management. Above this, each appli-

ANATOMY OF THE GRID 213

Table 1

The Grid Services Used to Construct the Two Example Applications of Figure 1

Multidisciplinary Simulation Ray Tracing

Collective Solver coupler, distributed data archiver Checkpointing, job management, failover, staging(application specific)

Collective (generic) Resource discovery, resource brokering, system monitoring, community authorization,certificate revocation

Resource Access to computation; access to data; access to information about system structure, state,performance

Connectivity Communication (Internet protocol), service discovery (DNS), authentication, authorization,delegation

Fabric Storage systems, computers, networks, code repositories, catalogs

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

cation uses a mix of generic and more application-specificcollective services.

In the case of the ray-tracing application, we assumethat this is based on a high-throughput computing system(Frey et al., 2001; Livny, 1998). To manage the executionof large numbers of largely independent tasks in a VO en-vironment, this system must keep track of the set of activeand pending tasks, locate appropriate resources for eachtask, stage executables to those resources, detect and re-spond to various types of failure, and so forth. An imple-mentation in the context of our Grid architecture uses bothdomain-specific collective services (dynamic checkpoint,task pool management, failover) and more generic collec-tive services (brokering, data replication for executables,and common input files), as well as standard resource andconnectivity protocols. Condor-G represents a first steptoward this goal (Frey et al., 2001).

In the case of the multidisciplinary simulation applica-tion, the problems are quite different at the highest level.Some application framework (e.g., CORBA, CCA) maybe used to construct the application from its various com-ponents. We also require mechanisms for discovering ap-propriate computational resources, for reserving time onthose resources, staging executables (perhaps), providingaccess to remote storage, and so forth. Again, a number ofdomain-specific collective services will be used (e.g.,solver coupler, distributed data archiver), but the basic un-derpinnings are the same as in the ray-tracing example.

6 “On the Grid”: The Needfor Intergrid Protocols

Our Grid architecture establishes requirements for theprotocols and APIs that enable sharing of resources, ser-vices, and code. It does not otherwise constrain the tech-nologies that might be used to implement these protocolsand APIs. In fact, it is quite feasible to define multipleinstantiations of key Grid architecture elements. For ex-ample, we can construct both Kerberos- and PKI-basedprotocols at the connectivity layer—and access these se-curity mechanisms via the same API, thanks to GSS-API(see the appendix). However, Grids constructed withthese different protocols are not interoperable and cannotshare essential services—at least not without gateways.For this reason, the long-term success of Grid computingrequires that we select and achieve widespread deploy-ment of one set of protocols at the connectivity and re-source layers—and, to a lesser extent, at the collectivelayer. Much as the core Internet protocols enable differentcomputer networks to interoperate and exchange infor-

mation, these intergrid protocols (as we might call them)enable different organizations to interoperate and ex-change or share resources. Resources that speak theseprotocols can be said to be “on the Grid.” Standard APIsare also highly useful if Grid code is to be shared. Theidentification of these intergrid protocols and APIs is be-yond the scope of this article, although the Globus Toolkitrepresents an approach that has had some success to date.

7 Relationships withOther Technologies

The concept of controlled, dynamic sharing within VOs isso fundamental that we might assume that Grid-like tech-nologies must surely already be widely deployed. In prac-tice, however, while the need for these technologies is in-deed widespread, in a wide variety of different areas, wefind only primitive and inadequate solutions to VO prob-lems. In brief, current distributed computing approachesdo not provide a general resource-sharing framework thataddresses VO requirements. Grid technologies distin-guish themselves by providing this generic approach toresource sharing. This situation points to numerous op-portunities for the application of Grid technologies.

7.2 WORLD WIDE WEB

The ubiquity of Web technologies (i.e., IETF and W3Cstandard protocols—TCP/IP, HTTP, SOAP, etc.—andlanguages, such as HTML and XML) makes them attrac-tive as a platform for constructing VO systems and appli-cations. However, while these technologies do an excel-lent job of supporting the browser-client to Web-serverinteractions that are the foundation of today’s Web, theylack features required for the richer interaction modelsthat occur in VOs. For example, today’s Web browserstypically use transport layer security (TLS) for authenti-cation but do not support single sign-on or delegation.

Clear steps can be taken to integrate Grid and Webtechnologies. For example, the single sign-on capabilitiesprovided in the GSI extensions to TLS would, if inte-grated into Web browsers, allow for single sign-on tomultiple Web servers. GSI delegation capabilities wouldpermit a browser client to delegate capabilities to a Webserver so that the server could act on the client’s behalf.These capabilities, in turn, make it much easier to useWeb technologies to build “VO portals” that provide thinclient interfaces to sophisticated VO applications.

214 COMPUTING APPLICATIONS

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

WebOS addresses some of these issues (Vahdat et al.,1998).

7.2 APPLICATION AND

STORAGE SERVICE PROVIDERS

Application service providers, storage service providers,and similar hosting companies typically offer to outsourcespecific business and engineering applications (in the caseof ASPs) and storage capabilities (in the case of SSPs). Acustomer negotiates a service-level agreement that definesaccess to a specific combination of hardware and software.Security tends to be handled by using VPN technology toextend the customer’s intranet to encompass resources op-erated by the ASP or SSP on the customer’s behalf. OtherSSPs offer file-sharing services, in which case access isprovided via HTTP, FTP, or WebDAV with user IDs, pass-words, and access control lists controlling access.

From a VO perspective, these are low-level building-block technologies. VPNs and static configurations makemany VO sharing modalities hard to achieve. For example,the use of VPNs means that it is typically impossible for anASP application to access data located on storage managedby a separate SSP. Similarly, dynamic reconfiguration ofresources within a single ASP or SPP is challenging and, infact, is rarely attempted. The load sharing across providersthat occurs on a routine basis in the electric power industryis unheard of in the hosting industry. A basic problem isthat a VPN is not a VO: it cannot extend dynamically to en-compass other resources and does not provide the remoteresource provider with any control of when and whether toshare its resources.

The integration of Grid technologies into ASPs andSSPs can enable a much richer range of possibilities. Forexample, standard Grid services and protocols can be usedto achieve a decoupling of the hardware and software. Acustomer could negotiate an SLA for particular hardwareresources and then use Grid resource protocols to dynami-cally provision that hardware to run customer-specific ap-plications. Flexible delegation and access control mecha-nisms would allow a customer to grant an applicationrunning on an ASP computer direct, efficient, and secureaccess to data on SSP storage—and/or to couple resourcesfrom multiple ASPs and SSPs with their own resourceswhen required for more complex problems. A single sign-on security infrastructure able to span multiple security do-mains dynamically is, realistically, required to supportsuch scenarios. Grid resource management and accounting/payment protocols that allow for dynamic provisioningand reservation of capabilities (e.g., amount of storage,transfer bandwidth, etc.) are also critical.

ANATOMY OF THE GRID 215

“In brief, current distributed computingapproaches do not provide a generalresource-sharing framework thataddresses VO requirements. Gridtechnologies distinguish themselves byproviding this generic approach toresource sharing.”

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

7.3 ENTERPRISE COMPUTING SYSTEMS

Enterprise development technologies such as CORBA,Enterprise Java Beans, Java 2 Enterprise Edition, andDCOM are all systems designed to enable the construc-tion of distributed applications. They provide standard re-source interfaces, remote invocation mechanisms, andtrading services for discovery and hence make it easy toshare resources within a single organization. However,these mechanisms address none of the specific VO re-quirements listed above. Sharing arrangements are typi-cally relatively static and restricted to occur within a sin-gle organization. The primary form of interaction isclient-server, rather than the coordinated use of multipleresources.

These observations suggest that there should be a rolefor Grid technologies within enterprise computing. Forexample, in the case of CORBA, we could construct anobject request broker (ORB) that uses GSI mechanisms toaddress cross-organizational security issues. We couldimplement a portable object adapter that speaks the Gridresource management protocol to access resources spreadacross a VO. We could construct Grid-enabled namingand trading services that use Grid information serviceprotocols to query information sources distributed acrosslarge VOs. In each case, the use of Grid protocols providesenhanced capability (e.g., interdomain security) and en-ables interoperability with other (non-CORBA) clients.Similar observations can be made about Java and Jini. Forexample, Jini’s protocols and implementation are gearedtoward a small collection of devices. A “Grid Jini” thatemployed Grid protocols and services would allow theuse of Jini abstractions in a large-scale, multi-enterpriseenvironment.

7.4 INTERNET AND

PEER-TO-PEER COMPUTING

Peer-to-peer computing (as implemented, for example, inthe Napster, Gnutella, and Freenet [Clarke et al., 1999]file-sharing systems) and Internet computing (as imple-mented, for example, by the SETI@home, Parabon, andEntropia systems) are an example of the more general(“beyond client-server”) sharing modalities and compu-tational structures that we referred to in our characteriza-tion of VOs. As such, they have much in common withGrid technologies.

In practice, we find that the technical focus of work inthese domains has not overlapped significantly to date.One reason is that peer-to-peer and Internet computingdevelopers have so far focused entirely on vertically inte-

grated (“stovepipe”) solutions, rather than seeking todefine common protocols that would allow for shared in-frastructure and interoperability. (This is, of course, acommon characteristic of new market niches, in whichparticipants still hope for a monopoly.) Another is that theforms of sharing targeted by various applications arequite limited—for example, file sharing with no accesscontrol and computational sharing with a centralizedserver.

As these applications become more sophisticated andthe need for interoperability becomes clearer, we will seea strong convergence of interests between peer-to-peer,Internet, and Grid computing (Foster, 2000). For exam-ple, single sign-on, delegation, and authorization technol-ogies become important when computational and data-sharing services must interoperate and the policiesthat govern access to individual resources become morecomplex.

8 Other Perspectives on Grids

The perspective on Grids and VOs presented in this articleis of course not the only view that can be taken. We sum-marize here—and critique—some alternative perspec-tives (given in italics).

The Grid is a next-generation Internet. “The Grid” isnot an alternative to “the Internet”: it is rather a set of ad-ditional protocols and services that build on Internet pro-tocols and services to support the creation and use of com-putation- and data-enriched environments. Any resourcethat is “on the Grid” is also, by definition, “on the Net.”

The Grid is a source of free cycles. Grid computingdoes not imply unrestricted access to resources. Gridcomputing is about controlled sharing. Resource ownerswill typically want to enforce policies that constrain ac-cess according to group membership, ability to pay, andso forth. Hence, accounting is important, and a Grid ar-chitecture must incorporate resource and collective proto-cols for exchanging usage and cost information, as well asfor exploiting this information when deciding whether toenable sharing.

The Grid requires a distributed operating system. Inthis view (e.g., see Grimshaw and Wulf, 1996), Grid soft-ware should define the operating system services to be in-stalled on every participating system, with these servicesproviding for the Grid what an operating system providesfor a single computer—namely, transparency with re-spect to location, naming, security, and so forth. Put an-other way, this perspective views the role of Grid softwareas defining a virtual machine. However, we feel that this

216 COMPUTING APPLICATIONS

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

perspective is inconsistent with our primary goals of broaddeployment and interoperability. We argue that the appro-priate model is rather the Internet protocol suite, whichprovides largely orthogonal services that address theunique concerns that arise in networked environments. Thetremendous physical and administrative heterogeneitiesencountered in Grid environments mean that the tradi-tional transparencies are unobtainable; on the other hand, itdoes appear feasible to obtain agreement on standard pro-tocols. The architecture proposed here is deliberately openrather than prescriptive: it defines a compact and minimalset of protocols that a resource must speak to be “on theGrid”; beyond this, it seeks only to provide a frameworkwithin which many behaviors can be specified.

The Grid requires new programming models. Pro-gramming in Grid environments introduces challengesthat are not encountered in sequential (or parallel) comput-ers, such as multiple administrative domains, new failuremodes, and large variations in performance. However, weargue that these are incidental, not central, issues and thatthe basic programming problem is not fundamentally dif-ferent. As in other contexts, abstraction and encapsulationcan reduce complexity and improve reliability. But, as inother contexts, it is desirable to allow a wide variety ofhigher level abstractions to be constructed, rather than en-forcing a particular approach. So, for example, a developerwho believes that a universal distributed shared-memorymodel can simplify Grid application development shouldimplement this model in terms of Grid protocols, extend-ing or replacing those protocols only if they prove inade-quate for this purpose. Similarly, a developer who believesthat all Grid resources should be presented to users as ob-jects needs simply to implement an object-oriented API interms of Grid protocols.

The Grid makes high performance computers superflu-ous. The hundreds, thousands, or even millions of proces-sors that may be accessible within a VO represent a signifi-cant source of computational power, if they can beharnessed in a useful fashion. This does not imply, how-ever, that traditional high performance computers are ob-solete. Many problems require tightly coupled computers,with low latencies and high communication bandwidths;Grid computing may well increase, rather than reduce, de-mand for such systems by making access easier.

9 Conclusions

We have provided in this article a concise statement of the“Grid problem,” which we define as controlled and coordi-nated resource sharing and resource use in dynamic, scal-

ANATOMY OF THE GRID 217

“‘The Grid’ is not an alternative to ‘theInternet’: it is rather a set of additionalprotocols and services that build onInternet protocols and services to supportthe creation and use of computation- anddata-enriched environments.”

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

able virtual organizations. We have also presented bothrequirements and a framework for a Grid architecture,identifying the principal functions required to enablesharing within VOs and defining key relationships amongthese different functions. Finally, we have discussed insome detail how Grid technologies relate to other impor-tant technologies.

We hope that the vocabulary and structure introducedin this document will prove useful to the emerging Gridcommunity, by improving understanding of our problemand providing a common language for describing solu-tions. We also hope that our analysis will help establishconnections among Grid developers and proponents of re-lated technologies.

The discussion in this paper also raises a number of im-portant questions. What are appropriate choices for theintergrid protocols that will enable interoperabilityamong Grid systems? What services should be present ina persistent fashion (rather than being duplicated by eachapplication) to create usable Grids? And what are the keyAPIs and SDKs that must be delivered to users to acceler-ate development and deployment of Grid applications?We have our own opinions on these questions, but the an-swers clearly require further research.

APPENDIXDefinitions

We define here four terms that are fundamental to the dis-cussion in this article but are frequently misunderstood andmisused—namely, protocol, service, SDK, and API.

Protocol. A protocol is a set of rules that endpoints in a tele-communication system use when exchanging information—forexample, the following:

• The Internet protocol (IP) defines an unreliable packet trans-fer protocol.

• The transmission control protocol (TCP) builds on IP to de-fine a reliable data delivery protocol.

• The transport layer security (TLS) protocol (Dierks and Al-len, 1999) defines a protocol to provide privacy and data in-tegrity between two communicating applications. It is lay-ered on top of a reliable transport protocol such as TCP.

• The lightweight directory access protocol (LDAP) builds onTCP to define a query-response protocol for querying thestate of a remote database.

An important property of protocols is that they admit to multipleimplementations: two endpoints need only implement the sameprotocol to be able to communicate. Standard protocols are thus

fundamental to achieving interoperability in a distributed com-puting environment.

A protocol definition also says little about the behavior of anentity that speaks the protocol. For example, the FTP protocoldefinition indicates the format of the messages used to negotiatea file transfer but does not make clear how the receiving entityshould manage its files.

As the above examples indicate, protocols may be defined interms of other protocols.

Service. A service is a network-enabled entity that providesa specific capability—for example, the ability to move files, cre-ate processes, or verify access rights. A service is defined interms of the protocol one uses to interact with it and the behaviorexpected in response to various protocol message exchanges(i.e., “service = protocol + behavior”). A service definition maypermit a variety of implementations—for example, the following:

• An FTP server speaks the file transfer protocol and supportsremote read-and-write access to a collection of files. OneFTP server implementation may simply write to and readfrom the server’s local disk, while another may write to andread from a mass storage system, automatically compressingand uncompressing files in the process. From a fabric-levelperspective, the behaviors of these two servers in response toa store request (or retrieve request) are very different. Fromthe perspective of a client of this service, however, the behav-iors are indistinguishable; storing a file and then retrievingthe same file will yield the same results regardless of whichserver implementation is used.

• An LDAP server speaks the LDAP protocol and supports re-sponse to queries. One LDAP server implementation may re-spond to queries using a database of information, while an-other may respond to queries by dynamically making SNMPcalls to generate the necessary information on the fly.

A service may or may not be persistent (i.e., always available),be able to detect and/or recover from certain errors, run withprivileges, and/or have a distributed implementation for en-hanced scalability. If variants are possible, then discovery mech-anisms that allow a client to determine the properties of a partic-ular instantiation of a service are important.

Note also that one can define different services that speak thesame protocol. For example, in the Globus Toolkit, both the rep-lica catalog (Allcock et al., 2001) and information service(Czajkowski et al., 2001) use LDAP.

API. An application program interface (API) defines a stan-dard interface (e.g., set of subroutine calls or objects and methodinvocations in the case of an object-oriented API) for invoking aspecified set of functionality—for example, the following:

• The generic security service (GSS) API (Linn, 2000) definesstandard functions for verifying the identity of communicat-ing parties, encrypting messages, and so forth.

218 COMPUTING APPLICATIONS

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

• The message-passing interface API (Gropp, Lusk, andSkjellum, 1994) defines standard interfaces, in several lan-guages, to functions used to transfer data among processes ina parallel computing system.

An API may define multiple language bindings or use aninterface definition language. The language may be a conven-tional programming language such as C or Java, or it may be ashell interface. In the latter case, the API refers to a particular def-inition of command-line arguments to the program, the input andoutput of the program, and the exit status of the program. An APInormally will specify a standard behavior but can admit to multi-ple implementations.

It is important to understand the relationship between APIsand protocols. A protocol definition says nothing about the APIsthat might be called from within a program to generate protocolmessages. A single protocol may have many APIs; a single APImay have multiple implementations that target different proto-cols. In brief, standard APIs enable portability; standard proto-cols enable interoperability. For example, both public key andKerberos bindings have been defined for the GSS-API (Linn,2000). Hence, a program that uses GSS-API calls for authentica-tion operations can operate in either a public key or a Kerberos en-vironment without change. On the other hand, if we want a pro-gram to operate in a public key and a Kerberos environment at thesame time, then we need a standard protocol that supportsinteroperability of these two environments (see Figure 5).

SDK. The term software development kit (SDK) denotes a setof code designed to be linked with and invoked from within an ap-plication program to provide specified functionality. An SDKtypically implements an API. If an API admits to multiple imple-mentations, then there will be multiple SDKs for that API. SomeSDKs provide access to services via a particular protocol—forexample, the following:

• The OpenLDAP release includes an LDAP client SDK, whichcontains a library of functions that can be used from a C orC++ application to perform queries to an LDAP service.

• JNDI is a Java SDK, which contains functions that can be usedto perform queries to an LDAP service.

• Different SDKs implement GSS-API using the TLS andKerberos protocols, respectively.

There may be multiple SDKs, for example, from multiple ven-dors, which implement a particular protocol. Furthermore, forclient-server-oriented protocols, there may be separate clientSDKs for use by applications that want to access a service andserver SDKs for use by service implementers that want to imple-ment particular, customized service behaviors.

An SDK need not speak any protocol. For example, an SDKthat provides numerical functions may act entirely locally and notneed to speak to any services to perform its operations.

ANATOMY OF THE GRID 219

GSS-API

Kerberos PKIorKerberos PKI

Domain A Domain B

GSP

Fig. 5 On the left, an application programming interface(API) is used to develop applications that can target eitherKerberos or PKI security mechanisms. On the right, proto-cols (the Grid security protocols provided by the GlobusToolkit) are used to enable interoperability betweenKerberos and PKI domains.

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

ACKNOWLEDGMENTS

We are grateful to numerous colleagues for discussions on thetopics covered here—particularly Bill Allcock, Randy Butler,Ann Chervenak, Karl Czajkowski, Steve Fitzgerald, BillJohnston, Miron Livny, Joe Mambretti, Reagan Moore, HarveyNewman, Laura Pearlman, Rick Stevens, Gregor vonLaszewski, Rich Wellner, and Mike Wilde—and participants inthe “Clusters and Computational Grids for Scientific Com-puting” workshop (Lyon, September 2000) and the 4th Grid Fo-rum meeting (Boston, October 2000), at which early versions ofthese ideas were presented.

This work was supported in part by the Mathematical, Infor-mation, and Computational Sciences Division subprogram ofthe Office of Advanced Scientific Computing Research, U.S.Department of Energy, under contract W-31-109-Eng-38; by theDefense Advanced Research Projects Agency under contractN66001-96-C-8523; by the National Science Foundation; andby the NASA Information Power Grid program.

BIOGRAPHIES

Ian Foster is a senior scientist and associate director of theMathematics and Computer Science Division at Argonne Na-tional Laboratory, professor of computer science at the Univer-sity of Chicago, and senior fellow in the Argonne/University ofChicago Computation Institute. He has published four booksand more than 100 papers and technical reports in parallel anddistributed processing, software engineering, and computa-tional science. He currently coleads the Globus project withDr. Carl Kesselman of the University of Southern California/Information Sciences Institute, which was awarded the 1997Global Information Infrastructure “Next Generation” award andprovides protocols and services used by many distributed com-puting projects worldwide. He cofounded the influential GlobalGrid Forum and recently coedited a book on this topic, titled TheGrid: Blueprint for a New Computing Infrastructure.

Carl Kesselman is a senior project leader at the InformationSciences Institute and a research associate professor of com-puter science, both at the University of Southern California. Heis also a visiting associate in computer science at the CaliforniaInstitute of Technology. He received a Ph.D. in computer sci-ence from the University of California, Los Angeles in 1991. Hecurrently coleads the Globus project with Dr. Ian Foster ofArgonne National Laboratory and the University of Chicago,which was awarded the 1997 Global Information Infrastructure“Next Generation” award and provides protocols and servicesused by many distributed computing projects worldwide. He re-cently coedited a book on this topic, titled The Grid: Blueprintfor a New Computing Infrastructure.

Steven Tuecke is a software architect in the Distributed Sys-tems Laboratory in the Mathematics and Computer Science Di-vision at Argonne National Laboratory and a fellow with theComputation Institute at the University of Chicago. He plays a

leadership role in many of Argonne’s research and developmentprojects in the area of high performance, distributed, and Gridcomputing and directs the efforts of both Argonne staff and col-laborators in the implementation of the Globus Toolkit. He isalso the cochair of the Grid Forum Security Working Group. Hereceived a B.A. in mathematics and computer science from St.Olaf College.

REFERENCES

Abramson, D., Sosic, R., Giddy, J., and Hall, B. 1995. Nimrod:A tool for performing parameterized simulations using dis-tributed workstations. In Proc. 4th IEEE Symp. on High Per-formance Distributed Computing.

Aiken, R., Carey, M., Carpenter, B., Foster, I., Lynch, C.,Mambretti, J., Moore, R., Strasnner, J., and Teitelbaum, B.2000. Network policy and services: A report of a workshopon middleware [Online]. Available: http://www.ietf.org/rfc/rfc2768.txt.

Allcock, B., Bester, J., Bresnahan, J., Chervenak, A. L., Foster,I., Kesselman, C., Meder, S., Nefedova, V., Quesnel, D., andTuecke, S. 2001. Secure, efficient data transport and replicamanagement for high-performance data-intensive comput-ing. In Mass Storage Conference.

Armstrong, R., Gannon, D., Geist, A., Keahey, K., Kohn, S.,McInnes, L., and Parker, S. 1999. Toward a common compo-nent architecture for high performance scientific computing.In Proc. 8th IEEE Symp. on High Performance DistributedComputing.

Arnold, K., O’Sullivan, B., Scheifler, R. W., Waldo, J., andWollrath, A. 1999. The Jini Specification. Boston: Addison-Wesley.

Baker, F. 1995. Requirements for IP Version 4 Routers, IETF,RFC 1812 [Online]. Available: http://www.ietf.org/rfc/rfc1812.txt.

Barry, J., Aparicio, M., Durniak, T., Herman, P., Karuturi, J.,Woods, C., Gilman, C., Ramnath, R., and Lam, H. 1998.NIIIP-SMART: An investigation of distributed object ap-proaches to support MES development and deployment in avirtual enterprise. In 2nd International Enterprise Distrib-uted Computing Workshop.

Baru, C., Moore, R., Rajasekar, A., and Wan, M. 1998. TheSDSC storage resource broker. In Proc. CASCON ’98 Con-ference.

Beiriger, J., Johnson, W., Bivens, H., Humphreys, S., and Rhea,R. 2000. Constructing the ASCI grid. In Proc. 9th IEEE Sym-posium on High Performance Distributed Computing.

Benger, W., Foster, I., Novotny, J., Seidel, E., Shalf, J., Smith,W., and Walker, P. 1999. Numerical relativity in a distributedenvironment. In Proc. 9th SIAM Conference on Parallel Pro-cessing for Scientific Computing.

Berman, F. 1998. High-performance schedulers. In The Grid:Blueprint for a New Computing Infrastructure, edited by I.Foster and C. Kesselman, 279-309. San Francisco: MorganKaufmann.

220 COMPUTING APPLICATIONS

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

Berman, F., Wolski, R., Figueira, S., Schopf, J., and Shao, G.1996. Application-level scheduling on distributed heteroge-neous networks. In Proc. Supercomputing ’96.

Beynon, M., Ferreira, R., Kurc, T., Sussman, A., and Saltz, J.2000. DataCutter: Middleware for filtering very large scien-tific datasets on archival storage systems. In Proc. 8thGoddard Conference on Mass Storage Systems and Technol-ogies/17th IEEE Symposium on Mass Storage Systems,pp. 119-133.

Bolcer, G. A., and Kaiser, G. 1999. SWAP: Leveraging the Webto manage workflow. IEEE Internet Computing 3 (1):85-88.

Brunett, S., Czajkowski, K., Fitzgerald, S., Foster, I., Johnson,A., Kesselman, C., Leigh, J., and Tuecke, S. 1998. Applica-tion experiences with the Globus Toolkit. In Proc. 7th IEEESymp. on High Performance Distributed Computing,pp. 81-89.

Butler, R., Engert, D., Foster, I., Kesselman, C., Tuecke, S.,Volmer, J., and Welch, V. 2000. Design and deployment of anational-scale authentication infrastructure. IEEE Computer33 (12): 60-66.

Camarinha-Matos, L. M., Afsarmanesh, H., Garita, C., andLima, C. 1998. Towards an architecture for virtual enter-prises. J. Intelligent Manufacturing 9 (2): 189-199.

Casanova, H., and Dongarra, J. 1997. NetSolve: A networkserver for solving computational science problems. Interna-tional Journal of Supercomputer Applications and High Per-formance Computing 11 (3): 212-223.

Casanova, H., Dongarra, J., Johnson, C., and Miller, M. 1998.Application-specific tools. In The Grid: Blueprint for a NewComputing Infrastructure, edited by I. Foster and C.Kesselman, 159-80. San Francisco: Morgan Kaufmann.

Casanova, H., Obertelli, G., Berman, F., and Wolski, R. 2000.The AppLeS Parameter Sweep Template: User-levelmiddleware for the Grid. In Proc. SC ’2000.

Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., andTuecke, S. 2001. The Data Grid: Towards an architecture forthe distributed management and analysis of large scientificdata sets. J. Network and Computer Applications 23:187-200.

Childers, L., Disz, T., Olson, R., Papka, M. E., Stevens, R. andUdeshi, T. 2000. Access Grid: Immersive group-to-groupcollaborative visualization. In Proc. 4th InternationalImmersive Projection Technology Workshop.

Clarke, I., Sandberg, O., Wiley, B., and Hong, T. W. 1999.Freenet: A distributed anonymous information storage andretrieval system. In ICSI Workshop on Design Issues in Ano-nymity and Unobservability.

Czajkowski, K., Fitzgerald, S., Foster, I., and Kesselman, C.2001. Grid information services for distributed resourcesharing. In Proceedings of the 10th IEEE International Sym-posium on High-Performance Distributed Computing,pp. 181-184.

Czajkowski, K., Foster, I., Karonis, N., Kesselman, C., Martin,S., Smith, W., and Tuecke, S. 1998. A resource managementarchitecture for metacomputing systems. In The 4th Work-

shop on Job Scheduling Strategies for Parallel Processing,pp. 62-82.

Czajkowski, K., Foster, I., and Kesselman, C. 1999. Co-allocationservices for computational grids. In Proc. 8th IEEE Sympo-sium on High Performance Distributed Computing.

DeFanti, T., and Stevens, R. 1998. Teleimmersion. In The Grid:Blueprint for a New Computing Infrastructure, edited by I.Foster and C. Kesselman, 131-155. San Francisco: MorganKaufmann.

Dierks, T., and Allen, C. 1999. The TLS Protocol Version 1.0,IETF, RFC 2246 [Online]. Available: http://www.ietf.org/rfc/rfc2246.txt.

Dinda, P., and O’Hallaron, D. 1999. An evaluation of linearmodels for host load prediction. In Proc. 8th IEEE Sympo-sium on High-Performance Distributed Computing.

Foster, I. 2000. Internet computing and the emerging Grid. InNature Web Matters [Online]. Available: http://www.nature.com/nature/webmatters/grid/grid.html.

Foster, I., and Karonis, N. 1998. A Grid-enabled MPI: Messagepassing in heterogeneous distributed computing systems. InProc. SC ’98.

Foster, I., and Kesselman, C., eds. 1998a. The Grid: Blueprintfor a New Computing Infrastructure. San Francisco: MorganKaufmann.

Foster, I., and Kesselman, C. 1998b. The Globus Project: A sta-tus report. In Proc. Heterogeneous Computing Workshop,pp. 4-18.

Foster, I., Kesselman, C., Tsudik, G., and Tuecke, S. 1998. A se-curity architecture for computational grids. In ACM Confer-ence on Computers and Security, pp. 83-91.

Foster, I., Roy, A., and Sander, V. 2000. A quality of service ar-chitecture that combines resource reservation and applica-tion adaptation. In Proc. 8th International Workshop onQuality of Service.

Frey, J., Foster, I., Livny, M., Tannenbaum, T., and Tuecke, S.2001. Condor-G: A Computation Management Agent forMulti-Institutional Grids. Madison: University ofWisconsin–Madison.

Gabriel, E., Resch, M., Beisel, T., and Keller, R. 1998. Distrib-uted computing in a heterogeneous computing environment.In Proc. EuroPVM/MPI ’98.

Gannon, D., and Grimshaw, A. 1998. Object-based approaches.In The Grid: Blueprint for a New Computing Infrastructure,edited by I. Foster and C. Kesselman, 205-236. San Fran-cisco: Morgan Kaufmann.

Gasser, M., and McDermott, E. 1990. An architecture for practi-cal delegation in a distributed system. In Proc. 1990 IEEESymposium on Research in Security and Privacy, pp. 20-30.

Goux, J. -P., Kulkarni, S., Linderoth, J., and Yoder, M. 2000. Anenabling framework for master-worker applications on thecomputational grid. In Proc. 9th IEEE Symp. on High Perfor-mance Distributed Computing.

Grimshaw, A., and Wulf, W. 1996. Legion: A view from 50,000feet. In Proc. 5th IEEE Symposium on High PerformanceDistributed Computing, pp. 89-99.

ANATOMY OF THE GRID 221

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from

Gropp, W., Lusk, E., and Skjellum, A. 1994. Using MPI: Porta-ble Parallel Programming with the Message Passing Inter-face. Cambridge, MA: MIT Press.

Hoschek, W., Jaen-Martinez, J., Samar, A., Stockinger, H., andStockinger, K. 2000. Data management in an internationaldata Grid project. In Proc. 1st IEEE/ACM InternationalWorkshop on Grid Computing.

Howell, J., and Kotz, D. 2000. End-to-end authorization. InProc. 2000 Symposium on Operating Systems Design andImplementation.

Johnston, W. E., Gannon, D., and Nitzberg, B. 1999. Grids asproduction computing environments: The engineering as-pects of NASA’s information power grid. In Proc. 8th IEEESymposium on High Performance Distributed Computing.

Leigh, J., Johnson, A., and DeFanti, T. A. 1997. CAVERN: Adistributed architecture for supporting scalable persistenceand interoperability in collaborative virtual environments.Virtual Reality: Research, Development and Applications 2(2): 217-237.

Linn, J. 2000. Generic Security Service Application Program In-terface Version 2, Update 1, IETF, RFC 2743 [Online].Available: http://www.ietf.org/rfc/rfc2743.

Litzkow, M., Livny, M., and Mutka, M. 1988. Condor: A hunterof idle workstations. In Proc. 8th Intl Conf. on DistributedComputing Systems, pp. 104-111.

Livny, M. 1998. High-throughput resource management. In TheGrid: Blueprint for a New Computing Infrastructure, editedby I. Foster and C. Kesselman, 311-337. San Francisco: Mor-gan Kaufmann.

Lopez, I., Follen, G., Gutierrez, R., Foster, I., Ginsburg, B.,Larsson, O., Martin, S., and Tuecke, S. 2000. NPSS onNASA’s IPG: Using CORBA and Globus to coordinatemultidisciplinary aeroscience applications. In Proc. NASAHPCC/CAS Workshop.

Lowekamp, B., Miller, N., Sutherland, D., Gross, T., Steenkiste, P.,and Subhlok, J. 1998. A resource query interface fornetwork-aware applications. In Proc. 7th IEEE Symposiumon High-Performance Distributed Computing.

Moore, R., Baru, C., Marciano, R., Rajasekar, A., and Wan, M.1998. Data-intensive computing. In The Grid: Blueprint for

a New Computing Infrastructure, edited by I. Foster andC. Kesselman, 105-129. San Francisco: Morgan Kaufmann.

Nakada, H., Sato, M., and Sekiguchi, S. 1999. Design and im-plementations of Ninf: Towards a global computing infra-structure. Future Generation Computing Systems 15 (5-6):649-658.

Novotny, J., Tuecke, S., and Welch, V. 2001. Initial experienceswith an online certificate repository for the Grid: MyProxy.In Proceedings of the 10th IEEE International Symposiumon High-Performance Distributed Computing, pp. 104-111.

Papakhian, M. 1998. Comparing job-management systems: Theuser’s perspective. IEEE Computational Science & Engi-neering, April-June [Online]. Available: http://pbs.mrj.com.

Realizing the Information Future: The Internet and Beyond.1994. Washington, D.C.: National Academy Press.

Sculley, A., and Woods, W. 2000. B2B exchanges: The KillerApplication in the Business-to-Business Internet Revolution.New York: ISI Publications.

Steiner, J., Neuman, B. C., and Schiller, J. 1988. Kerberos: Anauthentication system for open network systems. In Proc.Usenix Conference, pp. 191-202.

Stevens, R., Woodward, P., DeFanti, T., and Catlett, C. 1997.From the I-WAY to the national technology Grid. Communi-cations of the ACM 40 (11): 50-61.

Thompson, M., Johnston, W., Mudumbai, S., Hoo, G., Jackson,K., and Essiari, A. 1999. Certificate-based access control forwidely distributed resources. In Proc. 8th Usenix SecuritySymposium.

Tierney, B., Johnston, W., Lee, J., and Hoo, G. 1996. Perfor-mance analysis in high-speed wide area IP over ATM net-works: Top-to-bottom end-to-end monitoring. IEEE Net-work 10 (3).

Vahdat, A., Belani, E., Eastham, P., Yoshikawa, C., Anderson,T., Culler, D., and Dahlin, M. 1998. WebOS: Operating sys-tem services for wide area applications. In 7th Symposium onHigh Performance Distributed Computing.

Wolski, R. 1997. Forecasting network performance to supportdynamic scheduling using the network weather service. InProc. 6th IEEE Symp. on High Performance DistributedComputing, Portland, OR.

222 COMPUTING APPLICATIONS

© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://hpc.sagepub.comDownloaded from