The MOL project: an open, extensible metacomputer

15

Transcript of The MOL project: an open, extensible metacomputer

In: Procs. Sixth Heterogeneous Computing Workshop (HCW'97), Geneva, pp. 17{31The MOL Project: An Open, Extensible Metacomputer�A. Reinefeld, R. Baragliay, T. Decker, J. Gehring,D. Laforenzay, F. Ramme, T. R�omke, J. SimonPC2 { Paderborn Center for Parallel Computing, Germanyy CNUCE { Institute of the Italian National Research CouncilAbstractDistributed high-performance computing|so-calledmetacomputing|refers to the coordinated use of a poolof geographically distributed high-performance comput-ers. The user's view of an ideal metacomputer is thatof a powerful monolithic virtual machine. The imple-mentor's view, on the other hand, is that of a varietyof interacting services implemented in a scalable andextensible manner.In this paper, we present MOL, the MetacomputerOnline environment. In contrast to other metcomput-ing environments, MOL is not based on speci�c pro-gramming models or tools. It has rather been designedas an open, extensible software system comprising avariety of software modules, each of them specialized inserving one speci�c task such as resource scheduling,job control, task communication, task migration, userinterface, and much more. All of these modules existand are working. The main challenge in the design ofMOL lies in the speci�cation of suitable, generic in-terfaces for the e�ective interaction between the mod-ules.1 Metacomputing\Eventually, users will be unaware they are usingany computer but the one on their desk, because it willhave the capabilities to reach out across the nationalnetwork and obtain whatever computational resourcesare necessary" [41]. This vision, published by LarrySmarr and Charles Catlett in their seminal CACM ar-ticle on metacomputing, sets high expectations: \Themetacomputer is a network of heterogeneous, compu-tational resources linked by software in such a way thatthey can be used as easily as a personal computer."�This work is partly supported by the EU ESPRITLong Term Research Project 20244 (ALCOM-IT) and bythe Northrhine Westphalian Initiative \Metacomputing: Dis-tributed Supercomputing"

The advantages of metacomputing are obvious:Metacomputers provide true supercomputing powerat little extra cost, they allow better utilization ofthe available high-performance computers, and theycan be exibly upgraded to include the latest tech-nology. It seems, however, that up to now no sys-tem has been built that rightfully deserves the name`metacomputer' in the above sense. From the user'spoint of view, the main obstacles are seen at the sys-tem software level, where non-standardized resourceaccess environments and incompatible programmingmodels make it di�cult for non-experts to exploit theavailable heterogeneous systems. Many obstacles inthe cooperative use of distributed computing systemscan be overcome by providing a homogeneous, user-friendly access to a reliable and secure virtual meta-computing environment that is used in a similar wayas a conventional monolithic computer today.Some of these issues are addressed by \Metacom-puter Online (MOL)", an initiative that has beenfounded with the goal to design and implement the nu-cleus of a practical distributed metacomputer. MOLis part of the Northrhine-Westphalian MetacomputerInititiative that has been established in 1995, and it isembedded in several other European initiatives [31].The MOL group has implemented a �rst, incom-plete `condensation point' of a practical metacom-puter, which is now being extended and improved.Clearly, we could not tackle all relevant obstacles atthe same time. We initially concentrated on the fol-lowing issues that are deemed most important in thedesign of a �rst prototype:� provision of a generic, user-friendly resource ac-cess interface,� support of interoperability of existing codes usingdi�erent message passing standards,

� e�ective global scheduling of the subsystems forhigh throughput and reduced waiting times,� support of concurrent use in interactive and batchmode,� mechanisms for automatic remote source codecompilation and transparent data distribution,� automatic selection of adequate compute nodesfrom a pool of resources to be assigned to thetasks of a parallel application,� dynamic re-placement of user tasks by means ofperformance prediction of the source code on theheterogeneous nodes,� provision of data management libraries and pro-gramming frames to help non-expert users in pro-gram design and optimal system utilization.Clearly, this list is not complete. Further items canand should be added for a full metacomputing environ-ment. Depending on their attitude and current task,users might have di�erent expectations on the servicesa metacomputer should provide. As a consequence, ametacomputer cannot be regarded as a closed entity,but it is rather a highly dynamical software systemthat needs continuous adaptation to the current userdemands.The MOL project aims at integrating existingsoftware modules in an open, extensible environ-ment. Our current implementation supports PVM,MPI and PARIX applications running on LAN- orWAN-connected high-performance computers, such asParsytec GC, Intel Paragon, IBM SP2, and UNIXworkstation clusters.This paper presents the current status of MOL. Itgives an overview about the system architecture andit illustrates how the separate modules interact witheach other. Section 2 reviews related work which in- uenced the design of MOL or which can be integratedinto MOL at a later time. Section 3 gives a concep-tual view and discusses the general MOL architecture.In Section 4, we elaborate on the interface design andshow how the MOL components interact with eachother. Section 5 to 7 describe the MOL componentsin more detail, and Section 8 gives a brief summaryand outlook.2 Previous WorkIn the past few years, several research projects havebeen initiated with the goal to design a metacomputer

environment. Some of them follow the top-down ap-proach, starting with a concrete metacomputing con-cept in mind. While this approach seems compelling,it usually results in a \closed world metacomputer"that provides a �xed set of services for a well-de�neduser community.The second category of projects follow the bottom-up approach, initially focusing on some selected as-pects which are subsequently extended towards a fullmetacomputing environment. In the following, we re-view some projects falling into this category.Parallel programming models have been a popularstarting point. Many research projects targeted at ex-tending existing programming models towards a fullmetacomputing environment. One such example isthe Local Area Multicomputer LAM developed at theOhio Supercomputer Center [13]. LAM provides a sys-tem development and execution environment for het-erogeneous networked computers based on the MPIstandard. This has the advantage that LAM applica-tions are source code portable to other MPI systems.The wide-area metacomputer managerWAMM devel-oped at CNUCE, Italy, is a similar approach basedon PVM [5, 6]. Here, the PVM programming envi-ronment has been extended by mechanisms for par-allel task control, remote compilation and a graphicaluser interface. Later, WAMM has been integrated intoMOL and extended to the support of other program-ming environments, as shown below.Object oriented languages have also been proposedas a means to alleviate the di�culties of developingarchitecture independent parallel applications. Charm[26], Trapper [38], Legion andMentat [25] for example,support the development of portable applications byobject oriented parallel programming environments.In a similar approach, the Dome project [2] at CMUuses a C++ object library to facilitate parallel pro-gramming in a heterogeneous multi-user environment.Note, however, that such systems can hardly be usedin an industrial setting, where large existing codes(usually written in Fortran) are to be executed on par-allel environments.Projects originating in the management of work-station clusters usually emphasize on topics such asresource management, task mapping, checkpointingand migration. Existing workstation cluster manage-ment systems likeCondor, Codine, or LSF are adaptedfor managing large, geographically distributed ` ocks'of clusters. This approach is taken by the IowaState University project Batrun [42], the Yale Uni-versity Piranha project [14], and the Dutch Polderinitiative [34], both emphasizing on the utilization of

idle workstations for large-scale computing and high-throughput computing. Complex simulation applica-tions, such as air pollution and laser atom simulationshave been run in the Nimrod project [1] that supportsmultiple executions of the same sequential task withdi�erent parameter sets.These projects could bene�t by the use of specialmetacomputing schedulers, such as the Application-Level Scheduler AppLeS developed at the Universityof California in San Diego [9]. Here, each applica-tion has its own AppLeS scheduler to determine aperformance-e�cient schedule and to implement thatschedule in coordination with the underlying local re-source scheduler.Networking is also an important issue, giving impe-tus to still another class of research projects. I-WAY[22], as an example, is a large-scale wide area com-puting testbed connecting several U.S. supercomput-ing sites with more than a hundred users. Aspectsof security, usability and network protocol are amongthe primary research issues. Distributed resource bro-kerage is currently investigated in the follow-up I-Softproject. In a bilateral project, Sandia's massively par-allel Intel Paragon is linked with the Paragons locatedat Oak Ridge National Laboratories using GigaNetATM technology. While also termed a metacomput-ing environment, the consortium initially targets atrunning speci�c multi-site applications (e.g., climatemodel) in distributed mode.The well-known Berkeley NOW project [33] empha-sizes on using networks of workstations mainly for im-proving virtual memory and �le system performanceby using the aggregate main memory of the networkas a giant cache. Moreover, highly available and scal-able �le storage is provided by using the redundantarrays of workstation disks, and|of course|the mul-tiple CPUs are used for parallel computing.3 MOL ArchitectureWe regard a metacomputer as a dynamic entity ofinteracting modules. Consequently, interface speci�-cations play a central role in the MOL architecture,as illustrated in Figure 1. There exist three generalmodule classes:1. programming environments,2. resource management & access systems,3. supporting tools.(1) Programming environments provide mecha-nisms for communicating between threads and theyallow the use of speci�c system resources. MPI and

GenericUser Interface

Abstr

act In

terf

ace L

ayer

Abstra

ct In

terfa

ce L

ayer

WARP MARS WAMM DAI SY FRAMES

Interfaces

MPI

PARIX

PVM

CCS

NQS andrelated

CODINE

Resourc

e M

anagem

ent S

yste

ms

Supporting Tools

Pro

gra

mm

ing E

nviro

nm

ents

HPF

Figure 1: MOL architecturePVM, for example, are popular programming environ-ments supported by MOL. More important, MOL alsosupports the communication and process managementof heterogeneous applications comprising software us-ing di�erent programming environments. The MOLlibrary \PLUS" makes it possible, for example, to es-tablish communication links between MPI- and PVM-code that are part of one large application. PLUS isalmost transparent to the application, requiring onlya few modi�cations in the source code (see Sec. 5).(2) Resource management & access systems provideservices for starting an application and for controllingits execution until completion. Currently there existsa large number of di�erent resource management sys-tems, each with its speci�c advantage [3]. Codine, forexample, is well suited for managing (parallel) jobs onworkstation clusters, while NQS is specialized on serv-ing HPC systems in batch mode. Because the speci�cadvantages of these systems supplement each other,we have developed an abstract interface layer with ahigh-level resource description language that providesa uni�ed access to several (possibly overlapping) re-source management systems.(3) Supporting tools provide optional services tometacomputer applications. Currently, the MOLtoolset includes software modules for dynamic task mi-gration (MARS), for easy program development by us-ing programming templates (FRAMES), and also datalibraries for virtual shared memory and load balanc-ing (DAISY). The wide-area metacomputer managerWAMM plays a special role in MOL: On the one hand,it may be used just as a resource management systemfor assigning tasks to resources, and on the other hand,

it also provides automatic source code compilationandcleanup after task execution.4 Interface LayersAs described, MOL facilitates the interaction be-tween di�erent resource management systems and dif-ferent programming environments. However, this isnot enough: A metacomputer application may also re-quire some interaction between these two groups. Asan example, consider a PVM application running ona workstation cluster that is about to spawn a pro-cess on a parallel computer. For this purpose, theapplication makes a call to the parallel system's man-agement software in order to allocate the necessaryresources. Likewise, a metacomputer application mayneed a mechanism to inform the communication envi-ronment about the temporary location of the partici-pating processes.In addition, tools are needed for e�cient task map-ping, load balancing, performance prediction, programdevelopment, etc. There is a wealth of supportingtools available to cover most aspects of metacomput-ing. However, as these tools are typically very com-plex, it is hard to adapt their interfaces for our speci�cneeds. In the MOL framework shown in Figure 1,the supporting tools only need to interact with twoabstract interface layers rather than with all systemsbelow. Thus, new software must be integrated onlyonce, and updates to new releases a�ect only the toolin question.4.1 MOL User InterfaceMetacomputing services will be typically accessedvia Internet or Intranet world-wide-web browsers, suchas the Netscape Navigator or the Microsoft Explorer.MOL provides a generic graphical user interface (GUI)based on HTML and Java for interactive use and forsubmitting batch jobs. At the top level, the GUI dis-plays general information on status and availability ofthe system services. Users may select from a num-ber of alternatives by push-down buttons. In doingso, context sensitive windows are opened to ask forcontext speci�c parameters required by the requestedservices. Figure 2 shows the MOL window used forsubmitting interactive or batch jobs.This concept is called `interface hosting': Thegeneric interface acts as a host for the sub-interfacesof the underlying modules. Interface hosting givesthe user control over the complete system without theneed to change the environment when the user wantsto deal with another feature or service of the meta-computer.Figure 2: Prototype GUI of Paragon/Parsytec meta-computerThe di�erent skills of users are met by a Java-basedgraphical interface and a shell-based command ori-ented interface. In addition, an API library allowsapplications to directly interact with the metacom-puter services. All three interfaces,� the graphical user interface (Java)� the shell-based command line interface� the API libraryprovide the same functions.Figure 2 illustrates the graphical user interface usedto submit a job to a Parsytec/GC and a Paragon lo-cated in Paderborn and J�ulich. The two machinesare installed 300 kilometers apart. They are intercon-nected via the 34Mbps German Research WAN. Notethat the same interface is used for submitting batchjobs as well as for starting interactive sessions.A later version of the interface will include a grapheditor, allowing the user to describe more complex in-terconnection topologies and hierarchical structures.

TERMINAL

user0

1 2 3 4

DISK

post

0

1

0

7

8

15 63

56

MPC 601

compute

1

2

MC_appl

MPC601Figure 3: Simple heterogeneous computing exampleWhen the actual system con�guration has been deter-mined by the resource scheduler, the resulting topol-ogy will be presented to the user by a graph-drawingtool [32].4.2 A MOL User SessionIn this section, we describe the speci�cation and ex-ecution of a example user application that is executedin the MOL environment. Suppose that a large gridstructured application shall be run on a massively par-allel system with 64 PowerPC 601 nodes, each of themhaving at least 16 MB of main memory. As shownin Figure 3, the result of this computation shall bepost-processed by another application with a pipelinestructure. Depending on the available resources, theresource management system may decide to collapsethe pipeline structure to be run on a single proces-sor, or to run the grid and the pipeline on the samemachine. For interactive control, the user terminal islinked to the �rst node of the grid part of the appli-cation. For the concrete resource description used tospecify this metacomputer scenario see Figure 6 below.Within MOL, such resource speci�cations are gen-erated by tools that are integrated into the GUI. Whensubmitting a resource request, the request data istranslated into the internal format used by the ab-stract resource management (RM) interface layer. Theresult is then handed over to the scheduler and con�g-urator. The RM interface layer schedules this requestand chooses a time slot when all resources are avail-able. It contacts the target management systems andtakes care that all three programs will be started atthe same time. Furthermore, it generates a descrip-tion �le informing the abstract programming environ-ment interface where the programs will be run andwhich addresses to be used for establishing the �rstcommunication links. At program run time, the pro-gramming environment interface layer establishes the

interconnection between the application domains andtakes care of necessary data conversion.Though this example is not yet completely realized,it gives an idea on how the modules would interactwith each other by means of abstract layers and howthe results are presented to the user by the genericinterface.5 Programming Environments LayerIn contrast to many other heterogeneous comput-ing environments, MOL is not restricted to a spe-ci�c programming environment. Applications may useany programming model supported by the underly-ing hardware platforms. This could be either a ven-dor supplied library or a standard programmingmodelsuch as PVM and MPI.However, many programming models are homo-geneous, that is, applications can only be executedon the system architecture they have been compiled(linked) for. While there also exist some heteroge-neous programming libaries, they usually incur signif-icant performance losses due to the high level of ab-straction. Portable environments are typically imple-mented on top of the vendor's libraries, making theman order of magnitude slower than the vendor suppliedenvironments. Clearly, users are not willing to acceptany slow-downs in their applications, just because theapplication might need some infrequent communica-tion with some remote system. This problem is ad-dressed by PLUS.5.1 PLUS { A Linkage Between Pro-gramming ModelsPLUS stands for Programming EnvironmentLinkage by Universal Software Interfaces [12]. It pro-vides a lightweight interface between di�erent pro-gramming environments which are only used for com-municating with remote hosts. The e�ciency of thenative environment remains untouched.The interfaces are embedded into the various envi-ronments. A PVM application, for example, uses or-dinary PVM-IDs for addressing non-PVM processes.Consequently, a PARIX process reachable via PLUSis represented within a PVM application by an ordi-nary PVM-ID. Of course, PLUS takes care that these`pseudo PVM-IDs' do not con ict with the real PVM-IDs.Furthermore, PLUS allows to create processes byoverlaying existing routines (PVM) or by extendingthe available function set with its own procedure (inthe case of MPI). Thereby, PLUS adds a dynamic pro-cess model to programming environments like MPI1,

libplus.alibpvm.a libplus.a libmpi.a

PVM Workstation Cluster

User Program

Homogeneous Task IDs Homogeneous Task IDs

User Program

PARIX MPP System

PVM-internal communication

PLUS communicationPLUS communication

PARIX-internal communicationCooperating

PLUS ServersFigure 4: PLUS architecturewhich do not provide such facilities. PLUS has no in-ternal resource management functionality, but it for-wards the requests to the abstract resource manage-ment layer, which o�ers more powerful methods thancould be integrated into PLUS.With this concept, PLUS allows program develop-ers to integrate existing codes into the metacomputerenvironment, thereby lowering barriers in the use ofmetacomputers.PLUS Architecture. Fig. 4 depicts the architec-ture of the abstract interface layer for programmingenvironments developed in the PLUS project. Notethat the regular PVM communication is not a�ectedby PLUS. Only when accessing a PVM-ID that actu-ally represents an external process (a PARIX processin this example), the corresponding PLUS routine isinvoked. This routine performs the requested commu-nication via the fastest available protocol between thePVM cluster and the PARIX system. Usually, com-munication will be done via UDP.PLUS libraries are linked to each other via one ormore PLUS server. Since most of the PLUS code iscontained in these servers, the link library could bekept rather small. The servers manage the translationof di�erent data representations, the message routingalong the fastest network links, the dynamic creationof new processes, and much more. The number of ac-tive PLUS servers and their con�guration may changedynamically at run time according to the needs of theuser's application.PLUS Performance. On a 34 Mb/s WAN inter-connection, the PLUS communication has been shownto be even faster than TCP [12]. This is because theUDP based communication protocol of PLUS buildsa virtual channel on which the messages are multi-plexed. A sliding window technique allows to assem-ble and re-order the acknowledgements in the correctsequence.

0,0

0,5

1,0

1,5

2,0

2,5

3,0

3,5

0 1 2 4 8 16 32

Packetsize in KByte

Th

rou

gh

pu

t in

Mb

it/s

PVM

PLUSFigure 5: PLUS versus PVM on a 10Mb/s LANFigure 5 shows a less favorable case where the PLUScommunication is outperformed by PVM. A parallelParsytec CC system under AIX has been intercon-nected to a SUN-20 via 10Mb/s Ethernet. In bothruns, the same PVM code has been used: once com-municating via the internal PVM protocol, and theother time via the PLUS protocol. Here, PLUS isabout 10 to 20% slower, because the PLUS communi-cation needs two Internet hops, one hop from the CCto the PLUS daemon, and another from the daemonto the target SUN. The PVM tasks, in contrast, needonly one Internet hop for communicating between thetwo corresponding PVM daemons. Moreover, sincethe PLUS communication libraries are designed in anopen and extensible way, they do not contain routinginformation.Note, that the example shows a worst case, becausePLUS would only be used to communicate betweendi�erent programming environments. Any internalcommunication, e.g. within PVM, is not a�ected byPLUS.Extending PLUS. The current version of PLUSsupports PVM, MPI and PARIX, where the latter isavailable on Parsytec systems only.The PLUS library consists of two parts: a generic,protocol-independent module, and a translation mod-ule. The translation module contains a small set ofabstract communication routines for the speci�c pro-tocols. These routines are quite small, typically com-prising less then one hundred lines of code.New programming environments can be integratedto PLUS by implementing a new translation mod-ule. This allows applications to communicate withany other programming model for which translationmodules are available in PLUS.

6 Resource Management LayerResource management is a central task in meta-computing environments, permitting oversubscribedresources to be fairly and e�ciently shared. Histor-ically, supercomputer centers have been using batchqueuing systems such as NQS [28] for managing theirmachines and for scheduling the available computingtime. Distributed memory systems employ more com-plex resource scheduling mechanisms, because a largenumber of constraints must be considered in the exe-cution of parallel applications. As an example, inter-active applications may compete with batch jobs, andrequests for the use of non-timeshared components orfor special I/O facilities may delay other jobs.On a metacomputer, the scheduler is responsiblefor assigning appropriate resources to parallel applica-tions. Scheduling is done by the resource managementsystem that allocates a particular CPU at a particu-lar time instance for a given application, and by theoperating system running on a certain compute node.In parallel environments, we distinguish two levels ofscheduling:� At the task level, the scheduling of tasks andthreads (possibly of di�erent jobs) is performedby the operating system on the single MPP nodes.The scheduling strategy may be uncoordinated,as in the traditional time-sharing systems, or itcan be synchronized, as in gang-scheduling. Thelatter, however, is only available on a few parallelsystems.� At the job level, the scheduling is usually ar-chitecture independent. A request may specifythe CPU type, memory requirements, I/O fa-cilities, software requirements, special intercon-nection structures, required occupation time andsome attributes for distinguishing batch from in-teractive jobs. Here, the request scheduler is re-sponsible for re-ordering and assigning the sub-mitted requests to the most appropriate ma-chines.With few exceptions [9], the existing schedulerswere originally designed for scheduling serial jobs[27, 3]. As a consequence, many of them do not obeythe 'concept' of parallel programs [37]. Typically, astater (master) process is launched that is responsiblefor starting the rest of the parallel program. Thus,the management has limited knowledge about the dis-tributed structure of the application. When the mas-ter process dies unexpectedly (or has been exemptedbecause the granted time is expired) the rest of the

parallel program is still existent. Such orphaned pro-cesses can cause serious server problems, and in case ofMPP systems (like the SP2) they can even lock partsof the machine.With its open architecture, MOL is not restrictedto the usage of speci�c management systems. Rather,we distinguish three types of management systems,each of them specialized to a certain administrationstrategy and usage pro�le:The �rst group of applications comprise sequentialand client-server programs. The management softwarepackages of this group have emerged from the tradi-tional vector-computing domain. Examples are Con-dor, Codine, DQS and LSF.The second group contains resource managementsystems tailored to the needs of parallel applications.Some of them have their roots in the NQS develop-ment (like PBS), while others were developed to over-come problems with existing vendor solutions (likeEASY) or do focus on the transparent access to par-titionable MPP systems (like CCS described below).The last group consists of resource managementsystems for multi-site applications, exploiting thewhole power of a metacomputing environment. Cur-rently, only few multi-site applications exists, and thedevelopment of corresponding resource managementsystems is still in its infancy. But with increasing net-work performance such applications and their need fora uniform and optimizingmanagement layer is gainingimportance.6.1 Computing Center Software CCSCCS [15] provides transparent access to a pool ofmassively parallel computers with di�erent architec-tures [35]. Today, parallel machines ranging from 4to 1024 nodes are managed by CCS. CCS providesuser authorization, accounting, and for scheduling ar-bitrary mixtures of interactive and batch jobs [23].Furthermore, it features an automatic reservation sys-tem and allows to re-connect to a parallel applicationin the case of a breakdown in the WAN connection|an important feature for remote access to a metacom-puter.Speci�cation Language for Resource Requests.Current management systems use a variety ofcommand-line (or batch script) options at the userinterface, and hundreds of environment variables (orcon�guration �les) at the operator interface. Extend-ing such concepts to a metacomputing environmentcan result in a nightmare for users and operators.Clearly, we need a general resource description lan-guage equipped with a powerful generation and anima-

INCLUDE <default_defs.rdl> -- default definitions and constant declarationsDECLARATIONBEGIN UNIT MC_appl;DECLARATIONBEGIN SECTION main; -- main computation on gridEXCLUSIVE; -- use separate processors for each itemDECLARATIONFOR i=0 TO (main_x * main_y - 1) DO{ PROC i; compute = pbl_size; CPU = MPC601; MEMORY = 16; }; ODCONNECTIONFOR i=0 TO (main_x - 1) DOFOR j = i * main_y TO (i * main_y + main_y-2) DOPROC j LINK 0 <=> PROC j+1 LINK 2; OD ODFOR i=0 TO (main_y - 1) DOFOR j=0 TO (main_x - 2) DOPROC (j * main_y + i) LINK 1 <=> PROC (main_y * (j + 1) + i) LINK 3; OD ODASSIGN LINK 1 <=> PROC 0 LINK 3; -- from user terminalASSIGN LINK 2 <== PROC (main_x * main_y-1) LINK 1; -- to postprocessingEND SECTIONBEGIN SECTION post; -- pipelined post-processingSHARED; -- the items of this section may be run on a single nodeDECLARATIONFOR i=1 TO post_len DO{ PROC i; filter = post; CPU = MPC601; }; OD{ PORT Ausgabe; DISK; };CONNECTIONFOR i=1 TO (post_len - 1) DOi LINK 0 ==> i + 1 LINK 1; ODPROC post_len LINK 0 ==> PORT Ausgabe LINK 0;ASSIGN PROC 1 LINK 1 <== LINK 0; -- link to higher levelEND SECTIONBEGIN SECTION user; -- user control sectionDECLARATION{ PORT IO; TERMINAL; };CONNECTIONASSIGN IO LINK 0 <=> LINK 0;END SECTIONCONNECTION -- connecting the modulesuser LINK 0 <=> main LINK 1;main LINK 2 ==> post LINK 0;END UNIT -- MC_applFigure 6: Speci�cation of the system shown in Figure 3 using the Resource Description Language RDLtion tool. Metacomputer users and metacomputer ad-ministrators should not have to deal directly with thislanguage but only with a high-level interface. Ana-loguously to the popular postscript language used forthe device-independent representation of documents,a metacomputer resource description language is ameans to specify the resources needed for a metacom-puter application. Administrators should be able touse the same language for specifying the available re-sources in the virtual machine room.In a broad sense, resource representations must bespeci�ed on three di�erent abstraction levels: Thevendor-level for describing the internal structure of amachine. The operator-level, which is the metacom-puter itself, for describing the interconnections of themachines within the network and their general proper-ties. And the user-level, for specifying a logical topol-ogy to be con�gured for a given application.WithinMOL, we use the Resource Description Lan-guage RDL [4], that has been developed as part of theCCS resource management software [35]. RDL is used� at the administrator's level for describing typeand topology of the participating metacomputercomponents, and� at the user's level for specifying the required sys-tem con�guration for a given application.Figure 6 shows an RDL speci�cation for the examplediscussed in Figure 3. It is the task of the resourcescheduler to determine an optimal mapping betweenthe two speci�cations: The speci�cation of the appli-cation structure on the one hand, and the speci�cationof the system components on the other hand. Bettermapping results are obtained when the user requestsare less speci�c, i.e., when classes of resources are spec-i�ed instead of speci�c computers.

Figure 7: Snapshot of WAMM user interface7 Supporting ToolsThe MOL toolset contains a number of useful toolsfor launching and executing distributed applicationson the metacomputer. We �rst describe the \wide areametacomputer manager" WAMM, which provides re-source management functionality as well as automaticsource code compilation and cleanup after task execu-tion.Another class of tools allows for dynamic task mi-gration at execution time (MARS), which makes useof the performance prediction tool WARP.Data libraries for virtual shared memory and loadbalancing (DAISY) provide a more abstract program-ming level and program templates (FRAMES) facili-tate the design of e�cient parallel applications, evenfor non-experts.7.1 Graphical Interface WAMMThe Wide Area Metacomputer Manager WAMM[43, 5, 6] supports the user in the management of thecomputing nodes that take part in a parallel compu-tation. It controls the virtual machine con�guration,issues remote commands and remote source code com-pilations, and it provides task management. Whileealier releases of WAMM [5, 6] were limited to sys-tems running PVM only, with the PLUS library it isnow possible to use WAMM on arbitrary systems [7].

WAMM graphical interface

libpvm.a

PVMhoster LAN or WAN

communication

PVMmakerPVMtasker

PVM local host

PVM host

PVM host

USER

PVM daemon

PVM daemon

PVMmakerPVMtasker

PVM host

PVM daemon

PVMmakerPVMtasker

PVM daemon

PVM host

PVMmakerPVMtasker

PVM daemonFigure 8: WAMM architectureWAMM can be seen as a mediator between thetop user access level and the local resource man-agement. It simpli�es the use of a metacomputerby adopting a \geographical" view, where the hostsare grouped in tree structured sub-networks (LANs,MANs or WANs). Each item in the tree is shown inan OSF/Motif window, using geographical maps fornetworks and icons for hosts as shown in Figure 7.The user can move through the tree and explore theresources by selecting push buttons on the maps. Itis possible to zoom from a wide-area map to a sin-gle host of a particular site. A traditional list withInternet host addresses is also available.Metacomputer Con�guration. The metacom-puter can be con�gured by writing a con�guration �lewhich contains the description of the nodes that makeup the metacomputer, i.e. all the machines that userscan access. This �le will be read by WAMM at startuptime.ApplicationDevelopment. The development andexecution of an application requires some preparatoryoperations such as source code editing, remote com-pilation and execution. In WAMM, programmers de-velop their code on a local machine. For remote com-pilation, they only have to select hosts where theywant to do the compilation and to issue a single Makecommand from the popup menu. In a dialog box thelocal directories containing the source �les and corre-sponding Makefiles can be speci�ed along with thenecessary parameters.WAMM supports remote compilation by groupingall source �les into a single, compressed tar �le. A

NW NW

Checkpoint

Send

Information

Send

MonitorApplication

ManagerMigration

Manager

MonitorNetwork

ProgramInformationManager

Network

Network

User Process

NWPFigure 9: MARS System ArchitecturePVMMaker task, which deals with the remote compila-tion, is spawned on each participating node and thecompressed tar �le is sent to all these tasks. The restof the work is carried out in parallel by the PVMMakers.Each PVMMaker receives the compressed �le, extractsthe sources in a temporary working directory, and ex-ecutes the make command. WAMM is noti�ed aboutthe operations that have been executed, and the resultis displayed in a control window to show the user thestatus of the compilation.Tasks Execution and Control. Application arestarted by selecting the Spawn pushbutton from theApps popup menu. A dialog box is opened for the userto insert parameters, such as number of copies, com-mand line arguments, etc. The output of the processesis displayed in separate windows and/or saved in �les.When the output windows are open, new messagesfrom the tasks are shown immediately (Fig. 7). ATasks control window can be opened to control somestatus information on all the PVM tasks being exe-cuted in the virtual machine.7.2 MARS Task MigratorThe Metacomputer Adaptive Runtime SystemMARS [24] is a software module for the transparentmigration of tasks during runtime. Task migrationsmay become necessary when single compute nodes orsub-networks exhibit changing load due to concurrentuse by other applications. MARS uses previously ac-quired knowledge about a program's runtime behaviorto improve its task migration strategy. The knowledgeis collected during execution time without increasingthe overall execution time signi�cantly. The core ideais to keep all information gathered in a previous exe-cution of the same program and to use it as a basis forthe next run. By combining information from severalruns, it is possible to �nd regularities in the charac-teristic program behavior as well as in the networkpro�le. Future migration decisions are likely to bene-�t by the acquired information.

The MARS runtime system comprises two typesof instances (Fig. 9): Monitors for gathering statisti-cal data on the CPU work-load, the network perfor-mance and the applications' communication behavior,and Managers for exploiting the data for computingan improved task-to-processor mapping and task mi-gration strategy.As MARS is designed for heterogeneous metacom-puters, we cannot simply migrate machine-dependentcore images. Instead, the application code must bemodi�ed by a preprocessor to include calls to the run-time system at certain points where task migration isallowed. The user's object code is linked to the MARSruntime library which noti�es an Application Monitorand a Network Monitor each time a send or receiveoperation is executed.The Network Monitor collects long-term statisticson the network load. The Application Monitor mon-itors the communication patterns of the single appli-cations and builds a task dependency graph for eachexecution. Dependency graphs from successive execu-tion runs are consolidated by a Program InformationManager. The resulting information is used by theMigration Manager to decide about task migrationswhenever a checkpoint is reached in the application.In summary, two kinds of data are maintained: ap-plication speci�c information (in dependency graphs)and system speci�c information (in system tables) forpredicting the long-term performance of the networkand CPU work-load. Further information on MARScan be found in [24].7.3 Runtime Predictor WARPWARP is a Workload Analyzer and RuntimePredictor [39, 40]. Performance prediction tools pro-vide performance data to compilers, programmers andsystem architects to assist in the design of more e�-cient implementations. While being an important as-pect of high-performance computing, there exist onlyfew projects that address performance prediction forclusters of workstations and no project targets themetacomputing scenery.WARP is used within MOL to advise the resourcemanagement systems for better hardware exploitationby maximizing the through-put or by minimizing theresponse time. More speci�cally, the performance pre-diction �gures of WARP provide valuable input forthe initial mapping and dynamical task migration per-formed byMARS. The WARP system [39, 40] includesmodules for� compiling resource descriptions into a suitable in-ternal representation,

Task Graph Evaluation

. . .

. . .

architecture A ResourceDescription

Execution Analysis

1architecture A narchitecture A

1execution time of P on A n

execution time of P on A

program

RelaxedTask GraphFigure 10: WARP architecture� analyzing resource requests of parallel programs,and� predicting the execution time of parallel pro-grams.Architectural Model. In WARP, a parallel ma-chine is described by a hierarchical graph. The nodesin the graph denote processors or machines with localmemory. The edges denote interconnections betweenthe machines. The nodes are weighted with the rel-ative execution speed of the corresponding machines,which have been measured by small benchmark pro-grams or routines for testing the functional units ofthe CPU ( oating-point pipelines, load & store oper-ations in the memory hierarchy, etc.). The edges areweighted with the latency and bandwidth of the cor-responding communication performance between thetwo nodes, which can be either a local mechanism(e.g., shared memory) or a network communication.A monitoring tool in WARP detects the basic load oftime-sharing systems, that is used to model the sta-tistical load reducing the potential performance of themachine or interconnection.Task Graph Construction. WARP models theexecution of a parallel program by a task graph con-sisting of a set of sequential blocks with correspond-ing interdependencies. The nodes and edges of a taskgraph are weighted with their computation and com-munication loads. Task graphs are constructed eitherby static program analysis or by executing the codeon a reference system. In the latter case, the load fac-tors are reconstructed from the execution time of thesequential blocks and communications.Clearly, the task graphs { representing executiontraces { must not be unique. In cases where the control

ow depends on the execution sequence of the sequen-tial blocks or on the communication pattern, stochas-tic graphs are be used. Moreover, detailed event trac-ing can result in extremely large task graphs, depend-ing on the size of the system. Fortunately, most par-allel applications are written in SPMD or data par-allel mode, executing the same code on all proces-sors with local, data-dependent branches. Only a fewequivalence classes of processor behavior are modeledby WARP, with statistical clustering used as a stan-dard technique for identifying data equivalence classes.Also regular structures derived by loops and subrou-tines are used for a task graph relaxation.Task Graph Evaluation. Task graph are evalu-ated for predicting the execution time under modi-�ed task allocation schemes and/or changed resources.The task graph evaluation is done by a discrete eventsimulator which constructs the behavior of the parallelprogram running on a hardware with a given resourcedescription. Resource contention is modeled by queu-ing models. The execution times and communicationtimes are then adjusted to the relative speed of theparticipating resource and the expected contention.7.4 Data Management Library DAISYThe data management library DAISY (Fig. 11)comprises tools for the simulation of shared memory(DIVA) and for load balancing (VDS) in a single com-prehensive library. A beta release of DAISY is avail-able for Parsytec's PARIX and PowerMPI program-ming models. With the improved thread support ofMPI-2, DAISY will also become available on worksta-tion clusters.Load Balancing with VDS. The virtual dataspace toolVDS simulates a global data space for struc-tured objects stored in distributed heaps, stacks, andother abstract data types. The work packets arespread over the distributed processors as evenly aspossible with respect to the incurred balancing over-head [19]. Objects are di�erentiated by their corre-sponding class, depicted in Figure 11 by their di�er-ent shape. Depending on the object type, one of threedistribution strategies is used:� Normal objects are weighted by a load measuregiven by the expense of processing the object.VDS attempts to distribute the object such thatall processors have about the same load. A pro-cessor's load is de�ned as the sum of the load-weights of all placed objects.

read(m[1,3)) request(&o) write(y,5) generate(n)

��������������������

����������������

����������������

��������

��������

���������

�����������������

��������

��������

������������

������

������

��������

������������

������������

������������

������

������

��������

���������

���������

������

������

��������

�������������

���������

��������

��������

��������

������������������

������������������

��������

��������

��������

��������

������������

������������

���

���

����������

����������

��������

��������

����������

���������� ��

������

��������

����������������������������������������

����������������������������������������

��������

��������

������������

������������

����������

����������

������������

������������

��������

��������

������������

������������

m x y v tDiv

aV

DS

m[1,3]5Figure 11: : The two DAISY tools: Distributed sharedmemory (Diva) and the load-balancing layer VDS.� In some applications, such as best-�rst branch-and-bound, the execution time is not only a�ectedby the number of objects, but also by the orderin which the objects are processed. In addition tothe above described quantitative load balancing,some form of qualitative load balancing is per-formed to ensure that all processors are workingon promising objects. VDS provides qualitativeload balancing by means of the weighted objects.Besides their load-weight, these objects possess aquality tag used to select the next suitable objectfrom several alternatives.� The third kind of object, thread objects, pro-vide an easy and intuitive way to model multi-threaded computations as done in CILK [10]. Inthis computational model, threads may send mes-sages (results) to their parents. With later ver-sions it will be possible to send data across morethan one generation (e.g. to grandparents) [21].In the current VDS release, we implemented a work-stealing method [11] for thread objects as well as dif-fusive load balancing for normal and weighted objects.Shared Memory Simulation with DIVA. Thedistributed variables library DIVA provides functionsfor simulating shared memory on distributed systems.The core idea is to provide an access mechanism todistributed variables rather than to memory pages orsingle memory cells. The variables can be created andreleased at runtime. Once a global variable is created,each participating processor in the system has accessto it.For latency hiding, reads and writes can be per-formed in two separate function calls. The �rst callinitiates the variable access, and the second call waits

.abs .imp.html

abscc

.aux

FIT .ins

.aui

inscc

impcc

make

.exe

err msgs

err msgs sources

8VHU

GUI

([SHUW

Internet /Frames Repository

Internet /Frames Repository

genericsources

�3DUDOOHO�PDFKLQH

Figure 12: The frame modelfor its completion. The time between initiation andcompletion of a variable access can be hidden by otherlocal instructions or variable accesses.7.5 Programming FramesProgramming frames facilitate the development ofe�cient parallel code for distributed memory systems.Programming frames are intended to be used by non-experts, who are either unfamiliar with parallel sys-tems or unwilling to cope with new machines, environ-ments and languages. Several projects [16, 8, 17, 18]have been initiated to develop new and more sophis-ticated ways for supporting the programming of dis-tributed memory systems via libraries of basic algo-rithms, data structures and programming frameworks(templates). Like LEDA for the sequential case [30],each of these approaches provides non-experts withtools to program and exploit parallel machines e�-ciently.Our frames are like black boxes with problem de-pendent holes. The basic idea is to comprise expertknowledge about the problem and its parallelizationinto the black box and to let the users specify theholes, i.e. the parts that are di�erent at each instan-tiation. The black boxes are either constructed us-ing e�cient basic primitives, standard parallel data

types, communication schemes, load balancing andmapping facilities, or they are derived from well opti-mized, complete applications. In any case, frames con-tain e�cient state-of-the-art techniques focusing on re-usability and portability { both important aspects inmetacomputing. This will save software developmentcosts and improve the reliability of the target code.Figure 12 depicts our frame model. Each generatedtarget executable is built from three di�erent speci�-cations. The abstract speci�cation de�nes the param-eters of the problem (the holes in the black box). Itremains the same for all instances of that frame, andis generally given by an expert. Furthermore, the ab-stract speci�cation is used to generate a graphical userinterface (GUI) and for the type consistency check inthe instance level.At the instance level, values are assigned to the pa-rameters speci�ed in the abstract level. New probleminstances are generated by the user by giving new in-stance speci�cations via the GUI.The implementation level contains a number of im-plemented sources, e.g. for MPI or PVM, with respectto a given abstract frame. An implementation speci-�cation consists of a list of source �les and rules formodifying those sources to get new ones that com-ply with the values given in the instance level. Therules are best described as replacements and genera-tions. New frames can also be composed by modifyingand/or composing existing basic frames.Three major tools are used to process the framespeci�cations. The �rst checks the abstract speci�ca-tion. The second one checks the instance speci�cationagainst the abstract speci�cation. The build tool, �-nally, generates the target executables taking the spec-i�cations for both an instance and an implementation.Our preliminary versions of these tools and the GUIare portable across systems with POSIX compliant Ccompilers. As GUIs are considered vital for the accep-tance of the programming frames, our implementationis based on the graphical TCL/TK toolkit. The inter-face reads the abstract speci�cation and prompts theuser for the frame parameters. The output is an in-stance speci�cation according to our model. A Javainterface to the frames will be available in the nearfuture.8 SummaryWe have presented an open metacomputer environ-ment that has been designed and implemented in a col-laborative e�ort in the Metacomputer Online (MOL)initiative. Due to the diversity in the participatinghard- and software on the one hand, and due to the

heterogeneous spectrum of potential users on the otherhand, we believe that a metacomputer cannot be re-garded as a closed entity. It should rather be designedin an open, extensible manner that allows for con-tinuous adjustment to meet the user's demands by adynamically changing HW/SW environment.With the MOL framework, we have linked exist-ing software packages by generic, extensible interfacelayers, allowing future updates and inclusion of newsoft- and hardware. There are three general classes ofmetacomputer modules:� programming environments (e.g., PVM, MPI,PARIX, and the PLUS linkage module),� resource management & access systems (Codine,NQS, PBS, CCS),� supporting tools (GUIs, task migrator MARS,programming frames, data library DAISY,WAMM, WARP performance predictor,...).All of these modules exist. Linked by appropriategeneric interfaces, the modules became an integralpart of the MOL environment. From the hardwareperspective, MOL currently supports geographicallydistributed high-performance systems like ParsytecGC, Intel Paragon, and UNIX workstation clustersthat are run in a dedicated compute cluster mode.As a positive side-e�ect, the collaboration withinthe MOL group has resulted in a considerable amountof (originally unexpected) synergy. Separate projects,that have been started as disjoint research work, now�t together in a new framework. As an example, thetask migration manager MARS derives better migra-tion decisions when being combined with the perfor-mance predictor WARP. An even more striking exam-ple is the linkage of WAMM with PLUS [7]. Previ-ously, the WAMM metacomputer manager was lim-ited to PVM only. The PLUS library now extendsthe application of WAMM to a much larger base ofplatforms.AcknowledgementsMOL is the colaborative e�ort of many individu-als. We are indebted to the members of the Metacom-puter Online Group, the members of the Northrhine-Westphalian Initiative Metacomputing, and to themembers of ALCOM-IT, especially Jordi Petit i Sil-vestre (UPC).

References[1] D. Abramson, R. Sosic, J. Giddy, B. Hall. Nim-rod: A tool for performing parameterized simu-lations using distributed workstations. 4th IEEESymp. High-Perf. Distr. Comp. (August 1995).[2] J.N.C. �Arabe, A.B.B. Lowekamp, E. Seligman,M. Starkey, P. Stephan. Dome: Parallel program-ming environment in a heterogeneous multi-userenvironment. Supercomputing 1995.[3] M.A Baker, G.C. Fox, H.W. Yau.Cluster comput-ing review. Techn. Report, Syracuse Univ., Nov.1995.[4] B. Bauer, F. Ramme.A general purpose ResourceDescription Language. Reihe Informatik aktuell,Hrsg R. Grebe, M. Baumann, Parallel Datenver-arbeitung mit dem Transputer, Springer-Verlag,(Berlin), 1991, 68-75.[5] R. Baraglia, G. Faieta, M. Formica, D. Laforenza.WAMM: A Visual Interface for ManagingMetacomputers. EuroPVM'95, Ecole NormaleSup�erieure de Lyon, Lyon, France, September 14-15, 1995, 137{142.[6] R. Baraglia, G. Faieta, M. Formica, D. Laforenza.Experiences with a Wide Area Network Metacom-puting Management Tool using IBM SP-2 Paral-lel Systems. Concurrency: Practice and Experi-ence, John Wiley & Sons, Ltd., Vol.8, 1996, inpress.[7] R. Baraglia, D. Laforenza. WAMM integrationinto the MOL Project. CNUCE Inst. of the Ital-ian Nat. Research Council, Pisa, Internal Rep.,1996.[8] R. Barret, M. Berry, T. Chan, J. Demmel,J. Donato, J. Dongarra, V. Eijkhout, R. Pozo,C. Romine, H. van der Vorst. TEMPLATESfor the Solution of Linear Systems: BuildingBlocks for Iterative Methods. Tech. Rep., CS-Dept., Univ. of Tennessee, 1993.[9] F. Berman, R. Wolski, S. Figueira, J. Schopf, G.Shao. Application-level scheduling on distributedheterogeneous networks. Tech. Rep., Univ. Cali-fornia, San Diego. http://www-cse.ucsd.edu/-groups/hpcl/apples.[10] R.D. Blumhofe, C.F. Joerg, B.C. Kuszmaul,C.E. Leiserson, K.H. Randall, Y. Zhou. Cilk:An E�cient Multithreaded Runtime System. Pro-ceedings of the Fifth ACM SIGPLAN Symposium

on Principles and Practice of Parallel Program-ming, July 19-21, 1995, Santa Barbara, Califor-nia, 207-216.[11] R.D. Blumhofe, C.E. Leiserson. Scheduling Mul-tithreaded Computations by Work Stealing. Proc.of the 36th Ann. Symposium on Foundations ofComputer Science (FOCS '95), 356-368, 1995.[12] M. Brune, J. Gehring, A. Reinefeld. ALightweight Communication Interface BetweenParallel Programming Environments. HPCN'97,Springer LNCS.[13] G. D. Burns, R. B. Daoud, J. .R. Vaigl. LAM: AnOpen Cluster Environment for MPI. Supercom-puting Symposium '94, Toronto, Canada, June1994[14] N. Carriero, E. Freeman, D. Gelernter, D. Kamin-sky. Adaptive Parallelism and Piranha. IEEEComputer 28,1 (1995), 40{49.[15] Computing Center Software CCS. PaderbornCenter for Parallel Computing. http://www.-uni-paderborn.de/pc2/projects/ccs[16] M. Cole. Algorithmic Skeletons: Structured Man-agement of Parallel Computation. PhD, ResearchMonographs in Par. and Distr. Computing, MITPress.[17] M. Danelutto, R. Di Meglio, S. Orlando, S. Pela-gatti, M. Vanneschi. A Methodology for the De-velopment and the Support of Massively Paral-lel Programs. J. on Future Generation ComputerSystems (FCGS), Vol. 8, 1992.[18] J. Darlington, A.J. Field, P.G. Harrison,P.H.J. Kelly, D.W.N. Sharp, Q. Wu. ParallelProgramming Using Skeleton Functions. Proc.of Par. Arch. and Lang. Europe (PARLE '93),Lecture Notes in Computer Science No. 694,Springer-Verlag, 1993.[19] T. Decker, R. Diekmann, R. L�uling, B. Monien.Towards Developing Universal Dynamic MappingAlgorithms. Proc. of the 7th IEEE Symposiumon Parallel and Distributed Processing, SPDP'95,1995, 456-459.[20] J.J. Dongarra, S.W. Otto, M. Snir. D. Walker. Amessage passing standard for MPP and Worksta-tions. CACM 39,7(July 1996), 84{90.

[21] P. Fatourou, P. Spirakis. Scheduling Algorithmsfor Strict Multithreaded Computations. Proc. ofthe 7th Annual International Symposium on Al-gorithms and Computation (ISAAC '96), 1996,to appear.[22] I. Foster. High-performance distributed comput-ing: The I-Way experiment and beyond. Procs. ofthe EURO-PAR'96, Lyon, 1996, Springer LNCS1124, 3{10.[23] J. Gehring, F. Ramme. Architecture-IndependentRequest-Scheduling with Tight Waiting-Time Es-timations, IPPS Workshop on Job SchedulingStrategies for Parallel Processing, Lecture Notesin Computer Science 1162, 1996.[24] J. Gehring, A. Reinefeld. MARS { A Frame-work for Minimizing the Job Execution Time ina Metacomputing Environment. Future Genera-tion Computer Systems (FGCS), Elsevier ScienceB.V., Spring 1996.[25] A. Grimshaw, J.B. Weissman, E.A. West, E.C.Loyot. Metasystems: An approach combiningparallel processing and heterogeneous distributedcomputing systems. J. Par. Distr. Comp. 21(1994), 257{270.[26] L. V. Kale, S. Krishnan. CHARM++: APortable Concurrent Object Oriented SystemBased On C++. Conference on Object OrientedProgramming, Systems, Languages and Applica-tions (OOPSLA), September 1993[27] J.A. Kaplan, M.L. Nelson. A Comparison ofQueuing, Cluster and Distributed Computing Sys-tems. NASA Technical Memo, June 1994.[28] B.A. Kinsbury. The Network Queuing System.Cosmic Software, NASA Ames Research Center,1986.[29] M.J. Litzkow, M. Livny. Condor{A hunter of idleworkstations. Procs. 8th IEEE Int. Conf. Distr.Computing Systems, June 1988, 104{111.[30] K. Mehlhorn, S. N�aher. LEDA, a Library of Ef-�cient Data Types and Algorithms. MFCS 89,LNCS Vol. 379, 88-106[31] Metacomputer Online (MOL). http://www.-uni-paderborn.de/pc2/projects/mol/[32] B. Monien, F. Ramme, H. Salmen : A Paral-lel Simulated Annealing Algorithm for Generat-ing 3D Layouts of Undirected Graphs. Proc. of

Graph Drawing '95, Springer LNCS, Vol. 1027,pp. 396-408.[33] David Patterson et al. A Case for Net-works of Workstations: NOW. IEEE Micro.http://now.CS.Berkeley.EDU/Case/case.html[34] Polder. http://www.wins.uva.nl/projects/-polder/[35] F. Ramme, T. R�omke, K. Kremer. A DistributedComputing Center Software for the E�cient Useof Parallel Computer Systems. HPCN Europe,Springer LNCS 797, Vol. II, 129-136 (1994).[36] F.Ramme, K.Kremer. Scheduling a Metacom-puter by an Implicit Voting System. 3rd IEEE Int.Symposium on High-Performance DistributedComputing, San Francisco, 1994, 106-113.[37] W. Saphir, L.A. Tanner, B. Traversat. Job Man-agement Requirements for NAS Parallel Systemsand Clusters. IPPS Workshop on Job SchedulingStrategies for Parallel Processing, Lecture Notesin Computer Science 949, 319-336, 1995.[38] L. Sch�afers, C. Scheidler, O. Kr�amer-Fuhrmann.Software Engineering for Parallel Systems: TheTRAPPER Approach. 28th Hawaiian Interna-tional Conference on System Sciences, January1995, Hawaii, USA[39] J. Simon, J.-M. Wierum. Performance Predictionof Benchmark Programs for Massively ParallelArchitecture. 10th Annual Intl. Conf. on High-Performance Computer HPCS'96, June 1996.[40] J. Simon, J.-M. Wierum. Accurate PerformancePrediction for Massively Parallel Systems andits Applications. Euro-Par'96 Parallel Processing,August 1996, LNCS 1124, 675{688.[41] L. Smarr, C.E. Catlett.Metacomputing. Commu-nications of the ACM 35,6(1992), 45{52.[42] F. Tandiary, S.C. Kothari, A. Dixit, E.W. Ander-son. Batrun: Utilizing ilde workstations for large-scale computing. IEEE Parallel and Distr. Techn.,Summer 1996, 41{48.[43] WAMM { Wide Area Metacomputer Manager.Available at http://miles.cnuce.cnr.it or viaftp at ftp.cnr.it in the directory /pub/wamm.