Symphony: an infrastructure for managing virtual servers

13
Cluster Computing 4, 221–233, 2001 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Symphony: An Infrastructure for Managing Virtual Servers ROY FRIEDMAN, ELI BIHAM, AYAL ITZKOVITZand ASSAF SCHUSTER Department of Computer Science, The Technion, Haifa 32000, Israel Abstract. A virtual server is a server whose location in an internet is virtual; it may move from one physical site to another, and it may span a dynamically changing number of physical sites. In particular, during periods of high load, it may grow to new machines, while in other times it may shrink into a single host, and may even allow other virtual servers to run on the same host. This paper describes the design and architecture of Symphony, a management infrastructure for executing virtual servers in internet settings. This design is based on combining CORBA technology with group communication capabilities, for added reliability and fault tolerance. Keywords: CORBA, internet, virtual servers, group communication 1. Introduction The emergence of internets, and in particular the Internet, created a potential for (literally) global resource sharing. If we examine the total load on all computing servers in the world at any given moment, we may see the following phe- nomena: While the load on a given computer may increase and decrease throughout the day, the overall load on all com- puters remains more or less constant. Moreover, at times when some servers are overloaded to the point of crashing, other machines are sitting idle. To solve this problem, we propose the notion of virtual servers. A virtual server is a server whose identity is not bound to a fixed physical computer, but rather is changing and evolving with time and in reaction to the load on this server. Thus, a virtual server may move from one location in the network to another, and the number of physical comput- ers on which it resides may change dynamically. Consider, for example, an e-commerce application that shifts its location from the US to Japan when night falls on Los Angeles, and then from Japan to Israel when day breaks in Tel-Aviv. In this example, being able to shift the location of the server guarantees that at any given moment, the server is located in proximity to its current clients. Another ex- ample is a large ray tracing application running on a cluster of PCs in the Technion at night, using a distributed shared memory system like Millipede [9,10]. This application sud- denly notices that there are idle workstations at the Hebrew University in Jerusalem, and then decides to shift some of its computing tasks there, in order to enjoy higher parallelism. Similarly, Web based news sites become overloaded when an important event, such as elections or an important chess game, take place. These Web sites need to be able to grow, perhaps even to distant hosts, in order to handle the load. In this work we describe the internal architecture of Sym- phony, an infrastructure for managing and executing vir- Supported by the Israeli Ministry of Science, Basic Infrastructure Fund, Project #9762. A shorter, preliminary version of this paper appeared in Euro-Par ’99. tual servers in internet settings. Symphony’s architecture is based on the fundamental principles underlying CORBA [2], although, it is not a “pure CORBA” design, in the sense that services and applications (virtual servers) are replicated and can communicate internally in a non-CORBA compli- ant fashion. Also, we have taken liberty to extend the basic services of CORBA to fit our needs, as discussed later in this paper. This distinction is important since it enables us to inte- grate high-performance servers, such as distributed shared memory, for which CORBA’s performance is insufficient at best. Also, our main management services, as described later in this paper, are built using the Ensemble [7] group commu- nication technology [3]. A service based design has the benefits of scalability, interoperability, and extendibility. The design is scalable, since the location of a service in the system is virtual. A ser- vice can run anywhere from a single host to the entire set of machines, yet be accessible everywhere. Also, services present an abstraction of objects, where only their interface is known, so that different implementations of the same ser- vice can be replaced without having to recompile the system. Finally, it is always possible to add services, or modify exist- ing ones, in order to extend the functionality of the system. In summary, our design tries to enjoy the benefits of both worlds. We use group communication as the main build- ing block for our management infrastructure, use a CORBA- oriented architecture, and allow replicated/distributed servers to communicate internally with their own communication stacks for high performance. (Of course, servers that whish to communicate internally using CORBA can do so, but we do not impose this.) Finally, Symphony is an infrastructure technology whose purpose is to aid in the design and implementation of virtual servers, and tries to present a general framework for this. Clearly, any distributed application can be implemented di- rectly on top of the raw networking support provided by any modern operating system. However, the same argument can be said in favor of using bare hardware rather than advanced

Transcript of Symphony: an infrastructure for managing virtual servers

Cluster Computing 4, 221–233, 2001 2001 Kluwer Academic Publishers. Manufactured in The Netherlands.

Symphony: An Infrastructure for Managing Virtual Servers ∗

ROY FRIEDMAN, ELI BIHAM, AYAL ITZKOVITZ and ASSAF SCHUSTERDepartment of Computer Science, The Technion, Haifa 32000, Israel

Abstract. A virtual server is a server whose location in an internet is virtual; it may move from one physical site to another, and it mayspan a dynamically changing number of physical sites. In particular, during periods of high load, it may grow to new machines, while inother times it may shrink into a single host, and may even allow other virtual servers to run on the same host. This paper describes thedesign and architecture of Symphony, a management infrastructure for executing virtual servers in internet settings. This design is basedon combining CORBA technology with group communication capabilities, for added reliability and fault tolerance.

Keywords: CORBA, internet, virtual servers, group communication

1. Introduction

The emergence of internets, and in particular the Internet,created a potential for (literally) global resource sharing. Ifwe examine the total load on all computing servers in theworld at any given moment, we may see the following phe-nomena: While the load on a given computer may increaseand decrease throughout the day, the overall load on all com-puters remains more or less constant. Moreover, at timeswhen some servers are overloaded to the point of crashing,other machines are sitting idle.

To solve this problem, we propose the notion of virtualservers. A virtual server is a server whose identity is notbound to a fixed physical computer, but rather is changingand evolving with time and in reaction to the load on thisserver. Thus, a virtual server may move from one location inthe network to another, and the number of physical comput-ers on which it resides may change dynamically.

Consider, for example, an e-commerce application thatshifts its location from the US to Japan when night falls onLos Angeles, and then from Japan to Israel when day breaksin Tel-Aviv. In this example, being able to shift the locationof the server guarantees that at any given moment, the serveris located in proximity to its current clients. Another ex-ample is a large ray tracing application running on a clusterof PCs in the Technion at night, using a distributed sharedmemory system like Millipede [9,10]. This application sud-denly notices that there are idle workstations at the HebrewUniversity in Jerusalem, and then decides to shift some of itscomputing tasks there, in order to enjoy higher parallelism.Similarly, Web based news sites become overloaded whenan important event, such as elections or an important chessgame, take place. These Web sites need to be able to grow,perhaps even to distant hosts, in order to handle the load.

In this work we describe the internal architecture of Sym-phony, an infrastructure for managing and executing vir-

∗ Supported by the Israeli Ministry of Science, Basic Infrastructure Fund,Project #9762. A shorter, preliminary version of this paper appeared inEuro-Par ’99.

tual servers in internet settings. Symphony’s architecture isbased on the fundamental principles underlying CORBA [2],although, it is not a “pure CORBA” design, in the sensethat services and applications (virtual servers) are replicatedand can communicate internally in a non-CORBA compli-ant fashion. Also, we have taken liberty to extend the basicservices of CORBA to fit our needs, as discussed later in thispaper.

This distinction is important since it enables us to inte-grate high-performance servers, such as distributed sharedmemory, for which CORBA’s performance is insufficient atbest. Also, our main management services, as described laterin this paper, are built using the Ensemble [7] group commu-nication technology [3].

A service based design has the benefits of scalability,interoperability, and extendibility. The design is scalable,since the location of a service in the system is virtual. A ser-vice can run anywhere from a single host to the entire setof machines, yet be accessible everywhere. Also, servicespresent an abstraction of objects, where only their interfaceis known, so that different implementations of the same ser-vice can be replaced without having to recompile the system.Finally, it is always possible to add services, or modify exist-ing ones, in order to extend the functionality of the system.

In summary, our design tries to enjoy the benefits of bothworlds. We use group communication as the main build-ing block for our management infrastructure, use a CORBA-oriented architecture, and allow replicated/distributed serversto communicate internally with their own communicationstacks for high performance. (Of course, servers that whishto communicate internally using CORBA can do so, but wedo not impose this.)

Finally, Symphony is an infrastructure technology whosepurpose is to aid in the design and implementation of virtualservers, and tries to present a general framework for this.Clearly, any distributed application can be implemented di-rectly on top of the raw networking support provided by anymodern operating system. However, the same argument canbe said in favor of using bare hardware rather than advanced

222 FRIEDMAN ET AL.

operating system’s services. Yet, most people prefer to usehigh level abstractions to develop programs. We believeSymphony should be viewed in a similar way with respectto developing virtual servers.

The rest of this paper is organized as follows: section 2describes the general architecture, and describes the mainservices we support and the interaction between them. Sec-tion 3 discusses security aspects of Symphony, while sec-tion 4 elaborate on Symphony’s services. We provide a fewexample applications in section 5, compare our work withrelated projects in section 6, and conclude with a discussionin section 7.

2. Architectural overview

Our architecture follows the CORBA approach of providingservices to perform common tasks needed by applicationsand other services in the system. We arrange our services inseveral domains, as depicted in figure 1. Services are logi-cally organized in a hierarchical structure such that servicesin one domain tend to draw upon the functionality providedby services in lower or equal domains, but not in higherdomains, as discussed below. Also, Symphony’s run-timesystem is located below all services alongside with the theoperating system. Symphony’s run-time system consists ofSymphony’s daemons. A Symphony daemon is responsiblefor managing and monitoring the services on its machine.

Management services lie at the heart of our design. Theseservices provide support for automatically replicating andmigrating virtual servers based on the characteristics of theseservers, the topology of the network, computers availabilityand so forth. The application can register with these ser-vices a description of its requirements and cost functions forlocating “ideal” machines for it. The application is free tochoose between two modes of operations: (a) the applicationcan either allow the management services to automaticallymigrate and replicate itself based on the above descriptions(and restrictions), or (b) request non binding recommenda-tions from the management services, but maintain the finalword as to where and when its replicas should be (re)moved.This latter option is given for virtual servers that wish to re-tain control regarding the location of their replicas.

Management services draw upon the functionality pro-vided by both informational services and code-handler ser-vices. The code-handler services do the actual task of start-ing an image of a code on a given machine upon demand,or evacuating machines under certain conditions. These ser-vices may receive instructions from either management ser-vices or applications, receive operational information frominformational services, and utilize the functionality of thebasic services as part of their normal operation.

Informational services, on the other hand, collect andreport information about the system. This information in-cludes the load on participating machines, the topology ofthe system, the given environment on each of the machines,and the location of various applications. The information

Figure 1. Symphony’s service-based architecture.

gathered by informational services can either be reported tosubscribers using registered up-calls, or given as a responseto an explicit request.

Basic services are located at the bottom of the hierarchy.These services provide the basic functionality that we viewas necessary for most common applications, as well as toother services. These include a naming services, for regis-tering and locating other services, a security service, an eventnotification service, a virtual socket service, a distributed fileservice, a transaction service, and a replication service.

Figure 2 presents a deeper look into the bowels of thesystem, by examining an ideal picture of the interactionsbetween some of the main services in the system. In thisexample we see that the application (virtual server) regis-ters itself with the management services, mainly the code-replicator and code-migrator services. (We discuss theseservices and others later.) The code-replicator and code-migrator, in return, register with the informational servicesto be notified about the status of the system and to be notifiedabout changes as they occur. When one of the managementservices decides to start a replica on a certain machine, orto evacuate the machine, it contacts the code-dispatcher, orcode-evacuator respectively, to perform the job.

Applications as well as Symphony’s services may use thename service and the security service to locate other servicesand to communicate securely with them. Also, the eventnotification service plays a central role in our design, as itis being used to disseminate information about the status ofapplications, services, and hosts. Symphony’s services arediscussed in more detail in section 4.

2.1. Communicating with replicated services

Most of the services we propose need to be replicated for ef-ficiency and high-availability. Also, different services mayneed different levels of replication. Some services may re-quire a replica on every hosts, e.g., the code-dispatcher,

SYMPHONY 223

Figure 2. Flow of information in Symphony.

Figure 3. Communicating with replicated services.

while for other services it may be sufficient to instantiatea single replica in each cluster or subnet.

This goal is realized by splitting each service into a lo-cal stub that has to be started on each host, and a globalpart that is replicated, as illustrated in figure 3. The globalpart consists of several replicas that form a process group;these replicas communicate through a group communicationsystem that also manages membership changes, i.e., discov-

ery of both failures of existing replicas and joins by newreplicas.1 The local stub communicates directly with the ap-plication or service entity on the local host. When contacted,the stub issues a multicast RPC request to the global part ofthe service, and is then responsible for collecting the repliesand forwarding them to the local entity that initiated the re-quest. In particular, the local stub can be asked to deliver thefirst reply that arrives from any replica, deliver all replies,

224 FRIEDMAN ET AL.

or deliver the most common reply. Also, invocations on thelocal stub may be either synchronous, deferred synchronous,or asynchronous [13].

More precisely, at each given moment, one replica of theglobal part is registered with the name service as the repre-sentative of the service. When the local stub tries to bind to acertain service, it will receive the reference to the represen-tative. The local stub then contacts the representative, andthe representative communicates with other replicas when itis deemed necessary. If the representative fails, then one ofthe other global part replicas registers itself as a new rep-resentative for the service. In this case, the local stub re-connects to the new representative, and reissues all pend-ing requests. The above interaction is mostly transparent forprogrammers, as we designed class hierarchies that in mostcases automatically deal with these issues.

2.2. State transfer

Note that when moving a virtual server from one host to an-other, it is important to preserve the state of that server on thenew host. However, the level at which the state of an appli-cation needs to be preserved depends on the application. Inmany cases only a certain part of the application level stateneeds to be recovered, and this part of the state can be eas-ily encapsulated. In these cases, the application can use thepersistence service to dump the important part of the stateto disk, or to another computer on the network, so it can beretrieved later.

Similarly, if due to a failure, one replica takes over thefunctionality of another replica, the latter replica should con-tinue with the same state as was last held by the failingreplica. In Symphony the application can either maintainconsistency explicitly, possibly by using a group commu-nication toolkit or the transaction service, or register repli-cated objects with the replication service, which automati-cally takes care of maintaining a consistent state among allreplicas, as discussed in section 4.4.

On the other hand, if no assumptions are made about theapplication, the entire process state needs to be recovered,including all registers, open files, and memory image (bothstack and heap). This is a complicated operation, especiallyif the architectures of the host from which the state is takendiffers from the architecture of the new host in terms of byteordering or registers. At the initial phase, Symphony onlysupports complete state migration between similar architec-tures.

3. Security in Symphony

A major concern when allowing servers to communicateover internets and allowing code to migrate from one placeto another is security. Our approach to security is based ona trust model that is enforced by the configuration and secu-rity services. In this section we first describe the trust model,and later elaborate on how messages and code are secured.

3.1. Trust model

A trust model, as suggested by its name, defines for each en-tity the kind of trust it has in other entities, and how this trustis verified. In Symphony, an entity can be a host, a message,or an image of a code. Each entity e holds a data structurethat contains the following fields:

(1) list of entities to whom e is willing to send messages;

(2) list of entities from whom e is willing to accept mes-sages;

(3) list of entities to whom e is willing to send code;

(4) list of entities from whom e is willing to accept code;

(5) list of entities to whom e is willing to migrate/replicate;

(6) list of entities to whom e is willing to provide services;

(7) list of entities with whom e is willing to share resources.

Each of these categories is defined by specifying an accesscontrol list, and can be refined using the following subcrite-ria:

(a) type (of communication/service/code);

(b) authentication scheme;

(c) limitations on the way the communication was carried,e.g., has to go through a firewall;

(d) other conditions for the interaction;

(e) recommendations in case multiple options occur.

Thus, it is possible to specify that host A is willing to ac-cept any code written by Assaf, if it is authenticated usingthe El Gamal scheme, and it passed through a given firewall.A may also be willing to accept any Java applet that wasgenerated at the Technion, and is signed with an MD5 hashfunction and a known secret key. Similarly, it is possible tospecify that a replica of “Your-Fastest-Web-Search-Engine”never runs on the same machine as a replica of “The-Most-Complete-Web-Search-Engine”, while some server A mayrequire encryption of queries, but is willing to receive au-thenticated, but not encrypted, replies.

The trust model we have just described is not only used byvirtual servers, but also by Symphony’s internal services. Inparticular, some of the management and informational ser-vice may not accept certain types of data from untrusted en-tities. This is important in order to prevent malicious partiesfrom creating havoc in the system. For example, if the hostrecommender recommends bad hosts, it may cause the sys-tem to migrate code to the wrong nodes, which will degradethe performance of the system, and make it unusable.

3.2. Message security

There are two aspects of securing messages: authenticationand encryption. The former incurs relatively low overhead,but the latter is quite expensive. Similarly, most forms of

SYMPHONY 225

authentication are publicly available, and can be distributedfreely. On the other hand, encryption is usually protectedby patents, and its distribution is usually limited by exportregulations.

Symphony allows virtual servers to choose which of thetwo they wish to employ, in order to obey the trust model.As with other concerns, authentication and encryption aresupported by appropriate services. Moreover, as some appli-cations may prefer to do authentication and encryption them-selves, Symphony provides key management services; theseservices distribute keys among replicas of subscribed virtualservers, revoke expired keys, and initiate key replacementaccording to given specifications.

3.3. Securing imported code execution

In recent years there has been a large body of research onhow to securely execute imported code. In Symphony, an ex-ported code is accompanied with a descriptive object, whichdefines the restrictions the code is willing to run under, aswell as the runtime environments it requires. Also, part ofthe information gathered by the informational services is thepolicies employed at each of the sites on the network to-wards such code. In particular, a code can be authenticatedas coming from Assaf at the Technion, and some hosts inJerusalem may be configured to allow full execution rightsto Assaf’s code, but restrict all other code to a Java-like sand-box model.

When sending code for remote execution, the code-dispatcher service consults with the appropriate informationservices regarding the possibility of placing the given codeon the requested machine. The code is then executed on theremote site only if the site’s policy allow the code to runthere.

Moreover, the descriptive object accompanying the codeincludes the maximum amount of resources the processneeds. Before starting a code image, the code-dispatcherverifies that the hosting machine is willing to provide thatmuch resources to the imported code. These resources in-clude memory consumption, disk space, cpu time, commu-nication, and processes. If the answer is no, it will not bestarted on the given machine. If the answer is yes, it willbe started, but the amount of resources it consumes will bemonitored periodically; if they are exceeded, the process(es)will be stopped.

4. Detailed description of services

4.1. Management services

Most management services deal with replicating, migrating,and updating virtual servers. This is done by registeringa given code with the appropriate service, and indicatinga policy for treating that code. Also, as discussed in sec-tion 3, when a code is registered, it is accompanied by a de-scriptive object, which describes the minimal requirements

of the code, as well as identification information about it.This includes the minimal set of permissions needed by it(read-files, write-files, execute-files), minimal and maximalsystem resources (CPU, memory, disk, communication), theowner of the code with authentication information (digitalsignature).

It is also permissible to register a list of possible codeimages, each with its own list of descriptive objects. In thiscase, an indication of how to choose among these imagesis also supplied. These policies include first-fit, i.e., run thefirst image that can be executed on the given host, best-fit,i.e., run the best image among those that can run there usinga given evaluation function, all-fit, i.e., run all images only ifall of them can run, or any-fit, i.e., run every image that canrun on the host.

4.1.1. Code-replicatorThe code-replicator can automatically replicate a virtualserver on idle machines. When the application is first started,it registers with the code-replicator a replication policy forthe application. For example, if an application incurs verylittle internal communication, any idle machine might be agood candidate. On the other hand, if there is extensive in-ternal communication, it makes sense to start new replicasonly on hosts that have sufficient bandwidth. This servicechooses machines based on recommendations provided bythe host-recommender, and utilizes the code-dispatcher tostart the code on a chosen machine. Both services are dis-cussed below.

4.1.2. Code-migratorThe code-migrator is similar to the code replicator, exceptthat it migrates the application rather than simply replicateit. In particular, this means that a code may need to be evac-uated from one machine and be started on another. Typicalreasons for migrating a replica of a server to another machineinclude, e.g., keeping the virtual server close to its clients, orthe discovery of a more powerful idle machine.

4.1.3. Host-recommenderThe host-recommender tries to find ideal machines for regis-tered applications. This is done by applying a cost functionthat is provided alongside each registered application to thecurrent state of the system, as reported by the informationalservices. Hosts that come up are then reported by invokingan appropriate upcall.

4.1.4. Configuration managerThis service allows hosts to register their policy regardingexecution of imported code. This includes which code canrun with which access rights, what authentication methodsare accepted, the operating system and hardware architec-ture used, etc. The information is then registered with theappropriate information services, and can be used by otherservices and applications, e.g., by the code-dispatcher to de-cide if a submitted code can be started on a given host.

226 FRIEDMAN ET AL.

4.1.5. MonitorThe monitor service monitors the execution on any givenhosts, in order to enforce execution policies, prevent anom-alies, and react to them if they occur. In particular, the mon-itor service monitors the resource utilization of importedcode images, and if a process running such code violatesits resource limitations, then this process is stopped, and insome circumstances, the corresponding code image is evac-uated. Also, if the load on a given machine grows too much,the monitor may decide to stop and perhaps even evacu-ate processes running imported code, even if none of theseprocesses consumes more resource than it is allowed. Thisensures that processes that are native to the local host arenot endangered by imported ones. Also, this serves as asafety guard against erroneous decisions by other services,that caused one host to become overloaded.

4.2. Informational services

Informational servers are somewhat of a special case in Sym-phony. Like all replicated services, these services have alocal stub and a global part. However, in all informationalservices the local stub can also initiate communication withthe global part on its own, rather than simply handle remotemethod invocations and replies. That is, the local stub pe-riodically, and in some occasions as a result of significantchanges, report the local state of the system, be it load, envi-ronment, etc., to the global part without an explicit requestfrom any other local entity.

Moreover, the local stub can be polled locally as to thestate of the local machine, in which case it does not commu-nicate with the global part, and can periodically update localentities (applications and other services) about changes inthe local host. Due to the vast amount of the collected infor-mation, each informational service has methods for request-ing digested information for specific hosts and/or specific IPdomains, as well as for the entire systems.

We have defined the following set of informational ser-vices to be supported in Symphony.

4.2.1. LoadMonitors the load of the system. This includes all standardcomputing resources such as CPU utilization, memory con-sumption, disk accesses, networking, etc.

4.2.2. TopologyThis service tries to figure out the topology of the system;which machines are alive, how they are connected, and thequality of communication between various nodes.

4.2.3. EnvironmentThis service keeps track of the architecture of the local host,operating system, supported run-time environments, accesscontrol list restrictions, etc.

4.2.4. Application locatorThe application locator monitors the applications located onany given host in the system. In particular, it has methods forretrieving the list of applications that are running on a givenhost (or set of hosts), and which hosts run a given application(or set of applications).

4.3. Code handling services

4.3.1. Code-dispatcherThe code-dispatcher is responsible for actually starting animage of a code on a given host. This is done by invok-ing the dispatch method on the code-dispatcher service.Aside from the executable image, or applet, this method ac-cepts a descriptive object describing the requirements of thecode. These include the run-time environment it requires,the operating system, and the hardware architecture. Otherparameters include memory consumption, disk space, andCPU units. Of course, for each of these parameters it is pos-sible to indicate do-not-care if these are unimportant. Inaddition to the above parameters, the descriptive object in-cludes a list of restrictions under which the code can execute,as discussed in section 3.

After receiving the code and its descriptive object, thecode-dispatcher checks with the appropriate information ser-vices whether the requirements of the descriptive object holdon the requested machine. If they do, the code-dispatcherwrites the image of the executable/applet on the remote ma-chine, and starts executing it with the requested run-time.

4.3.2. Code-evacuatorThe code evacuator, as suggested by its name, is responsi-ble for evacuating code from a given machine. It is invokedwith an evacuate method. The evacuated code image,including registers, stacks, and heap, and any output it hasproduced, both to files and standard output can either bediscarded, or returned to a given host, as discussed in sec-tion 2.2.

4.3.3. Code-updaterThe code-updater is provided as a mean for updating replicasof a given service on-the-fly. Typically, the code-updatershuts down a replica, updates the code image, and then startsthe new code image on the same host. By repeating thisprocess to all replicas, one by one, the entire set is beingupgraded, while keeping (at least one replica of) the serviceon-line at all times.

4.3.4. Code-moverThe code-mover supports the move method for moving agiven code from one host to another. This method employsboth the code-evacuator and the code-dispatcher to actuallymove the code. The code-mover is different from the code-migrator in that it does not take decisions by itself, but rathersimply executes explicit instructions regarding which codeshould be moved from which machine and to where.

SYMPHONY 227

4.4. Basic services

In this section we discuss the basic services provided bySymphony. These services largely follow the CORBA stan-dard, although we have extended their functionality beyondthe requirements of CORBA, and added a few additional ser-vices, to fit our needs.

4.4.1. NamingAs in any CORBA system, objects, including services andapplications, have names that identify them in the system.However, these names serve as symbolic links to objects,since they do not define where the object is located andhow to access it. The naming service provides this capa-bility. That is, in order to access an object for the first time,the equivalent of a UNIX fopen command must be issued.This command invokes the name-resolution method on thenaming service, which returns a handle to the object. Thehandle to the object defines how the object should be ac-cessed, how it can be located, and its current location.

Following this, whenever object A invokes a method onobject B, A uses the handle to identify the requested objects.This invocation tries to invoke the requested method on B

by using the currently available information about B. If thisattempt fails, the client’s server stub checks with the namingservice for the current location of the B. If at least one copyof B is still active and connected to the system, the namingservice returns a new binding for B that matches its currentlocation.

This protocol also defines an interaction between thenaming service and the application-locator service, since thenaming service may need to consult with the application-locator as to the current location of an object.

4.4.2. SecurityThe security service supports signing, encrypting, authen-ticating and decrypting of messages, and methods for keymanagement among a group of applications or services.These are used to enforce the trust model, as discussed insection 3.

4.4.3. Event notificationThe event notification service allows objects to register andmulticast messages and events to interest groups. Messagesare then delivered to all interested parties according to theirtopic. This mode of communication decouples senders fromreceivers, since a sender never knows who are the listeners.In particular, the event notification service is used by manyother symphony services to post changes in hosts and appli-cations.

4.4.4. Virtual socket serviceThe virtual socket service provides location independentsocket semantics. That is, it allows two processes to com-municate using a socket-like interface, with the followingaddition. If one of the processes migrate from one host toanother, the connection remains active, and new messages

as well as in-transit messages are automatically routed to thenew location.

4.4.5. Distributed file serviceThe distributed file service allows Symphony applicationsand services access to files located anywhere in the networkas if they were located locally. Also, when a process moves,its references to files opened through the distributed file ser-vice remain valid.

4.4.6. TransactionsAccording to the CORBA standard, the transactional servicesupports distributed transactions using the 2-phase commitprotocol. We intend to extended this definition to supportthe more sophisticated 3-phase commit protocol.

4.4.7. PersistenceObjects can be registered with the persistence service if theyneed to be persistent. For example, according to the CORBAstandard, the persistence service can flush (create a checkpoint) an object to disk either periodically, or when certainevents occur. We intend to extend this definition and allowthe persistence service to flush an object to another host onthe network as well as to the local disk. Recall that this canbe used to migrate a virtual server as discussed in section 2.2.

4.4.8. ReplicationReplicated servers can register replicated objects with thereplication service. In this case, the replication service takescare of maintaining consistency for these objects acrossall replicas. Symphony’s replication service supports threetypes of consistency: strong consistency, also known as se-quential consistency, weak consistency, and dirty consis-tency. This service does not appear in the CORBA standard,although we view it as extremely useful in developing repli-cated applications.

5. Example applications

In this section we present two example applications of Sym-phony, namely, a MicroMint and exhaustive search. Otherapplications under development using Symphony include adistributed Web server, a video-on-demand server, and a dis-tributed video conferencing server.

5.1. MicroMint

MicroMint is a scheme for producing digital money for mi-cropayments, known as microcash, devised by Rivest andShamir [14]. In a MicroMint setting, the mint produces dig-ital coins that expire within a relatively short period of time,e.g., within a month. At the beginning of each period, newcurrency becomes available for general usage, and is dis-tributed to the public through intermediate branches, e.g.,ATMs. Each digital coin can then be used once, and onlyonce, in order to pay to a store. The store returns the money

228 FRIEDMAN ET AL.

Figure 4. Implementation of a MicroMint using Symphony.

to the mint and is credited accordingly. At the end of eachperiod, the unused digital coins need to be returned to themint for appropriate credit.

The mint verifies that each returned coin is legal, i.e., wasnot forged or used, before crediting the entity that broughtthe coin. Also, authentication schemes are used to detectviolators that attempt to use a given coin more than once.

One of the advantages of the MicroMint scheme for mi-cro payments is that each entity can easily verify that a coinis legal, although producing coins is very time/resource con-suming, which makes forgery non-economical. The key tothis is the structure of a coin; each coin is composed of twolarge numbers that are mapped by a well known hash func-tion to the same hashed value.

Generating coins involves computing the hash value ofmany numbers, and trying to find pairs of numbers whosehash collide. By using n bit hash values, the first colli-sion is discovered after roughly 2n/2 values have been com-puted, and from that point on, the number of collisions growsquadratically with the number of computed values.

Note that while calculating each hash is fairly trivial,computing 2n/2 hash functions is a time/resource consum-ing activity for large n’s (e.g., n > 80). Also, the exact hashfunction is replaced with each period, and is kept secret untilthe beginning of the period. Thus, a good choice of a periodcan prevent even malicious resourceful parties, e.g., a hos-tile government, from generating coins before the end of theschedule.

There are certain aspects of the MicroMint scheme thatmakes it a good candidate for distribution, and in particu-lar for being implemented using Symphony. Since generat-ing a coin and verifying a returned coins are simple tasks,and these tasks are also independent of one another and ofsimilar activities for other coins, they can be easily distrib-

uted. Also, the task of computing hash values, sorting thecomputed hash values in order to find duplicates, and ver-ifying returned coins reach their activity peaks in differentparts of the period. For example, at the end of a period,many coins are returned, while collisions can be found af-ter roughly half the period. This behavior is mapped verywell into symphony’s support for growing and shrinkingapplications. Finally, as we explain below, MicroMint’ssecurity aspects map very well into Symphony’s securitymodel.

5.1.1. Implementing a MicroMint with SymphonyIn the discussion below, we refer to seven functional enti-ties: mint controller, value-generators, matchers, verifiers,distributors, clients, and stores, as depicted in figure 4. Notethat this illustration also indicate the security aspects thatare discussed in section 5.1.2. Figure 5 sketches a qualita-tive time-line for the proposed mode of operation in the Mi-croMint, while figure 6 includes code extracts that show howdifferent components of the mint interact with each other.

The mint itself is composed of a mint controller, value-generators, matchers, and verifiers in the following fashion:The mint controller informs other components of the mintat the beginning of each period about the new hash functionfor the next period, and informs the rest of the world aboutthe hash function for the period which has just started. Thegenerators compute hash values and send them in batches tothe matchers, who try to find duplicates. Also, distributorscontact matchers in order to obtain new coins for distribu-tion to clients. Clients then pass coins to stores in order topay for goods and/or services. Finally, stores and clientsreturn their coins to verifiers, who inform the mint con-troller on any attempted forgery or dual usage of the samecoin.

SYMPHONY 229

Figure 5. Starting and replicating MicroMint’s components. Note that step (e) repeats itself whenever new hosts are found.

All components of the mint are distributed for fault-tolerance and efficiency. The mint controller is replicated,but has a small, fixed set of hosts it is constantly running on.On the other hand, value-generators, matchers, and verifierscan be located on any host with the appropriate security at-tributes, and in particular, replicate and vacate themselves asthe load changes for maximum efficiency. All hosts belong-ing to the mint are protected by a firewall from the rest of thenetwork.

Distributors can also replicate themselves and migrate incase there is a sudden high demand for their services in somepart of the system. Each distributor must also be protectedby a firewall from the rest of the world.

Finally, stores and clients are outside the scope of sym-phony, but they communicate with certain parts of themint as described above. All communication inside themint, between the mint and the distributors, and betweenclients/stores and the verifiers must be authenticated and en-

230 FRIEDMAN ET AL.

Mint controller. . .

LastMonthKey = NewKey;NewKey = GenerateNewKey();refGenerators = Name.bind(“value-generators”);refMatchers = Name.bind(“matchers”);refVerifiers = Name.bind(“verifiers”);refDistributors = Name.bind(“distributors”);refGenerators → NewMonth(NewKey);refMatchers → NewMonth(NewKey);refVerifiers → NewMonth(NewKey);refDistributors → NewMonth(LastMonthKey);. . .

Value-generator. . .

for Current = BeginRange to EndRange doNewVal = ApplyHash(Key,Current);refMatchers → NewValue(NewVal);

od. . .

Distributor. . .

AddCurrency(refGenerators→GetCurrency);. . .

Figure 6. Sample interaction between MicroMint’s entities.

crypted. It is advisable to do the same for communicationbetween clients and stores, however, as they are outside thescope of the system, it is up to them to decide.

5.1.2. The trust modelWe define four security levels with regards to the MicroMint:The mint controllers are assigned the highest level of secu-rity, followed by the value-generators, matchers and veri-fiers. The third level is given to distributors, while clientsand stores share the lowest level of security.

The task of the mint controller can be held only by a lim-ited number of known hosts. Thus, object descriptor for amint control would specify that it can run only on one ofthese machines, and that it is willing to communicate onlywith other mint controllers, generators, matchers, and veri-fiers. This communication must be encrypted and authenti-cated.

Communication with other mint controllers is permittedonly if they are in the same LAN and belong to the prede-fined list of mint controllers. All other communication insidethe mint is permitted either if it takes place within the sameLAN, or if it passes through the firewall. In any case, allmessages must be encrypted and authenticated.

Value-generators, matchers, and verifiers can be repli-cated to any host that can be trusted by the mint controller,and are willing to communicate with the mint controller. Inaddition, each replica of a value-generator is willing to sendmessages only to matchers. A matcher is willing to com-municate microcash on demand also to distributors, and ver-ifiers are willing to receive microcash for verification onlyfrom clients and stores. Here again, all communication must

be authenticated and encrypted, and has to go through a fire-wall if it is not within the same LAN. In particular, com-munication with distributors, clients and stores need to gothrough a firewall.

Distributors, on the other hand, are only willing to requestmicrocash from matchers, and to distribute it to clients ondemand. Communication between distributors and clientshas to go through a firewall as well.

Finally, each MicroMint entity imposes the restrictionthat it is not willing to reside on the same host as entitiesthat have lower security level. Thus, if a host is occupied bya distributor, it may no longer be used by a value-generator,etc. However, generators and matchers are allowed to run onthe same host.

5.2. Exhaustive search

Exhaustive search is an application that can be distrib-uted/parallelized in a trivial way. The efforts to break theRSA data security DES/RC5 challenges, e.g., the RC5–64 and DES-II-2 projects [1], by splitting the search spaceamong thousands of hosts is a good example of using distri-bution for conducting exhaustive search.

Implementing such a scheme in Symphony is trivial. Oneonly needs to create a couple of simple applications. Thefirst application, a searcher, receives a search space and anelement to find, and then exhaustively searches it until eitherit finds the missing element, or until the whole subspace iscompletely searched. This code can be replicated to any idlemachine in the network.

The second part of the code consists of the coordina-tor, which assigns search spaces to searchers. Also, eachsearcher that either found the missing element or finished itssubspace, contacts the coordinator. If the missing elementis found, the coordinator can either send an abort mes-sage to all searchers, or stop assigning new search spaces tothem. Otherwise, each searcher that finished its subspace isassigned a new one. The coordinator can be replicated forreliability among two or three designated machines.

A single image of the searcher’s code can be registeredwith the code-replicator, which takes care of replicating ituntil it receives a command from the coordinator to stop.All replicated coordinators can be implemented using an ob-ject group, so they are kept synchronized regarding the stateof the search. Also, the application locator and naming ser-vice can be used for transparent communication between thesearchers and the coordinator. Hence, the system needs tobe started once, and from that point on, no human interfer-ence/coordination is required.

6. Related work

Amoeba is one of the first distributed operating systems [12].Amoeba is also based on a group communication system, al-though the group communication system used by Amoebawas developed as part of that project. An important differ-ence between Amoeba and Symphony is that Amoeba is a

SYMPHONY 231

full operating system, while Symphony is not. In particular,Amoeba deals with process scheduling and Symphony doesnot. This is because the two systems have different state-ment of mission. Amoeba was built right from the start to bea distributed operating system, while Symphony is intendedto allow users to run virtual servers in a connected world,which is composed of computers running existing operatingsystems. Another significant difference is the service-basedarchitecture of Symphony, which is different than the mono-lithic architecture of Amoeba.

Code images of virtual servers might be compared to mo-bile agents, as implemented in the TACOMA project [11].However, agents in TACOMA are intended to move aroundfreely in the Internet, without trusting anyone, and with-out being trusted by any hosting computer. Thus, a largepart of the effort in TACOMA deals with protecting agentsagainst malicious attacks, and protecting hosts against ma-licious agents. In Symphony, we use a trust model, whichcircumvents most of these problems, but is not suitable foragents. On the other hand, TACOMA does not handle issueslike global load balancing and migrating code based on per-formance considerations. Of course, TACOMA could serveas a platform for developing a system like Symphony, but atits current state it does not provide this functionality.

Legion [6] is another object-oriented distributed systemthat provides an illusion of a single virtual machine com-posed of wide area collection of workstations, parallel com-puters, and fast massive multiprocessor machines. Class in-terfaces in Legion can be written in either CORBA’s IDL orMDL [6]. Unlike Symphony, in which an application canchoose the set of service it wises to use, Legion requires allobjects to conform to its model. Each object should inheritfrom a predefines LegionObject and each class shouldbe also instantiated and inherited from a base class namedLegionClass. This programming model (rules) enablesthe Legion runtime system to efficiently support the mo-bility of application’s objects, security in object invocationand heterogeneity. Legion deals with issues of scalability,security, fault-tolerance, and management of a large scale(sometimes called nation-wide) meta-computer. Note that,Legion address scalability by cloning objects, but does nottake Symphony’s approach of replicated services based ongroup communication.

The Globe project [15] as well as the JiniTM architectureaim at being a wide area distributed operating system. Assuch, these systems need to present a comprehensive solu-tion to all associated problems of operating systems, includ-ing global resource management and access to remote de-vices. Our goals are somewhat more limited, as we focuson virtual servers rather than all types of applications. Also,Jini is targeting Java only applications, while we support ap-plications written in other programming languages.

Hadas [8] is a programming environment and runtime in-frastructure for composing distributed applications from in-dependently developed and widely dispersed components.Working in Hadas, component developers create (or encap-sulate pre-existing) components and publish them in the

distributed component management system. In particular,Hadas components are dynamically adaptable, allowing au-tonomous maintenance and evolution without disrupting theapplication or other components. Hadas also introduced thenotion of Ambassadors; an Ambassador is a representativeof a component that gets deployed and is attached to a re-mote component upon successful negotiation between twosites. Implemented itself as a dynamically adaptable object,it can be tailored specifically to the foreign application andto the remote environment, and evolve on-the-fly to reflectchanges that occurred after it has been deployed, both “athome” and at the hosting site.

User-level software-only distributed shared memory(DSM) systems, such as Millipede [9,10], are another ex-ample of systems that need to migrate code in a cluster ofworkstations environment. In particular, in Millipede bothcode and data can migrate between machines based on theirload and on access patterns to shared memory pages. Sym-phony complements Millipedes functionality, as Symphonyprovides services for replicating code, and for finding can-didate machines for migrating code based on their load andcommunication capabilities. We are currently planning toport Millipede to use Symphony’s services. After the portis complete, Millipede will still manage its own memorycoherency and intra process communication, but will relyon Symphony for detecting candidate machines and startingMillipede daemons there, and will use Symphony’s informa-tional services, such as load and topology, and other basicservices such as naming.

Condor [4] and Utopia/LSF [16] are systems for doingbatch scheduling in distributed environments. They focus onqueuing many independent computation tasks and distribut-ing them among a large network, whenever idle machinesare detected. Since both these systems only handle indepen-dent computation tasks, they do not need to interact with theapplications they run. Symphony could aid in the develop-ment of batch queuing systems like Condor and Utopia/LSFsince it provides the basic services required by systems. Forexample, finding appropriate machines and running a codethere is provided by Symphony’s management services.

7. Discussion

In recent years many organizations shifted from stand-alonepersonal computers that, perhaps, share a printer or a fileserver, to a networked environment, in which computers canshare and exchange information, and can cooperate in com-puting resource intensive tasks. Moreover, with the adventof the Internet, most computers worldwide are nowadays in-terconnected. While the main motivation behind this trendis to be able to share and access remote information, we pro-pose a way of generating added value from being intercon-nected.

Our system allows organizations to share their comput-ing resources at a global level, in order to maximize theirutilization. By using Symphony, an organization can reduce

232 FRIEDMAN ET AL.

its computing budget and improve its customer service at thesame time; users of Symphony need only purchase enoughcomputing power to handle their normal computing needs,and rely on being able to use remote computers when facedwith a sudden hike in demand for their services.

On the other hand, Symphony is not a distributed operat-ing system, in the sense that it does not attempt to replacelocal operating systems. Instead, Symphony complementsthe local operating system by adding remote execution capa-bilities and global load balancing. In particular, Symphonydoes not interfere with local scheduling policies, files sys-tems management, etc., and has no jurisdiction over localprocesses.

Finally, the CORBA style service-based architecture ofSymphony is important in achieving our goals of scalability,interoperability, and extensibility. Also, the combination ofCORBA and group communication technology is extremelyinstrumental in implementing a reliable scalable system.

The holy grail of distributed computing is achieving aprogramming model that is general enough for writing anykind of distributed application, hides distribution and repli-cation from programmers, yet is efficient and maintainsconsistency among object replicas. Symphony falls shortof achieving this goal; while we offer CORBA interfacesand group communication, in these programming paradigmsreplication is not always transparent, application develop-ers may choose to keep replicas inconsistent, and especiallywith CORBA, the performance in not always as good as onecould hope for. Another part of this project deals with devis-ing advanced programming models; the results of which arereported in separate papers. Also, as discussed in section 6,Symphony was developed with the notion of virtual serversin mind, and is designed to handle issues related to virtualservers. Other types of distributed applications, e.g., mo-bile agents, may have different needs that are not properlyaddressed by Symphony.

Acknowledgements

We wish to thank all other participants in this project, includ-ing Sergey Kleyman and Avraham Kenigsberg, who partic-ipated in early stages of the implementation of Symphony,Vladislav Kalinovsky, Yaniv Marks, Aviv Simionovici, andShlomo Yona, who are actively working on the system, andto Roman Vitenberg and Erez Hadad for helpful commentsand discussions.

Note

1. The Event Notification Service of CORBA can be viewed as a plausiblealternative for implementing replicated services. Felber and Guerraouiexplain in [5] why ENS is in fact unsuitable for this task.

References

[1] Distributed.net home page, http://www.distributed.net.

[2] The OMG home page, http://www.omg.org.[3] K. Birman, The process group approach to reliable distributed com-

puting, Communications of the ACM 36(12) (1993) 37–53.[4] D.H.J. Epema, M. Livny, R. van Dantzig, X. Evers and J. Pruyne,

A worldwide flock of condors: Load sharing among workstation clus-ters, Journal on Future Generations of Computer Systems 12 (1996).

[5] P. Felber, R. Guerraoui and A. Schiper, Replicating objects using thecorba event service, in: Proc. of the 6th IEEE Workshop on FutureTrends of Distributed Computing Systems (October 1997).

[6] A.S. Grimshaw and W.A. Wulf, The legion vision of a worldwidevirtual computer, Communications of the ACM 40(1) (1997).

[7] M. Hayden, The Ensemble Syste, Technical report TR98-1662, De-partment of Computer Science, Cornell University (January 1998).

[8] O. Holder and I. Ben-Shaul, A reflective model for mobile softwareobjects, in: Proc. of the 17th Int. Conf. on Distributed Computing andSystems (May 1997) pp. 339–346.

[9] A. Itzkovitz, A. Schuster and L. Wolfovich, Supporting multiple pro-gramming paradigm on top of a single virtual parallel machine, in:Proc. of 2nd Int. Workshop on High-Level Parallel ProgrammingModels and Supportive Environments (HIPS’97) (April 1997) pp. 25–34. Earlier version appeared as Technion CS Technical report LPCR#9607.

[10] A. Itzkovitz, A. Schuster and L. Wolfovich, Thread migration and itsapplications in distributed shared memory systems, The Journal ofSystems and Software (1998), to appear. Also available as TechnionCS Technical report LPCR #9603.

[11] D. Johansen, R. van Renesse and F.B. Schneider, Operating systemsupport for mobile agents, in: Proc. of the 5th Workshop on Hot Topicsin Operating Systems (May 1995).

[12] F. Kaashoek, A. Tanenbaum, S. Hummel and H. Bal, An efficientreliable broadcast protocol, Operating Systems Review 23(4) (1989)5–19.

[13] S. Landis and S. Maffeis, Building reliable distributed systems withCORBA, Theory and Practice of Object Systems (April 1997).

[14] R.L. Rivest and A. Shamir, PayWord and MicroMint: Two simplemicropayment schemes, Technical report, Computer Science Labora-tory, MIT (November 1995).

[15] M. van Steen, P. Homburg and A.S. Tanenbaum, A wide area distrib-uted system, IEEE Concurrency (January–March 1999) 70–78.

[16] S. Zhou, J. Wang, X. Zheng and P. Delisle. Utopia: A load sharingfacility for large, heterogeneous distributed computing systems func-tionality, Technical report CSRI-257, Computer Systems Research In-stitute, University of Toronto (1992).

Roy Friedman is a Senior Lecturer with the Com-puter Science Department at the Technion – Is-rael Institute of Technology. He received his B.Sc.and D.Sc. in computer science from the Technionin 1990 and 1994, respectively. His current re-search areas include reliable distributed systems,high availability and fault-tolerance, group commu-nication, soft real-time, and middlewares.E-mail: [email protected]

Eli Biham is an Associate Professor with the Com-puter Science Department at the Technion – Is-rael Institute of Technology. He received his B.A.and Ph.D. degrees in mathematics and computerscience from Tel-Aviv University and the Weiz-mann Institute of Science, respectively. His re-search interests and fields of publication includecryptography, cryptanalysis, quantum computingand quantum cryptography, and computer security.He serves as a director of the International Associ-ations for Cryptologic Research (IACR).E-mail: [email protected]

SYMPHONY 233

Ayal Itzkovitz received his B.A. and Ph.D. in com-puter science from the Technion – Israel Instituteof Technology, the latter in 1999. Recently he hasbeen a visiting faculty at New York University. Hisresearch interests include operating systems, mid-dleware and distributed systems, adaptable and se-cure computations, and distributed shared memory.Lately he has been working on novel scalable ar-chitectures for intelligent telephony networks.E-mail: [email protected]

Assaf Schuster is an Associate Professor with theComputer Science Department at the Technion –Israel Institute of Technology. He received hisB.A., M.A., and Ph.D. degrees in mathematics andcomputer science from the Hebrew University ofJerusalem, the latter one in 1991. His researchinterests and fields of publication include mem-ory hierarchies and consistency models, Java mem-ory model, distributed shared memory, parallel anddistributed computing, scalable symbolic modelchecking, global memory systems, database repli-cation and scalability. He is the head of the Dis-tributed Systems Laboratory at the Technion (dsl.cs.technion.ac.il), serves as an associate editor ofthe Journal of Parallel and Distributed Computing,and is leading several large projects in his area.E-mail: [email protected]