Decentralized approach to resource availability prediction using group availability in a P2P desktop...

Future Generation Computer Systems 28 (2012) 854–860

Contents lists available at SciVerse ScienceDirect

Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs

Decentralized approach to resource availability prediction using groupavailability in a P2P desktop gridKarthick Ramachandran ∗, Hanan Lutfiyya, Mark PerryDepartment of Computer Science, University of Western Ontario, London, Ontario, Canada

a r t i c l e i n f o

Article history:Received 15 June 2010Accepted 12 October 2010Available online 18 November 2010

Keywords:Peer-to-peer desktop gridsResource availabilityCloud computingGroup availability

a b s t r a c t

In a desktop grid model, the job (computational task) is submitted for execution in the resource onlywhen the resource is idle. There is no guarantee that the job which has started to execute in a resourcewill complete its executionwithout any disruption from user activity (such as a keyboard stroke ormousemove) if the desktop machines are used for other purposes. This problem becomes more challenging in aPeer-to-Peer (P2P)model for a desktop gridwhere there is no central server that decides to allocate a job toa particular resource. This paper describes a P2P desktop grid framework that utilizes resource availabilityprediction, using group availability data. We improve the functionality of the system by submitting thejobs on machines that have a higher probability of being available at a given time. We benchmark ourframework and provide an analysis of our results.

© 2010 Elsevier B.V. All rights reserved.

1. Introduction

Volunteer computing is a type of distributed computing inwhich computer owners donate computing resources (e.g., CPUcycles and storage) for one or more Several initiatives [1–5] involunteer computing have shown the potential computing powerachieved in exploiting the CPU cycles of thousands of machinesthat are connected to the internet.

Volunteer computing systems often use a master–worker styleof computing where tasks are distributed from a master machine(server) to worker machines (volunteers). The tasks are storedin servers and are sent to the clients for execution. The task isreferred to as a job. Worker machines register with the serverpreferences indicating when work can be done on the machine.This information is periodically updated. The server (or in somesystems a cluster of servers) maintains all information aboutprojects and volunteers. As systems grow larger this becomes abottleneck. That has led to investigating the use of peer-to-peer(P2P) architectures for volunteer computing systems (e.g., [6,7]).

Studies (e.g., [8]) suggest that since volunteer computingsystems are often less available than controlled clusters thatthe latter are preferable despite the former being more cost-effective. There are several contributing factors that make itdifficult to guarantee the availability of computing resources:(i) resources are donated but not necessarily dedicated, meaningthat resources that are available for use change dynamically over

∗ Corresponding author.E-mail address: [email protected] (K. Ramachandran).

0167-739X/$ – see front matter© 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.future.2010.10.006

time; (ii) high arrival/departure rates mean that it is difficultto provide guarantees about resource availability; (iii) since thedonated resources are not dedicated it is possible for the resource’sowner to initiate activity that may disrupt the job using theresource.

A P2P environment provides an additional challenge since noresource specific data is stored in a centralized server whichmakes it difficult to store monitored data needed for resourceavailability prediction. The primary contribution of this work is toprovide a P2P framework for resource selection that incorporatesresource availability prediction. In investigating the incorporationof resource availability, we realized that there are environmentswhere resources are not always used by one particular person(e.g., a university computer laboratory). In this caseweneed to takeinto account the usage pattern of a group of resources.

This paper is organized as follows. Section 2 presents anarchitecture and the metrics used to predict future availability.Section 3 describes the implementation details. Section 4 outlinesthe validation methods. Section 5 describes related work andSection 6 concludes with our observations and future work.

2. Architecture

This section presents the overall architecture of our system(Fig. 1). It has these components: Servers, Client software, WorkUnits and a Prediction Engine. Volunteers, i.e. desktop machines,are referred to as peers.

2.1. Servers

Although resource discovery is P2P, there are several types ofservers needed. The job server maintains a repository of the jobs

http://dx.doi.org/10.1016/j.future.2010.10.006

http://www.elsevier.com/locate/fgcs

http://www.elsevier.com/locate/fgcs

mailto:[email protected]

http://dx.doi.org/10.1016/j.future.2010.10.006

https://www.researchgate.net/publication/224561492_Cost-Benefit_Analysis_of_Cloud_Computing_versus_Desktop_Grids?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/4117891_BOINC_A_System_for_Public-Resource_Computing_and_Storage?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/4318989_A_Pure_Peer-To-Peer_Desktop_Grid_framework_with_efficient_fault_tolerance?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/3498042_Condor_-_A_Hunter_of_Idle_Workstations?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/220948632_Creating_a_Robust_Desktop_Grid_using_Peer-to-Peer_Services?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/3897302_XtremWeb_A_generic_global_computing_system?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/220380571_Entropia_Architecture_and_performance_of_an_Enterprise_desktop_Grid_system?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

K. Ramachandran et al. / Future Generation Computer Systems 28 (2012) 854–860 855

Fig. 1. Architecture of the system.

Fig. 2. Client-software components.

to be executed. A client that needs resources to run its tasks makesthe request through a job server. A job server executes the resourceselection algorithm to find one or more peers and submits the jobto selected peers. Information about each job is stored in the jobserver. A job server can be associated with an organization or witha division of the organization or some other logical units. Thusthere can be multiple job servers. The job servers do not maintaininformation about resources.

Amonitoring server is used tomonitor the jobs submitted by thejob servers. There may be multiple monitoring servers associatedwith a job server. During a job’s execution, the mobile agentassociated with that job sends the status of execution (percentagethat is completed) to the monitoring server at a frequency that ispredefined when the job is assigned to a resource. The results ofa job are sent to a result server. The inputs for the mobile agentassociated with the job includes the monitoring server and theresult server addresses.

A single centralmanagement server can be used to authenticatethe job servers in the system, so as to avoid non-authorized serverssending jobs to the resources.

2.2. Client software

Each peer has software installed on it. Instances of the softwareform a self-organizing network in which the resource discovery isfully distributed across the network.

The client software (Fig. 2) consists of the following compo-nents:

1. Communication layer. The communication layer provides theAPI for the other layers (Resource Discovery, Code Executor,Monitoring Engine) to send/receive messages with otherinstances of the client software (peers) and the servers. It is alsoresponsible for the bootstrapping of a client. Bootstrapping is

themechanism throughwhich a peer joins the P2P network [9].This involves finding a member in a network, before joining thenetwork.

2. Resource discovery. Resource discovery is done by queryingthe P2P network for a resource (machine) with particular<attribute_name, attribute_value> pairs associated with it.The query is broadcasted within the P2P network. Softwareon the matching resources respond to the peer from whichthe query originated. From the replies received, a specifiednumber of resources are selected. This number is indicated bysystem parameter: resourceLimit. Each resource is sent aAreYouIdleFor ‘n’ minutesmessage. The list of selectedpeers is input to the resource selection algorithm which isexecuted to select the appropriate peer for job execution.

3. Monitoring engine. The monitoring engine module records sys-tem parameters such as memory usage, CPU usage percentageand user activity to a datastore (lightweight flat-file database)within the peer, at a predefined frequency.

The Monitoring Engine is also used when the node isexecuting a job. When the machine is interrupted by useractivity, based on the migration policy of the job, it triggers anevent in the Code Executor (explained below), which starts themigration process.

4. Prediction engine. The data from the local datastore, populatedby the monitoring engine, are used as the dataset for learningabout the system’s behavior.

5. Code executor. The code executor module is responsible for theexecution of the job in the local machine. The code executorprovides a sandbox for the code to execute and protects thesystem frommalicious code. The code executor is implementedby executing the job in a virtual machine [10].

2.3. Work unit

A mobile agent [11] is a software module which moves fromnode to node autonomously and is executed at each node to whichit moves. A job is associated with a mobile agent. This is referredto as a work unit. The mobile agent is responsible for migration,communicationwith amonitoring server and reporting the resultsto a result server. When a machine is interrupted by user activitywhile executing the job, instead of terminating the execution, thestate is to be transferred to a newmachine at which the executionshould continue.

2.4. Prediction engine—dedicated and non-dedicated desktops

One of the goals of the volunteer computing architecture isto optimize the usage of desktop resources. These resources canbe either individually owned (desktop machines in an office) orused by a group of people (university laboratory machines). Inthe case of machines that are individually owned, each machine’susage pattern serves as the dataset to predict its availability in thefuture. However, in a setup like a university laboratory, where themachines are not assigned to a single person, the total laboratory’susage pattern could also be an important criteria for predictionalong with the individual machine’s usage pattern.

The prediction engine is designed to consider group behaviorfor predicting availability. An organization can consist of multiplegroups (e.g., the university can have multiple laboratories). Eachgroup can have a pattern of resource availability, as can eachresource in a group.

The P2P system is used to classify the network into groups(g1, g2, . . . , gn) where each group represents a collection ofresources with a common usage pattern (Fig. 3). When a clientrequests a resource it searches for a job server. A resource requestquery is sent from the job server which we will denote by Rmaster .

https://www.researchgate.net/publication/2956300_The_architecture_of_virtual_machine?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/4225332_Bootstrapping_Chord_in_ad_hoc_networks_Not_going_anywhere_for_a_while?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/228557297_J_Software_Agents?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

856 K. Ramachandran et al. / Future Generation Computer Systems 28 (2012) 854–860

Fig. 3. Dedicated versus non-dedicated desktops.

A resource request query is also sent when a machine executing ajob is interrupted and is searching for a free resource to which tomigrate the job.

All the matching resources respond to the query. The matchingresources can be fromdifferent groups. The identity of thematchedresources as well as the following values are sent to Rmaster :Resource Availability Factor (raf ), Group Availability Factor (ga)and the Total Number of Resources in a group (totgroup).Resource Availability Factor. The resource availability factor (raf )denotes the probability of an individual machine’s availability at agiven time of the day in a week. It is calculated from the historicaldata on the resource availability at that time of the day in a week.The resource availability factor is calculated every week based onthe previous week’s data. Resource Availability Factor (raf ) at agiven time of the day in a week (t) is given by the following:

raf (t) =Navailable(t)Ntotal(t)

(1)

whereNavailable is the number of samples at which the resourcewasavailable (not busy due to user activity) in the particular time of theday in a week and Ntotal is the total number of samples consideredfor that time of the day in a week.Group availability: The group availability quantifies the total groupavailability at a given time of the day in a week (t). This isrepresented by the following:

ga(t) = currentga(t) − avgb(t) (2)

where currentga(t) (Eq. (3)) denotes the number of resources thatare currently idle and avgb(t) (Eq. (4)) represents the averagenumber of resources that are busy at the given time of the dayin a week, t . Like the resource availability factor raf, the avgb(t) iscalculated weekly. The avgb(t) of a group is collected by updatingthe group’s availability data every 15 min and stored this in a localdatastore. The table has a total of 672 rows (4 quarter hours * 24 h *7 days) and has the data about the groups’ availability at a specifictime of the day in a week.

A resource canbe busywhen theuser isworking on themachine(user activity). The resource can also be busy because it is currentlyexecuting a job. When the state of a resource changes (from idle tobusy or busy to idle) the state change is broadcasted within thegroup. Thus each resource in the group at a given point of timewillhave the data on the number ofmachines which are currently busyand the ones which are currently idle. This data is used to calculatethe current group availability (currentga), by every resource, duringresource selection.

currentga = tot − (Nua + Nje) (3)

where tot denotes the total number of resources in the group, Nuarepresents the number of resources that are busy because of useractivity and Nje denotes the number of resources that are busy dueto job execution.

Every 15 min, the current Nua (number of resources that arebusy due to user activity) is determined and is used to calculateavgb. Thus avgb is based on the average calculated in the previoustime period and is the component that gives the value for theprevious week’s availability.

avgb =Nua + (old_avgb ∗ num_of _samples)

num_of _samples + 1(4)

where old_avgb represents the average value of the number of busynodes until the previous week. The value (num_of_samples) isthe number of samples considered for that time of the day.Probability of resource availability. The three values explained above(raf , ga and tot_machines) are sent to Rmaster . Rmaster applies thefollowing formula to calculate the probability of the resourceavailability.

Presource_availability = raf (t) ∗ ga(t) (5)

where raf (t) represents the resource availability factor (Eq. (1))and ga(t) gives the group availability value (Eq. (2)).

In the case of dedicated desktops, ga(t) is considered to be 1.There, the formula in the case of dedicated desktops becomes,

Presource_availability = raf (t). (6)

3. Implementation

The servers of this system (Job Server,Monitoring Server, ResultServer and Management Server) are hosted on an Apache Tomcatwebserver. Functionality is exposed as web services, implementedusing Java Axis2 web services.

The system is built over the peer-to-peer (P2P) framework ofJXTA [12]. It uses the decentralized resource discoverymechanismsof JXTA to find the peers. Mobile agents facilitate the migrationof jobs in this system. The Aglet framework [13] is used in thesystem to provide the mobile agent functionality. Two versions ofthe client are implemented, one for Windows and one for Linux.The client has been developed using Java, therefore, there is nocode change in most of the modules, except for the monitoringengine. Each client software includes the JXTA and Aglet library.This enables each client to be able to discover resources andmigrate jobs. For discovery, the prediction parameters introducedin the previous section are calculated by each client, in responseto a resource request query (The resource request query can ariseeither from the job server or from the node that wants to migrateits current running job). In the case of non-dedicated resourcesits group availability is calculated using the broadcast messagelistener and broadcaster modules.

This section describes the broadcast message listener andthe broadcaster module in detail. We conclude this section bypresenting the lifetime of the job from its creation to its completionof execution.

3.1. Broadcast message listener and broadcaster

The broadcaster and broadcast message listener are enabled fornon-dedicated desktop systems to share the availability status ofa peer within the sub-peer group. Sub-peer group is the groupingmade to exchange usage information within peers of similar usagepattern (Section 2.4). This grouping is specified by the user whileinstalling the client software. If the peer is a dedicated desktop,both the broadcast listener and broadcaster are disabled.

https://www.researchgate.net/publication/220301994_Mobile_agents_with_Java_The_Aglet_API?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==


Fig. 4. Job life cycle.

The broadcast message listener and broadcaster modulesenable the calculation of Group Availability presented in Eq. (2) asdescribed in Section 2.4. The broadcaster sends the statusmessageshaving the availability data from a peer to every other peer inthe same sub-group at regular intervals (broadcast_intervalminutes). The broadcast receiver aggregates these statusmessagesand populates group_task_avg table. The group_task_avgtable is queried to get the group availability at a given point of timein a day of the week.

3.2. Lifetime of a job

The lifetime of a job from its creation to sending the results backto the result server is graphically depicted in Fig. 4.

1. The work unit is implemented as a mobile agent and the Javaclass file of the work unit is saved in a common codebaselocation.

2. The job server web service, InsertJob, is called to insert thejob details in the job repository. The job details include the jobname and constraints (e.g., OS needed).

3. The web service, ScheduleJob, of the job server is called toschedule the job. When called, the instance of the job is createdand the search for a candidate resource is started using theresource discovery component. When a suitable candidate peeris found for job execution, the job schedule details, containingthe job name, class name, codebase location, private key, ipaddressees of job server, monitoring server, result server is sentto this peer.

4. The peer before starting the execution, calls the web service,Authenticate, to authenticate the job server.

5. Themobile agent then starts executing in the host machine andthe machine is monitored for any user activity.

6. Based on the monitoring policy defined inside the job, themobile agent sends its updates to the monitoring server.

7. When a user activity is detected, the mobile agent subsystemfinds another peer for job execution and then dispatches themobile agent to the new peer.

8. When the monitoring server does not get the expected updatesfrom the agent, it assumes that the agent is destroyed andinforms the job server that it should restart the execution of thejob.

9. Once the job is executed in the peer, the result is sent back tothe result server.

4. Validation

The behavior of the system with a large number of jobs andclients are analyzed using real-time experiments and the perfor-mance of the prediction algorithms is studied using simulations.The goal of our real-time experiments is to study the system’s be-havior in a non-trivial test environment. This environment had aninstance of management server, two instances of job server, mon-itoring sever and result server. The client software is installed on23 workstations. The real-time experimental results show a func-tioning system that is able to take a high number of requests. Spacedoes not allowus to present all of our results. The rest of the sectionfocusses on our investigation into the effectiveness of the resourceavailability prediction model which is based on simulation.

Failure Trace Archive [14] presents a centralized publicrepository of availability traces of distributed systems for theanalysis to facilitate the validation of fault-tolerant and predictionmodels. None of the traces available in the failure trace archivehas information about the grouping of resources (of non-dedicated resources), which is essential for our prediction basedon group availability. Hence we have validated our modelusing the trace we collected by monitoring the system usagefrom three different labs over a period of three months withthe grouping information. Our traces can be downloaded fromhttp://www.csd.uwo.ca/~kramach/vcc/traces.tar.bz2 and we areworking on making them available in the failure trace archive(http://fta.inria.fr).

4.1. Data collection

The approach followed for collecting dataset is similar to thatof the UCB trace [15]. The traces are collected using a windowsservice that logged whether a resource is used by a student, alongwith its CPU usage and memory usage over a 90-day period inthree different laboratories. The traces were then post-processedto represent the availability data of the hosts in a desktop gridsetting [16].

The resource availability data collected in the three laboratories(MC 342, MC 230 and MC 235) reveal the patterns graphed inFigs. 5–7 respectively. Each of these labs has 25machineswith eachmachine running Windows XP. Each graph shows the number ofmachines that are in use at a given time.

http://www.csd.uwo.ca/~kramach/vcc/traces.tar.bz2

http://fta.inria.fr

https://www.researchgate.net/publication/220285687_Resource_Availability_in_Enterprise_Desktop_Grids?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/234784854_The_Interaction_of_Parallel_and_Sequential_Workloads_on_a_Network_of_Workstations?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/38321171_The_Failure_Trace_Archive_Enabling_Comparative_Analysis_of_Failures_in_Diverse_Distributed_Systems?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==


Fig. 5. Lab 342 user availability pattern.


The following can be inferred from the collected datasets:

1. A job schedule in all these labs has a better probability ofrunning without interruption during nights rather than in days.

2. The weekend is the ideal time to schedule jobs in the systemrather than the weekdays.

3. At any point in scheduling time, if lab 342 is given preferenceover the other two laboratories, the jobs are least likely to beinterrupted. At its peak usage, only 7 machines are busy in lab342 when compared to lab 230 (20 machines) and lab 235 (15machines).

4.1.1. SimulationOver the collected dataset the following three different

scheduling algorithms are executed: random scheduling, grouppredictive scheduling and single machine predictive scheduling.

In random scheduling, a resource is selected from the set ofavailable resources. The simulation script takes num_of_jobs,duration and start_time as the command line arguments. Theparameter num_of_jobs denotes the total number of jobs thathas to be run during the simulation, duration gives the time torun a single job, start_time specifies the time of the day theexecution of the jobs should start.

In group predictive scheduling, the predictive equation(Eq. (5)) is used to predict whether the system is going to beavailable for a given point of time for execution. The simula-tion script assumes the same input parameters as that of randomscheduling. It also takes as input an array of machine identifiers,machine_id_array, that are to be considered for simulation.Before assigning jobs to a machine, the machine_id_array issorted by its probability of availability in descending order.

In single machine predictive scheduling, the resource availabil-ity factor equation (Eq. (1)) is used to predict whether the systemwill be available for a given point of time for execution. The re-source availability factor is considered as a single factor to analyzethe performance of group predictive scheduling.

From the results of the simulation shown in Table 1, thefollowing observations were made:

1. The group predictive scheduling decreases the number ofmigrations significantly if the num_of_jobs is less than thetotal number of machines in the system.

2. During peak hours, group predictive scheduling is useful inselecting resources that are less prone to be interrupted by useractivity and hence reduce the number of migrations.



3. Group predictive scheduling also makes a difference with long-duration jobs (greater than 45 min).

4. During off-peak hours and when num_of_jobs is higher thanthe number of machines, the group predictive scheduling doesnot decrease the migration by a huge factor.

5. Related work

Examples of volunteer computing systems include [1–7]. Mostof these systems are built on a client–server model except[6,7]. Kim et al. [6] proposed a decentralized scalable desktopgrid infrastructure built over Distributed Hash Tables [17]. El-Desoky et al. [7] focused on building a fault-tolerant P2P desktopgrid. Drost et al. [18] proposed a P2P supercomputing middlewaresystem that was locality aware. These decentralized approachesdid not incorporate resource availability prediction in a P2Pframework.

Selection of the appropriate resources among the availableresources is significant as it determines whether a job is going tobe executed in a client machine without interruption. Kondo et al.and Budati et al. [19,20] associated reliability ratings to clients,based on the number of jobs successfully executed in it in thepast. The selection of resources for job execution is dependenton the reliability rankings. Resource availability prediction usingmathematical models were proposed by Brevik et al. [21]. Shanget al. [22] focused on predicting the availability of the client withcertain confidence based on its past availability.

Resource Availability patterns in various environments havebeen exhaustively studied by Kondo et al., [23,16,24], Nurmiet al. [25] and Rood and Lewis [26]. Kondo et al. [24] identifygroups of internet resources across 110,000 hosts with correlatedavailability that exhibit similar time effects. We feel our workis complementary where we implement resource availability bytaking into account group availability patterns.

Cloud@Home [27] and Nebulas [28] propose the usage ofdistributed voluntary resources in cloud computing [29]. Oursystem architecture addresses many of the challenges laid outin [27] or can be extended to address the challenges. Our approachto monitor and distribute information related to availability canbe used within Cloud@Home. The work in [28] discusses the needto form Nebulas which refer to a lesser managed cloud systemconstructed using a set of distributed voluntary resources. Severalchallenges were identified including the need to handle failures,

Table 1Simulation results (R-Random, GP-Group predictive, SP-Singlemachine predictive).

Input (num_of_jobs, duration, time) Sched. type Migrations Elapsedtime

10, 30, 2008-12-01 12:00:00R 4 35 minGP 0 30 minSP 1 35 min

10, 100, 2008-12-01 12:00:00R 10 1:50 hGP 2 1:45 hSP 6 1:45 h

10, 200, 2008-12-11 10:00:00R 16 3:30 hGP 1 3:25 hSP 0 3:20 h

10, 300, 2008-12-11 10:00:00R 19 5:20 hGP 14 5:15 hSP 18 5:15 h

100, 30, 2008-12-01 12:00:00R 55 2:00 hGP 49 1:55 hSP 50 2:00 h

100, 100, 2008-12-01 12:00:00R 427 8:25 hGP 296 7:05 hSP 284 7:05 h

100, 100, 2008-12-11 10:00:00R 2 3:25 hGP 3 3:30 hSP 3 3:30 h

1000, 100, 2008-12-11 10:00:00R 42 1 day 3:55 hGP 34 1 day 3:55 hSP 33 1 day 3:55 h

one of which may include the revocation of donated resources.Our work deals with the issues of revocation with showing howavailability prediction can be incorporated into a P2P environmentthat supports distributed voluntary resources.

6. Conclusions

Previous work in resource availability prediction was done inthe context of a client–server environment. Our study focusedon the effectiveness of prediction in a P2P environment, whichto the best of our knowledge has not been done. We studiedthe process of predicting the availability in both dedicated andnon-dedicated desktops with detailed experiments run on non-dedicated desktop environment. None of the implementations wesurveyed took the non-dedicated desktop machines into account

https://www.researchgate.net/publication/234818365_Computing_in_the_clouds_NetWorker?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/221188449_Resource_Availability_Prediction_for_Improved_Grid_Scheduling?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==


https://www.researchgate.net/publication/4124109_Resource_Management_for_Rapid_Application_Turnaround_on_Enterprise_Desktop_Grids?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/228354610_Nebulas_Using_distributed_voluntary_resources_to_build_clouds?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==






https://www.researchgate.net/publication/232625075_Characterizing_and_Evaluating_Desktop_Grids_An_Empirical_Study?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/224570780_Volunteer_Computing_and_Desktop_Cloud_The_CloudHome_Paradigm?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==


https://www.researchgate.net/publication/2888765_A_Scalable_Content-Addressable_Network?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==


https://www.researchgate.net/publication/2917211_Modeling_Machine_Availability_in_Enterprise_and_Wide-Area_Distributed_Computing_Environments?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/220717500_RIDGE_Combining_Reliability_and_Performance_in_Open_Grid_Platforms?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==




https://www.researchgate.net/publication/4092924_Automatic_methods_for_predicting_machine_availability_in_desktop_Grid_and_peer-to-peer_systems?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==


https://www.researchgate.net/publication/234820202_TM-DG_a_trust_model_based_on_computer_users'_daily_behavior_for_desktop_grid_platform?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

https://www.researchgate.net/publication/29635725_On_Correlated_Availability_in_Internet-Distributed_Systems?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==




when using prediction. Our experiments show that in a setupwhere the number of jobs is fewer than the total number ofresources, the resource availability prediction reduces the numberof migrations (job interruptions) the job experiences during itsexecution. This change is more prominent when user activity ishigh. This provides evidence, which should be further studied, ofthe scalability of the proposed framework for resource availabilityprediction for both dedicated and non-dedicated desktops over aP2P network.

Future work includes improving prediction by using moreadvanced prediction and learning models based on work foundin [21] and the trust model in [22]. Prediction would also beimproved by improving the grouping of non-dedicated desktops.Currently this is decided during the setup of the infrastructureby human intervention. Future work would look into automaticclustering techniques based on similar usage patterns.

Acknowledgements

The authors would like to thank Natural Sciences and Engineer-ing Research Council of Canada and IBM Center for Advanced stud-ies for their support.

References

[1] M. Litzkow, M. Livny, M. Mutka, Condor-a hunter of idle workstations,in: 8th International Conference on Distributed Computing Systems, 1988,pp. 104–111.

[2] G. Woltman, S. Kurowski, The Great Internet Mersenne Prime Search, 2000.[3] G. Fedak, C. Germain, V. Neri, F. Cappello, Xtremweb: a generic global

computing system, in: Proceedings of the IEEE International Symposium onCluster Computing and the Grid, 2001, pp. 582–587.

[4] A. Chien, B. Calder, S. Elbert, K. Bhatia, Entropia: architecture and performanceof an enterprise desktop grid system, Journal of Parallel and DistributedComputing 63 (5) (2003) 597–610.

[5] D. Anderson, Boinc: a system for public-resource computing and storage, in:5th IEEE/ACM InternationalWorkshop on Grid Computing, 2004, pp. 365–372.

[6] J. Kim, B. Nam, M. Marsh, P. Keleher, B. Bhattacharjee, D. Richardson,D. Wellnitz, A. Sussman, Creating a robust desktop grid using peer-to-peer services, in: IEEE International on Parallel and Distributed ProcessingSymposium. IPDPS 2007, 2007, pp. 1–7.

[7] A. El-Desoky, H. Ali, A. Azab, A pure peer-to-peer desktop grid frameworkwith efficient fault tolerance, in: International Conference on ComputerEngineering & Systems. ICCES’07. 2007, pp. 346–352.

[8] D. Kondo, B. Javadi, F. Cappello, A.D. Anderson, Cost-benefit analysis of cloudcomputing versus desktop grids, in: 18th International Heterogeneity inComputing Workshop.

[9] C. Cramer, T. Fuhrmann, Bootstrapping chord in ad hoc networks: not goinganywhere for a while, in: Fourth Annual IEEE International Conference onPervasive Computing and Communications Workshops, PERCOMW’06, pp.168–172.

[10] J. Smith, R. Nair, The architecture of virtualbel machines, Computer (2005)32–38.

[11] J. White, Mobile agents, Software Agents (1997) 437–472.[12] S. Microsystems, Jxta v2. 5. x: java programmer’s guide, in:White Paper, 2007.[13] D. Lange, M. Oshima, Mobile agents with java: the Aglet API, WorldWideWeb

1 (3) (1998) 111–121.[14] D. Kondo, B. Javadi, A. Iosup, D. Epema, The failure trace archive: enabling

comparative analysis of failures in diverse distributed systems, 2009.[15] R. Arpaci, A. Dusseau, A. Vahdat, L. Liu, T. Anderson, D. Patterson, The

interaction of parallel and sequential workloads on a network of workstations,in: Proc. of the 1995 ACM SIGMETRICS Joint International Conference onMeasurement and Modeling of Computer Systems, ACM, 1995, pp. 267–278.

[16] D. Kondo, G. Fedak, F. Cappello, A. Chien, H. Casanova, Characterizing resourceavailability in enterprise desktop grids, Future Generation Computer Systems23 (7) (2007) 888–903.

[17] S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Schenker, A scalable content-addressable network, in: SIGCOMM ’01: Proceedings of the 2001 Conferenceon Applications, Technologies, Architectures, and Protocols for ComputerCommunications, ACM, New York, NY, USA, 2001, pp. 161–172.

[18] N. Drost, R. van Nieuwpoort, H. Bal, Simple locality-aware co-allocationin peer-to-peer supercomputing, in: Proceedings of the 6th IEEE/ACMInternational Symposium on Cluster Computing and the Grid, CCGrid’06,Citeseer, 2006.

[19] D. Kondo, A.A. Chien, H. Casanova, Resourcemanagement for rapid applicationturnaround on enterprise desktop grids, in: SC ’04: Proceedings of the2004 ACM/IEEE Conference on Supercomputing, IEEE Computer Society,Washington, DC, USA, 2004, p. 17.

[20] K. Budati, J. Sonnek, A. Chandra, J. Weissman, Ridge: combining reliabilityand performance in open grid platforms, in: Proc of the 16th InternationalSymposium on High Performance Distributed Computing, 2007, pp. 55–64.

[21] J. Brevik, D. Nurmi, R. Wolski, Automatic methods for predicting machineavailability in desktop grid and peer-to-peer systems, IEEE InternationalSymposium Cluster Computing and the Grid. CCGrid 2004, 2004, pp. 190–199.

[22] L. Shang, Z. Wang, X. Zhou, X. Huang, Y. Cheng, Tm-dg: a trust model basedon computer users’ daily behavior for desktop grid platform, in: Proceedingsof the 2007 Symposium on Component and Framework Technology in High-Performance and Scientific Computing, 2007 pp. 59–66.

[23] D. Kondo, M. Taufer, C. Brooks, H. Casanova, A. Chien, Characterizing andevaluating desktop grids: an empirical study, in: Proceedings of the 18thInternational Parallel and Distributed Processing Symposium. 2004.

[24] D. Kondo, A. Andrzejak, D. Anderson, On correlated availability in internet-distributed systems, in: Proceedings of the 2008 9th IEEE/ACM InternationalConference on Grid Computing-Volume 00, IEEE Computer Society, 2008,pp. 276–283.

[25] D. Nurmi, J. Brevik, R. Wolski, Modeling machine availability in enterprise andwide-area distributed computing environments, Technical report, UC SantaBarbara Computer Science Department, October 2003.

[26] B. Rood, M. Lewis, Resource availability prediction for improved gridscheduling, in: Fourth IEEE International Conference on eScienc, 2008.

[27] V.D. Cunsolo, S. Distefano, A. Puliafito, M. Scarpa, Volunteer computing anddesktop cloud: the cloud@home paradigm, in: NCA ’09: Proceeding of the2009 Eighth IEEE International Symposium on Network Computing andApplications, IEEEComputer Society,Washington, DC,USA, 2009, pp. 134–139.

[28] A. Chandra, J. Weissman, Nebulas: using distributed voluntary resources tobuild clouds.

[29] A. Weiss, Computing in the clouds, NetWorker 11 (4) (2007) 16–25.

Karthick Ramachandran is a Ph.D. Student in the De-partment of Computer Science at the University of West-ern Ontario, London, Canada. He received his Bachelorsin Engineering in Computer Science at Madras University(2004), Chennai and hisM.Sc. from the University ofWest-ern Ontario, London, Canada (2009). His research interestsinclude resource availability prediction in desktop gridsand also privacy issues in cloud computing.

Hanan Lutfiyya is a Professor in the Department ofComputer Science at the University of Western Ontario,London, Canada. Her research interests include distributedsystems management, policy-based management anddynamic resource management for the cloud.

Mark Perry is a Professor in the Department of ComputerScience and a Professor at Faculty of Law, at the Universityof Western Ontario, London, Canada. He is also AssociateDean Research, Graduate Studies and Operation at theFaculty of Law. He is interested in pursuing the nexus oflaw and Computer Science in areas such as security andprivacy.

https://www.researchgate.net/publication/234818365_Computing_in_the_clouds_NetWorker?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==













































































https://www.researchgate.net/publication/228557297_J_Software_Agents?el=1_x_8&enrichId=rgreq-84a9cad0-3229-4b6e-be89-98bece67df42&enrichSource=Y292ZXJQYWdlOzIzNjE1MzgyODtBUzoxNjcyMDQ3NjI5NTU3NzZAMTQxNjg3NjEzMDAyOA==

Decentralized approach to resource availability prediction using group availability in a P2P desktop...

Documents

Transcript of Decentralized approach to resource availability prediction using group availability in a P2P desktop...