Decreasing Communication Latency through Dynamic Measurement, Analysis, and Partitioning for...

12
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 60, NO. 1, JANUARY 2011 81 Decreasing Communication Latency through Dynamic Measurement, Analysis, and Partitioning for Distributed Virtual Simulations Robson Eduardo De Grande, Azzedine Boukerche, and Hussam Mohamed Soliman Ramadan Abstract—Communication in distributed simulations is highly relevant due to latencies cumulatively introduced in the execution time of such simulations; these latencies are caused by the transfer of data throughout networks. High-Level Architecture (HLA) is a well-known standard used to organize and keep the consistency of distributed virtual simulations. Nevertheless, although the HLA framework presents solutions to decrease communication over- head for simulations, it does not solve the communication issues that are pertinent to the network distances among federates and the Run-Time Infrastructure (RTI) in large-scale environments. Due to the relevance of load balancing for distributed simula- tions, many solutions have been proposed, but such approaches just consider communication load partially in their schemes or are limited. To minimize the communication overhead of HLA distributed simulations, a hierarchical dynamic balancing scheme composed of three phases was designed; however, even though the balancing scheme reacts properly to run-time load changes, its load analysis does detect all the imbalances after measuring simulations’ communication load. Thus, an extension is proposed to improve the detection of communication imbalances, as well as the redistribution of federates, so the balancing scheme can better react to communication load measurements gathered from each federate. To observe the benefits of the proposed communication balancing scheme and its extension, extensive large-scale exper- iments were realized. The results show that the scheme reduces considerably the amount of communication overhead, and the extension enables the detection of communication imbalances by modifying the decision-making technique in the balancing system. Index Terms—Distributed simulations, High-Level Architecture (HLA), load Analysis, performance. Manuscript received December 15, 2009; revised March 4, 2010; accepted June 1, 2010. Date of publication September 27, 2010; date of current version December 8, 2010. This work was supported in part by the Natural Sciences and Engineering Research Council of Canada, the Canada Research Chair program, MRI/ORF research funds, the Ontario Distinguished Research Award, and the EAR Research Award. The Associate Editor coordinating the review process for this paper was Dr. Augusto Sarti. R. E. De Grande is with the School of Information and Technology En- gineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada (e-mail: [email protected]). A. Boukerche is with the School of Information and Technology Engineer- ing, University of Ottawa, Ottawa, ON K1N 6N5, Canada, and a visiting Professor with King Saud University, Riyadh 11451, Saudi Arabia (e-mail: [email protected]). H. M. S. Ramadan is with the College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIM.2010.2065730 I. I NTRODUCTION D ISTRIBUTED virtual simulations are directly dependent on an environment’s shared resources, which require some kind of coordination to partition their load; however, the performance of such simulations are highly impacted by the management of dependencies between virtual simulations’ distributed elements. These dependencies are represented by interactions that can cause cumulative delays and overheads, which indirectly reflect on overall simulation execution time. Such overheads are even more evident when distributed virtual simulations are deployed on large-scale environments, using the HLA-based standard. In this case, the interaction latencies are increased by communication distances. These distances are produced by the deployment of simulation elements and by sim- ulation intra-interaction behavior, which can change dynami- cally and unpredictably. Based on deterministic factors, such virtual simulations can be statically partitioned and deployed on available resources; however, this partitioning is unable to reach a reasonable deployment of simulation elements when unpredictable load changes exist. Therefore, to keep or improve the execution performance of these distributed simulations, a dynamic scheme is devised to constantly measure resources’ computation load and simulations’ communication rate, ana- lyze the execution conditions, and perform the needed load modifications. The High-Level Architecture (HLA) framework [1] pro- vides a standard for designing and coordinating parallel and distributed virtual simulations. According to the HLA stan- dard, distributed simulations are composed of federates, which are interactive, independent elements delimited by the HLA specification and organized by the Run-Time Infrastructure (RTI) management services. Such services are responsible for maintaining the simulation consistency by controlling time, interaction, and data constraints. These management services also provide mechanisms to minimize communication overhead in HLA simulations, such as Data Distribution Management (DDM). DDM avoids the consumption of network resources by restricting the transmission of data that is relevant among the simulation federates. However, even with only existing relevant interactions, the DDM service cannot decrease communica- tion delays that originate from the network distances between federates and the communication latencies inherent to these distances. Moreover, since the distribution of load and the dependencies considerably affect the performance of virtual simulations, 0018-9456/$26.00 © 2010 IEEE

Transcript of Decreasing Communication Latency through Dynamic Measurement, Analysis, and Partitioning for...

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 60, NO. 1, JANUARY 2011 81

Decreasing Communication Latency throughDynamic Measurement, Analysis, and Partitioning

for Distributed Virtual SimulationsRobson Eduardo De Grande, Azzedine Boukerche, and Hussam Mohamed Soliman Ramadan

Abstract—Communication in distributed simulations is highlyrelevant due to latencies cumulatively introduced in the executiontime of such simulations; these latencies are caused by the transferof data throughout networks. High-Level Architecture (HLA) is awell-known standard used to organize and keep the consistency ofdistributed virtual simulations. Nevertheless, although the HLAframework presents solutions to decrease communication over-head for simulations, it does not solve the communication issuesthat are pertinent to the network distances among federates andthe Run-Time Infrastructure (RTI) in large-scale environments.Due to the relevance of load balancing for distributed simula-tions, many solutions have been proposed, but such approachesjust consider communication load partially in their schemes orare limited. To minimize the communication overhead of HLAdistributed simulations, a hierarchical dynamic balancing schemecomposed of three phases was designed; however, even thoughthe balancing scheme reacts properly to run-time load changes,its load analysis does detect all the imbalances after measuringsimulations’ communication load. Thus, an extension is proposedto improve the detection of communication imbalances, as well asthe redistribution of federates, so the balancing scheme can betterreact to communication load measurements gathered from eachfederate. To observe the benefits of the proposed communicationbalancing scheme and its extension, extensive large-scale exper-iments were realized. The results show that the scheme reducesconsiderably the amount of communication overhead, and theextension enables the detection of communication imbalances bymodifying the decision-making technique in the balancing system.

Index Terms—Distributed simulations, High-Level Architecture(HLA), load Analysis, performance.

Manuscript received December 15, 2009; revised March 4, 2010; acceptedJune 1, 2010. Date of publication September 27, 2010; date of current versionDecember 8, 2010. This work was supported in part by the Natural Sciences andEngineering Research Council of Canada, the Canada Research Chair program,MRI/ORF research funds, the Ontario Distinguished Research Award, and theEAR Research Award. The Associate Editor coordinating the review processfor this paper was Dr. Augusto Sarti.

R. E. De Grande is with the School of Information and Technology En-gineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada (e-mail:[email protected]).

A. Boukerche is with the School of Information and Technology Engineer-ing, University of Ottawa, Ottawa, ON K1N 6N5, Canada, and a visitingProfessor with King Saud University, Riyadh 11451, Saudi Arabia (e-mail:[email protected]).

H. M. S. Ramadan is with the College of Computer and InformationSciences, King Saud University, Riyadh 11451, Saudi Arabia (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIM.2010.2065730

I. INTRODUCTION

D ISTRIBUTED virtual simulations are directly dependenton an environment’s shared resources, which require

some kind of coordination to partition their load; however,the performance of such simulations are highly impacted bythe management of dependencies between virtual simulations’distributed elements. These dependencies are represented byinteractions that can cause cumulative delays and overheads,which indirectly reflect on overall simulation execution time.Such overheads are even more evident when distributed virtualsimulations are deployed on large-scale environments, usingthe HLA-based standard. In this case, the interaction latenciesare increased by communication distances. These distances areproduced by the deployment of simulation elements and by sim-ulation intra-interaction behavior, which can change dynami-cally and unpredictably. Based on deterministic factors, suchvirtual simulations can be statically partitioned and deployedon available resources; however, this partitioning is unable toreach a reasonable deployment of simulation elements whenunpredictable load changes exist. Therefore, to keep or improvethe execution performance of these distributed simulations, adynamic scheme is devised to constantly measure resources’computation load and simulations’ communication rate, ana-lyze the execution conditions, and perform the needed loadmodifications.

The High-Level Architecture (HLA) framework [1] pro-vides a standard for designing and coordinating parallel anddistributed virtual simulations. According to the HLA stan-dard, distributed simulations are composed of federates, whichare interactive, independent elements delimited by the HLAspecification and organized by the Run-Time Infrastructure(RTI) management services. Such services are responsible formaintaining the simulation consistency by controlling time,interaction, and data constraints. These management servicesalso provide mechanisms to minimize communication overheadin HLA simulations, such as Data Distribution Management(DDM). DDM avoids the consumption of network resources byrestricting the transmission of data that is relevant among thesimulation federates. However, even with only existing relevantinteractions, the DDM service cannot decrease communica-tion delays that originate from the network distances betweenfederates and the communication latencies inherent to thesedistances.

Moreover, since the distribution of load and the dependenciesconsiderably affect the performance of virtual simulations,

0018-9456/$26.00 © 2010 IEEE

82 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 60, NO. 1, JANUARY 2011

many schemes have been proposed to solve the performanceissues regarding computation load and communicationimbalances. Generally, these approaches mostly observe theload imbalances caused by computation factors; nevertheless,some approaches consider communication through aspects thatinvolve simulation intradependencies. These communicationaspects are usually either employed as an additional parameterto aid the decision making of computation load redistribution insome schemes or are restricted to some repartitioning policiesor applications. The presence of such restrictions impedesthe utilization of these communication balancing schemes forHLA-RTI-based simulations, in which the RTI acts as thecentral, global simulation coordinator.

To react to dynamic, irregular distribution of load, any bal-ancing system needs to be able to detect load discrepancies.This detection is enabled by constantly measuring and ana-lyzing a distributed simulation and the resources on whichsimulation elements are running. Even though this approachreorganizes HLA simulations specific only to communicationaspects, computational load is considered in the redistributionscheme. This metric is employed to avoid load imbalances andto provide awareness of resources’ processing status, which iscollected through resource management systems. A resourcemanagement system provides tools for administrating sharedresources, which include the needed monitoring of resources.The Grid system, a resource management system, is amplyused to manage distributed resources and applications, aimingto enable a coordinated, flexible, and secure resource sharingsystem [2]. Therefore, due to its characteristics that embracereliability, security, and flexibility, Globus Toolkit [3], a de factoGrid system middleware, is accessed by the proposed balancingscheme to gather data regarding monitoring measurements ofdistributed virtual simulations.

Due to the need for a balancing mechanism that considers theaspects of HLA-based virtual simulations, a scheme has beenproposed in [4]. This scheme constantly measures the load ofshared resources and a distributed simulation’s communicationrates, analyzes them, and performs the proper modifications.The proposal of this scheme aims at a more comprehensivecommunication balancing system that can minimize the net-work latencies by employing a more general redistributionanalysis. Such a balancing system considers the proximity ofresources to federates’ communication destinations. The sys-tem is designed under a hierarchical structure that decreases theoverhead caused by measuring and analyzing simulation enti-ties, facilitating the management of simulations. The balancingscheme divides the balancing process in three interdependentsequential phases: 1) monitoring of resources and federates;2) reallocation of resources for federates; and 3) execution ofmigration moves. However, even though the proposed schemeis effective in decreasing communication delays in simulations,it shows a limited detection of communicative federates. Thedetection technique employed in this scheme does not adapt toabrupt and severe communication changes. Due to the magni-tude of the balanced simulations, the collected load data samplemight hide communication rates that represent an overload offederates. Since such overload is not detected, the balancingsystem cannot modify the arrangement of federates to decrease

communication delays, and different analysis techniques arerequired to better detect communicative federates. Therefore,extensions for the detection phase of the balancing schemeare proposed in this paper, so the balancing scheme can reactproperly and achieve a more precise deployment of federates.

The remainder of this paper is organized as follows. InSection II, the related work is introduced, and the challengingissues are presented. In Section III, the proposed scheme, aswell as its structure, architecture, and balancing phases aredescribed. In Section IV, two extensions for the balancingscheme are delineated. In Section V, experiments are explainedand their results are discussed. Finally, the conclusion brieflyoutlines and describes the directions for future work.

II. RELATED WORK

Due to the complexity of providing a load balancing so-lution for optimistic and conservative distributed simulations,many balancing approaches have been designed to considerablyimprove performance. All the proposed schemes encompasscomputation load and simulation intradependencies, which re-flect on communication load. Measuring the computation loadand taking the proper actions benefit simulations consider-ably. However, such a practice is ineffective when simulationspresent communication imbalances. Consequently, measuringand analyzing simulations’ internal interactions is highly rel-evant for execution performance. Following this proposition,several approaches consider the simulation look-ahead andthe communication dependencies between simulation federates.The analysis of these metrics is performed statically, before thesimulation is run, or dynamically, during run time. However,these schemes are restricted to specific simulation applicationsand cannot be applied to any HLA-RTI-based simulations.

Inevitably, many balancing systems mainly consider eithercomputation load of shared resources or simulation speed ofa distributed simulation to redeploy distributed simulations.In a resource-centered paradigm, schemes basically focus onmeasuring the direct result from imbalances, which correspondsto the CPU utilization of virtual simulation federates or theirexecution throughput [5]–[11]. In this approach, the main crite-ria is to equalize the consumption of resources, so a simulationcan obtain the maximum benefit from them. In simulation-centered paradigms, schemes measure the simulation speed ofeach federate and compare it to the average simulation speed ofa whole simulation [12]–[15]. Such an approach focuses on theprocess that is slowing down the simulation; consequently, thisrelieves the resource where the process is running and can ben-efit the entire simulation. However, even though the measuringof computation load can evidence the need for changes, it mightnot detect imbalances caused by simulation dependencies or bydelays originated from communication latencies.

Some other balancing mechanisms assign certain relevanceto interactions between simulation entities and consequentlyattempt to eliminate or decrease the delays caused by inter-nal dependencies in distributed simulations. Such simulationinternal dependencies are evidenced by the differences in sim-ulation look-aheads and communication load. The look-aheadis a monitoring metric that can efficiently show the pace of

DE GRANDE et al.: DECREASING LATENCY THROUGH MEASUREMENT FOR DISTRIBUTED SIMULATIONS 83

Fig. 1. Dynamic Communication Balancing’s General Architecture.

a simulation entity and hence enable the discovery of a slowsimulation entity that is causing delays in other simulationelements [16], [17]. The dependencies, which are reflectedon the communication delays, are employed to determine asimulation distribution that minimizes such latencies. In thisscope, some techniques analyze such dependencies statically[18], [19] while others measure, detect, and modify simulationsdynamically [17], [20]–[26]. Among these dynamic balancingsystems, some just add the communication information to thecomputation balancing; others mix the computation load andcommunication dependencies or measure the communicationload to redistribute the simulation. However, these techniquesare simulation-dependent, limit the simulation parallelism, ordisregard the network topology of shared resources.

Therefore, all the existent approaches for repartitioning thecommunication load of distributed simulations cannot be totallyapplied to HLA-RTI-based simulations. The limitations of de-tecting and redeploying simulation entities restrict the balanc-ing schemes to specific simulation applications or disregard thenetwork topology of the resources that are used to run the simu-lations. The previous approaches are successful in reducing thecommunication overhead and the simulation delays, but theyare limited to peer-to-peer redeployment of simulation load,ignoring the proximity of shared resources for the migration ofload. A proximity analysis allows the system, not only to benefitfrom the creation of more possibilities of minimizing commu-nication latencies, but also to enable communication balancingfor simulations based on centralized RTIs. A dynamic balanc-ing scheme [4] that observes the proximity of resources hasbeen designed, and it effectively decreases the communicationlatencies in HLA-based simulations This balancing schemeuses proximity analyses to determine the rearrangement offederates according to their distance in the communicationtopology. However, such a scheme presents some limitationsin detecting discrepancies of communication load. Through thescheme’s detection mechanism, certain highly communicativefederates are not considered for redistribution due to partic-ularities of data samples collected from a large-scale virtual

simulation. Moreover, the detection of such communicativefederates is vital for enabling a highly efficient redeployment ofsimulation federates. Thus, extensions are proposed to improvethe detection mechanism of this balancing scheme, more effec-tively determine communication imbalances, and redistributeload accordingly.

III. PROPOSED BALANCING SCHEME

The main goal of the proposed balancing scheme in [4] isto employ the distribution of resources and to take advantageof their proximity to provide an application-independent solu-tion that decreases or minimizes the communication latenciesfor large-scale HLA-RTI-based distributed simulations. Due todynamic load and communication changes that might occurduring a simulation execution, the balancing scheme needsto constantly monitor a simulation and its environment. Thisperiodic measuring provides means to responsively redistributethe simulation federates according to balancing requirementsto minimize the communication latency. The scheme conse-quently is divided in three sequential balancing phases thatperiodically measure the needed parameters and react to imbal-ances. Moreover, the proposed balancing system is structuredhierarchically and employs a low-latency migration procedure.The hierarchical structure decreases the overhead and latenciescaused by the balancing system when it measures and redistrib-utes simulations’ load.

As shown in Fig. 1, the architecture of the balancing schemebasically consists of a Group Manager (GM) and several LocalManagement Agents (LMA). All these elements are placed onshared resources by following a hierarchical order, and theyare all implemented in Java, aiming at a cross-platform designand integration with the RTI. Even though the scheme performsredistribution of simulations considering only communicationaspects between federates, it is vital that the balancing systemaccesses third-party mechanisms to obtain computation load ofshared resources. This information is required in the decision-making part of the redistribution algorithm to establish a

84 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 60, NO. 1, JANUARY 2011

reconfiguration of simulation elements, so that simulations donot undergo overload caused by computation load imbalances.

The GM corresponds to the main element in the balancingscheme; it administrates a set of LMAs that are assigned to itand organizes the balancing phases: monitoring, redistribution,and migration. GMs are organized in a hierarchical structure inwhich each GM is responsible for managing sub-GMs, whichare GMs in a lower hierarchical level, used for reporting to anupper GM about its collected data sample. For the monitoring,a GM requests information from its set of LMAs and sub-GMsand accesses a third-party tool. Each LMA, as well as each sub-GM, forwards information regarding the communication be-havior of simulation federates. The third-party tool is accessedto collect information regarding the load of the shared resourcesthat a GM is responsible for managing. The GM filters andanalyzes the collected information to identify communicationdiscrepancies. After the analysis is performed, a redistributionis generated according to the load of the shared resources. Atthe end of the process, a GM emits migration calls, which areforwarded to the respective LMA.

An LMA is responsible mainly for acting as an interface be-tween a GM and some simulation federates. An LMA is placedin each shared resource that is eligible for receiving a federate.Each LMA contains a set of Communication Load Monitorsand Migration Managers. A Communication Load Monitoracts together with a federate and constantly logs the federate’scommunication. The Communication Load Monitor answersperiodical requests from its respective LMA with logs’ data. AMigration Manager, likewise a Communication Load Monitor,is responsible for managing the migration procedure of eachfederate. Triggered by a migration call forwarded by its LMA,the Migration Manager is required to support a migration. Thissupport essentially prevents simulation inconsistencies, whichare caused by lost messages or the arrival of nonordered sim-ulation events. In a migration procedure, a Migration Managerlaunches a Migration Manager remotely, suspends a federate’sexecution, restores its state, and coordinates the required dataexchanges. The Migration Manager also activates a MigrationProxy to assist the federate migration when the destinationresource of a migration call is unreachable by the Manager, soa peer-to-peer data exchange cannot be realized. The MigrationProxy acts as an intermediate migration element that has therole of forwarding data to a migrating federate.

To be suited for large-scale environments, a hierarchicalarchitecture is introduced to organize all the balancing system,decrease the overhead generated by the balancing scheme,and facilitate the management of HLA virtual simulations.The hierarchy is organized in several layers, and each GM isresponsible for coordinating a set of LMAs and a set of sub-GMs, as stated previously. Each LMA is an endpoint or a leafin the structure and basically corresponds to a resource in thedistributed system. GMs that control just LMAs are in thebottom of the hierarchy and are managed by other upper layerGMs. The role of this bottom GM is to forward monitoringinformation to an upper GM or migration calls to an LMA. Inthe intermediary layers of the structure, the GMs also forwardinformation and migration calls, but they additionally filter andmerge information. The GM located at the top of the hierarchy

is responsible for indirectly or directly managing all other GMs;this GM performs global data gathering and balancing analyses,redistributes federates accordingly, and emits migration calls.

A. Monitoring Phase

The distributed simulation is balanced periodically, and themonitoring phase is activated at the beginning of each balancingcycle. This phase is essential for enabling responsiveness tothe dynamic balancing system, and based on it, the othersubsequent phases are triggered. The monitoring is responsi-ble for measuring the communication rate of simulations andcomputation load of the environment; it also employs detectiontechniques to identify communication dependencies that delaysimulations. The computation load information is collectedthrough the access to a third-party tool, the Monitoring andDiscovery Service from Grid Services [27]. This informationis used in the decision-making part of the redistribution phase.The communication rate is gathered from each CommunicationLoad Monitor, which keeps the interactions of each federate in acommunication table. This table is located in the same resourceas the federates, and it logs every federate’s interaction. Afederate’s interaction is comprised of a destination addressand the number of messages sent and received (as well astheir size). More particularly, inasmuch as the simulations areimplemented based on an HLA-RTI-centralized architecture,every federate communicates strictly and directly with the RTI;consequently, the destination address in the log tables are allthe same. All this communication information is sent to a GMevery monitoring interval to perform filtering and selection.

In every collected data sample, selection is applied to iden-tify conclusive aspects that are used as decision factors. Suchaspects correspond to discrepancies in communication ratesof federates, exhibiting that the rearrangement of federates onresources might decrease communication latency. The selec-tion simply consists of performing comparisons between eachfederate’s communication rate and their overall communica-tion rate average. Through this selection technique, the mostcommunicative federates of a simulation in a given momentare differentiated. The communication rate average is retrievedthrough the calculation of an arithmetic mean; also, a thresholdis utilized to determine the communicative federates moreprecisely. Initially, the value obtained from the calculation ofthe standard deviation of the communication rate average isemployed as the selection threshold, which corresponds to asuperior boundary when added to the average. The communi-cation rate is an application-dependent metric, which shows abehavior that changes according to each simulation implemen-tation, so standard deviation is used to produce a parameterthat is totally based on the collected system’s communicationrates. After the selection is realized, the redistribution phase isinvoked to rearrange the federates as required.

B. Redistribution Phase

After an ordered list of communicative federates is selected,a repartitioning is performed to search for the most appropriatedestination resources for such federates. These federates are

DE GRANDE et al.: DECREASING LATENCY THROUGH MEASUREMENT FOR DISTRIBUTED SIMULATIONS 85

reallocated by evaluating them according to the redistributionprocedure described in Algorithm 1. This repartitioning of fed-erates is performed with the objective of precisely determiningmigration moves to destination resources that can benefit simu-lation performance by decreasing the communication latencies;consequently, a classification of resources is realized accordingto their network topology to match with the communication rateof each federate candidate. This classification mainly employsthe distances between resources and a specific destination,which is the RTI in this particular application case. As a result,highly communicative federates are elected to be moved to re-sources that are located closer to the RTI, so the communicationlatencies are reduced, decreasing delays in simulations.

Following the ordered list of federates and according to thelocation of each federate, the redistribution algorithm searchesfor the proper destination resource for each communicativefederate. A data structure is used by the algorithm to store allthe positions of the shared resources. This structure facilitatesthe search for a resource that is close to the target destination ofa determined federate. By observing this structure and lookingjust to the communication scope, the most appropriate resourceto transfer a federate to is the one that the federate communi-cates with the most. This resource is the optimal destinationfor the transfer of this federate because the federate’s com-munication latencies are eliminated or reduced considerably.This approach is extensively employed by the previous com-munication balancing solutions, which group together the sameresource simulation entities that have intense communicationamong them, avoiding networking delays. However, this re-duces the parallelism of distributed simulations. This approachis also considerably limited and disregards the possibility ofoverloading resources, evidencing the impossibility of applyingit to simulations based on centralized HLA RTIs. Therefore, adifferent technique is used to enable the migration of federatesto resources close to an RTI and not to the same resource wherethe RTI is running. In this case, the path distances structure isessential for determining resources nearby the RTI.

Algorithm 1 Communication Redistribution AlgorithmRequire: current_loads, spec_loads, federates_loads,

path_distancesorder(federates_loads)cmean ⇐ calc_mean(federates_loads)cstd ⇐ calc_STD(federates_loads, cmean)comm_federates ⇐find_comm(federates_loads,mean, std)for all federate IN comm_federates do

while !resource_found and path_distances(next)do

ring ⇐ get_closest_ring(path_distances,RTI)resource ⇐ least_load_resource(ring)if !overloaded(resource.load) then

migration_move.add(federate, resource)resource_found ⇐ TRUE

end ifend while

end forreturn migration_moves

The Path Distances structure organizes all the shared re-sources in distance rings according to the analysis of thecommunication topology. Each ring contains a group of re-sources that have the same distance to the RTI. The distance isdefined as the sum of the number of network hops between tworesources, which are weighed by their network capacities. Ineach distance ring in the structure, the resources are organizedaccording to their computation load, so the balancing scheme isable to identify a resource that does not harm the distribution ofcomputation load, avoiding the generation of load imbalances.Since the network topology is characteristically static and thedestination is the same for any simulation federate, the ringsof the path distances structure are generated only once-in theinitialization of the balancing scheme. However, the orderingof resources in each ring is determined by the computation loadof resources in a given moment.

Regarding the communication aspects, only the nominalnetwork capacity in each hop and number of hops are usedin the Path Distances structure to define distances, consideringthat the hop with the smallest nominal transmission capacity isused to qualify a distance. Resources are classified accordingto this static network configuration to define the destination forcommunicative simulation entities. These metrics represent anear-static state of the networking resources, but the addition ofdynamic factors would increase the complexity of the balancingsystem. Although the use of delay variations to fit federates’communication needs to network capacities presents a morerealistic solution, its feasibility can lead to considerable balanc-ing delays. By observing the network capacities and attemptingto fit federates according to the devices’ available bandwidthpromotes the problem to a higher level of complexity; anexample of this is the solution of the bin packing problem, aNP-hard problem. For instance, many factors, such as commu-nication distance, individual and global communication rate,communication flow, and others, need to be measured becausethey directly or indirectly influence delays. A deep search andanalysis caused by this complexity increase results in higherbalancing latencies that can lower the responsiveness for theredistribution mechanism. Therefore, in a matter of simplifica-tion that minimizes the balancing overhead, the delays are notconsidered in the detection and redistribution phases.

The resource with the least load and which is located at thedistance ring closest to the RTI is selected as the destinationresource candidate to receive the migrating federate. After theselection, the resource candidate needs to be analyzed notto become overloaded because of the federate migration. Inthis case, an analysis evaluates the load of the candidate byconsidering the average load of the rest of the shared resources.The resource candidate’s computation load is compared withthe overall computation load of the resources. The differencebetween the load values needs to obey the policies delimited forcomputation load distribution [28] not to harm the computationload balance, as delimited in Algorithm 2. In the case that a re-source candidate is not able to receive a federate for migration,the next candidate is searched in the distance rings; thus, thenext ring in the structure is selected, consequently identifyingthe resource with the least load. The same comparisons areconsecutively applied to determine a migration move for the

86 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 60, NO. 1, JANUARY 2011

communicative federate. The procedure of searching for aresource candidate in the distance rings continues while thedestination resource is not found or the communication latencyof a ring is not greater than the latency of the resource wherethe federate is running. The stop condition of the algorithmestablishes that any communication improvement cannot bereached since the situation of the federate cannot be improvedin a given load configuration of the distributed system.

Algorithm 2 Evaluation AlgorithmRequire: src_rsc, dst_rsc

if dst_rsc < min thenif number_fed(src_rsc) > 1 then

create_migration_move(src_rsc, dst_rsc)else if number_fed(src_rsc) = 1 & src_rsc >(min ∗ φ) then

create_migration_move(src_rsc, dst_rsc)end if

else if (dst_rsc − src_rsc) < (min ∗ δ)thenif number_fed(src_rsc) > 1 then

create_migration_move(src_rsc, dst_rsc)else if number_fed(src_rsc) = 1 & (dst_rsc −src_rsc) > (min ∗ φ) then

create_migration_move(src_rsc, dst_rsc)end if

end ifreturn migration_move

C. Migration Phase

Migration is the final step of the balancing cycle in thescheme, and after the redistribution defines the federate movesthat are necessary to be realized, the migration calls are emittedto their respective Migration Managers. The migration proce-dure can introduce substantial overhead to simulation executiontime; consequently, the responsiveness of the balancing systemis directly related to the migration latency. The reduction ofthe migration latency to a minimal time enables the systemto produce a better performance improvement. Basically, themigration procedure consists of suspending the execution of afederate in a resource and restoring it at the chosen destination;however, more complexities exist in this procedure becausesimulation inconsistencies might be caused by the migrationdue to the time and data coordination relevance for an HLAsimulation. The federate migration procedure is required totransfer the initialization files of a federate to the destinationresource, suspend the federate’s execution, retrieve its execu-tion status and the incoming messages, transfer its state andmessages to the destination, and restore the federate executionat the destination. The data transfers represent the largest la-tencies in all the migration procedure, mainly considering itfor large-scale environments. Thus, to minimize the migrationlatency, a two-phase migration mechanism is adopted in thebalancing system. This technique is similar to the techniquesproposed by Boukerche and De Grande [29] and Zengxianget al.[30]; also, it requires minimal external tools, avoids un-necessary communication and computation overhead, and doesnot introduce global synchronization in the simulation.

The first phase of the migration procedure is responsible fortransmitting the static initialization files, which are requiredfor starting up the migrating federate at the destination re-source. After these files are transmitted, the Migration Managercan be configured properly to perform the next steps in themigration process. These files are transferred using a third-party tool, GridFTP [3]. The migrating federate and the re-spective Migration Manager are launched remotely through theWeb Service Grid Resource Allocation and Management [3].Access to third-party tools introduces substantial overhead inthe migration process, but overhead is not incorporated intothe migration latency because the federate is not suspendedwhile the Migration Manager is not completely initialized atthe remote destination.

The second phase in the migration procedure correspondsbasically to suspending a federate and restoring its executionat the destination resource. Initially, the Migration Managertriggers the exchange of the communication channels in theRTI, so the migrating federate starts receiving messages insteadof the federate at the source resource. After the exchange issuccessfully accomplished, the Migration Manager sends a callfor the suspension of the federate’s execution. Then, the feder-ate saves its execution state and the messages that it receivedand did not process. The Migration Manager sends both statedata and messages to the remote resource. The execution stateis comprised of the dynamic information (variables) that rep-resents the current execution status of a federate and its LocalRuntime Controller’s state. When the state and the messages arereceived, the Migration Manager at the remote location passesthem to the migrating federate. Finally, the federate restores itsstate and takes over its execution.

IV. EXTENSION OF THE SCHEME

The proposed dynamic balancing scheme employs a non-complex analysis technique, acting like a greedy algorithm.Even though it is a simple approach, the used analysis providesa fast solution that is close to an optimal simulation redistrib-ution. With this approach, the scheme obtains a rearrangementof load in a short time. The quick balancing answer minimizesthe consumption of resources, without introducing balancingoverhead to the system that is managed; this technique alsoincreases the responsiveness of the balancing system, enablinga more frequent detection and redistribution of simulation load.Furthermore, regarding only the observed communication as-pects, the redistribution algorithm achieves an optimal solution,in which the network distances are minimized according tointeraction rates of communicative federates. Nonetheless, con-sidering computation factors in the redistribution algorithm, thebalancing problem is reduced to a suboptimal solution. Becausethese factors are considered to avoid computation overload ofthe shared resources, they interfere with the communication ob-jectives and can impede a communicative federate’s movementto a closer location.

Moreover, the communication rate is a factor that is attachedto each specific simulation implementation and consequentlyis obtained through calculations regarding only the gathereddata sample. Consequently, the analysis of this metric evidences

DE GRANDE et al.: DECREASING LATENCY THROUGH MEASUREMENT FOR DISTRIBUTED SIMULATIONS 87

Fig. 2. Experimental Results from Previous Detection Approach, Evidencing a Reactivity Limitation. (a) Increasing Number of Objects. (b) Increasing Numberof Federates. (c) Dynamic Communication Load.

certain communication particularities, which represent the con-siderable communication overhead. However, the analysis maynot identify some particularities. In such an analysis, thecalculation of a common overall high average of federatecommunication rates can hide the overhead caused by somecommunicative federates. As a result, the balancing schememight determine that the system is balanced, but the distributedsystem does not reach reasonable performance improvementeven if migrations are realized to reduce delays.

Therefore, the detection of imbalances and the redistributionof simulation load in the balancing scheme are based on a mech-anism that does not fully identify communicative federates.The mechanism employs standard deviation for calculating thethreshold used to detect federates with high communicationrates. The standard deviation tends to follow the load behaviorof an analyzed data sample, so the use of standard deviationto identify communicative federates impedes the detection ofsome communication imbalances or hides them. Such behavioris observed when the collected data sample is composed of ex-treme values. The extreme communication rates lead to a steepincrease or decrease in the calculation of the mean, augmentingthe standard deviation value. The increase of such a value,caused by the high variation of communication rate, makes thedetection and redistribution algorithms focus on the extremecases and not identify other nonextreme high communicationrates. Consequently, the use of just standard deviation to selectand determine redistribution can hamper the dynamic balancingfor such simulations.

Even though extreme values of communication rates arerare for some data samples, they can be produced frequentlyby distributed elements in certain applications, evidencing acommon data pattern. For instance, in the experiments of theprevious work [4], the simulation scenario was composed ofseveral federates that intensively published objects’ updateswhile a few federates were subscribed to these updates andreceived an overwhelming number of messages. These recipientfederates represented stations that constantly monitored thepositioning of objects and their respective firefighting situationin a given moment. In this scenario case, message recipientspresented the largest communication rates and became thebottleneck in the simulation. These communication rates led thestandard deviation to assume a large interval, which excludedother communicative federates from the redistribution analysis,as shown in Fig. 2. The figure describes the previous results

obtained using the previous redistribution approach [4] in sucha simulation scenario. In such results, the number of migrationsis constantly limited to three in every simulation case. Thesemigrations explicitly depict how the detection of imbalanceswas misled by a few communicative federates, which receivedall the object updates during the simulations and kept an ex-tremely high communication rate. Thus, the standard deviationcannot be applied to every case; other approaches are requiredto determine a general, better suited reallocation of distrib-uted elements to improve the performance of a distributedsimulation.

As mentioned previously, two parts of the balancing al-gorithm are influenced by these extreme communication ratevalues and need to be modified/extended: 1) detection of im-balances; and 2) redistribution of federates. The detection ofimbalances shows which parts of an application/simulation arecausing imbalances and the indication of load changes that areessential to achieve a better execution performance. As a conse-quence, the redistribution determines the required modificationswith the information that is provided by the detection phase.If the detection is not performed properly, the rearrangementof distributed elements is limited. A better approach to detectthe imbalances is needed, so extreme cases are considered butdo not influence the analysis of the rest of the data sample.For the extension, two approaches are proposed: one appliesiterative filtering in the detection/redistribution process and theother employs limiting conditions different from the standarddeviation.

A. Federate Filtering

In this extension, a filtering of federates is performed to spec-ify the analysis of gathered data samples. The communicativefederates that were moved to a more favorable location continueto be monitored by the balancing system. Since such federatesmight present a dynamic communication behavior, they needto be observed constantly; thus, the balancing scheme needs toanalyze all the federates and all the resources to improve thedistribution of simulation parts. However, the consideration ofthis global view in the analysis can mislead the identification ofcommunication imbalances in certain particular cases, as statedpreviously. To avoid this situation, recursive filtering is appliedon the data sample. This filtering increases the threshold’sprecision that determines the communicative federates.

88 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 60, NO. 1, JANUARY 2011

The detection threshold’s precision is influenced by the dis-tances of nodes that are considered in the calculations. A furtheranalysis that performs filtering and/or selection is employedto allow the detection of imbalances belonging to nonextremesamples. Currently in the balancing scheme, the distances ofresources to certain specific locations are employed to deter-mine redistribution of load. In the filtering, the distances areused to incrementally search the federates that cannot benefitfrom the balancing algorithm. Therefore, the recursive filteringdetermines which federates are relevant for the detection ofimbalances and the reallocation of resources. This cumulativefiltering is added to Algorithm 1 and is performed before thesearch for closer resource destinations starts.

In the detection algorithm (together with the load redistribu-tion), the distance of each federate’s resource is used to removea federate from the data sample. The federates that are consid-ered overloaded through an initial communication load analysisare selected for a distance analysis. This distance analysisobserves the proximity of such federates to their communi-cation destination. Because these federates are already in theclosest available topological position, no migration move canbe performed to decrease their communication overhead. Thepresence of such federates can hamper the detection of othercommunicative federates, which can have their communicationlatencies decreased through federate migrations. Consequently,as an extension, if a communicative federate is running on aresource that is located in the closest position to the desti-nation, the federate is discarded from the analysis. With thisprocedure, other federates which have a moderate but relevantcommunication rate are considered in this extended detectionprocess. Thus, more interactive federates can be identified inthe detection phase.

The process of identifying the communicative federates thatare closer to their communication destination is performedrepetitively until the system reaches a defined condition. Thisprocess influences the analysis of the communication balanceof the system, improving the reduction of communication over-head produced by the distributed elements. Generally, wheneverthe search algorithm identifies a communicative federate to beexcluded from the list of federates, a recalculation of mean andits standard deviation is performed on the data sample. With thisrecalculation, new overloaded federates are identified and ana-lyzed to refine the search or to produce migration moves to im-prove the simulation performance. However, this process needsto have a well-defined condition to reach a final deployment offederates. Such a condition is determined by a threshold thatis obtained by summing the average and the standard deviationcalculated after the first federate filtering. Therefore, throughthis approach, the issues caused by the extreme communicationloads are avoided; moreover, this extension of the algorithmproduces a more extensive analysis than when employing justthe standard deviation to classify federates as overloaded.

B. Extensive Federate Analysis

A second extension for the detection/redistribution approachis designed to improve the redistribution of federates by in-creasing the balancing convergence speed. In this extension, the

algorithm checks all the federates by maintaining a conditionwhich enforces certain limitations and which is thereby morecomprehensive than the one stated previously. This conditionis composed of a federate analysis and a resource analysis.The first one provides the number of federates that requiremigration to produce a better simulation performance. The sec-ond analysis determines the availability of resources to receivesuch federates. In this analysis, computation load analysis isemployed as a determinant factor to conduct this extension.As a result, the balancing system redistributes federates whileimbalances exist or migrations are still possible.

Observing the inability to detect communicative federatescaused by the use of standard deviation in the analysis, thesecond extension excludes the utilization of such a value.The limiting condition is determined by the relation betweenthe initial average and the median of the data sample. Theincorporation of the median into the analysis provides a gen-eral description of the communication rate distribution in thecollected data sample. This new analysis shows a generalview of the amount of communicative federates present in asimulation. The analysis disregards the federate’s communi-cation rate, preventing miscalculations. Therefore, the limitingcondition for the communication balancing analysis is denotedby the smallest amount between the two statistical values,min(average,median). Through this approach, the issuescaused by the extreme communication loads are totally avoided.

With this technique, communication balancing evaluates amore ample set of federates, which also includes those witha relatively moderate communication rate. Consequently, thisextended algorithm produces a more extensive analysis thanwhen employing just the standard deviation to classify feder-ates as overloaded. The use of a low-value threshold inducesa selection of a larger number of federate candidates to bechecked as communicative. Generally, the threshold determinesthe analysis of at least half of the data sample to determinemigration moves properly. Through this extensive evaluation,the detection step certifies that communicative federates in cer-tain cases are not excluded from the redistribution procedure.As a consequence of a more extensive analysis, the approachintroduces additional overhead to the balance processing. How-ever, this additional overhead, which is larger than the previousextension approach and the original detection algorithm, is stillnoticeably less than the overhead caused by the evaluation ofan entire distributed simulation. In this extension, the numberof evaluations is limited to half of the simulation federatesif resources are available. In a large-scale overloaded system,this processing reduction becomes crucial to achieve reasonablemeasuring and analysis efficiency.

Allying both federate and resources analysis parameters inthe balancing scheme, this extension generalizes the procedureof detection and redistribution by reallocating federates to avail-able nonoverloaded resources. Basically, the extension of theapproach attempts to assign all the available resources to simu-lation federates to decrease the distance between them and theircommunication destination. These two parameters, the numberof communicative federates and the number of available re-sources, condition the balancing. The detection/redistributionalgorithm finishes when one of these parameters reaches zero.

DE GRANDE et al.: DECREASING LATENCY THROUGH MEASUREMENT FOR DISTRIBUTED SIMULATIONS 89

In this case, either all the resources in a better position are usedfor redistributing the load, or all active federates are reallocatedto a closer position. Therefore, this extension makes the pro-cedure of detection and redistribution more general through acyclic reallocation of shared resources to simulation federates.

Like the previous communication redistribution algorithms,the list of federate candidates is analyzed sequentially, gener-ating migration moves for the most communicative federatesfirst. Communicative federates are assigned to a closer resourceand eliminated from the list. This process is realized whilecloser resources are available. The algorithm accesses the PathDistance structure to retrieve information from the availableresources and updates this structure when a migration assign-ment is determined. For every federate, the algorithm needsto perform a search in a sequential pattern in the structure’srings, always starting the search from the beginning of the PathDistance that contains the list of resources. In this particularcase, the beginning of this structure is the ring closest tothe RTI.

The resource search in this extension is formed in this man-ner because federates are organized by their communicationrate and are definitely assigned to a destination resource afteranalyses of computation load are performed on resources in therings. Since the computation load of resources is used in thedecision-making part of the redistribution and is not employedto organize the list of federates, resources can be assigned toa lager number of communicative federates. Thus, with thisapproach, even if there is no considerable communicative feder-ates in the data sample, federates are still assigned to resources.The technique attempts to move any federate that presentsa higher communication rate to decrease the communicationdelays and to balance distributed simulation.

V. EXPERIMENTS

To identify the efficiency improvement of the proposed ex-tension to the dynamic balancing scheme, three experimentshave been realized. These experiments are similar to those ac-complished in [4]. The experiments showed that the extensioncould decrease the communication latencies, improving thesimulation’s performance even more. Such experiments weresupported by a large-scale distributed environment composedof two computing clusters connected by a fast-Ethernet networklink. One cluster, a Dell cluster, consists of 24 nodes, while theother one, an IBM cluster, is composed of 32 nodes. In the Dellcluster, each node comprised a Quadicore 2.40GHz Intel(R)Xeon(R) CPU and 8 Gb of DIMM DDR RAM memory. Allthe Dell nodes were interconnected through a Myrinet opticalnetwork that allowed data transmission up to 2 Gb/s. In theIBM cluster, each node consisted of a Core 2 Duo 3.4 GHzIntel(R) Xeon(R) CPU and 2 Gb of DIMM DDR RAM. Agigabit Ethernet network connected the IBM cluster’s nodes.The fast-Ethernet link connecting both clusters cannot presentall the aspects originated from the Internet; nevertheless, thelink is introduced to produce a communication bottleneck forsimulations, avoiding unexpected external influences and ev-idencing the need for balancing. Moreover, Linux operatingsystems have been installed in both clusters. To support all

the experimental environments, the HLA platform with an RTIversion 1.3 and the Globus Toolkit 4.2.1 have been used in thesimulations.

In the experimental virtual simulations, the simulation fed-erates were evenly placed on the 55 shared nodes, and theHLA RTI executive was deployed on a dedicated node. For thebalancing system, the LMAs were placed on all of the clusternodes except the ones that were designated to the RTI and theGMs. There were two GMs, and each one was placed in onenode of each cluster, considering that one GM was the root.Furthermore, in the simulation scenario, federates simulated anemergency preparedness scenario of a firefighting situation. Inthe simulation, federates coordinated the actions to update theinformation of objects that represented firefighters, fire focuses,and buildings in a 2-D routing space in time-stepped virtualsimulations. Such federates updated the information of theirobjects that were published and subscribed to interest spaces.The simulations were composed of 500 federates that managed1 to 1000 objects in 100 time steps. To produce controlled,differentiated communication latencies in the simulations, cer-tain federates performed the publication of special objects thatgenerated a large communication overhead.

The first experiment measured the performance gain that thebalancing system provided to an HLA virtual simulation. Inthis experiment, the dynamic balancing scheme that was pro-posed previously was compared with the two aforementionedextensions. As depicted in the graph in Fig. 3, an increasingcommunication overhead was applied in a small set of simula-tion federates-a federate that publishes special object updates,and three other federates that subscribe to them. The increaseof communication load reflected in a proportional delay thesimulations’ execution time. A noticeable performance gain canbe observed in the graph when comparing the baseline, whichis not balanced, with the other balanced cases. When observingjust the curves of the three balancing schemes, the normalbalancing system shows a slightly smaller performance gain inFig. 3(a), but when comparing the number of migrations amongthe schemes, the extension that performs extensive federateanalysis presents a high number of migrations. By crossingthe information between the graphs in the figure, the resultsevidence that the extensions are not more efficient than thenormal balancing scheme for simulations with a few commu-nicative federates, though performing more modifications in thesimulation distribution.

In the second experiment, the responsiveness of all the bal-ancing schemes is observed and compared when an increasingcommunication overhead is introduced to the overall distributedsimulation. In the experiment, 100 special objects were as-signed among 40 federates. The curves in the graph of Fig. 4(a)show that all the schemes reduced the communication overheadconsiderably by detecting latencies and reacting properly tothem. Nevertheless, the curve that corresponds to normal bal-ancing presented a performance decrease when compared withthe curves of the extensions. When observing the number ofmigrations in Fig. 4(b), the performance difference originatesfrom the decrease in the number of migrations that occurredsimultaneously with the increase of execution time in Fig. 4(a).Such behavior occurred due to the inability of the normal

90 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 60, NO. 1, JANUARY 2011

Fig. 3. Communication Balancing Gain for an Increasing Communication Load. (a) Performance Comparison. (b) Comparison of Number of Migrations.

Fig. 4. Communication Balancing with an Increasing Number of Communicative Federates. (a) Performance Comparison. (b) Comparison of Number ofMigrations.

balancing to react properly when extreme communication ratesare present in the collected data sample. This limitation does notexist in both extensions because of their more detailed analysisand extra filtering.

Finally, the third experiment measured the capacity of thebalancing systems to detect and react to dynamic changes ofcommunication load. This experiment is similar to the previousone, but its communication load changed dynamically instead.The federates that were selected to present communicationoverhead changed their load randomly during the executiontime of a simulation, producing a unpredictable behavior. Inthis experiment, 1 to 60 federates in the simulations produceda random amount of object updates that ranged from 1 to 100.As shown in Fig. 5(a), the balancing schemes presented a largeperformance gain when compared to the baseline, and all theircurves showed similar performance improvement; however,by observing the curves in more detail, the normal balancingscheme shows a performance decrease between 30 and 50 fed-erates when compared with only the extensions. Such behavioris caused by the overall communication overhead in the systemthat was not detected by the normal balancing scheme, whichemploys standard deviation to detect imbalances. The samebehavior is observed in Fig. 5(b) when the curve of the normal

balancing scheme is compared with the curve of the balancingwith extra resource filtering. This particular behavior is causedby the communication load produced in the experiments. Withload less than 30 federates, the communication rates were notas high as needed to mislead the normal balancing. Between30 and 50 federates, the communication load was enough toproduce a few extreme communication rates, which influencedthe normal balancing but not the extensions. With a load ofmore than 50 federates, the larger number of communicativefederates (less load variations in the data sample) caused thenormal balancing to have a narrower standard deviation, in-ducing a better detection of communication imbalances. Thus,through this experiment, it was noticed that the extensionsproved to be more effective than the old balancing scheme insome cases.

VI. CONCLUSION

In this paper, a dynamic balancing scheme for decreas-ing communication latencies from HLA-RTI-based virtualsimulations was analyzed and compared with two proposedextensions. Based on a multilayered hierarchical structure, thescheme is divided in three interdependent sequential phases;

DE GRANDE et al.: DECREASING LATENCY THROUGH MEASUREMENT FOR DISTRIBUTED SIMULATIONS 91

Fig. 5. Communication Balancing with Dynamic Communication Load. (a) Performance Comparison. (b) Comparison of Number of Migrations.

these are to detect communication imbalances, redistributesimulation federates according to communication latencies, andtransfer federates between shared resources. These three phasesbasically consist of constantly measuring the communicationrate of distributed virtual simulations and analyzing load distri-bution to determine corrections for imbalances. Moreover, Gridservices are accessed by the balancing system to add reliabilityand security to the system when measuring the computationload of shared resources for redistribution decision making andperforming data transfers for migrations.

The experiments proved the efficiency of the balanc-ing scheme in detecting and reacting properly to dynamiccommunication rate changes. Both extensions improved thebalancing scheme, enabling it to more effectively detect com-munication load changes and to produce a rearrangement offederates that was able to decrease the simulation executiontime. Even though the gains that the extensions providedwere not substantially large when compared to the normalbalancing scheme, the extensions were more efficient in bal-ancing the simulations in specific situations. These specificsituations caused the detection step of the normal balanc-ing to fail. Moreover, a particular case was observed in theexperiments when comparing results of both extensions; theextensive analysis showed the same/similar performance gainin all experiments, but the number of migrations that it pro-duced to balance simulations was considerably higher thanother schemes. This particular situation happened due to thetradeoff between migration latency and responsiveness. Theextension performed many migrations, but the migration laten-cies consumed the performance gains. Thus, as future work,aiming at exploring the tradeoff between migration latenciesand performance gain, further experimental analysis will berealized to produce a balancing scheme that can produce abetter balance with less migrations. Another aspect to be ex-plored is the addition of a computation load balancing systemto the scheme, so the system will be able to react to dynamicsimulation load changes. To provide a more realistic balancingscheme for large-scale simulations, communication delays willbe analyzed by considering balancing overhead and perform-ance gain.

REFERENCES

[1] S.I.S.C. (SISC), IEEE standard for modeling and simulation (M&S) highlevel architecture (HLA) framework and rules, IEEE Computer Society,Sep. 2000.

[2] I. Foster, C. Kesselman, and S. Tuecke, “The anatomy of the grid:Enabling scalable virtual organizations,” Int. J. High Perform. Comput.Appl., vol. 15, no. 3, pp. 200–222, 2001.

[3] Globus, Feb. 7, 2008. [Online]. Available: http://www.globus.org/[4] R. E. De Grande and A. Boukerche, “Dynamic partitioning of distributed

virtual simulations for reducing communication load,” in Proc. IEEE Int.Workshop HAVE, 2009, pp. 176–181.

[5] L. F. Wilson and W. Shen, “Experiments in load migration and dy-namic load balancing in SPEEDES,” in Proc. Winter Simul. Conf., 1998,pp. 483–490.

[6] E. Deelman and B. K. Szymanski, “Dynamic load balancing in paralleldiscrete event simulation for spatially explicit problems,” in Proc. Work-shop Parallel Distrib. Simul., 1998, pp. 46–53.

[7] A. Boukerche and S. K. Das, “Dynamic load balancing strategies forconservative parallel simulations,” in Proc. Workshop Parallel Distrib.Simul., 1997, pp. 32–37.

[8] J. Luthi and S. Grossmann, “The resource sharing system: Dynamic fed-erate mapping for HLA-based distributed simulation,” in Proc. WorkshopParallel Distrib. Simul., 2001, pp. 91–98.

[9] W. Cai, S. J. Turner, and H. Zhao, “A load management system for runningHLA-based distributed simulations over the grid,” in Proc. Int. WorkshopDistrib. Simul. Real-Time Appl., 2002, pp. 7–14.

[10] K. Zajac, M. Bubak, M. Malawski, and P. Sloot, “Towards a grid manage-ment system for HLA-based interactive simulations,” in Proc. Int. Symp.Distrib. Simul. Real-Time Appl., 2003, pp. 4–11.

[11] H. Avril and C. Tropper, “The dynamic load balancing of clustered timewarp for logic simulation,” in Proc. Workshop Parallel Distrib. Simul.,1996, pp. 20–27.

[12] D. W. Glazer and C. Tropper, “On process migration and load balancing intime warp,” IEEE Trans. Parallel Distrib. Syst., vol. 4, no. 3, pp. 318–327,Mar. 1993.

[13] C. Burdorf and J. Marti, “Load balancing strategies for time warpon multi-user workstations,” Comput. J., vol. 36, no. 2, pp. 168–176,1993.

[14] C. D. Carothers and R. M. Fujimoto, “Background execution oftime warp programs,” in Proc. Workshop Parallel Distrib. Simul., 1996,pp. 12–19.

[15] G. Tan and K. C. Lim, “Load distribution services in HLA,” in Proc. IEEEDistrib. Simul. Real-Time Appl., 2004, pp. 133–141.

[16] B. P. Gan, Y. H. Low, S. Jain, S. J. Turner, W. Cai, W. J. Hsu, andS. Y. Huang, “Load balancing for conservative simulation on shared mem-ory multiprocessor systems,” in Proc. Workshop Parallel Distrib. Simul.,2000, pp. 139–146.

[17] M. Y. H. Low, “Dynamic load-balancing for BSP time warp,” in Proc.Annu. Simul. Symp., 2002, pp. 267–274.

[18] R. Schlagenhaft, M. Ruhwandl, C. Sporrer, and H. Bauer, “Dynamic loadbalancing of a multi-cluster simulator on a network of workstations,” inProc. Workshop Parallel Distrib. Simul., 1995, pp. 175–180.

92 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 60, NO. 1, JANUARY 2011

[19] M. Choe and C. Tropper, “On learning algorithms and balancingloads in time warp,” in Proc. Workshop Parallel Distrib. Simul., 1999,pp. 101–108.

[20] J. Jiang, R. Anane, and G. Theodoropoulos, “Load balancing in distributedsimulations on the grid,” in Proc. Int. Conf. Syst., Man, Cybern., 2004,pp. 3232–3238.

[21] P. Peschlow, H. Honecker, and P. Martini, “A flexible dynamic partition-ing algorithm for optimistic distributed simulation,” in Proc. WorkshopParallel Distrib. Simul., 2007, pp. 219–228.

[22] Z. Xiao, B. Unger, R. Simmonds, and J. Cleary, “Scheduling critical chan-nels in conservative parallel discrete event simulation,” in Proc. WorkshopParallel Distrib. Simul., 1999, pp. 20–28.

[23] A. Boukerche and C. Tropper, “A static partitioning and mapping algo-rithm for conservative parallel simulations,” in Proc. Workshop ParallelDistrib. Simul., 1994, pp. 164–172.

[24] A. Boukerche, “An adaptive partitioning algorithm for conservative par-allel simulation,” in Proc. Int. Parallel Distrib. Process. Symp., 2001,pp. 133–138.

[25] E. E. Ajaltouni, A. Boukerche, and M. Zhang, “An efficient dynamic loadbalancing scheme for distributed simulations on a grid infrastructure,” inProc. Int. Symp. Distrib. Simul. Real-Time Appl., 2008, pp. 61–68.

[26] L. Bononi, M. Bracuto, G. D’Angelo, and L. Donatiello, “An adaptiveload balancing middleware for distributed simulation,” in Proc. WOMP,2006, pp. 864–872.

[27] Gt Information Services: Monitoring & Discovery System (MDS), Feb. 7,2008. [Online]. Available: http://www.globus.org/toolkit/mds

[28] A. Boukerche and R. E. De Grande, “Dynamic load balancing using gridservices for hla-based simulations on large-scale distributed systems,” inProc. Int. Symp. Distrib. Simul. Real-Time Appl., 2009, pp. 175–183.

[29] A. Boukerche and R. E. De Grande, “Optimized federate migration forlarge-scale hla-based simulations,” in Proc. Int. Symp. Distrib. Simul.Real-Time Appl., 2008, pp. 227–235.

[30] Z. Li, W. Cai, S. J. Turner, and K. Pan, “Federate migration in a serviceoriented hla rti,” in Proc. Int. Symp. Distrib. Simul., Real-Time Appl.,2007, pp. 113–121.

Robson Eduardo De Grande received the B.Sc. andM.Sc. degrees in computer science from the FederalUniversity of Sao Carlos, São Carlos, SP, Brazil, in2004 and 2006, respectively. He is currently workingtoward the Ph.D. degree in the PARADISE ResearchLaboratory, University of Ottawa (uOttawa), Ottawa,ON, Canada.

His research interests include load balancing, dis-tributed systems, large-scale distributed simulations,and performance evaluation.

Azzedine Boukerche is a full Professor and holdsa Canada Research Chair position at the Univer-sity of Ottawa (uOttawa), Ottawa, ON, Canada, andhe is a visiting Professor at King Saud University,Riyadh, Saudi Arabia. He is a Fellow of the CanadianAcademy of Engineering and the Founding Directorof the PARADISE Research Laboratory, School ofInformation Technology and Engineering (SITE),Ottawa. Prior to this, he held a faculty position at theUniversity of North Texas, Denton, TX, and he was aSenior Scientist at the Simulation Sciences Division,

Metron Corporation, San Diego, CA. He was also employed as a FacultyMember in the School of Computer Science, McGill University, Montreal, QC,Canada, and taught at the Polytechnic of Montreal, Montreal, QC, Canada. Hespent a year at the JPL/NASA, California Institute of Technology, Pasadena,CA, where he contributed to a project centered about the specification andverification of the software used to control interplanetary spacecraft operated byJPL/NASA Laboratory. He has published several research papers in these areas.His current research interests include wireless ad hoc and sensor networks,wireless networks, mobile and pervasive computing, wireless multimedia,QoS service provisioning, performance evaluation and modeling of large-scaledistributed systems, distributed computing, large-scale distributed interactivesimulation, and parallel discrete-event simulation.

He served as a Guest Editor for the Journal of Parallel and Distributed Com-puting (special issue for routing for mobile ad hoc, special issue for wirelesscommunication and mobile computing, and special issue for mobile ad hocnetworking and computing), ACM/Kluwer Wireless Networks, ACM/KluwerMobile Networks Applications, and the Journal of Wireless CommunicationandMobile Computing. He served as an Associate Editor of IEEE TRANSACTIONS

ON PARALLEL AND DISTRIBUTED SYSTEMS, IEEE TRANSACTIONS ON

VEHICULAR TECHNOLOGY, Elsevier Ad Hoc Networks, Wiley InternationalJournal of Wireless Communication and Mobile Computing, Wiley’s Securityand Communication Network Journal, Elsevier Pervasive and Mobile Com-puting Journal, IEEE WIRELESS COMMUNICATION MAGAZINE, Elsevier’sJournal of Parallel and Distributed Computing, and SCS Transactions onSimulation. He is the Cofounder of the QShine International Conferenceon Quality of Service for Wireless/Wired Heterogeneous Networks (QShine2004). He served as the General Chair for the Eighth ACM/IEEE Symposiumon Modeling, Analysis, and Simulation of Wireless and Mobile Systems, andthe Ninth ACM/IEEE Symposium on Distributed Simulation and Real-TimeApplication (DS-RT), as General Chair for the IEEE WOWMOM 2010, theProgram Chair for the ACM Symposium on QoS and Security for Wireless andMobile Networks, ACM/IFIPS Europar 2002 Conference, IEEE/SCS AnnualSimulation Symposium (ANNS 2002), ACM WWW 2002, IEEE MWCN2002, IEEE/ACM MASCOTS 2002, IEEE Wireless Local Networks WLN03-04; IEEE WMAN 04-05, and ACM MSWiM 98-99, and a TPC memberof numerous IEEE- and ACM-sponsored conferences. He served as the ViceGeneral Chair for the Third IEEE Distributed Computing for Sensor Networks(DCOSS) Conference in 2007, as the Program Cochair for GLOBECOM2007–2008 Symposium on Wireless Ad Hoc and Sensor Networks, and forthe 14th IEEE ISCC 2009 Symposium on Computer and CommmunicationSymposium, and as the Finance Chair for ACM Multimedia 2008. He alsoserves as a Steering Committee Chair for the ACM Modeling, Analysis, andSimulation for Wireless and Mobile Systems Conference, the ACM Symposiumon Performance Evaluation of Wireless Ad Hoc, Sensor, and UbiquitousNetworks, and IEEE/ACM DS-RT. He was the Recipient of the Best ResearchPaper Award at IEEE/ACM PADS 1997, ACM MobiWac 2006, ICC 2008, ICC2009, and IWCMC 2009, and the Recipient of the Third National Award forTelecommunication Software in 1999 for his work on a distributed securitysystems on mobile phone operations. He has been nominated for the BestPaper Award at the IEEE/ACM PADS 1999 and ACM MSWiM 2001. He isa Recipient of an Ontario Early Research Excellence Award (previously knownas Premier of Ontario Research Excellence Award), Ontario DistinguishedResearcher Award, and Glinski Research Excellence Award.

Hussam Mohamed Soliman Ramadan received thePh.D. degree in computer science and engineeringfrom the University of Louisville, Luisville, KY, in1995, and the B.S. degree in electrical engineeringfrom George Washington University, Washington,DC, in 1988.

He is a Professor in the Department of Informa-tion Systems, College of Computer and InformationSciences at King Saud University, Riyadh, SaudiArabia. He is currently the Dean of the Collegeof Computer and Information Sciences, King Saud

University. His research interest is in the areas of parallel and distributedsimulation, distributed computing, modeling and simulation methodologies,and wireless networks.