Design and Performance Analysis of a Message Scheduling Scheme for WLAN-Based Cluster Computing

9
Design and Performance Analysis of a Message Scheduling Scheme for WLAN-Based Cluster Computing Junghoon Lee 1 , Mikyung Kang 1 , Euiyoung Kang 2 , Gyungleen Park 1 , Hanil Kim 2 , Cheolmin Kim 2 , Seongbaeg Kim 2 , and Jiman Hong 3, 1 Dept. of Computer Science and Statistics, Cheju National Univ 2 Dept. of Computer Education, Cheju National Univ 3 School of Computer Science and Engineering, Kwangwoon Univ., 690-756, Jeju Do, Republic of Korea {jhlee, mkkang, glpark, hikim, cmkim, webkey, sbkim}@cheju.ac.kr, [email protected] Abstract. This paper proposes and analyzes the performance of a task scheduling scheme that couples network schedule for master-slave style parallel computing cluster built on top of wireless local area networks. When a transmission fails, the proposed scheme selects another data and destination that can replace the subtask scheduled on the unreachable node with minimal cost, rather than hopelessly retransmits on the bad channel. Simulation results performed via ns-2 event scheduler, show that the proposed scheme minimizes the task migration, improves the compu- tation time by maximally 35.4 %, increases the probability of successful retransmission, and finally survives the network failure as long as at least one node is reachable at each instance of time. 1 Introduction Networks of standard workstations provide an attractive scalability in terms of computation power and low cost. It is becoming strongly competitive compared with expensive parallel machines[1]. Cluster as well as grid computing is a new computation idea that takes advantage of independent and different resources to build an unified resource for massive computation, distributed computation, data storing, and so on. While the fixed networks of computers constitute the lowest cost as well as the most available parallel computer, the proliferation of wireless devices such as PDA, telematics, and so on, allows to expand the parallel virtual machine. Recent advances in wireless communication technology are making WLAN (Wireless Local Area Network) an appealing transmission media for parallel and distributed computing on networked computers[2]. This research was supported by the MIC, Korea, under the ITRC support program supervised by the IITA. Corresponding author. M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3983, pp. 558–566, 2006. c Springer-Verlag Berlin Heidelberg 2006

Transcript of Design and Performance Analysis of a Message Scheduling Scheme for WLAN-Based Cluster Computing

Design and Performance Analysis of a MessageScheduling Scheme for WLAN-Based

Cluster Computing�

Junghoon Lee1, Mikyung Kang1, Euiyoung Kang2, Gyungleen Park1,Hanil Kim2, Cheolmin Kim2, Seongbaeg Kim2, and Jiman Hong3,��

1 Dept. of Computer Science and Statistics, Cheju National Univ2 Dept. of Computer Education, Cheju National Univ

3 School of Computer Science and Engineering, Kwangwoon Univ.,690-756, Jeju Do, Republic of Korea

{jhlee, mkkang, glpark, hikim, cmkim, webkey, sbkim}@cheju.ac.kr,[email protected]

Abstract. This paper proposes and analyzes the performance of a taskscheduling scheme that couples network schedule for master-slave styleparallel computing cluster built on top of wireless local area networks.When a transmission fails, the proposed scheme selects another data anddestination that can replace the subtask scheduled on the unreachablenode with minimal cost, rather than hopelessly retransmits on the badchannel. Simulation results performed via ns-2 event scheduler, show thatthe proposed scheme minimizes the task migration, improves the compu-tation time by maximally 35.4 %, increases the probability of successfulretransmission, and finally survives the network failure as long as at leastone node is reachable at each instance of time.

1 Introduction

Networks of standard workstations provide an attractive scalability in terms ofcomputation power and low cost. It is becoming strongly competitive comparedwith expensive parallel machines[1]. Cluster as well as grid computing is a newcomputation idea that takes advantage of independent and different resourcesto build an unified resource for massive computation, distributed computation,data storing, and so on. While the fixed networks of computers constitute thelowest cost as well as the most available parallel computer, the proliferationof wireless devices such as PDA, telematics, and so on, allows to expand theparallel virtual machine. Recent advances in wireless communication technologyare making WLAN (Wireless Local Area Network) an appealing transmissionmedia for parallel and distributed computing on networked computers[2].

� This research was supported by the MIC, Korea, under the ITRC support programsupervised by the IITA.

�� Corresponding author.

M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3983, pp. 558–566, 2006.c© Springer-Verlag Berlin Heidelberg 2006

Design and Performance Analysis of a Message Scheduling Scheme 559

The use of MPI (Message Passing Interface), PVM (Parallel Virtual Machine),or other variants can work in wireless environment because they are built on topof TCP/IP and therefore the physical medium does not impose any restriction[3].However, it is not clear how well this mechanism fits for wireless networks, sincewireless channels are subject to unpredictable location-dependent and bursty er-rors. A parallel application may fail altogether if the wireless connection staysdown too long. This problem fatally affects MPI parallel programs because thedefault behavior in case of network failure is the immediate termination of appli-cation. Though such a problem can be somehow relieved using a dynamic processmigration functionality of MPI or other similar mechanisms, corresponding over-head and waste of bandwidth are significant, as an intermediate processing isdiscarded and a new task is created at some other node.

To optimize the computing speed, the number of such migrations should bekept as small as possible. However, considering bursty and unpredictable natureof wireless channel errors, immediate retransmission does not seem to be ap-propriate, as the subsequent retransmissions to the original destination may failrepeatedly. To solve this problem, this paper obviates hopeless retransmissionsbased on the main idea that the undelivered message is not necessarily retrans-mitted to the current destination. Namely, when a node becomes unreachable,the master should decide a new node that can take the task originally assignedto the unreachable node. Consider a process that needs two parameters, A andB, and node 0 has A while node 1 has B, respectively. If master fails to transmitB to node 0, due to bad channel condition, it is better to try to send A to node1 rather than retransmit B to node 0.

The rest of this paper is organized as follows: Section 2 describes the back-ground of this paper, including related works on cluster computing on WLAN aswell as IEEE 802.11 WLAN standard itself. Then Section 3 proposes a messagescheduling scheme on WLAN. After demonstrating the performance measure-ment results obtained via simulation using ns-2 in Section 4, Section 5 finallyconcludes this paper with a brief summarization and the description of futureworks.

2 Background

2.1 Related Works

MPI provides a fault tolerant mechanism that can cope with dynamic changesresulted from network errors. This scheme consists of spawning slave processesone by one as new portable or fixed nodes become available and creating anindependent intercommunicator for each master and slave process. Additionally,FT-MPI is a fault-tolerant MPI implementation that includes some dynamicprocess management functionalities[4]. It lets the communicators to be in anintermediate state and they can be rebuilt so the application can recover froma fail. FT-MPI survives the crash of (n − 1) processes in a n-process job, and, ifrequired, can respawn/restart them. However, it is still the responsibility of theapplication to recover the data-structures and the data on the crashed processes.

560 J. Lee et al.

LAMGAC is at first a library that makes it easy to program a MPI parallelapplication in a LAN-WLAN cluster where the parallel virtual machine canchange during process execution[5]. LAMGAC is extended to detect temporaryor permanent disconnections of the wireless channel by implementing a newfunction named as LAMGAC Fault detection[3]. This function is invoked bythe master node whenever it needs to check if there is a physical connectionto one slave process and therefore guarantee a successful message interchangebetween them. Standard ping application enables the library to determine whichslave node is reachable and then return this information back to the server. Themaster executes the attachment and detachment protocol and master can spawnslave processes creating a new intercommunicator for each spawning.

Legion considers wireless and mobile devices as significant computation re-sources for the Grid[1]. Legion’s object architecture makes the system capableof dealing with the high degree of intermittent connectivity associated with mo-bile and wireless devices. It implemented a dynamic Resource Adaptation Layerthat will query and modify the characteristics of the system according to chang-ing user and device needs. This function supports the ability to select a minimalsubset, which is crucial for such small-memory mobile and wireless devices. How-ever, the above mechanisms are based on the end-to-end error recovery as wellas process migration.

2.2 IEEE 802.11 WLAN

The IEEE 802.11 was developed as a MAC standard for WLAN[6], and weexploited a computing architecture of LAMGAC as shown in Fig. 1. It considersmaster-slave parallel application in which for every iteration the processes mustsynchronize to interchange data. The master is in the access node (AN) andthe slaves are in the member node (MN). MN can be mobile or fixed nodeconnected via WLAN. Based on the infrastructure WLAN where each MN onlycommunicates with AN, the master distributes the task. However, even in thead hoc mode, we can designate a node to play a role of AN to schedule thetransmission. For downlink channel (AN to MN), AN schedules those packetsthat are generated at AN or arrived from another cluster, while for uplink channel(MN to AN), AN sequentially polls each node according to a specific schedule[7].

The 802.11 radio channel is modeled as a Gilbert channel[8]. In this model, pdenotes the transition probability from state good to state bad while q the prob-ability from state bad to state good[8]. The pair of p and q represents a range of

MN

MN

MN

Wired link

MN

802.11b WLAN

ANWired network

Fig. 1. WLAN-based computing architecture

Design and Performance Analysis of a Message Scheduling Scheme 561

channel conditions, and it has been typically obtained by using the trace-basedchannel estimation. The average error probability, denoted by ε, and the averagelength of a burst of errors are derived as p

p+q and 1q , respectively. The packet

is received correctly if the channel is in state good for the whole duration ofpacket transmission, otherwise, it is received in error. In addition, according toWLAN standard, poll, transmission, and acknowledgment are atomic, namely,these steps must complete in their entirety to be successful. Senders expect ac-knowledgment for each transmitted frame and are responsible for retrying thetransmission. After all, error detection and recovery is up to the sender station,as positive acknowledgments are the only indication of success. If an acknowl-edgment is expected but does not arrive, the sender considers the transmissionfailed.

3 Network Schedule for Cluster Computing

3.1 Channel Estimation

We take the estimation method from Bottiglieno’s work[9]. To trace the channelstatus, AN maintains a state machine associated with each member node. Thestate can be either good or bad according to the channel condition. The channelcondition is estimated as follows: The ACK/NAK is sent from the receiver toAN as soon as it receives a packet. If the AN does not receive an ACK/NAKwithin predefined time-out interval, the packet will be assumed to be lost. Thenthe state triggers to bad. AN sets the state to good whenever it receives from thecorresponding node, namely, a MAC-layer acknowledgment in response to a dataframe, a CTS frame in response to a RTS frame, or any other error-free frame.The AN sets the state to bad after a transmission failure. Each bad channel hasits own counter, and when a counter expires, the AN attempts to send a singledata frame to check the channel status. The duration of timer is reset to itsinitial value upon a transmission from bad to good, and the value is doubledwhenever the probing fails in bad state. The value of timer should be set smallso as to quickly recover from short channel error period. This probing procedureis carried out via currently idle link, either uplink or downlink.

3.2 Description of Basic Idea

The ultimate goal is to guarantee the successful completion of the parallel pro-gram even in the presence of wireless link failures. We use a WLAN as a un-derlying network, where master process runs on AN, while slaves on membernodes. A task is typically divided into several subtasks, each of them is assignedto other slave process, and then the partial results are integrated to produce afinal result. The master process is in charge of the dynamic data distributionto the remainder processes of the parallel program. In this procedure, transmis-sion of argument and result impacts the efficiency of parallel application. Thenode set may change dynamically in runtime as a new node can be added or anexisting node can be detached from the working group. The network schedule

562 J. Lee et al.

DoComp

Put

Put&Comp

Command Arg Subtask

Fig. 2. Command set

totally depends on AN which polls for upstream schedule and decides the trans-mission order for downstream[10]. Each node only communicates with AN andits channel is in one of two states, namely, good state or bad state.

All slaves have the program code for the given task as in the MPI program-ming model, and master node can initiate their executions by downloading rele-vant parameters via downlink channel. Slaves iteratively wait for the commandfrom the master on the downlink channel and perform the command. After thecompletion of subtask, the slave returns the result to master via uplink chan-nel. AN schedules the transmission of uplink channel by a polling mechanism.Fig. 2 shows the command set defined for the proposed computing scheme. Thecommand set consists of DoComp, Put&Comp, and Put, and each commandincludes Arg and Subtask fields. Arg can include the necessary parameter toperform a specified mission in Subtask field. DoComp is used when a node al-ready has enough parameters to compute a specific subtask, thus it only specifiessubtask without any argument. Put&Comp makes to perform a subtask and withthe parameter enclosed in the command as well as the one it already holds. Putis the command that is used when every available node has insufficient data tocomplete a subtask. Even with the parameter enclosed in this command, thereceiver does not have enough data to fulfill any subtask. Thus Subtask fieldcontains nothing. Generally, Put&Comp follows the Put command.

As the master works on AN, the network schedule is decided from the compu-tation schedule, while the computation schedule takes into account the informa-tion including whether a node is currently busy or not and whether its channelis in good state or bad state. With these data, the task scheduler on AN, decidesthe command sequence and delivery schedule as follows:

For a node in good channel status and not computing,Rule 1. If Ni has enough data to perform a subtask Sj , send to Ni {DoComp,–, Sj }. If transmission succeeds, mark Sj as InProgress and Ni as ComputingSj . If there is no candidate, go to step 2.Rule 2. If Ni can perform a subtask, Sj , which is not started, with a dataAk, send to Ni {Put&Comp, Ak, Sj}. On successful transmission, mark Sj asInProgress and Ni as computing Sj and also mark Ni has Ak. If there is nocandidate, go to step 3.Rule 3. Select the parameter that is required to proceed the subtask in Not-Started. If transmission succeeds, mark Ni has Ak.

Design and Performance Analysis of a Message Scheduling Scheme 563

In addition to such rules, task scheduler performs following management func-tions. Namely, It checks if all subtasks are completed. If so, the scheduler final-izes the job and merges the results. In addition, if the status of a node markedas executing Sj triggers to bad, save the current status and set the status asNotStarted. When the channel gets back to good status, restore the saved sta-tus, that is, the subtask it was executing. Notice that even if a node temporarilydisconnected, its computation goes on.

3.3 Example

We will show a transmission scenario generated by the proposed task schedulingscheme for the typical matrix multiplication with 3 nodes. As shown in theFig. 3(a), the problem is divided into 4 multiplications of submatrices, namely,A1·B1, A1·B2, A2·B1, and A2·B2. For simplicity, we assume that each submatrixis as large as a slot time that corresponds to the transmission of a single packet.In addition, the computation time is an integer multiple of slot time, say 4 slots.Task scheduler decides the submatrix to send at each start of slot. Notice thatthe DoComp command does not need argument, so the transmission time issmall enough to be ignored.

B1

B2

C11

C21

C12

C22A1 A2 * =

(a) Matrix partition

C11 C11 C11 C11 C12 C12 C12 C12

C22 C22 C22 C22

C21 C21 C21 C21

C21 C21

1 (0) {Put, A1, −−}

3 (1) {Put, A2, −−}4 (1) {Put&Comp, B1,C21}5 (2) {Put, B2, −−}6 (2) {Put&Comp, A1, C12}7 (0) {Put&Comp, B2, C12}8 (1) {Put&Comp, B2, C22}9 (2) {Put&Comp, A2, C22}

13 (0) {Put&Comp, A2, C21}

10 (1) {Put&Comp, B1, C21}

2 (0) {Put&Comp, B1, C11}

(b) Command sequence

Node 0

Node 1

Node 2

A1 B1

A2 B1

B2 A1

B2

B2 B1

A2

A2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

(c) Time axis of computation (downlink) Time

Fig. 3. Example of network scheduling

In Fig. 3(b), at time 1, all nodes do not have any argument, but also are notcomputing. The task scheduler chooses to send A1 to node 0 according to Rule3. At time 2, by Rule 2, the scheduler selects {Put&Comp, B1, C11} as node 0can compute C11 if it gets B1. From time 4, node 2 triggers to bad channel, sothe transmission of {Put&Comp, B1, C21} fails. The transmission to node 2 attime 6 also fails due to the same reason. At time 8, node 1 is selected to compute

564 J. Lee et al.

Node 1

Node 2

Bad

Good

Computing

Task Status

C11 Completed

C12

C21

C22

InProgress

Node 0 (C12)

Node Status

A1 B1 B2

A2

B2 NotStarted

NotStarted

Fig. 4. Snapshot of data structure at time 9.0

C22 after receiving B2, however, the transmission fails. C22 can be calculatedat node 2 with A2, so the corresponding command is sent to node 2, withouttrying to retransmit B2 to node 1 in contrast to MPI approach.

Fig. 4 shows the snapshot of data structure at time 9, where the only candidateis Node 2, as Node 0 and 1 are computing and in bad status, respectively. ByRule 2, as C22 is not yet started and Node 2 has B2, the task scheduler selectsaction as {Put&Comp, A2, C22}. Finally, at time 13, the scheduler redundantlysend A2 to node 0 to compute C21 that is also being computed in node 1. This isbecause node 1 goes to bad channel status at time 13, and thus the task status ofC21 changes from InProgress to NotStarted. If the channel recovers, the resultis safely sent to the master process, finalizing the whole computation procedure.Otherwise, newly started calculation replaces the original one.

4 Performance Analysis

The simulation is performed using ns-2 event scheduler that enables to con-struct a flexible event-driven simulation environment[11]. In this experiment,each member node is assumed to have equal computing power, as we are mainlyconcerned on the efficiency of network schedule for parallel computing in wirelesschannel. The simulation revisits the matrix multiplication for the target paral-lel application. The performance measurement compares the execution time ofour scheme with that of the traditional MPI according to the number of nodesand channel error rate, respectively. Both schemes divide the given tasks intoA1·B1, A1·B2, A2·B1, and A2·B2, each of them is performed at each node. Foreach experiment, the simulation runs 20 times and then the execution times areaveraged to achieve the final result.

Fig. 5 plots the execution time according to the number of nodes. In thisexperiment, the message error rate is set to 0.25 and the computation time is3 slots. When the number of nodes is equal to or greater than 6, the perfor-mance of parallel computing remains constant, since the computing resourcesare sufficiently available. This situation is same on the traditional scheme. Onsmall number of nodes, the effect of channel error increases, as the task schedulercannot find the appropriate action when all channels stay down or are alreadycomputing assigned task. The figure also plots the execution time of ideal casewhen all channels stay good during the entire computation process. As the com-putation time is 3 slots, the completion times of error-free case are all same when

Design and Performance Analysis of a Message Scheduling Scheme 565

there are 3 or more nodes. Finally, the gap between the proposed scheme andthe error-free case results from the overhead due to transmission failure.

Fig. 6 exhibits the execution time according to the message error rate. Theerror rate is denoted with parameter p and q, and if the ratio is 3.0, the messageerror rate is 1

1+3 = 0.25. The number of nodes is 4 while the computationtime is 3 slots. As expected, the performance gap between the proposed schemeand the traditional scheme is maximized by 35.4 % when the channel is highlyunstable. We observed that if the message error rate is less than 10%, that is, pto q ratio is more than 9.0, almost all packet transmissions succeed, so there isonly a little performance enhancement compared with the traditional computingscheme. The message error rate is the function of bit error rate and the messagelength, and the rate is chosen to be rather high to magnify the robustness ofproposed scheme. For both schemes, when the computing time is high, the effectof message loss due to disconnection is more critical, as another node should takethe subtask on the disconnected node. However, as the frequency of disconnectionis not so high, the effect of disconnection is flattened by averaging the mass ofexperiment results.

8

9

10

11

12

13

14

15

16

2 3 4 5 6 7 8

Exe

cutio

n tim

e

Number of nodes

"Proposed""NoError"

10

12

14

16

18

20

22

24

2 4 6 8 10 12 14 16 18 20

Exe

cutio

n tim

e

p/q ratio

"Proposed""Traditional"

Fig. 5. # of nodes vs. execution time Fig. 6. Error rate vs. execution time

5 Conclusion

In this paper, we have proposed and analyzed the performance of a task schedul-ing scheme that couples network schedule for master-slave style parallel comput-ing cluster built on top of wireless local area networks. When a transmission failsdue to temporary network disturbance or long time disconnection, the proposedscheme does not try to retransmit the erroneously transmitted packet to the un-reachable destination, but it selects another data and destination to perform thesubtask scheduled on the currently unreachable destination node. With this idea,we developed a heuristic method for parallel applications on the unreliable wirelessnetwork and applied to the traditional matrix multiplication problem. As the pro-posed scheme can minimize or eliminate the task migration that imposes signifi-cant overhead and also improve the probability of successful retransmission, it out-performs the traditional MPI style framework. The simulation result performedby ns-2 event scheduler shows that the proposed scheme can improve the compu-tation speed for the whole range of channel error rate and computation time. Most

566 J. Lee et al.

importantly, the experiment also demonstrates that the proposed scheme reliablyperforms the parallel application in the sense that it can survive the network fail-ure as long as at least one node is reachable at each instance of time.

For an application that needs more complex topology, the cluster-based com-munication model can be exploited[12]. As a future work, we are to apply theproposed heuristic to various applications on such wireless clusters. In addi-tion, we believe that load balancing issues should be reinforced to our messagescheduling scheme to make the WLAN cluster more practical, as the generalcluster consists of heterogeneous mobile or fixed node, that is, processors havedifferent speeds, memory resources, variable external load, and even the differentnetwork interface speed.

References

1. Clarke, B., Humphrey, M.: Beyond the Device as Portal: Meeting the Require-ments of Wireless and Mobile Devices in the Legion Grid Computing System. 2ndInternational Workshop on Parallel and Distributed Computing Issues in WirelessNetworks and Mobile Computing (2002)

2. McKnight, L., Howison, J., Bradner, S.: Wireless grids: Distributed resource shar-ing by mobile, nomadic, and fixed devices. IEEE Internet Computing (2004) 24-31

3. Macıas, E., Suarez, A.: Solving Engineering Applications with LAMGAC over MPI-2. 9th EuroPVMMPI International Conference (2002)

4. Fagg, G., Bukovsky, A., Dongarra, J.: Fault Tolerant MPI for the HARNESSMeta-Computing System. Lecture Notes in Computer Science, Vol. 2073. Springer-Verlag, Berlin Heidelberg New York (2001) 355-366

5. Macıas, E., Suarez, A.: A Mechanism to Detect Wireless Network Failures for MPIPrograms. 4th DAPSYS International Conference (2002)

6. IEEE 802.11-1999: Part 11: Wireless LAN Medium Access Control (MAC) andPhysical Layer (PHY) Specifications. also available at http://standards.ieee.org/-getieee802 (1999)

7. Choi, S., Shin, K.: A unified wireless LAN architecture for real-time and non-real-time communication services. IEEE/ACM Trans. on Networking (2000) 44-59

8. Bai, H., Atiquzzaman, M.: Error modeling schemes for fading channels in wirelesscommunications: A survey. IEEE Communications Surveys, Vol. 5, No. 2 (2003)2-9

9. Bottigliengo, M., Casetti, C., Chiaserini, C., Meo, M.: Short term fairness for TCPflows in 802.11b WLANs. Proc. IEEE INFOCOM (2004)

10. Lee, J., Kang, M., Jin, Y., Kim, H., Kim, J.: An efficient bandwidth managementscheme for a hard real-time fuzzy control system based on the wireless LAN. Lec-ture Notes in Artificial Intelligence, Vol. 3642. Springer-Verlag, Berlin HeidelbergNew York (2005) 644-659

11. Fall, K., Varadhan, K.: Ns notes and documentation. Technical Report, VINTproject, UC-Berkeley and LBNL (1997)

12. Liu, K., Li, J.: Mobile Cluster Protocol in Wireless Ad Hoc Networks. Proceedingsof International Conference on Communication Technology (2000)