An Approach to Asynchronous Object-Oriented Parallel and Distributed Computing on Wide-Area Systems

Lecture Notes in Computer Science 1800Edited by G. Goos, J. Hartmanis and J. van Leeuwen

3BerlinHeidelbergNew YorkBarcelonaHong KongLondonMilanParisSingaporeTokyo

Jose Rolim et al. (Eds.)

Parallel andDistributed Processing

15 IPDPS 2000 WorkshopsCancun, Mexico, May 1-5, 2000Proceedings

1 3

Series Editors

Gerhard Goos, Karlsruhe University, GermanyJuris Hartmanis, Cornell University, NY, USAJan van Leeuwen, Utrecht University, The Netherlands

Managing Volume Editor

Jose RolimUniversite de Geneve, Centre Universitaire d’Informatique24, rue General Dufour, CH-1211 Geneve 4, SwitzerlandE-mail: [email protected]

Cataloging-in-Publication Data applied for

Die Deutsche Bibliothek - CIP-Einheitsaufnahme

Parallel and distributed processing : 15 IPDPS 2000 workshops, Cancun,Mexico, May 1 - 5, 2000, proceedings / Jose Rolim et al. (ed.). -Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ;Milan ; Paris ; Singapore ; Tokyo : Springer, 2000

(Lecture notes in computer science ; Vol. 1800)ISBN 3-540-67442-X

CR Subject Classification (1998): C.1-4, B.1-7, D.1-4, F.1-2, G.1-2, E.1, H.2

ISSN 0302-9743ISBN 3-540-67442-X Springer-Verlag Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,in its current version, and permission for use must always be obtained from Springer-Verlag. Violations areliable for prosecution under the German Copyright Law.

Springer-Verlag is a company in the BertelsmannSpringer publishing group.© Springer-Verlag Berlin Heidelberg 2000Printed in Germany

Typesetting: Camera-ready by author, data conversion by Boller MediendesignPrinted on acid-free paper SPIN: 10720149 06/3142 5 4 3 2 1 0

Foreword

This volume contains the proceedings from the workshops held in conjunctionwith the IEEE International Parallel and Distributed Processing Symposium,IPDPS 2000, on 1-5 May 2000 in Cancun, Mexico.

The workshops provide a forum for bringing together researchers, practition-ers, and designers from various backgrounds to discuss the state of the art inparallelism. They focus on different aspects of parallelism, from run time systemsto formal methods, from optics to irregular problems, from biology to networksof personal computers, from embedded systems to programming environments;the following workshops are represented in this volume:

– Workshop on Personal Computer Based Networks of Workstations– Workshop on Advances in Parallel and Distributed Computational Models– Workshop on Par. and Dist. Comp. in Image, Video, and Multimedia– Workshop on High-Level Parallel Prog. Models and Supportive Env.– Workshop on High Performance Data Mining– Workshop on Solving Irregularly Structured Problems in Parallel– Workshop on Java for Parallel and Distributed Computing– Workshop on Biologically Inspired Solutions to Parallel Processing Problems– Workshop on Parallel and Distributed Real-Time Systems– Workshop on Embedded HPC Systems and Applications– Reconfigurable Architectures Workshop– Workshop on Formal Methods for Parallel Programming– Workshop on Optics and Computer Science– Workshop on Run-Time Systems for Parallel Programming– Workshop on Fault-Tolerant Parallel and Distributed Systems

All papers published in the workshops proceedings were selected by the pro-gram committee on the basis of referee reports. Each paper was reviewed byindependent referees who judged the papers for originality, quality, and consis-tency with the themes of the workshops.

We would like to thank the general co-chairs Joseph JaJa and Charles Weemsfor their support and encouragement, the steering committee chairs, GeorgeWestrom and Victor Prasanna, for their guidance and vision, and the financechair, Bill Pitts, for making this publication possible. Special thanks are due toSally Jelinek, for her assistance with meeting publicity, to Susamma Barua formaking local arrangements, and to Danuta Sosnowska for her tireless efforts ininterfacing with the organizers.

We gratefully acknowledge sponsorship from the IEEE Computer Society andits Technical Committee of Parallel Processing and the cooperation of the ACMSIGARCH. Finally, we would like to thank Danuta Sosnowska and GermaineGusthiot for their help in the preparation of this volume.

February 2000 Jose D. P. Rolim

Volume Editors

Jose D.P. RolimG. ChiolaG. ConteL.V. ManciniOscar H. IbarraKoji NakanoStephan OlariuSethuraman PanchanathanAndreas UhlMartin SchulzMohammed J. ZakiVipin KumarDavid B. SkilicornSartaj SahniTimothy DavisSanguthevar RajasekeranSanjay RankaDenis CaromelSerge ChaumetteGeoffrey FoxPeter GrahamAlbert Y. ZomayaFikret Ercal

Kenji TodaSang Hyuk SonMaarten BoassonYoshiaki KakudaDeveah BhattLonnie R. WelchHossam ElGindyViktor K. PrasannaHartmut SchmeckOliver DiesselBeverly SandersDominique MeryFouad KiamilevJeremy EkmanAfonso FerreiraSadik EsenerYi PanKeqin LiRon OlssonLaxmikant V. KalePete BeckmanMatthew HainesDimiter R. Avresky

Contents

Workshop on Personal Computer Based Networks ofWorkstations

1

G. Chiola, G. Conte, L.V. Mancini

Memory Management in a Combined VIA/SCI Hardware 4M. Trams, W. Rehm, D. Balkanski, S. Simeonov

ATOLL, a New Switched, High Speed Interconnect in Comparison toMyrinet and SCI 16M. Fischer, U. Bruning, J. Kluge, L. Rzymianowicz, P. Schulz, M. Waack

ClusterNet: An Object-Oriented Cluster Network 28R.R. Hoare

GigaBit Performance under NT 39M. Baker, S. Scott, A. Geist, L. Browne

MPI Collective Operations over IP Multicast 51H.A. Chen, Y.O. Carrasco, A.W. Apon

An Open Market-Based Architecture for Distributed Computing 61S. Lalis, A. Karipidis

The MultiCluster Model to the Integrated Use of Multiple WorkstationClusters 71M. Baretto, R. Avila, P. Navaux

Parallel Information Retrieval on an SCI-Based PC-NOW 81S.-H. Chung, H.-C. Kwon, K.R. Ryu, H.-K. Jang, J.-H. Kim, C.-A. Choi

A PC-NOW Based Parallel Extension for a Sequential DBMS 91M. Exbrayat, L. Brunie

Workshop on Advances in Parallel and DistributedComputational Models

101

O.H. Ibarra, K. Nakano, S. Olariu

The Heterogeneous Bulk Synchronous Parallel Model 102T.L. Williams, R.J. Parsons

On Stalling in LogP 109G. Bilardi, K.T. Herley, A. Pietracaprina, G. Pucci

X Contents

Parallelizability of Some P -Complete Problems 116A. Fujiwara, M. Inoue, T. Masuzawa

A New Computation of Shape Moments via Quadtree Decomposition 123C.-H. Wu, S.-J. Horng, P.-Z. Lee, S.-S. Lee, S.-Y. Lin

The Fuzzy Philosophers 130S.-T. Huang

A Java Applet to Visualize Algorithms on Reconfigurable Mesh 137K. Miyashita, R. Hashimoto

A Hardware Implementation of PRAM and Its Performance Evaluation 143M. Imai, Y. Hayakawa, H. Kawanaka, W. Chen, K. Wada,C.D. Castanho, Y. Okajima, H. Okamoto

A Non-binary Parallel Arithmetic Architecture 149R. Lin, J.L. Schwing

Multithreaded Parallel Computer Model with Performance Evaluation 155J. Cui, J.L. Bordim, K. Nakano, T. Hayashi, N. Ishii

Workshop on Parallel and Distributed Computing in ImageProcessing, Video Processing, and Multimedia (PDIVM 2000)

161

S. Panchanathan, A. Uhl

MAJC-5200: A High Performance Microprocessor for MultimediaComputing 163S. Sudharsanan

A Novel Superscalar Architecture for Fast DCT Implementation 171Z. Yong, M. Zhang

Computing Distance Maps Efficiently Using an Optical Bus 178Y. Pan, Y. Li, J. Li, K. Li, S.-Q. Zheng

Advanced Data Layout Optimization for Multimedia Applications 186C. Kulkarni, F. Catthoor, H. De Man

Parallel Parsing of MPEG Video in a Multi-threaded MultiprocessorEnvironment 194S.M. Bhandarkar, S.R. Chandrasekaran

Contents XI

Parallelization Techniques for Spatial-Temporal Occupancy Maps fromMultiple Video Streams 202N. DeBardeleben, A. Hoover, W. Jones, W. Ligon

Heuristic Solutions for a Mapping Problem in a TV-Anytime ServerNetwork 210X. Zhou, R. Luling, L. Xie

RPV: A Programming Environment for Real-Time Parallel Vision -Specification and Programming Methodology - 218D. Arita, Y. Hamada, S. Yonemoto, R.-i. Taniguchi

Parallel Low-Level Image Processing on a Distributed Memory System 226C. Nicolescu, P. Jonker

Congestion-Free Routing of Streaming Multimedia Content inBMIN-Based Parallel Systems 234H. Sethu

Performance of On-Chip Multiprocessors for Vision Tasks 242Y. Chung, K. Park, W. Hahn, N. Park, V.K. Prasanna

Parallel Hardware-Software Architecture for Computation of DiscreteWavelet Transform Using the Recursive Merge Filtering Algorithm 250P. Jamkhandi, A. Mukherjee, K. Mukherjee, R. Franceschini

Workshop on High-Level Parallel Programming Models andSupportive Environments (HIPS 2000)

257

M. Schulz

Pipelining Wavefront Computations: Experiences and Performance 261E.C. Lewis, L. Snyder

Specification Techniques for Automatic Performance Analysis Tools 269M. Gerndt, H.-G. Eßer

PDRS: A Performance Data Representation System 277X.-H. Sun, X. Wu

Clix - A Hybrid Programming Environment for Distributed Objectsand Distributed Shared Memory 285F. Mueller, J. Nolte, A. Schlaefer

Controlling Distributed Shared Memory Consistency from High LevelProgramming Languages 293Y. Jegou

XII Contents

Online Computation of Critical Paths for Multithreaded Languages 301Y. Oyama, K. Taura, A. Yonezawa

Problem Solving Environment Infrastructure for High PerformanceComputer Systems 314D.C. Stanzione, Jr., W.B. Ligon III

Combining Fusion Optimizations and Piecewise Execution of NestedData-Parallel Programs 324W. Pfannenstiel

Declarative Concurrency in Java 332R. Ramirez, A.E. Santosa

Scalable Monitoring Technique for Detecting Races in Parallel Programs 340Y.-K. Jun, C.E. McDowell

Workshop on High Performance Data Mining 348M.J. Zaki, V. Kumar, D.B. Skillicorn

Implementation Issues in the Design of I/O Intensive Data MiningApplications on Clusters of Workstations 350R. Baraglia, D. Laforenza, S. Orlando, P. Palmerini, R. Perego

A Requirements Analysis for Parallel KDD Systems 358W.A. Maniatty, M.J. Zaki

Parallel Data Mining on ATM-Connected PC Cluster and Optimizationof Its Execution Environment 366M. Oguchi, M. Kitsuregawa

The Parallelization of a Knowledge Discovery System with HypergraphRepresentation 374J. Seitzer, J.P. Buckley, Y. Pan, L.A. Adams

Parallelisation of C4.5 as a Particular Divide and Conquer Computation 382P. Becuzzi, M. Coppola, S. Ruggieri, M. Vanneschi

Scalable Parallel Clustering for Data Mining on Multicomputers 390D. Foti, D. Lipari, C. Pizzuti, D. Talia

Exploiting Dataset Similarity for Distributed Mining 399S. Parthasarathy, M. Ogihara

Contents XIII

Scalable Model for Extensional and Intensional Descriptions ofUnclassified Data 407H.A. Prado, S.C. Hirtle, P.M. Engel

Parallel Data Mining of Bayesian Networks from TelecommunicationsNetwork Data 415R. Sterrit, K. Adamson, C.M. Shapcott, E.P. Curran

Irregular 2000 - Workshop on Solving Irregularly StructuredProblems in Parallel

423

S. Sahni, T. Davis, S. Rajasekeran, S. Ranka

Load Balancing and Continuous Quadratic Programming 427W.W. Hager

Parallel Management of Large Dynamic Shared Memory Space:A Hierarchical FEM Application 428X. Cavin, L. Alonso

Efficient Parallelization of Unstructured Reductions on Shared MemoryParallel Architectures 435S. Benkner, T. Brandes

Parallel FEM Simulation of Crack Propagation-Challenges, Status,and Perspectives 443B. Carter, C.-S. Chen, L.P. Chew, N. Chrisochoides, G.R. Gao,G. Heber, A.R. Ingraffea, R. Krause, C. Myers, D. Nave, K. Pingali,P. Stodghill, S. Vavasis, P.A. Wawrzynek

Support for Irregular Computations in Massively Parallel PIM Arrays,Using an Object-Based Execution Model 450H.P. Zima, T.L. Sterling

Executing Communication-Intensive Irregular Programs Efficiently 457V. Ramakrishnan, I.D. Scherson

Non-Memory-Based and Real-Time Zerotree Building for WaveletZerotree Coding Systems 469D. Peng, M. Lu

Graph Partitioning for Dynamic, Adaptive, and Multi-phaseComputations 476V. Kumar, K. Schloegel, G. Karypis

XIV Contents

A Multilevel Algorithm for Spectral Partitioning with ExtendedEigen-Models 477S. Oliveira, T. Soma

An Integrated Decomposition and Partitioning Approach forIrregular Block-Structured Applications 485J. Rantakokko

Ordering Unstructured Meshes for Sparse Matrix Computations onLeading Parallel Systems 497L. Oliker, X. Li, G. Heber, R. Biswas

A GRASP for Computing Approximate Solutions for the Three-IndexAssignment Problem 504R.M. Aiex, P.M. Pardalos, L.S. Pitsoulis, M.G.C. Resende

On Identifying Strongly Connected Components in Parallel 505L.K. Fleischer, B. Hendrickson, A. Pınar

A Parallel, Adaptive Refinement Scheme for Tetrahedral andTriangular Grids 512A. Stagg, J. Hallberg, J. Schmidt

PaStiX: A Parallel Sparse Direct Solver Based on a Static Schedulingfor Mixed 1D/2D Block Distributions 519P. Henon, P. Ramet, J. Roman

Workshop on Java for Parallel and Distributed Computing 526D. Caromel, S. Chaumette, G. Fox, P. Graham

An IP Next Generation Compliant JavaTM Virtual Machine 528G. Chelius, E. Fleury

An Approach to Asynchronous Object-Oriented Parallel andDistributed Computing on Wide-Area Systems 536M. Di Santo, F. Frattolillo, W. Russo, E. Zimeo

Performance Issues for Multi-language Java Applications 544P. Murray, T. Smith, S. Srinivas, M. Jacob

MPJ: A Proposed Java Message Passing API and Environment forHigh Performance Computing 552M. Baker, B. Carpenter

Contents XV

Implementing Java Consistency Using a Generic, Multithreaded DSMRuntime System 560G. Antoniu, L. Bouge, P. Hatcher, M. MacBeth, K. McGuigan, R. Namyst

Workshop on Bio-Inspired Solutions to Parallel ProcessingProblems (BioSP3)

568

A.Y. Zomaya, F. Ercal, S. Olariu

Take Advantage of the Computing Power of DNA Computers 570Z.F. Qiu, M. Lu

Agent Surgery: The Case for Mutable Agents 578L. Boloni, D.C. Marinescu

Was Collective Intelligence before Life on Earth? 586T. Szuba, M. Almulla

Solving Problems on Parallel Computers by Cellular Programming 595D. Talia

Multiprocessor Scheduling with Support by Genetic Algorithms-BasedLearning Classifier System 604J.P. Nowacki, G. Pycka, F. Seredynski

Viewing Scheduling Problems through Genetic and EvolutionaryAlgorithms 612M. Rocha, C. Vilela, P. Cortez, J. Neves

Dynamic Load Balancing Model: Preliminary Assessment of aBiological Model for a Pseudo-search Engine 620R.L. Walker

A Parallel Co-evolutionary Metaheuristic 628V. Bachelet, E.-G. Talbi

Neural Fraud Detection in Mobile Phone Operations 636A. Boukerche, M.S.M.A. Notare

Information Exchange in Multi Colony Ant Algorithms 645M. Middendorf, F. Reischle, H. Schmeck

A Surface-Based DNA Algorithm for the Expansion of SymbolicDeterminants 653Z.F. Qiu, M. Lu

XVI Contents

Hardware Support for Simulated Annealing and Tabu Search 660R. Schneider, R. Weiss

Workshop on Parallel and Distributed Real-Time Systems 668K. Toda, S.H. Son, M. Boasson, Y. Kakuda

A Distributed Real Time Coordination Protocol 671L. Sha, D. Seto

A Segmented Backup Scheme for Dependable Real TimeCommunication in Multihop Networks 678P.K. Gummadi, J.P. Madhavarapu, S.R. Murthy

Real-Time Coordination in Distributed Multimedia Systems 685T.A. Limniotes, G.A. Papadopoulos

Supporting Fault-Tolerant Real-Time Applications Using theRED-Linux General Scheduling Framework 692K.-J. Lin, Y.-C. Wang

Are COTS Suitable for Building Distributed Fault-Tolerant HardReal-Time Systems? 699P. Chevochot, A. Colin, D. Decotigny, I. Puaut

Autonomous Consistency Technique in Distributed Database withHeterogeneous Requirements 706H. Hanamura, I. Kaji, K. Mori

Real-Time Transaction Processing Using Two-Stage Validation inBroadcast Disks 713K.-w. Lam, V.C.S. Lee, S.H. Son

Using Logs to Increase Availability in Real-Time Main-Memory Database 720T. Niklander, K. Raatikainen

Components Are from Mars 727M.R.V. Chaudron, E. de Jong

2+10 � 1+50 ! 734H. Hansson, C. Norstrom, S. Punnekkat

A Framework for Embedded Real-Time System Design 738J.-Y. Choi, H.-H. Kwak, I. Lee

Contents XVII

Best-Effort Scheduling of (m,k)-Firm Real-Time Streams in MultihopNetworks 743A. Striegel, G. Manimaran

Predictability and Resource Management in Distributed MultimediaPresentations 750C. Mourlas

Quality of Service Negotiation for Distributed, Dynamic Real-Time Systems 757C.D. Cavanaugh, L.R. Welch, B.A. Shirazi, E.-n. Huh, S. Anwar

An Open Framework for Real-Time Scheduling Simulation 766T. Kramp, M. Adrian, R. Koster

Workshop on Embedded/Distributed HPC Systems andApplications (EHPC2000)

773

D. Bhatt, L.R. Welch

A Probabilistic Power Prediction Tool for the Xilinx 4000-Series FPGA 776T. Osmulski, J.T. Muehring, B. Veale, J.M. West, H. Li, S. Vanichayobon,S.-H. Ko, J.K. Antonio, S.K. Dhall

Application Challenges: System Health Management for Complex Systems 784G.D. Hadden, P. Bergstrom, T. Samad, B.H. Bennett, G.J. Vachtsevanos,J. Van Dyke

Accomodating QoS Prediction in an Adaptive Resource ManagementFramework 792E.-n. Huh, L.R. Welch, B.A. Shirazi, B.C. Tjaden, C.D. Cavanaugh

Network Load Monitoring in Distributed Systems 800K.M. Jahirul Islam, B.A. Shirazi, L.R. Welch, B.C. Tjaden,C.D. Cavanaugh, S. Anwar

A Novel Specification and Design Methodology of EmbeddedMultiprocessor Signal Processing Systems Using High-PerformanceMiddleware 808R.S. Janka, L.M. Wills

Auto Source Code Generation and Run-Time Infrastructure andEnvironment for High Performance, Distributed Computing Systems 816M.I. Patel, K. Jordan, M. Clark, D. Bhatt

XVIII Contents

Developing an Open Architecture for Performance Data Mining 823D.B. Pierce, D.T. Rover

A 90k Gate “CLB” for Parallel Distributed Computing 831B. Schulman, G. Pechanek

Power-Aware Replication of Data Structures in DistributedEmbedded Real-Time Systems 839O.S. Unsal, I. Koren, C.M. Krishna

Comparison of MPI Implementations on a Shared Memory Machine 847B. Van Voorst, S. Seidel

A Genetic Algorithm Approach to Scheduling Communications for aClass of Parallel Space-Time Adaptive Processing Algorithms 855J.M. West, J.K. Antonio

Reconfigurable Parallel Sorting and Load Balancing on a BeowulfCluster: HeteroSort 862P. Yang, T.M. Kunau, B.H. Bennett, E. Davis, B. Wren

Reconfigurable Architectures Workshop (RAW 2000) 870H. ElGindy, V.K. Prasanna, H. Schmeck, O. Diessel

Run-Time Reconfiguration at Xilinx 873S.A. Guccione

JRoute: A Run-Time Routing API for FPGA Hardware 874E. Keller

A Reconfigurable Content Addressable Memory 882S.A. Guccione, D. Levi, D. Downs

ATLANTIS - A Hybrid FPGA/RISC Based Re-configurable System 890O. Brosch, J. Hesser, C. Hinkelbein, K. Kornmesser, T. Kuberka,A. Kugel, R. Manner, H. Singpiel, B. Vettermann

The Cellular Processor Architecture CEPRA-1X and Its Configurationby CDL 898C. Hochberger, R. Hoffmann, K.-P. Volkmann, S. Waldschmidt

Contents XIX

Loop Pipelining and Optimization for Run Time Reconfiguration 906K. Bondalapati, V.K. Prasanna

Compiling Process Algebraic Descriptions into Reconfigurable Logic 916O. Diessel, G. Milne

Behavioral Partitioning with Synthesis for Multi-FPGA Architecturesunder Interconnect, Area, and Latency Constraints 924P. Lakshmikanthan, S. Govindarajan, V. Srinivasan, R. Vemuri

Module Allocation for Dynamically Reconfigurable Systems 932X.-j. Zhang, K.-w. Ng

Augmenting Modern Superscalar Architectures with ConfigurableExtended Instructions 941X. Zhou, M. Martonosi

Complexity Bounds for Lookup Table Implementation of Factored Formsin FPGA Technology Mapping

951

W. Feng, F.J. Meyer, F. Lombardi

Optimization of Motion Estimator for Run-Time-ReconfgurationImplementation 959C. Tanougast, Y. Berviller, S. Weber

Constant-Time Hough Transform on a 3D Reconfigurable MeshUsing Fewer Processors 966Y. Pan

Workshop on Formal Methods for Parallel Programming(FMPPTA 2000)

974

B. Sanders, D. Mery

A Method for Automatic Cryptographic Protocol Verification 977J. Goubault-Larrecq

Verification Methods for Weaker Shared Memory Consistency Models 985R.P. Ghughal, G.C. Gopalakrishnan

Models Supporting Nondeterminism and Probabilistic Choice 993M. Mislove

Concurrent Specification and Timing Analysis of Digital HardwareUsing SDL 1001K.J. Turner, F.J. Argul-Marin, S.D. Laing

XX Contents

Incorporating Non-functional Requirements into Software Architectures 1009N.S. Rosa, G.R.R. Justo, P.R.F. Cunha

Automatic Implementation of Distributed Systems Formal Specifications 1019L.H. Castelo Branco, A.F. do Prado, W. Lopes de Souza, M. Sant’Anna

Refinement Based Validation of an Algorithm for DetectingDistributed Termination 1027M. Filali, P. Mauran, G. Padiou, P. Queinnec, X. Thirioux

Tutorial 1: Abstraction and Refinement of Concurrent Programs andFormal Specification 1037D. Cansell, D. Mery, C. Tabacznyj

Tutorial 2: A Foundation for Composing Concurrent Objects 1039J.-P. Bahsoun

Workshop on Optics and Computer Science (WOCS 2000) 1042F. Kiamilev, J. Ekman, A. Ferreira, S. Esener, Y. Pan, K. Li

Fault Tolerant Algorithms for a Linear Array with aReconfigurable Pipelined Bus System 1044A.G. Bourgeois, J.L. Trahan

Fast and Scalable Parallel Matrix Computationas with Optical Buses 1053K. Li

Pulse-Modulated Vision Chips with Versatile-Interconnected Pixels 1063J. Ohta, A. Uehara, T. Tokuda, M. Nunoshita

Connectivity Models for Optoelectronic Computing Systems 1072H.M. Ozaktas

Optoelectronic-VLSI Technology: Terabit/s I/O to a VLSI Chip 1089A.V. Krishnamoorthy

Three Dimensional VLSI-Scale Interconnects 1092D.W. Prather

Present and Future Needs of Free-Space Optical Interconnects 1104S. Esener, P. Marchand

Contents XXI

Fast Sorting on a Linear Array with a Reconfigurable PipelinedBus System 1110A. Datta, R. Owens, S. Soundaralakshmi

Architecture Description and Prototype Demonstration ofOptoelectronic Parallel-Matching Architecture 1118K. Kagawa, K. Nitta, Y. Ogura, J. Tanida, Y. Ichioka

A Distributed Computing Demonstration System Using FSOIInter-Processor Communication 1126J. Ekman, C. Berger, F. Kiamilev, X. Wang, H. Spaanenburg,P. Marchand, S. Esener

Optoelectronic Multi-chip Modules Based on Imaging Fiber BundleStructures 1132D.M. Chiarulli, S.P. Levitan

VCSEL Based Smart Pixel Array Technology Enables Chip-to-ChipOptical Interconnect 1133Y. Liu

Workshop on Run-Time Systems for Parallel Programming(RTSPP)

1134

R. Olsson, L.V. Kale, P. Beckman, M. Haines

A Portable and Adaptative Multi-protocol Communication Library forMultithreaded Runtime Systems 1136O. Aumage, L. Bouge, R. Namyst

CORBA Based Runtime Support for Load Distribution and FaultTolerance 1144T. Barth, G. Flender, B. Freisleben, M. Grauer, F. Thilo

Run-Time Support for Adaptive Load Balancing 1152M.A. Bhandarkar, R.K. Brunner, L.V. Kale

Integrating Kernel Activations in a Multithreaded Runtime System onTop of Linux 1160V. Danjean, R. Namyst, R.D. Russell

DyRecT: Software Support for Adaptive Parallelism on NOWs 1168E. Godard, S. Setia, E. White

Fast Measurement of LogP Parameters for Message Passing Platforms 1176T. Kielmann, H.E. Bal, K. Verstoep

XXII Contents

Supporting Flexible Safety and Sharing in Multi-threaded Environments 1184S.H. Samorodin, R. Pandey

A Runtime System for Dynamic DAG Programming 1192M.-Y. Wu, W. Shu, Y. Chen

Workshop on Fault-Tolerant Parallel and Distributed Systems(FTPDS 2000)

1200

D.R. Avresky

Certification of System Architecture Dependability 1202I. Levendel

Computing in the RAIN: A Reliable Array of Independent Nodes 1204V. Bohossian, C.C. Fan, P.S. LeMahieu, M.D. Riedel, L. Xu, J. Bruck

Fault-Tolerant Wide-Area Parallel Computing 1214J.B. Weissman

Transient Analysis of Dependability/Performability Models byRegenerative Randomization with Laplace Transform Inversion 1226J.A. Carrasco

FANTOMAS: Fault Tolerance for Mobile Agents in Clusters 1236H. Pals, S. Petri, C. Grewe

Metrics, Methodologies, and Tools for Analyzing Network FaultRecovery Performance in Real-Time Distributed Systems 1248P.M. Irey IV, B.L. Chappell, R.W. Hott, D.T. Marlow,K.F. O’Donoghue, T.R. Plunkett

Consensus Based on Strong Failure Detectors: A Time andMessage-Efficient Protocol 1258F. Greve, M. Hurfin, R. Macedo, M. Raynal

Implementation of Finite Lattices in VLSI for Fault-State Encoding inHigh-Speed Networks 1266A.C. Doring, G. Lustig

Building a Reliable Message Delivery System Using the CORBAEvent Service 1276S. Ramani, B. Dasarathy, K.S. Trivedi

Contents XXIII

Network Survivability Simulation of a Commercially DeployedDynamic Routing System Protocol 1281A. Chowdhury, O. Frieder, P. Luse, P.-J. Wan

Fault-Tolerant Distributed-Shared-Memory on a Broadcast-BasedInterconnection Network 1286D. Hecht, C. Katsinis

An Efficient Backup-Overloading for Fault-Tolerant Scheduling ofReal-Time Tasks 1291R. Al-Omari, G. Manimaran, A.K. Somani

Mobile Agents to Automate Fault Management in Wireless andMobile Networks 1296N. Pissinou, Bhagyavati, K. Makki

Heterogeneous Computing Workshop (HCW 2000) 1301V.K. Prasanna, C.S. Raghavendra

Author Index 1307

Author Index

Lee A. Ada ms 374Kenny Ada mson 415Ma tthia s Adria n 766Rena ta M. Aiex 504R. Al-Oma ri 1291Moha mmed Almulla 586La urent Alonso 428John K. Antonio 776, 855Ga briel Antoniu 560Sha fqa t Anwa r 757, 800Amy W. Apon 51F. Ja vier Argul-Ma rin 1001Da isa ku Arita 218Olivier Auma ge 1136Ra fa el Avila 71

Vincent Ba chelet 628Jea n-Pa ul Ba hsoun 1039Ma rk Ba ker 39, 552Henri E. Ba l 1176Da niel Ba lka nski 4R. Ba ra glia 350Ma rcos Ba retto 71Thoma s Ba rth 1144Primo Becuzzi 382Siegfried Benkner 435Bonnie Holte Bennett 784, 862C. Berger 1126Peter Bergstrom 784Yves Berviller 959Bha gya va ti 1296Milind A. Bha nda rka r 1152Suchendra M. Bha nda rka r 194Devesh Bha tt 816Gia nfra nco Bila rdi 109Rupa k Biswa s 497Va sken Bohossia n 1204La disla u Boloni 578Kira n Bonda la pa ti 906J. L. Bordim 155Luc Bouge 560, 1136Azzedine Boukerche 636Anu G. Bourgeois 1044Thoma s Bra ndes 435O. Brosch 890

Loga n Browne 39Jehoshua Bruck 1204Lionel Brunie 91Ulrich Bruning 16Robert K. Brunner 1152Ja mes P. Buckley 374

Dominique Ca nsell 1037Brya n Ca rpenter 552Jua n A. Ca rra sco 1226Yvette O. Ca rra sco 51Bruce Ca rter 443C. D. Ca sta nho 143Luiz Henrique Ca stelo Bra nco 1019Fra ncky Ca tthoor 186Cha rles D. Ca va na ugh 757, 792, 800Xa vier Ca vin 428Sha nka r R. Cha ndra seka ra n 194B. L. Cha ppell 1248M.R.V. Cha udron 727Guilla ume Chelius 528Chuin-Sha n Chen 443Hsia ng Ann Chen 51W. Chen 143Yong Chen 1192Pa sca l Chevochot 699L. Pa ul Chew 443Dona ld M. Chia rulli 1132Cha m-Ah Choi 81Jin-Young Choi 738Abdur Chowdhury 1281Nikos Chrisochoides 443Sa ng-Hwa Chung 81Y. Chung 242Ma ttew Cla rk 816Antoine Colin 699Ma ssimo Coppola 382Pa ulo Cortez 612J. Cui 155Pa ulo R. F. Cunha 1009Edwin P. Curra n 415

Vincent Da njea n 1160Ba la krishna n Da sa ra thy 1276Amita va Da tta 1110Emmett Da vis 862

1308 Author Index

Na tha n DeBa rdeleben 202Da vid Decotigny 699Hugo De Ma n 186Suda rsha n K. Dha ll 776Oliver Diessel 916M. Di Sa nto 536Andrea s C. Doring 1266Da niel Downs 882Joe Va n Dyke 784

J. Ekma n 1126Pa ulo M. Engel 407Sa dik Esener 1104, 1126Ha ns-Georg Eßer 269Ma tthieu Exbra ya t 91

Cha rles C. Fa n 1204Wenyi Feng 951Ma moun Fila li 1027Ma rkus Fischer 16Lisa K. Fleischer 505Gerd Flender 1144Eric Fleury 528D. Foti 390Robert Fra nceschini 250F. Fra ttolillo 536Bernd Freisleben 1144Ophir Frieder 1281Akihiro Fujiwa ra 116

Gua ng R. Ga o 443Al Geist 39Micha el Gerndt 269Ra jnish P. Ghugha l 985Etienne Goda rd 1168Ga nesh C. Gopa la krishna n 985Jea n Gouba ult-La rrecq 977Srira m Govinda ra ja n 924Ma nfred Gra uer 1144Fa bıola Greve 1258Cla us Grewe 1236Steven A. Guccione 873, 882P. Krishna Gumma di 678

George D. Ha dden 784Wilia m W. Ha ger 427W. Ha hn 242Ja ckie Ha llberg 512Yoshio Ha ma da 218Hideo Ha na mura 706

Ha ns Ha nsson 734Reiji Ha shimoto 137Philip Ha tcher 560Y. Ha ya ka wa 143T. Ha ya shi 155Gerd Heber 443, 497Dia na Hecht 1286Bruce Hendrickson 505Pa sca l Henon 519Kiera n T. Herley 109J. Hesser 890C. Hinkelbein 890Stephen C. Hirtle 407Ra ymond R. Hoa re 28Christia n Hochberger 898Rolf Hoffma nn 898Ada m Hoover 202Shi-Jinn Horng 123R. W. Hott 1248Shing-Tsa a n Hua ng 130Eui-na m Huh 757, 792Michel Hurfin 1258

Yoshiki Ichioka 1118M. Ima i 143Antony R. Ingra ffea 443Michiko Inoue 116P. M. Irey IV 1248N. Ishii 155

Ma tthia s Ja cob 544Ka zi M. Ja hirul Isla m 800Piyush Ja mkha ndi 250Ha n-Kook Ja ng 81Ra nda ll S. Ja nka 808Yvon Jegou 293Willia m Jones 202E. de Jong 727Pieter Jonker 226Ka rl Jorda n 816Yong-Kee Jun 340George R. R. Justo 1009

Keiichiro Ka ga wa 1118Isa o Ka ji 706La xmika nt V. Ka le 1152Alexa ndros Ka ripidis 61George Ka rypis 476Consta ntine Ka tsinis 1286H. Ka wa na ka 143

Author Index 1309

Eric Keller 874F. Kia milev 1126Thilo Kielma nn 1176Jin-Hyuk Kim 81Ma sa ru Kitsurega wa 366Jorg Kluge 16Seok-Hyun Ko 776Isra el Koren 839K. Kornmesser 890Ra iner Koster 766Thorsten Kra mp 766Rola nd Kra use 443C. Ma ni Krishna 839Ashok V. Krishna moorthy 1089T. Kuberka 890A. Kugel 890Chida mber Kulka rni 186Vipin Kuma r 476Timothy M. Kuna u 862Hee-Hwa n Kwa k 738Hyuk-Chul Kwon 81

D. La forenza 350Stephen D. La ing 1001Preetha m La kshmika ntha n 924Spyros La lis 61Kwok-wa La m 713Pa ul S. LeMa hieu 1204Insup Lee 738Pei-Zong Lee 123Shung-Shing Lee 123Victor C. S. Lee 713I. Levendel 1202Delon Levi 882Steven P. Levita n 1132E. Christopher Lewis 261Hongping Li 776Jie Li 178Keqin Li 178, 1053Xia oye Li 497Ya min Li 178Wa lter B. Ligon 202, 314Theophilos A. Limniotes 685Kwei-Ja y Lin 692Rong Lin 149Shih-Ying Lin 123D. Lipa ri 390Yue Liu 1133Fa brizio Lomba rdi 951Mi Lu 469, 570, 653

Reinha rd Luling 210Pa ul Luse 1281Gunther Lustig 1266

Ma rk Ma cBeth 560Ra imundo Ma cedo 1258Jna na Pra deep Ma dha va ra pu 678R. Ma nner 890Kia Ma kki 1296Willia m A. Ma nia tty 358G. Ma nima ra n 743, 1291Philippe Ma rcha nd 1104, 1126Da n C. Ma rinescu 578D. T. Ma rlow 1248Ma rga ret Ma rtonosi 941Toshimitsu Ma suza wa 116Philippe Ma ura n 1027Cha rles E. McDowell 340Keith McGuiga n 560Dominique Mery 1037Fred J. Meyer 951Ma rtin Middendorf 645George Milne 916Micha el Mislove 993Kensuke Miya shita 137Kinji Mori 706Costa s Mourla s 750Jeffrey T. Muehring 776Fra nk Mueller 285Ama r Mukherjee 250Kuna l Mukherjee 250Pa ul Murra y 544Siva Ra m Murthy 678Chris Myers 443

K. Na ka no 155Ra ymond Na myst 560, 1136, 1160Philippe Na va ux 71Demia n Na ve 443Jose Neves 612Ka m-wing Ng 932Cristina Nicolescu 226Tiina Nikla nder 720Kouchi Nitta 1118Jorg Nolte 285Christer Norstrom 734Mirela Sechi Moretti Annoni Nota re 636Jerzy P. Nowa cki 604Ma sa hiro Nunoshita 1063

1310 Author Index

K. F. O’Donoghue 1248

Mitsunori Ogiha ra 399Ma sa to Oguchi 366Yusuke Ogura 1118

Jun Ohta 1063Y. Oka jima 143H. Oka moto 143

Leonid Oliker 497Suely Oliveira 477

Sa lva tore Orla ndo 350Timothy Osmulski 776Robyn Owens 1110

Yoshihiro Oya ma 301Ha ldun M. Oza kta s 1072

Gera rd Pa diou 1027P. Pa lmerini 350Holger Pa ls 1236

Yi Pa n 178, 374, 966Ra ju Pa ndey 1184

George A. Pa pa dopoulos 685Pa nos M. Pa rda los 504K. Pa rk 242

N. Pa rk 242Rebecca J. Pa rsons 102Sriniva sa n Pa rtha sa ra thy 399

Minesh I. Pa tel 816Gera ld Pecha nek 831Dongming Peng 469

Ra ffa ele Perego 350Stefa n Petri 1236

Wolf Pfa nnenstiel 324Da vid B. Pierce 823Andrea Pietra ca prina 109

Ali Pına r 505Kesha v Pinga li 443Niki Pissinou 1296

Leonida s S. Pitsoulis 504C. Pizzuti 390T. R. Plunkett 1248

Antonio Fra ncisco do Pra do 1019Hercules A. Pra do 407

Viktor K. Pra sa nna 242, 906Dennis W. Pra ther 1092Isa belle Pua ut 699

Geppino Pucci 109Sa sikuma r Punnekka t 734Grzegorz Pycka 604

Z. Fra nk Qiu 570, 653Philippe Queinnec 1027

Kimmo Ra a tika inen 720Va ra Ra ma krishna n 457Sriniva sa n Ra ma ni 1276Pierre Ra met 519Ra fa el Ra mirez 332Ja rmo Ra nta kokko 485Michel Ra yna l 1258Wolfga ng Rehm 4Fra nk Reischle 645Ma uricio G.C. Resende 504Ma rc D. Riedel 1204Miguel Rocha 612Jea n Roma n 519Nelson S. Rosa 1009Dia ne T. Rover 823Sa lva tore Ruggieri 382Robert D. Russell 1160W. Russo 536Kwa ng Ryel Ryu 81La rs Rzymia nowicz 16

Ta riq Sa ma d 784Steven H. Sa morodin 1184Ma rcelo Sa nt’Anna 1019Andrew E. Sa ntosa 332Isa a c D. Scherson 457Alexa nder Schla efer 285Kirk Schloegel 476Ha rtmut Schmeck 645Joseph Schmidt 512Reinha rd Schneider 660Bruce Schulma n 831Pa trick Schulz 16Ja mes L. Schwing 149Stephen Scott 39Steven Seidel 847Jennifer Seitzer 374Fra nciszek Seredynski 604Ha rish Sethu 234Sa njeev Setia 1168Da nbing Seto 671Lui Sha 671C. Ma ry Sha pcott 415Behrooz A. Shira zi 757, 792, 800Wei Shu 1192Sta nisla v Simeonov 4H. Singpiel 890

Author Index 1311

Todd Smith 544La wrence Snyder 261Ta ka ko Soma 477Arun K. Soma ni 1291Sa ng H. Son 713Subbia h Sounda ra la kshmi 1110Wa nderley Lopes de Souza 1019H. Spa a nenburg 1126Suresh Sriniva s 544Vinoo Sriniva sa n 924Ala n Sta gg 512Da niel C. Sta nzione 314Thoma s L. Sterling 450Roy Sterritt 415Pa ul Stodghill 443A. Striegel 743Subra ma nia Sudha rsa na n 163Xia n-He Sun 277Ta deusz Szuba 586

Christophe Ta ba cznyj 1037El-Gha za li Ta lbi 628Domenico Ta lia 390, 595Jun Ta nida 1118Rin-ichiro Ta niguchi 218Ca mel Ta nouga st 959Kenjiro Ta ura 301Fra nk Thilo 1144Xa vier Thirioux 1027Brett C. Tja den 792, 800Ta ka shi Tokuda 1063Jerry L. Tra ha n 1044Ma rio Tra ms 4Kishor S. Trivedi 1276Kenneth J. Turner 1001

Akihiro Ueha ra 1063Osma n S. Unsa l 839

George J. Va chtseva nos 784Sirirut Va nicha yobon 776Ma rco Va nneschi 382Bria n Va n Voorst 847Stephen Va va sis 443

Bria n Vea le 776Ra nga Vemuri 924Kees Verstoep 1176B. Vetterma nn 890Ca rla Vilela 612Kla us-Peter Volkma nn 898

Ma thia s Wa a ck 16K. Wa da 143Stefa n Wa ldschmidt 898Regina ld L. Wa lker 620Peng-Jun Wa n 1281X. Wa ng 1126Yu-Chung Wa ng 692Pa ul A. Wa wrzynek 443Serge Weber 959Reinhold Weiss 660Jon B. Weissma n 1214Lonnie R. Welch 757, 792, 800Ja ck M. West 776, 855Eliza beth White 1168Tiffa ny L. Willia ms 102Linda M. Wills 808Bill Wren 862Chin-Hsiung Wu 123Min-You Wu 1192Xingfu Wu 277

Li Xie 210Liha o Xu 1204

Pa mela Ya ng 862Sa toshi Yonemoto 218Akinori Yoneza wa 301Zha ng Yong 171

Moha mmed J. Za ki 358Min Zha ng 171Xue-jie Zha ng 932Si-Qing Zheng 178Xia nfeng Zhou 941Xia obo Zhou 210Ha ns P. Zima 450E. Zimeo 536

An Approach to Asynchronous Object-OrientedParallel and Distributed Computing on

Wide-Area Systems?

M. Di Santo1, F. Frattolillo1, W. Russo2 and E. Zimeo1

1 University of Sannio, School of Engineering, Benevento, Italy2 University of Calabria, DEIS, Rende (CS), Italy

Abstract. This paper presents a flexible and effective model for object-oriented parallel programming in both local and wide area contexts andits implementation as a Java package. Blending remote evaluation and ac-tive messages, our model permits programmers to express asynchronous,complex interactions, so overcoming some of the limitations of the modelsbased on message passing and RPC and reducing communication costs.

1 Introduction

Exploiting geographically distributed systems as high-performance platforms forlarge-scale problem solving is becoming more and more attractive, owing to thehigh number of workstations and clusters of computers accessible via Internetand to the spreading in the scientific community of platform-independent lan-guages, such as Java [14]. Unfortunately, the development of efficient, flexible andtransparently usable distributed environments is difficult due to the necessity ofsatisfying new requirements and constraints. (1) Host heterogeneity: wide-areasystems solve large-scale problems by using large and variable pools of compu-tational resources, where hosts often run different operating systems on differenthardware. (2) Network heterogeneity: wide-area systems are characterized by thepresence of heterogeneous networks that often use a unifying protocol layer, suchas TCP/IP. While this allows hosts in different networks to interoperate, it maylimit the performance of specialized high-speed networking hardware, such asMyrinet [1]. (3) Distributed code management: a great number of hosts compli-cates the distributed management of both source and binary application code.(4) Use of non-dedicated resources: traditional message-passing models are toostatic in order to support the intrinsic variability of the pool of hosts used inwide-area systems. In this context, the interaction schemes based on one-sidedcommunications are more apt, even if some of them, by adopting synchronousclient/server models, do not ensure an efficient parallelization of programs.

Starting from these considerations, we propose a flexible and effective modelfor object-oriented parallel programming in both local- and wide-area contexts.? Work carried out under the financial support of the M.U.R.S.T. in the framework

of the project “Design Methodologies and Tools of High Performance Systems forDistributed Applications” (MOSAICO)

J. Rolim et al. (Eds.): IPDPS 2000 Workshops, LNCS 1800, pp. 536-543, 2000.© Springer-Verlag Berlin Heidelberg 2000

It is based on the remote evaluation [13] and active messages [3] models and over-comes some of the limitations of the models based on message passing and RPC,thanks to its completely asynchronous communications and to the capability ofexpressing complex interactions, which permit applications to reduce commu-nication costs. Moreover, its ability of migrating application code on demandavoids the static distribution and management of application software.

The model has been integrated into a minimal, portable, efficient and flexiblemiddleware infrastructure, called Moka, implemented as a Java library. Mokaallows us both to directly write object-oriented, parallel and distributed appli-cations and to implement higher-level programming systems. Moka applicationsare executed by a parallel abstract machine (PAM) built on top of a variablecollection of heterogeneous computers communicating by means of a transparent,multi-protocol transport layer, able to exploit high-speed, local-area networks.The PAM appears as a logically fully-interconnected set of abstract nodes (AN),each one wrapping a Java Virtual Machine. Each physical computer may hostmore than one AN .

2 Related work

De facto standard environments for parallel programming on clusters of worksta-tions are doubtless PVM [7] and MPI [12]. Both use an execution model basedon processes that communicate by way of message passing, which offers goodperformances but supports only static communication patterns. Moreover bothPVM and MPI are rather complex to be used by non-specialists and, on thewide-area scale, present the distributed code management problem. Java imple-mentations of PVM [4] and MPI [8] simplify a little the use of message passing,especially for object-oriented parallel programming.

A different and more attractive approach to distributed and parallel com-puting is the one proposed by Nexus [5] and NexusJava [6]. These systems sup-port fully asynchronous communications, multithreading and dynamic manage-ment of distributed resources in heterogeneous environments. The communica-tion scheme is based on the use of global pointers (one-sided communication),which allow software to refer memory areas (Nexus) or objects (NexusJava) al-located in different address spaces. NexusJava is rather similar to Moka bothin the programming model and in the architecture, but it has a too low-leveland verbose programming interface and limits distributed interactions to theinvocations of methods explicitly registered as handlers.

Commonly used middlewares based on Java RMI [14], generally, do not di-rectly provide asynchronous mechanisms on the client site. Some recent systems,such as ARMI [11], transform synchronous RMI interactions into asynchronousones by using either an explicit or an implicit multithreading approach. How-ever, in both cases, inefficiencies due to the scheduling of fine grain threads areintroduced, especially on commodity hardware. Instead, Moka provides moreefficient, fully asynchronous communications at system level.

An Approach to Asynchronous Object-Oriented Parallel and Distributed Computing 537

3 The Moka programming model

The model is based on the following three main concepts. (1) Threads, whichrepresent the active entities of a computation. (2) Global objects, which, througha global naming mechanism, allow the applications to build a global space ofobjects used by the threads to communicate and synchronize themselves. (3)Active objects (auto-executable objects), which allow threads to interact withglobal objects, by using a one-sided asynchronous communication mechanism.

A Moka computation is fully asynchronous and evolves through subsequentchanges of global objects’ states and through their influence on the control ofapplication threads. These have to be explicitly created and managed by the pro-gram, which must pay attention to protect objects against simultaneous accessesfrom the threads running on the same AN .

An active object (AO) can be asynchronously and reliably communicated, asa message, to an AN where it may arrive with an unlimited delay and withoutpreserving the sending order. When an AO reaches its destination node, theexecution of its handler (a function connected to the AO) is automatically carriedout. The handler code, when not already present on the node, is dynamicallyretrieved and loaded by Moka. So it is possible to program according to a pureMIMD model, where applications are organized as collections of components,each one implementing a class used to build active or global objects, loaded onthe nodes only when necessary (code on demand promoted by servers).

The automatic execution of handlers implies that a message is automaticallyand asynchronously received without using any explicit primitive. This seman-tics requires the existence of a specific activity (network consumer) devoted tothe tasks of extracting active objects from the network and of executing theirhandlers. A solution is based on the single-threaded upcall model, in which asingle network consumer serially runs in its execution environment the handlersof the received objects. On the other hand, when a handler may suspend on acondition to be satisfied only by the execution of another AO, in order to avoidthe deadlock of the system, the program must use the popup-threads model andso explicitly ask for the handler to be executed by a separate, dedicated thread(network consumer assistant) [9].

Application threads and handlers interact by using a space of global objects(GOS). A global object (GO) is abstractly defined as a pair (GN , GO−IMPL).GN is the global name of the GO and univocally identifies it in the Mokasystem. GO−IMPL is the concrete GO representation, an instance of a userdefined class physically allocated on an AN . A thread can access a GO−IMPLby making a query to the GOS, which is organized as a distributed collection ofname spaces, each one (NSi) allocated on a different AN . Therefore, in order toaccess a GO−IMPL, it is necessary to know its location (node i). Moka offerstwo ways in order to do this: (1) by means of a static, immutable association(GN , i), established at the GN creation time; (2) by using a dynamic approach,where the GO−IMPL location is explicitly specified at access time. In this lattercase, one different implementation per node may be bound to a given GN ; so, it

538 M. Di Santo et al.

AONetworkConsumer

Namespace

GO_IMPL

GN

handler

Executed AO

Namespace

(a) (b)

NS i

Fig. 1. Creation of a new GO. (a) An AO, containing a GO−IMPL and a GN , isarriving on an node. (b) The network consumer runs the AO handler, which createsthe association (GN , GO−IMPL) in the NSi

is possible to realize a replicated implementation of a GO, even if the programmust explicitly ensure the consistency of replicas.

At the start of computation, only AO handlers can refer the local namespace(NSi). Subsequently, the NSi reference may be passed to other activities run-ning on the same node. When GO−IMPLs become shared resources, they mustbe explicitly protected from simultaneous accesses. The creation of a GO on anode i requires three operations executed on the node where the request starts:(1) generate a GN ; (2) create a specific AO containing both the GN and theGO−IMPL; (3) send the AO to the node i (see fig. 1), where its handler execu-tion binds the GN to the GO−IMPL in the local namespace NSi. An existentGO can be remotely accessed through the following operations: (1) reclaim fromthe GN the identifier of the node where the GO resides; (2) send to this nodean AO which looks up NSi with GN as key, in order to get the GO−IMPL;(3) execute on the GO−IMPL the operations abstractly requested on the GO.

Through a special form of active objects (AOc), Moka provides a deferredsynchronous send primitive that calls for a result produced by the executionof operations tied to a dispatched AOc. Sending an AOc does not suspend thecaller which immediately receives a promise, an object that will get the resultin the future. A suspension will occur only if the result is reclaimed before itis really available. An AOc can be modified and forwarded many times beforereturning the result (agent like model); so the caller may receive the result of acomplex distributed interaction.

4 The Java API of Moka

A Java package implements the proposed model, by offering an API that allowsprograms to create and dynamically configure the PAM , to create and sendactive objects, and to create global objects. Thread management and synchro-nization on global objects, are instead committed to the Java language defaultmechanisms. In fact, differently from the proposal in [2], we do not provide


Moka-level synchronization mechanisms, because we want Moka to be a mini-mal and extensible middleware.

The Moka package contains the following classes and interfaces: Moka, Acti-veObject, ActiveObjectCall, Promise, GlobalUid and LocalNameSpace. TheMoka class allows us to dynamically configure the PAM and provides program-mers with primitives for either asynchronously or deferred synchronously sendingactive objects to nodes, by using either point-to-point or point-to-multi-pointcommunication mechanisms, with the possibility to specify the single threadedupcall model (Moka.SINGLE) or the popup threads one (Moka.POPUP). It is worthnoting that, when a synchronous send primitive is used, the result is to be caughtby an instance of Promise. An active object is an instance of a user-defined classimplementing one of the Java interfaces ActiveObject and ActiveObjectCall,to be respectively used for asynchronous and deferred synchronous interactions.Instances of GlobalUid and LocalNameSpace classes respectively implement theglobal name (GN) of a GO and the namespace of a node i (NSi). In particular,Moka creates one LocalNameSpace instance per node, which is automaticallypassed to all the handlers. Moreover, the Moka API provides three classes ofactive objects: Create, InvokeVoidMethod and InvokeMethod, that, using thereflection Java package, allow programs to respectively create a global object,invoke on a GO a void method or a method returning a value.

For the sake of clarity, in the following, we present a simple program thatmultiplies in parallel two square matrices, A and B.

public class Main implements ActiveObject {

public void handler(LocalNameSpace ns) {

float[] a, b; int dim;

<read matrices a and b as mono-dimensional arrays of dim*dim>;

GlobalUid gn = new GlobalUid(); ns.bind(gn, new Matrix(b));

Moka.broadcast(new Create(gn, Matrix.class, b), Moka.SINGLE);

int numNodes = Moka.size(); int rfn = dim/numNodes;

float[] subM = new float[dim*rfn];

Promise[] result = new Promise[numNodes];

for(int node = 0; node < numNodes; node++) {

System.arraycopy(a, node*dim*rfn, subM, 0,rfn*dim);

result[node] = Moka.call(new SubMatrix(gid,subM),node,Moka.SINGLE); }

for(int node = 0; node < numNodes; node++) {

float[] r = (float[])result[node].getValue(); <print r>; }

}

}

public class Matrix implements Serializable {

private float mat[];

public Matrix(float[] m) { mat = m; }

public float[] multiply(float[] a) { return < a x mat >; }

}

public class SubMatrix implements ActiveObjectCall {

private float[] part; private GlobalUid gn;

public SubMatrix(GlobalUid g, float[] a) {gid = g; part = a;}

public Object handler(LocalNameSpace ns) {


return ((Matrix)ns.lookUp(gn)).multiply(part);

}

}

The algorithm is organized as follows: B is replicated on each node of thePAM , whereas A is dissevered into submatrices, each one formed by an equalnumber of rows and assigned to a different node of the PAM . The computationevolves through the parallel multiplication of each submatrix with B. In moredetail, the program creates the PAM and sends a first (Main) AO to one of itsnodes, where the handler replicates B as a global object by generating a globaluid (gn) and wrapping B in an instance of the class Matrix (og−impl); locally,this is obtained by directly binding, in the local name space, gn to a local in-stance of Matrix; remotely, by broadcasting an AO of the class Create, whichtakes charge to create a remote og−impl and bind it to gn in the name space ofthe remote node. Moreover, the handler divides A into dim/numNodes subma-trices, where dim is the dimension of A and numNodes is the size of the PAM ,obtained by invoking the primitive Moka.size. Each submatrix is wrapped intoan instance of the class SubMatrix, which implements the ActiveObjectCallinterface, and sent to each node by using the deferred synchronous send primitiveMoka.call; all these invocations immediately return a promise. Remotely, thehandler of SubMatrix gets the local instance of Matrix and invokes its methodmultiply. The resulting values are caught by Moka and implicitly sent to thenode that executed the main AO; here, they are extracted from the promise byusing the blocking Promise.getValue primitive.

5 Transparent vs. non-transparent distributedinteractions

The Moka model is based on non-transparent, distributed interactions amongthreads and global objects, whereas the middlewares based on the RMI modelmake possible the transparent invocation of methods on remote objects. Un-fortunately, in order to realize transparency, these systems use IDLs (InterfaceDescription Language) and stub generators, which complicate the developmentof distributed applications, especially when many remote objects are used. In ad-dition, using RMI, the interactions between different address spaces are limitedto the invocations of remote object methods, explicitly exported by an interface.So, it is not possible to remotely invoke class methods, to directly access thepublic data members of an object and to dynamically create remote objects.Moka instead, thanks to the possibility of remotely executing all the locallyexecutable operations, permits us to efficiently realize any kind of interactionsamong address spaces, without using stub generators and preserving a non trans-parent polymorphism between local and remote (global) objects. Moreover, ourmodel does not require global objects to be instances of particular classes, asJava RMI does; instead any object can become a global one after its binding in aNSi. In addition, the use of distributed interactions as first class objects allows


Moka to minimize communication costs when a distributed interaction impliesthe execution of methods whose results are used as arguments of other meth-ods on the same AN . Therefore, we argue that while transparent distributedinteractions allow programs to easily express remote invocations of methods, anon-transparent lower-level approach seems more efficient when object-orientedprogramming is used for high-performance parallel computing or when the in-terest is in the development of higher level systems.

6 Performance evaluation

At the present, Moka implements three transport modules, two respectivelybased on TCP and reliable UDP with multicast support, to be used on the In-ternet and local-area Ethernet networks, and the third, based on Illinois FastMessages [10], to be used on Myrinet networks. All these protocols can be con-temporary used by the PAM , because it is possible to specify a different protocolfor each pair of nodes.

In the following we present two graphs. The first graph shows the perfor-mances of the presented matrix multiplication example on a cluster composedof bi-processor PCs (PentiumII 350MHz) interconnected by a multi-port FastEthernet repeater and using the TCP as communication transport. We can ob-serve that, without taking into account the time necessary to transfer the rightmatrix, a good speedup is achieved for matrices over 400×400.

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

2 4 8

Tim

e in

mill

isec

onds

Number of PCs

Multiplication of matrices on the cluster

100x100200x200300x300400x400500x500600x600

0

50

100

150

200

250

300

350

400

450

500

100 200 300 400 500 600 700 800 900

Tim

e in

mill

isec

onds

Number of remote rows

Matrix - Vector multiplication

2 nodes sequential

The second graph shows the time needed to compute the product betweena 1000×1000 matrix and a vector on two Sun UltraSparcs interconnected byEthernet, without taking into account the time to transfer the matrix. We canobserve that the best performance is reached for a fifty-fifty splitting of the ma-trix rows on the two machines. Anywhere, the graph shows an absolute speedupeven for the other, less favorable splittings of the matrix. The anomaly in thecase of 100 remote rows is due to the need of initial remote code loading.

A third experiment was realized on a small wide-area system composed of aPentiumII 400MHz PC interconnected to 2 Sun UltraSparcs by means of a framerelay network (512 Kbps CIR, 2 Mbps CBIR). On this system, the product of


two matrices shows a little speedup only for the dimension of 500×500 floats (21sec. parallel, 33 sec. sequential). In fact, for smaller dimensions the grain is toosmall, while for larger ones the available bandwith is saturated.

7 Conclusions

We have shown how the integration of the remote evaluation model and theactive messages one permits Java programs to set-up global objects spaces and,with the use of promises, to realize distributed asynchronous interactions thatovercome some limitations of message passing and RPC. The proposed model hasbeen integrated into the middleware Moka which, thanks to the use of a multi-protocol transport, ensures acceptable performances both in local- and wide-areanetworks. At this end, we will provide a better integration of default Java serial-ization with the FM libray in order to improve performances on Myrinet clusters,which currently are only slightly better than the ones provided by TCP clusters.

References

1. N. J. Boden et al.: Myrinet: A Gigabit-per-Second Local Area Network. IEEEMicro, 15(1):29-36, 1995.

2. D. Caromel, W. Klauser, and J. Vayssiere: Towards Seamless Computing and Meta-computing in Java. Concurrency: Pract.&Exp., 10(11-13):1043-1061, 1998.

3. T. von Eicken et al.: Active Messages: A Mechanism for Integrated Communicationand Computation. 19th Ann. Int’l Symp. Computer Architectures, ACM Press, NY,256-266, 1992.

4. A. Ferrari: JPVM: Network Parallel Computing in Java. ACM Workshop on Javafor High-Performance Network Computing. Palo Alto, 1998.

5. I. Foster, C. Kesselmann, and S. Tuecke: The Nexus Approach to Integrating Mul-tithreading and Communication. J. of Par. and Distr. Computing, 37:70-82, 1996.

6. I. Foster, G. K. Thiruvathukal, and S. Tuecke. Technologies for Ubiquitous Super-computing: A Java Interface to the Nexus Communication System. Concurrency:Pract.&Exp., June 1997.

7. A. Geist et al.: PVM: Parallel Virtual Machine. The MIT Press, 1994.8. V. Getov and S. Mintchev: Towards a Portable Message Passing in Java.

http://perm.scsise.vmin.ac.uk/Publications/javaMPI.abstract.9. K. Langendoen, R. Bhoedjang, an H. Bal: Models for Asynchronous Message han-

dling. IEEE Concurrency, 28-38, April-June 1997.10. S. Pakin, V. Karamcheti, and A. A. Chien: Fast Messages: Efficient, Portable

Communication for Workstation Clusters and MPPs. IEEE Concurrency, 60-73,April-June 1997.

11. R. R. Raje, J. I. William, and M. Boyles: An Asynchronous Remote Method Invo-cation (ARMI) Mechanism for Java. Concurrency: Pract.&Exp.. 9(11):1207-1211,1997.

12. M. Snir et al.: MPI: The Complete Reference. The MIT Press, 1996.13. J. W. Stamos, and D. K. Gifford: Remote Evaluation. ACM Transactions on Com-

puter Systems, 12(4):537-565, October 1990.14. http://www.javasoft.com.


An Approach to Asynchronous Object-Oriented Parallel and Distributed Computing on Wide-Area Systems

Documents

Transcript of An Approach to Asynchronous Object-Oriented Parallel and Distributed Computing on Wide-Area Systems