A case study of a distributed high-performance computing system for neurocomputing

A case study of a distributed high-performance computingsystem for neurocomputing

D. Anguita *, A. Boni, G. Parodi

Department of Biophysical and Electronic Engineering, University of Genova, Via Opera Pia 11a, 16145 Genova, Italy

Received 30 August 1998; received in revised form 16 February 1999; accepted 1 April 1999

Abstract

We model here a distributed implementation of cross-stopping, a combination of cross-validation and early-stopping

techniques, for the selection of the optimal architecture of feed-forward networks. Due to the very large computational

demand of the method, we use the RAIN system (Redundant Array of Inexpensive workstations for Neurocomputing)

as a target platform for the experiments and show that this kind of system can be e�ectively used for computational

intensive neurocomputing tasks. Ó 2000 Elsevier Science B.V. All rights reserved.

Keywords: Neurocomputing; Workstation cluster; Statistical validation

1. Introduction

Arti®cial Neural Networks (ANNs) are an ef-fective tool for Pattern Recognition (PR) tasks [4].The classi®cation underlying a typical PR problemare usually unknown, but ANNs are able to learnthem from examples (i.e. a set of already classi®edsamples called the training set). After learning, themost important performance measure of a neuralnetwork is its generalisation capacity, or, in otherwords, the ability to correctly classify a new pat-tern according to the rules learnt on the trainingset.

Asymptotic results have demonstrated the op-timal behaviour of neural networks in classi®ca-tion tasks [15]; unfortunately one is confronted, inpractice, with a limited data set where the gener-

alisation is far from being optimal. A large amountof theoretical work has been developed, based onstatistical learning theory, and outstanding resultshave been derived (e.g. [16]). However, practitio-ners have found in many occasions that thesebounds are overall pessimistic and often the be-haviour of neural networks is better than predictedby theory.

A practical way to compute the generalisationcapacity of ANNs, and many other PR systems, isto split the available examples into training andtest set: the generalisation error of the system,trained on the former is estimated according to thelatter. Unfortunately, in many cases, such samplesare too few, so a further subdivision inevitablyinvolves a loss of e�ciency in the system design.Furthermore it has been demonstrated [10] thatsuch an approach is very sensitive to the speci®csplitting of the data. Several other techniques havebeen designed to overcome this problem [11];nevertheless the cost of using such methods is ahigher computational requirement.

www.elsevier.com/locate/sysarc

Journal of Systems Architecture 46 (2000) 429±438

* Corresponding author. Tel.: +39 10 3532800; fax: +39 10

3532175; e-mail: [email protected]

1383-7621/00/$ - see front matter Ó 2000 Elsevier Science B.V. All rights reserved.

PII: S 1 3 8 3 - 7 6 2 1 ( 9 9 ) 0 0 0 1 7 - X

https://www.researchgate.net/publication/2462705_Cross-Validation_with_Active_Pattern_Selection_for_Neural_Network_Classifiers?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

https://www.researchgate.net/publication/5569054_The_multilayer_perceptron_as_an_approximation_to_a_Bayes_optimal_discriminant_function_IEEE_Trans_Neural_Networks_1_296-298?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

https://www.researchgate.net/publication/3302395_A_bootstrap_evaluation_of_the_effect_of_data_splitting_on_financial_time_series_IEEE_Trans_Neural_Netw?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

https://www.researchgate.net/publication/278651717_The_Nature_of_Statistical_Learning_Theory?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

We focus on two practical methods for de-signing e�ective neural networks: k-fold cross-validation and early-stopping. The ®rst one is amethod for estimating the generalisation capa-bilities of the network [8]: given n patterns, sev-eral networks are designed using a set of �nÿ k�patterns and the average of the error rate on the ktest patterns is computed, then the network withthe lowest average error is selected and trained onthe entire set. The case k � 1 is the well-knownleave-one-out method [13]. The second techniqueis used to avoid over®tting: the learning is stop-ped at the minimum of the error on the test set,usually well before reaching the minimum on thetraining set. It has been argued [18] that this isequivalent to design a network with smallercomplexity and therefore better generalisationability. It is quite obvious that the computationalload of the cross-validation can be very high; infact, several networks must be retrained on dif-ferent sets of �nÿ k� patterns.

We experiment on the combination of the twomethods described above: cross-validation is usedto estimate the performance of di�erent networkarchitectures stopped at di�erent times duringlearning. We use the term cross-stopping to indi-cate this procedure in the same way as the termboot-stopping is used for the combination ofbootstrap and early-stopping [17].

The purpose of cross-stopping is to ®nd a net-work with an optimal architecture (i.e. number ofhidden neurons) and train for an optimal numberof steps, with respect to its estimated generalisat-ion ability.

In this paper, we describe a distributed imple-mentation of cross-stopping using the RAIN sys-tem (Redundant Array of Inexpensiveworkstations for Neurocomputing), consisting ofseveral workstations connected by a LAN, actingas a single parallel computer.

In the following section, we brie¯y describethe rationale behind the RAIN system. Section 3details the cross-stopping algorithm asimplemented on RAIN, and in Section 4 weprovide a computing model for evaluating theperformance of such systems. Section 5 showssome experimental results obtained on real-worldproblems.

2. The RAIN system

The need for high-performance computing inapproaching statistical validation methods is quiteobvious; in our case, there are at least two mainreasons for resorting to this kind of distributedarchitecture as computing platform:· the cross-validation phase is trivially parallel. It

consists of several independent learning phases inwhich communication is usually negligible withrespect to the computation time. Eventually,some data transfer occurs only at the end ofeach learning phase;

· the early-stopping method must use a slowlearning algorithm. Fast (e.g. second order)methods are of little use in this case because theywould easily miss the optimal stopping point,where the error on the test set is at its minimum;therefore, a high-performance computing sys-tem is needed to achieve a reasonable computingtime.There are many reasons for using a workstation

cluster instead of a traditional parallel computeror a dedicated neurocomputer. A trivial observa-tion is that only few dedicated machines for ne-urocomputing come from a non-academicenvironment and are available on the market;furthermore the performance gap between dedi-cated neurocomputers and conventional high-performance microprocessors is narrowing at avery fast rate. It is also worthwhile to mention thatthe software for dedicated hardware is much lessportable than the code developed for generalpurpose computers and that the latter can be de-veloped making use of standard BLAS [1,5,6]routines that provide virtually a hardware ab-stracting layer. Lastly, costs are greatly reducedboth on start-up and on the medium and longterm, because general purpose workstations alloweasy upgradability towards more powerful sys-tems. One factor that fuelled much scepticismabout the feasibility of network-based high-per-formance computing was the limitations imposedby traditional LANs: (for communication±inten-sive, high-performance applications, traditionalnetworks simply cannot provide adequate perfor-mances); on the other hand, statistical validationmethods, like cross-stopping, are intrinsically

430 D. Anguita et al. / Journal of Systems Architecture 46 (2000) 429±438

https://www.researchgate.net/publication/2984684_Performance_of_LAPACK_A_portable_library_of_numerical_linear_algebra_routines?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

https://www.researchgate.net/publication/260851629_A_highly_efficient_implementation_of_Back-propagation_algorithm_on_SIMD_computers?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

https://www.researchgate.net/publication/221620160_A_Comparison_between_Neural_Networks_and_other_Statistical_Techniques_for_Modeling_the_Relationship_between_Tobacco_and_Alcohol_and_Cancer?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

https://www.researchgate.net/publication/260851114_Use_of_Level_3_BLAS_kernels_in_neural_networks_the_Back-Propagation_algorithm?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

https://www.researchgate.net/publication/3191886_Small_Sample_Size_Effects_in_Statistical_Pattern_Recognition_Recommendations_for_Practitioners?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

https://www.researchgate.net/publication/264944677_Computer_Intensive_Statistical_Methods_Validation_Model_Selection_and_Bootstrap?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

parallel and only a small amount of communica-tion is needed between processors, thereforegreatly avoiding the LAN bottleneck.

The basic architecture of the RAIN system isvery simple: one of the workstations acts as thecontroller (master) of the cluster and is usually theconsole to which the user is connected. Each oneof the independent learning tasks is spawned bythe master on one of the available workstations(slave). The slaves of the cluster are provided witha highly e�cient implementation of backpropa-gation (MBP ± Matrix Back Propagation [2]) tospeed up computation. As soon as a learning phaseis completed, the slave sends back the computedinformation and gets itself ready to accept anothertask. The master keeps an updated table contain-ing the status of each learning step, then it dy-namically spawns new learning tasks each timestarting with the last saved status.

3. Cross-stopping on RAIN

3.1. The MBP algorithm

In Table 1 the MBP algorithm is reported [2].The weights and biases of the network are storedrespectively in matrices WH, WO and vectors bH,bO. Input patterns are stored in matrix SI in roworder and target patterns in matrix T. Matrices SH

and SO contain the output of the correspondinglayer when SI is applied to the input of the net-

work, assuming f �� as the activation function ofthe neurons. The back-propagated error is storedin matrices DO and DH and the variations ofweights and biases, computed at each iteration, arestored, respectively, in matrices DWH, DWO andvectors DbH, DbO.

We refer the reader to previous works for per-formance consideration of MBP [3,2]

Our target is a distributed system, therefore it isnecessary to identify a status of each MBP instancethat allows to resume the algorithm at a given stepof the learning on any computing node. Table 2summarises the variables that constitute thisstatus.

3.2. Cross-validation and early-stopping: The cross-stopping algorithm

Let us suppose that our data set D contains npatterns. We build N learning sets Li, composedby nÿ k patterns by randomly splitting D, and Ncorresponding test sets Ti, built with the remain-ing k patterns. We indicate with Ei�h; t� the errorperformed on Ti by a network with h hiddenneurons after t learning iterations on the set Li.The Cross-Validation estimate is computed as theaverage of Ei 8i. Note that we deliberately omittedto specify k: k � 1 gives the well-known leave-one-

Table 1

The MBP algorithm

Step Description

Feed-forward SH � f SI �WH � bH � 1t� �SO � f SH �WO � bO � 1t� �

Error back-prop DO � �Tÿ SO��SO � �1 � 1t ÿ SO�

DH � DO �WtO

�SH � �1 � 1t ÿ SH�Weight variation DWO gSt

H � DO � aDWO

DbO gDtO � 1� aDbO

DWH gStI � DH � aDWH

DbH gDtH � 1� aDbH

Weight update WO;H WO;H � DWO;H

bO;H bO;H � DbO;H

Table 2

The status of the MBP Algorithm

Variable Description

WO;H Weights of the output and

hidden layer

bO;H Biases of the output and

hidden layer

DWO;H Delta weight at step nÿ1

DbO;H Delta bias at step nÿ1

UWO Weight updating step of the

output layer

(UWO � St

H � DO)

UbO Bias updating step of the

output layer

(UbO � Dt

O � 1)

UWH Weight updating step of the

hidden layer

(UWH � St

I � DH)

UbH Bias updating step of the

hidden layer

(UbH � Dt

H � 1�

g Learning step

a Momentum

E Network error

D. Anguita et al. / Journal of Systems Architecture 46 (2000) 429±438 431

https://www.researchgate.net/publication/220549439_An_efficient_implementation_of_BP_on_RISC-based_workstations?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==



out algorithm, but for computational reasons otherchoices can be made. Usually k � 5 or k � 10 arecommon values; the reader can refer to [9] formore information on this issue.

Let N maxt and N max

h be the upper limit for, re-spectively, the number of learning iterations andthe number of hidden neurons; then the cross-stopping algorithm can be described as in Table 3.

As can be easily noted, step 4. of the algorithmcan be performed using N parallel tasks for all thevalues of i 2 �1;N �, each one executing an MBPinstance.

Following a typical master±slave model, themaster workstation, is responsible for processspawning, initialisation, keeping track of alllearning tasks and collecting the results; it is alsoallowed to run slave processes, and this is due tothe fact that it is usually the fastest computer in thecluster. Furthermore the master keeps a table Zupdated, with N entries, where the status of eachMBP instance is stored: each entry Zi contains thevariables listed in Table 2. Every time the counter tis increased, a new slave is chosen to computeEi�h; t�, starting from the last saved status Zi.

The master maintains a set S of N tasks to beexecuted and a set R of running tasks. Each task inR is associated with one of the slave computers.While S 6� ; there are four possible events:1. at least one of the slaves is available: the master

selects a task from S to be executed on theavailable slave and put it in the set R;

2. a slave completes its task: the master receivesthe computed error and the ®nal status of theassociated MBP, so deletes the correspondingtask from R and updates the corresponding en-try in the table Z;

3. a slave is deleted from the cluster, due to somefault or to user intervention: the correspondingtask is moved back from R to S;

4. a slave is added to the cluster: see event 1.Fig. 1 sketches the described system. The task

distribution is implemented through the well-known message passing interface Parallel VirtualMachine (PVM) [7]; it allows to distribute pro-cesses and data across a large variety of di�erentarchitectures and operating systems hiding all thedata conversions from the user.

As can be easily deduced from the schedulingdescribed above, the distribution of tasks is totallyasynchronous and fault-tolerant: if one of theslaves fails to respond, its task is rescheduled onthe ®rst available slave.

4. A time-computing model for the distributed cross-

stopping algorithm

The number of operations (nMBPop ) needed by the

MBP algorithm has been computed in Ref. [3].Using this result, we can write, for our particularcase:

nMBPop �h� � �2Np�3NO � 2NI� � �3� k1 � k2�Np

� 4�NO � NI� � 4�h� ��3� k1 � k2�NpNO ÿ Np � 4NO�; �1�

where the network is composed of NI input neu-rons, 1 � h � Nmax

h hidden neurons and NO outputneurons; the training set consists of Np � nÿ kpatterns and k1 and k2 are respectively, the numberof operations needed for the computation of theactivation function of the neurons and its deriva-tive. if the activation function is a sigmoid or hy-perbolic tangent then k1 � 6 and k2 � 2 [5].

Let t be the number of training steps and VMBP

the average speed of the algorithm (in MFLOPS �Millions of ¯oating-point operations per second),

then the computing time of one instance of MBPis:

Table 3

The cross±stopping algorithm

1. for h:� 1 to Nmaxh

2. for t:� 1 to Nmaxt step Dt

3. for i:� 1 to N4. choose an available slave to

compute Ei�h; t� using Li and Ti

5. endfor

6. compute E�h; t� � 1N

Pi Ei�h; t�

7. if E�h; t� < Ebest then

8. Ebest :� E�h; t�9. hopt :� h10. topt :� t11. endif

12. endfor

13. endfor

14. train a network with hopt neurons for

topt steps on D



https://www.researchgate.net/publication/262566867_A_Study_of_Cross-Validation_and_Bootstrap_for_Accuracy_Estimation_and_Model_Selection?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

https://www.researchgate.net/publication/247302096_PVM_Parallel_Virtual_Machine_A_Users''Guide_and_Tutorial_for_Networked_Parallel_Programming_Camb?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

Te�h; t� �nMBP

op �h�tVMBP

� f�2Np�3NO � 2NI� � �3� k1 � k2�Np

� 4�NO � NI� � 4�h� ��3� k1 � k2�

� NpNO ÿ Np � 4NO�g tVMBP

: �2�

Every time the master starts a new learning task itmust send the corresponding entry Zi to the se-lected slave. Let nfloat be the number of bits used torepresent a ¯oating-point number, then

nbit � 3nfloat��NO � NI � 1�h� �NO � 1��bits are sent for each learning task. Obviously,when a learning task ®nishes, the slave sends backthe computed error and a new status, therefore wehave to take this term in account twice.

The total transmission time of each Zi is:

Tt�h� � nbit

BW

� 3nfloat

BW

��NO � NI � 1�h� �NO � 1��;�3�

where BW is the bandwidth of the network (inMbit/s).

The total time of this implementation of thecross-stopping algorithm is therefore:

TCS �XNmax

h

h�1

XNmaxt

t�1;stepDt

nk

2Tt�h��

� Te�h;Dt�nc

�; �4�

where nc is the number of computing nodes of thecluster. Note that n=k learning tasks, that is in-stances of MBP, are executed and that the com-puting time of every learning task depends only onh and Dt.

Finally, expanding all the terms in the previousequation, we have:

TCS � N maxt nDtk

XNmaxh

h�1

DtncVMBP

��2Np�3NO � 2NI�

� �3� k1 � k2�Np � 4�NO � NI� � 4�h� ��3� k1 � k2�NpNO ÿ Np � 4NO��

� 23nfloat

BW

��NO � NI � 1�h� �NO � 1��:

�5�This is not the only way to implement the cross-stopping algorithm on a distributed architecture.In fact, we can discard the status table Z, avoidingto transmit a training status each time a new taskis spawned to a slave. Obviously, the new MBPinstance will be forced to start the training fromscratch. This means that the learning restarts from

Fig. 1. Cross-stopping on RAIN: The Master±Slave model.


the original weight values. The only informationtransmitted between the nodes is the ANN error,that each slave must send back to the master. Notethat the corresponding transmission time is negli-gible in comparison to the computing time.

As one expects, this implementation should beadvantageous if the connecting network is rela-tively slow; in this case the time spent to transmiteach Zi could be signi®cant. The total time for thisimplementation is

T �2�CS �XNmax

h

h�1

XNmaxt

t�1;stepDt

nk

Te�h; t�nc

� �; �6�

where Te depends on h and t, therefore

T �2�CS �XNmax

h

h�1

XNmaxt

t�1;stepDt

nk

tncVMBP

��2Np�3NO

�� 2NI�

� �3� k1 � k2�Np � 4�NO � NI� � 4�h

� ��3� k1 � k2�NpNO ÿ Np � 4NO��: �7�

Through Eqs. (5) and (7) we provide two quiteeasy-to-use expressions for deducing the bestmodel to use; it can be observed that they are afunction of the number of learning steps, thestructure of the neural network, and the band-width of the communication network. The usercan easily compute both and decide which modelto use. As we will see in the next section, there arecases where the latter implementation performsbetter than the former.

5. Experimental results

The cluster used for our experiments is com-posed of seven heterogeneous workstations: themaster machine is an HP9000/735. The slaves aretwo IBM6000 (550 and 250, respectively) and fourpersonal computers (HP Pentium Pro). The threeworkstations are interconnected by an Ethernet10Base2 LAN (10 Mb/s), whereas the four PCs usean HP 100 VGAnyLAN (100 Mb/s). The twoLANs are interconnected through a bridge. Notethat the raw computational power and cost of theseven computers is very di�erent: this is a desirableexperimental setting because one of the main issues

of the RAIN system is the use, where possible, ofexisting and commodity hardware. Furthermore,the four PCs are running the Windows NT oper-ating system, while the other machines o�er dif-ferent ¯avours of UNIX.

In order to prove the e�ectiveness of the cross-stopping algorithm we tested the system on severalreal-world data sets from the UCI database [12]and on the Lyme disease data set [14]. The per-formances of the system are reported in Millionsof Connection Updates Per Second (MCUPS),and this is the standard measure for Neural Net-works. MCUPS is usually used in the place of timebecause it constitutes a normalisation to theproblem.

Table 4 shows the optimal network sizes foundby cross-stopping, its estimated error E and theactual error on the validation set Ev, using real-world problems. Note that there is a substantialagreement between the error estimated by cross-stopping method and the error performed on thevalidation set. Perfect agreement is obviously im-possible due to the statistical ¯uctuations andconsidering the fact that Ev is also an estimate ofthe generalisation error that depends on the par-ticular choice of the validation set.

The computing performance of the cluster wasmeasured with the following experiments. In Fig. 2the e�ect of tuning the core operations of MBP,that is the matrix multiplication, is reported (thereader can refer to [2] for an exhaustive explana-tion of the tuning operations of MBP): as shown inTable 5, this tuning allows to obtain speed-upsranging from 1:06� to 2:3� for the Lyme data set.In this experiment, we used a network with 54input neurons and 5 hidden neurons.

In Fig. 3 a comparison between the modelfound in the previous section and the actual exe-cution time is sketched. In this experiment we usedthe Sonar data set and a leave-one-out imple-

Table 4

Generalisation estimate on various problems

Problem Optimal

network size

E Ev

Heart disease 13±2±1 16.5 15.7

Liver disorders 6±9±1 26.5 29.0

Lyme disease 54±5±1 9.2 6.6



https://www.researchgate.net/publication/244446198_UCI_Repository_of_Machine_Learning_Databases_Machine-Readable_Data_Repository?el=1_x_8&enrichId=rgreq-b023d71f-6e86-4e01-8ae0-6375e9becc95&enrichSource=Y292ZXJQYWdlOzIyMjY2MzYwMjtBUzoxMDQ5OTEyMzA1Mjk1MzZAMTQwMjA0MzI2ODQ1Nw==

mentation. For simplicity, the number of hiddenneurons is ®xed to four. The hardware platform isthe homogeneous cluster of 4 PC Pentium Proin order to guarantee a homogeneous networkperformance also.

Finally, in Fig. 4, a comparison between thetwo models described in the previous section (TCS

and T �2�CS ) is shown, using the sonar data set andassuming a slow connection between the nodes

Fig. 3. Comparison of execution times: Model vs. real.

Fig. 2. E�ect of tuning of the matrix multiplication (in MFLOPS).

Table 5

E�ect of tuning on the BP algorithm (in MCUPS)

Code HP735 IBM550 IBM250 PPro

Standard 13.9 1.9 4.4 7.9

Tuned 14.7 4.4 4.4 9.0


(e.g. BW � 33:6�Kbit=s�). Note that for a smallnumber of learning steps, it is worthwhile to usethe second implementation.

In Table 6 we show the performance (inMCUPS) of the cross-stopping algorithm obtainedusing the entire cluster compared to the perfor-mance of each machine. In the last column theideal performance of the whole cluster achieved byadding the performance of each machine is shown;such a value shows that even if the master is thefastest computer of the RAIN cluster, the useralways bene®ts from additional computing powerprovided by adding more machines; the reader canverify the validity of this sentence by observingthat if he adds the performances of all the work-station but the two IBM, that is the workstationswith lesser power, he achieves an ideal perfor-mance less than the one obtained by the RAIN

system; furthermore, the e�ciency of the system isalways above 80%. In this experiment, we used allthe patterns of the corresponding data set withk � 1, the most demanding scenario from a com-putational point of view.

6. Conclusions

We have sketched an implementation of thecross-stopping method on a distributed high-per-formance computing system. We believe thatcomputer intensive statistical methods will gainmore and more popularity thanks to greater avail-ability of low cost high-performance computing.The RAIN system has been developed to demon-strate the feasibility of an e�ective and low-costapproach to high-performance neurocomputing.

Fig. 4. Comparison of execution time models.

Table 6

Performance of single machines (A�HP735, B� IBM550, C� IBM250, D�PPRO), and complete cluster (E� real performance,

F� ideal performance)a

Problem # of patterns A B C D E F

Heart disease 270 8.0 1.7 1.9 4.0 25.0 27.6

Liver disorders 345 4.4 0.9 1.0 2.3 13.0 15.5

Lyme disease 684 14.7 4.4 4.4 9.0 55.0 63.1

aThe measures are in MCUPS.


More information on RAIN and instructionson how to obtain the code are available at thefollowing WWW address: http://www.esng.

dibe.unige.it/RAIN.

Acknowledgements

The RAIN project acts within the framework ofthe European Commission Esprit Programme:``Demonstration and assessment of HPCN inneural network applications for industry andmedicine''. The Pentium Pro PCs have been kindlydonated by Hewlett±Packard, Italy. The Lymedata set is courtesy of ``Centro ReumatologicoIstituto Bruzzone'', Genova, Italy. We thank theanonymous reviewers for their suggestions on howto improve the original manuscript.

References

[1] E.C. Anderson, J. Dongarra, Performance of LAPACK: A

portable library of numerical linear algebra routines, Proc.

IEEE 81 (1993) 1094±1101.

[2] D. Anguita, G. Parodi, R Zunino, An e�cient implemen-

tation of BP on RISC based workstations, Neurocomput-

ing 6 (1994) 57±65.

[3] D. Anguita, B.A. Gomes, Mixing ¯oating- and ®xed-point

formats for neural network learning on neuroprocessors,

Microprocessing and Microprogramming 41 (1996) 757±

769.

[4] C. Bishop, Neural Networks for Pattern Recognition,

Clarendon Press, Oxford, 1995.

[5] A. Corana, C. Rolando, S. Ridella, A highly e�cient

implementation of back±propagation algorithm on SIMD

computers, in: J.-L. Delhaye, E. Gelenbe (Eds.), High

Performance Computing (1989) 181±190.

[6] A. Corana, C. Rolando, S. Ridella, Use of Level 3 BLAS

Kernels in neural networks: The back±propagation algo-

rithm, Parallel Computing 89 (1990) 269±274.

[7] A. Geist et al., PVM: Parallel Virtual Machine, a User's

Guide and Tutorial for Networked Parallel Computing,

The MIT Press, 1994.

[8] J.S. Hjorth, Computer Intensive Statistical Methods:

Validation Model Selection and Bootstrap, Chapman &

Hall, London, 1994.

[9] R. Kohavi, A study of cross-validation and bootstrap for

accuracy estimation and model selection, in: Proc. IJCAI

1995, Montreal, Canada, 1995.

[10] B. LeBaron, A.S. Weigend, Bootstrap evaluation of the

e�ect of data splitting on ®nancial time series. IEEE

Transaction on Neural Network 9(1) (1998) 213±220.

[11] F. Leisch, L.C. Jain, K. Hornik, Cross-validation with

active pattern selection for neural network classi®ers. IEEE

Transaction on Neural Network 9(1) (1998) 35±41.

[12] P.M. Murphy, D.W. Aha, UCI Repository of machine

learning databases, http://www.ics.uci.edu/ mlearn/MLRe-

pository.html, University of California, Department of

Information and Computer Science, Irvine, CA, 1994.

[13] S.J. Raudys, A.K. Jain, Small sample size e�ects in

statistical pattern recognition: recommendations for prac-

tioners, IEEE Transaction on Pattern Analysis and Ma-

chine Intelligence 13 (1991) 252±263.

[14] S. Rovetta, R. Zunino, L. Bu�rini, G. Rovetta, Prototyp-

ing neural networks learn Lyme borreliosis, in: Proceedings

of the 8th IEEE Symposium on Computer Based Medical

Systems, Lubbock, Texas, USA, 9±11 June 1995.

[15] D.W. Ruck, S.K. Rogers, M. Kabrisky, M.E. Oxley, B.W.

Suter, The multilayer perceptron as an approximation to a

bayes optimal discriminant function, IEEE Transactions

on Neural Networks 1 (1990).

[16] V.N. Vapnik, The Nature of Statistical Learning Theory,

Springer, Berlin, 1995.

[17] T. Plate, P. Band, J. Bert, J. Grace, A comparison between

neural networks and other statistical techniques for mod-

elling the relationship between tobacco and alcohol and

cancer, in: M.C. Mozer, M.I. Jordan, T. Petsche (Eds.),

Advances in Neural Information Processing Systems, vol.

9, The MIT Press, Cambridge, 1997, pp. 967±973.

[18] C. Wang, S.S. Venkatesh, J.S. Judd, Optimal stopping and

e�ective machine complexity in learning, in: J.D. Cowan,

G. Tesauro, J. Alspector (Eds.), Advances in Neural

Information Processing Systems, Morgan±Kaufmann,

vol. 6, 1994, pp. 303±310.

Davide Anguita was born in Genoa in1963. He obtained the ``Laurea'' de-gree in Electronic Engineering in 1989and the Ph.D. in Electronic Engineer-ing and Computer Science in 1994from University of Genoa. Since 1993he is working in the ®eld of modeling,simulation and VLSI implementationof arti®cial neural network. He iscurrently assistant professor at theDepartment of Biophysical and Elec-tronic Engineering of the University ofGenoa.


















































Andrea Boni was born in Genoa, Italy,in 1969 and received the Laurea degreein Electronic Engineering from Uni-versity of Genoa, Italy, in 1996. He ispursuing the Ph.D. degree in Elec-tronic and Computer Science Engi-neering in Electronic Systems Groupof Department of Biophysical andElectronic Engineering (DIBE) atUniversity of Genoa. In 1997 he op-erated as research consultant withDIBE. His main scienti®c interests fo-cus on engineering of high-perfor-mance systems and Neural Networks.

Giancarlo Parodi was born in Genovain 1948. He received the ``Laurea''degree in Electronic Engineering in1973 from the University of Genova.He was Associate Professor of AppliedElectronics at DIBE until 1994. Cur-rently he is Full Professor of AppliedElectronics at the same Department.He is a member of AEI, AICA andIEEE. Giancarlo PARODI is currentlyteaching the following courses:· Industrial electronics,· Applied electronics.


A case study of a distributed high-performance computing system for neurocomputing

Documents

Transcript of A case study of a distributed high-performance computing system for neurocomputing