Parallelizing the fuzzy ARTMAP algorithm on a Beowulf cluster

6
Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005 Parallelizing the Fuzzy ARTMAP Algorithm on a Beowuif Cluster Jimmy Secretan(*), Jose Castro(**), Michael Georgiopoulos(*), Joe Tapia(*), Amit Chadha(*), Brian Huber(*), Georgios Anagnostopoulos(***), Samuel Richie(*) (*) Dep. of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32816 (**) Comp. Eng., Instituto Technologico de Costa Rica, Cartago, Costa Rica (***) Dept of Electrical and Computer Engineering, Florida Institute of Technology, Melbourne, FL, 32901 Abstract- Fuzzy ARTMAP neural networks have been proven to be good classifiers on a variety of classification problems. However, the time that it takes Fuzzy ARTMAP to converge to a solution increases rapidly as the number of patterns used for training increases. In this paper we propose a coarse grain parallelization technique, based on a pipeline approach, to speed-up Fuzzy ARTMAP's training process. In particular, we first parallelized Fuzzy ARTMAP, without the match-tracking mechanism, and then we parallelized Fuzzy ARTMAP with the match-tracking mechanism. Results run on a Beowulf cluster with a well known large database (Forrest Covertype database from the UCI repository) show linear speedup with respect to the number of processors used in the pipeline. I. INTRODUCTION Neural Networks have been used extensively and success- fully to tackle a wide variety of problems. As computing capacity and electronic databases grow, there is an increasing need to process considerably larger databases. Neural network algorithms can have a prohibitively slow convergence to a solution, especially when they are trained on large databases. Even one of the fastest (in terms of training speed) neural network algorithms, the Fuzzy ARTMAP algorithm [3], and its faster variations ([6], [9]) tend to converge slowly to a solution as the size of the network increases. One obvious way to address the problem of slow conver- gence to a solution is by the use of parallelization. This paper focuses on parallelization strategies for FAM on a Beowulf cluster. A Beowulf cluster is a collection of standard PC work- stations formed into a single, cohesive supercomputer by a fast network and open source software. For many applications, especially those of interest to the data mining community, the Beowulf architecture offers unparalleled performance for the price. Regarding the parallelization of ART neural networks it is worth mentioning the work by Manolakos [8] who has implemented the ART I neural network [4] on a ring of processors, and the work of Malkani and Vassiliadis [7], who have implementated Fuzzy-ARTMAP on a hypercube architecture. In the latter paper, a hypercube topology is utilized for transferring data to all of the nodes involved in the computations. Each of the processors maintains a subset of the architecture's templates, and finds the template with the maximum match in its local collection. Finally, in its d- dimensional hypercube, it finds the global maximum through d different synchronization operations. This can limit the scalability of this approach. The Fuzzy ARTMAP neural network has many desirable characteristics, such as the ability to solve any classification problem, the capability to learn from data in an on-line mode, the advantage of providing interpretations for the answers that it produces, the capacity to expand its size as the problem requires it and the ability to recognize novel inputs, among others. Due to all of its above properties it is worth in- vestigating Fuzzy ARTMAP's parallelization in an effort to improve its convergence speed to a solution when it is trained with large datasets. In this paper, our focus is to improve the convergence speed of ART-like neural networks through a parallelization strategy applicable for a pipeline structure (Beowulf cluster). We chose to demonstrate the effectiveness of our proposed parallelization strategy on Fuzzy ARTMAP since, if we demonstrate its effectiveness for Fuzzy ARTMAP, its extension to other ART structures can be accomplished without much additional effort. This is due to the fact that the other ART structures share a lot of similarities with Fuzzy ARTMAP, and as a result, the proposed parallelization approach can be readily extended to other ART variants. II. Fuzzy ARTMAP ALGORITHM The Fuzzy ARTMAP neural network and its associated architecture was introduced by Carpenter and Grossberg in their seminal paper [3]. Kasuba [6] and Taghi, Baghmisheh, and Pavesic [9] presented a simplified version of the original Fuzzy ARTMAP architecture that was equivalent to Fuzzy ARTMAP for classification problems. In our paper, we have implemented the simplified Fuzzy ARTMAP version from Taghi, called SFAM2.0, that we refer to as FS-FAM or alternately as Fuzzy ARTMAP. We assume that the reader is familiar with the simplified Fuzzy ARTMAP architecture consisting of an input layer, a category representation layer (where compressed representations of the inputs are formed) and an output layer. FS-FAM can operate in two distinct phases: the training phase and the performance phase. In the training phase of FS-FAM a collection of input/output (such as {(I', ab (I')),... ,(f, ab (Ir)),... (IPT ab (IPT))}) are repeatedly presented to FS-FAM until FS-FAM learns the desired mappings from inputs to outputs (referred to as labels 0-7803-9048-2/05/$20.00 ©2005 IEEE 475

Transcript of Parallelizing the fuzzy ARTMAP algorithm on a Beowulf cluster

Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005

Parallelizing the Fuzzy ARTMAP Algorithm on aBeowuif Cluster

Jimmy Secretan(*), Jose Castro(**), Michael Georgiopoulos(*),Joe Tapia(*), Amit Chadha(*), Brian Huber(*), Georgios Anagnostopoulos(***), Samuel Richie(*)

(*) Dep. of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32816(**) Comp. Eng., Instituto Technologico de Costa Rica, Cartago, Costa Rica

(***) Dept of Electrical and Computer Engineering, Florida Institute of Technology, Melbourne, FL, 32901

Abstract- Fuzzy ARTMAP neural networks have been provento be good classifiers on a variety of classification problems.However, the time that it takes Fuzzy ARTMAP to convergeto a solution increases rapidly as the number of patternsused for training increases. In this paper we propose a coarsegrain parallelization technique, based on a pipeline approach, tospeed-up Fuzzy ARTMAP's training process. In particular, wefirst parallelized Fuzzy ARTMAP, without the match-trackingmechanism, and then we parallelized Fuzzy ARTMAP with thematch-tracking mechanism. Results run on a Beowulf clusterwith a well known large database (Forrest Covertype databasefrom the UCI repository) show linear speedup with respect tothe number of processors used in the pipeline.

I. INTRODUCTIONNeural Networks have been used extensively and success-

fully to tackle a wide variety of problems. As computingcapacity and electronic databases grow, there is an increasingneed to process considerably larger databases. Neural networkalgorithms can have a prohibitively slow convergence to asolution, especially when they are trained on large databases.Even one of the fastest (in terms of training speed) neuralnetwork algorithms, the Fuzzy ARTMAP algorithm [3], andits faster variations ([6], [9]) tend to converge slowly to asolution as the size of the network increases.One obvious way to address the problem of slow conver-

gence to a solution is by the use of parallelization. This paperfocuses on parallelization strategies for FAM on a Beowulfcluster. A Beowulf cluster is a collection of standard PC work-stations formed into a single, cohesive supercomputer by afast network and open source software. For many applications,especially those of interest to the data mining community, theBeowulf architecture offers unparalleled performance for theprice.

Regarding the parallelization of ART neural networks itis worth mentioning the work by Manolakos [8] who hasimplemented the ART I neural network [4] on a ring ofprocessors, and the work of Malkani and Vassiliadis [7],who have implementated Fuzzy-ARTMAP on a hypercubearchitecture. In the latter paper, a hypercube topology isutilized for transferring data to all of the nodes involved inthe computations. Each of the processors maintains a subsetof the architecture's templates, and finds the template withthe maximum match in its local collection. Finally, in its d-dimensional hypercube, it finds the global maximum through

d different synchronization operations. This can limit thescalability of this approach.The Fuzzy ARTMAP neural network has many desirable

characteristics, such as the ability to solve any classificationproblem, the capability to learn from data in an on-line mode,the advantage of providing interpretations for the answers thatit produces, the capacity to expand its size as the problemrequires it and the ability to recognize novel inputs, amongothers. Due to all of its above properties it is worth in-vestigating Fuzzy ARTMAP's parallelization in an effort toimprove its convergence speed to a solution when it is trainedwith large datasets. In this paper, our focus is to improvethe convergence speed of ART-like neural networks througha parallelization strategy applicable for a pipeline structure(Beowulf cluster). We chose to demonstrate the effectivenessof our proposed parallelization strategy on Fuzzy ARTMAPsince, if we demonstrate its effectiveness for Fuzzy ARTMAP,its extension to other ART structures can be accomplishedwithout much additional effort. This is due to the fact thatthe other ART structures share a lot of similarities withFuzzy ARTMAP, and as a result, the proposed parallelizationapproach can be readily extended to other ART variants.

II. Fuzzy ARTMAP ALGORITHMThe Fuzzy ARTMAP neural network and its associated

architecture was introduced by Carpenter and Grossberg intheir seminal paper [3]. Kasuba [6] and Taghi, Baghmisheh,and Pavesic [9] presented a simplified version of the originalFuzzy ARTMAP architecture that was equivalent to FuzzyARTMAP for classification problems. In our paper, we haveimplemented the simplified Fuzzy ARTMAP version fromTaghi, called SFAM2.0, that we refer to as FS-FAM oralternately as Fuzzy ARTMAP. We assume that the readeris familiar with the simplified Fuzzy ARTMAP architectureconsisting of an input layer, a category representation layer(where compressed representations of the inputs are formed)and an output layer.FS-FAM can operate in two distinct phases: the

training phase and the performance phase. In the trainingphase of FS-FAM a collection of input/output (such as{(I', ab (I')),... ,(f, ab (Ir)),... (IPT ab (IPT))})are repeatedly presented to FS-FAM until FS-FAM learns thedesired mappings from inputs to outputs (referred to as labels

0-7803-9048-2/05/$20.00 ©2005 IEEE 475

of inputs). The training process in FS-FAM is succinctlydescribed in Taghi's et al., paper [9]. We repeat it here to givethe reader a good, well-explained overview of the operationsinvolved in its training phase.

I) Find the nearest category in the category representationlayer of FS-FAM that "resonates" with the input pattern.

2) If the labels of the chosen category and the input patternmatch, update the chosen category to be closer to theinput pattern.

3) Otherwise, reset the winner, temporarily increase theresonance threshold (called vigilance parameter), andtry the next winner. This process is referred to as match-tracking.

4) If the winner is uncommitted, create a new category(assign the representative of the category to be equalto the input pattern, and designate the label of the newcategory to be equal to the label of the input pattern).

The nearest category to an input pattern ir presented to FS-FAM is determined by finding the category that maximizes thefunction:

Tj(Ir,w Ja) = IITAWJ (1)

The resonance of a category is determined by examining ifthe function, called vigilance ratio, and defined below

(I , wj) = rA J (2)

satisfies the following condition:

(Ir Iwj) > (3)

If the label of the input pattern (r) is the same as the labelof the resonating category, then the category's template (wj)is updated as follows:

w=wj AIr (4)

If the category j is chosen as the winner and it resonates, butthe label of this category wj is different than the label of theinput pattern r, then this category is reset and the vigilanceparameter is increased to the level:

jlrA wjl (5)

In the above equations the quantities a, , and e are FS-FAMnetwork parameters; a usually takes small positive values,is chosen as a value in the interval [0, 1], and e is a very smallpositive parameter.

In all of the above equations (equations (1)-(5)) there isa specific operator involved, called fuzlzy min operator, anddesignated by the symbol A. Actually, the fuzzy min operationof two vectors x, and y, designated as xAy, is a vector whosecomponents are equal to the minimum of components of xand y. Another specific operator involved in these equations

is designated by the symbol 1. In particular, lxl is the sizeof a vector x and is defined to be the sum of its components.

In the performance phase of FS-FAM, a test input ispresented to FS-FAM and the category node in FS-FAM thathas the maximum bottom-up input is chosen. The label of thechosen category is the label that FS-FAM predicts for this testinput. By knowing the correct labels of test inputs, belongingto a test set, we can calculate the classification error of FS-FAM for this test set.A simplification that can be applied to the FS-FAM algo-

rithm is the elimination of the match-tracking process. Thismodification was originally proposed by Anagnostopoulos [I]and turned out to yield improved classification performance onsome databases. Our interest in using this FS-FAM variant liesin the fact that it simplifies the FS-FAM algorithm and allowsone to concentrate on the parallelization of the competitionloop in Fuzzy ARTMAP.

III. PARALLEL FS-FAM ARCHITECTURESThe design of the parallel FS-FAM implementation used in

this paper was somewhat inspired by the architecture in [8]. Inthis paper, we present two variations of the parallel FS-FAM,the parallel no match-tracking FS-FAM, and the parallel FS-FAM (which includes match-tracking). While these parallelimplementations could be adapted to a physical pipeline/ringnetwork topology, the run environment was emulated by nodesof the Beowulf cluster connected by a standard switch (startopology). The different nodes of the Beowulf were logicallyarranged in a pipeline, with the capability of bidirectionalcommunication with their neighbors. The input patterns to belearned are read into the first node of the pipeline. Each nodein the pipeline has its own collection of templates, againstwhich the input patterns are compared. The input patterns aresent in batches, whose size is an adjustable parameter of theprogram. As a batch of inputs enters a node, it is processedsequentially. That is, the processing always starts with thefirst input pattern in the batch. The incoming patterns searchthrough the node's collection of available templates. Themaximum template that meets the vigilance is then coupledwith the incoming input pattern. This coupling removes thepattern from the available pool of templates for the node. Inall subsequent comparisons, if a better template is found, it isswitched out with the one currently coupled to the input. Thetemplate comparisons proceed until the input pattern reachesthe end of the pipeline. At this point, if the chosen template ismapped to the correct output, then learning of the pattern bythe template ensues. If on the other hand the chosen templateis mapped to the wrong output two different strategies areimplemented. In the no match-tracking FS-FAM case a newtemplate is added to the last node's template list which learnsthe designated input pattern. In the FS-FAM case the match-tracking mechanism is enforced and the input pattern is re-presented to the first node in the pipeline with an increasedvigilance value and the search for the right template starts allover again. In addition to processing inputs the system is alsoperforming load balancing. New templates are created only at

476

the end stage of the pipeline. To achieve load balancing thelast node's pool of excess templates, must be re-distributedto other nodes, further upstream in the pipeline. At the timethat each node sends its input patterns and winning templatesforward through the pipeline, it also sends its excess templatesbackward through the pipeline. An illustration of the flow ofinput patterns and templates through the system can be seenin figure 1. In this figure, the focus is on processor k andthe exchange of information (patterns and templates) betweenprocessor k and its neighboring processors (i.e., processorsk 1 and k + 1).

'' (If nutc&WW- tracking) s

Pro- s ..1Pt-lIy .._ + -

Wl, W

D-- Up.

Fig. 1. Diagram to illustrate the exchange of input pattems and templatesin the Beowulf processor pipeline.

A. No Match-Tracking Parallel FS-FAMAs part of our previous research, we presented a parallel

pipelined version of the no match-tracking FS-FAM [5]. Wheninputs reach the end of the pipeline in the no match-trackingFS-FAM, they either update their associated templates orcreate new templates. This is the final point for all inputs. Andas templates accumulate in the last processor in the pipeline,they are load balanced in the way explained earlier. Becauseof the lack of space in this paper, we refer the reader toour previous work for more a thorough explanation of theparallelization of the no match-tracking FS-FAM [5]. Notethat in [5], we presented proofs of the parallel no match-tracking FS-FAM's equivalence to its serial counterpart, aswell as description of some of its favorable load balancingproperties. Throughout the rest of this paper, we concentrate onthe generalization of the parallel no match-tracking FS-FAMdesign for the case of the match-tracking FS-FAM. However,we present scaling results for the no match-tracking FS-FAMfor comparison purposes.

B. Parallel FS-FAM with Match-TrackingThe parallel implementation of FS-FAM is shown in figure

2. The variables involved are as follows:* n: number of processors in the pipeline.* k: index of the current processor in the pipeline, k E

{0,1,.. ,n 1}.* p: packet size, number of patterns sent downstream; 2p =

number of templates sent upstream.* (I, w, T, ): 4-tuple corresponding to the format of the

elements that are packed downstream in the pipeline. I isthe input pattern, w is the current best candidate templatefor input pattern I. T is the activation of the pattern with

the given template. And is the current value of thevigilance parameter for this input pattern.

* myTemplates: set of templates that belong to the currentprocessor.

. nodes: variable local to the current processor that holdsthe total number of templates the process is aware of (itsown plus the templates of the other processors).

. myShare: amount of templates that the current processorshould not exceed.

* Wk 1: Ordered set of 4-tuples coming from the previousprocessor in the pipeline.

* Wk+1: Set of templates (not 4-tuples) coming from thenext processor in the pipeline.

* W: Ordered set of 4-tuples going to the next processorin the pipeline.

* W': Set of templates (not 4-tuples) going to previousprocessor in the pipeline.

. class(I): class label associated with a given input pattern.* class(w): class label associated with a given template.* index(w): sequential index assigned to a template.* newNodesk+l: number of new nodes (templates) that

were created and that processor k + 1 communicatesupstream in the pipeline.

* newNodes: number of new nodes (templates) that werecreated and that processor k communicates upstream inthe pipeline.

For the sake of brevity, we omit the pseudocode for theutility functions used here, but provide their high level de-scriptions. For a more thorough explanation of these functions,the reader is referred to [5]. To begin with, there are thenetwork communications functions. They behave exactly astheir names would indicate. The SENDNEXT functions, sendsdata to the next processor in the pipeline, and so fourth.The INIT function serves to initialize the variables and datastructures for the algorithm. The function FINDWINNER isvital to the algorithm. This function searches through a set oftemplates S to find if there exists a template wi that is a betterchoice for representing I than the current best representativew. If it finds one it swaps it with w, leaving w in S andextracting wz from it.

Each processor in the pipeline will execute the algorithm offigure 2 for as long as there are input pattems to be processed.The input parameter k tells the process which stage of thepipeline it is, where the value k varies from 0 to n 1.After initializing most of the values as empty we enter theloop of lines 2 through 43. This loop continues executionuntil there are no more input patterns to process. As before,the first activity of each process is to create a packet ofexcess templates to send back (line 3 to 5). Lines 6 to 9correspond to the information exchange between contiguousnodes in the pipeline. The function RECvNEXT on line 7,does not do anything if the process is the last in the pipeline(k = n 1). The same is true for the function SENDPREVwhen the process is the first in the pipeline (k = 0). Onthe other hand, function SENDNEXT sends packets to thefirst processor in the pipeline, when a given input pattern is

477

forced to engage the match-tracking mechanism. In this case,the input pattem will increment its vigilance in its 4-tupleand start afresh at the beginning of the pipeline (process 0).The function RECVPREv does the normal procedure if it isnot the first process in the pipeline. If it is the first processthough, it will receive the pattem 4-tuples that come fromthe last process in the pipeline (the ones that are engagedin match-tracking), and read input patterns from the inputstream if the amount of match-tracking packets is less thanp. The fresh patterns from the input stream will be pairedwith a dummy template called the uncommitted node withindex oc as their best representative so far. On all other casesthese functions do the obvious information exchange betweencontiguous processes in the pipeline.By sending the input patterns downstream in the pipeline

coupled with their current best representative template weguarantee that the templates are not duplicated among differentprocessors and that we do not have multiple-instance consis-tency issues. Also when exchanging templates between nodesin the pipeline we have to be careful that patterns that aresent downstream do not miss the comparison with templatesthat are being sent upstream. This is the purpose of lines 11to 13 (communication with the next one in the pipeline) andlines 18-20 of PROCESS (communication with the previousprocess in the pipeline). We loop through each 4-tuple (lines11-13) to see if one of the templates, sent upstream, has ahigher activation (bottom-up input) than the ones that weresent downstream; if this is true then the template will beextracted from 'Wk+1. The net result of this is that Wk+1ends up containing the templates that lost the competition, andtherefore the ones that process k should keep (line 13). Theconverse process is done on lines 18 to 20. Here we comparethe pattern, template pairs 4-tuples that k 1 sent upstreamin the pipeline with the templates in W' that process k sentdownstream in the pipeline. On line 20 we set our current4-tuple to the winners of this competition. The set WV' isdiscarded since it contains the losing templates and thereforethe templates that process k 1 keeps.The primary competition loop is in lines 21 to 39. On line

22, we start by comparing the pertinent 4-tuples to the pro-cessor's main collection of templates with the FINDWINNERfunction as described earlier. If it is on the last processor(n 1), there are additional steps that must be taken. If the finalassociated template is uncommitted, we add a new template tothe system (lines 25-27). We also remove the 4-tuple from W(line 28), as we are done processing it. If a regular (committed)template has been associated with the input in question, andit maps to the correct class, then we update the template asper the Fuzzy ARTMAP algorithm (lines 31 to 34). Again, asbefore, the 4-tuple in question is removed fromW (line 34). Ifthe labels of the input and template do not match, we must useour match tracking mechanism (lines 35-38). We first increaseits associated vigilance (line 36). We then remove the templatethat was associated with the input back into the processor'sgeneral pool and reset the template in the 4-tuple to be theuncommitted node. Since this 4-tuple is not removed from W

as in the other cases, it remains to be sent back to processor0, and start its competition loop all over again.A processor continues until it receives nothing in Wk 1.

Finally, lines 45 and 46 of PROCESS make sure that thetemplates that are sent upstream in the pipeline are not lostafter the pool of training input patterns that are processed isexhausted.

IV. ExPERIMENTS

The database used for the testing the performance of boththe parallel no match-tracking FS-FAM and parallel FS-FAMwas the Forest CoverType database provided by Blackardand donated to the UCI Machine Learning Repository. Thedatabase consists of a total of 581,012 patterns, with each oneassociated with 1 of 7 different forest tree cover types. Thenumber of attributes of each pattern is 54, but this numberis misleading since attributes 11 to 14 are actually a binarytabulation of the attribute Wilderness-Area, and attributes 15to 54 (40 of them) are a binary tabulation of the attributeSoil-Type. The original database values are not normalizedto fit in the unit hypercube. Thus, we transformed the data toachieve this. There are no omitted values in the data. Patterns 1through 512,000 were used for training. The test set consistedof patterns 561,001 to 581,000. Although lack of space doesnot allow a comprehensive comparison of how different classi-fication algorithms performed on this database, Blackard citesperformance of 70% for backpropagation neural networks and58% for Linear Discriminant Analysis [2]. Training set sizesof 1000 2i, where i C {5, 6, 7, 8, 9} were used, that is 32,000to 512,000 patterns were used for the training of parallelno match-tracking FS-FAM and FAM. The test set size, asmentioned above, was fixed at 20,000 patterns. The numberof processors in the pipeline varied from p = 1 to p = 32, inpowers of 2 (obviously the case of p = 1 corresponds to the se-quential no match-tracking FS-FAM, and FS-FAM). To avoidadditional computational complexities in the experiments thevalues of the ART network parameters and a were fixed (i.e.,the values chosen were ones that gave reasonable results). Forevery combination of (p, PT)= (pipeline size, training set size)values, we conducted 12 independent experiments (trainingand performance phases), corresponding to different ordersof pattern presentations within the training set. All resultsreported are averages over the 12 runs. All the tests whereconducted on the OPCODE Beowulf cluster at the Institutefor Simulation and Training, an institute affiliated with theUniversity of Central Florida. This cluster consists of 96 nodes,with dual Athlon 1500+ processors and 512MB of RAM. Theimplementation of the algorithm was done in C++ with theMPI (Message Passing Interface) libraries. MPI provides asimple interface for communication among processes runningon either one node or several different nodes. The runs weredone in such as way as to utilize half as many nodes as p. Thus,there were two MPI processes per node, one per processor.The metrics used to measure the performance of the

pipelined approach were:

478

Fig. 2: Pseudocode for the parallel FS-FAM.Procedure: Process (k-, n, ,a, c, p)

1 Init(p);2 while continue do3 W'= {};4 while ImyTemplates > myShare do5 L ExtractTemplate(myTemplates, W');6 SendNext (k, n, WIT);7 RecvNext(k, n, 4k+l, uiwN d sk+l);8 SendPrev(k, W', niwN d s);9 RecvPrev(k, n, p, , a, Wk 1);10 newNodes= n wN d sk+l;11 foreach (I, w, T ) C WV do12 L FindWinner (I, w, T, a, Wk+l);13 myTemnplates= myTemplatesU Wk+1;1415 if lWk 11 == 0 then16 continue= FALSE;17 else18 foreach (I, w, )T E Wk 1 do19 |FindWinner (I, w, T,a, W') ;20 lW = W U I I, w, T, }21 foreach (I, w, T ) C W do22 FindWinner (I, w, T a, myTemplates);23 if k ==n I then24 if w == uncommitted then25 newTeinplate= I;26 index(newTemplate) = newNodes+

nodes;27 mnyTetnplates= mnyTeinplatesU

{newTemplate};28 W = W -{ I, w, T, };29 newNodes= newNodes+ 1;30 else31 if class(I) == class(w) then32 w = I A w;33 myTemplates= myTemplatesU

{w};34 W= W -{I, w, T, ;35 else36 = (I,w) + ;37 mvTemplates= myvTemplatesU

38 L w = uncommitted;

40 if newNodes> 0 then41 nodes= nodles+newNodes;42 myvShare= I iio'lesI1;43__

44 SendNext( k, n, W);45 RecvNext ( k, n, Wk+1, newNodek+l);46 myTemplates= myTemplatesU Wk+1;

1) Classification performance of pipelined no match-tracking FS-FAM and pipelined FS-FAM.

2) Size of the trained, pipelined, no match-tracking FS-FAM and pipelined FS-FAM.

3) Speedup of pipelined no match-tracking FS-FAM andFS-FAM compared to their sequential counterparts.

To calculate the speedup, we simply measured the CPU timefor each run.

V. RESULTS

The Forrest Covertype results are depicted in Tables Iand II (parallel no match-tracking FS-FAM and parallel FS-FAM average classification performance and average numberof templates created), and in Figures 3 and 4 (speed-up ofthe parallel FS-FAM versions compared to their sequentialcounterparts). As seen in [1], the no match-tracking versionof the algorithm can yield increased classification performanceat the expense of creating more templates in the system. Thespeed-up curves for the parallel no match-tracking FS-FAMand, to a lesser degree, the ones for the FS-FAM exhibit alinear behavior with respect to the number of processors inthe pipeline. The speed-up curves though level off after wereach a certain number of processors in the pipeline for lowervalues of training pattems in our training collection; this isdue to the fact that for smaller number of training pattems thenumber of templates created is not large enough to justify theusage of processors beyond a certain number. It is also worthemphasizing that the speed-ups achieved by the no match-tracking parallel FS-FAM are more impressive, compared tothe corresponding speed-ups attained by the parallel FS-FAM.This result is also expected, since the number of templates thatthe no match-tracking parallel FS-FAM creates is significantlylarger than the corresponding number of templates that theparallel FS-FAM creates (at times 10 times as many). Inessence, the benefits of the proposed parallelization strategyare more pronounced for larger datasets and/or larger ARTarchitectures.The speedup curve for the FS-FAM has a spike toward

the beginning of the curve. The spike occurs as the scalinggoes up relatively well for p = 2 and then drops down to amore moderate scaling trend for p > 4. This is most likelyrelated to the cluster design itself. As mentioned earlier, thecluster consists of dual processor nodes, with each processorrunning a copy of the program. In the case with p=2, thismeans both copies of the program were communicating on thesame motherboard, which is much faster than the fast Ethemetnetwork. This suggests that the parallel FS-FAM architecture'sperformance will improve by utilizing faster networking tech-nologies. For 32k inputs, the graph is superlinear for the 2processor case. It is very likely that this is because of caching.For the dimensionality of the templates (108, 4-bytes floats),and the average number of templates generated in this case(1263.09), split up between two processors is equal to about256k bytes. This means that in this case, the templates fit muchbetter into the cache than they did in the single processor case.

479

I

I

II

.1I

39

I

2

2

2

2

3

3

3

TABLE IPARALLEL NO MATCH-TRACKING FS-FAM AVERAGE CLASSIFICATIONPERFORMANCE AND AVERAGE NUMBER OF TEMPLATES CREATED WITH

THE FORREST COVERTYPE DATA

Examples (1000s) Avg. Classification Avg. Templates32 70.29 5148.8364 74.62 11096.66128 75.05 22831256 77.28 49359.33512 79.28 100720.75

TABLE 11PARALLEL FS-FAM AVERAGE CLASSIFICATION PERFORMANCE AND

AVERAGE NUMBER OF TEMPLATES CREATED WITH FORREST COVERTYPEDATA.

Examples (1000s) Avg. Classification Avg. Templates32 69.84 1263.0964 73.26 2147.36128 73.30 3346.64256 73.93 5178.27512 75.13 8013.82

35

30

25

iis V

o0

0 5 10 IS 20Number of Procossors

Fig. 3. Speedup of the parallel NMT-FS-FAMnumbers of processors and input patterns.

15 20Number of Proressors

VI. SUMMARY/CONCLUSIONS

We have produced a pipelined implementation of the nomatch-tracking FS-FAM and FS-FAM algorithms. This im-plementation can be extended to other ART architectures thathave similar competitive structure as FS-FAM. It can also beextended to other neural network architectures that are desig-nated as "competitive" neural networks, such as PNNs, RBFs,as well as other "competitive" classifiers. We have introducedand proven a number of theorems (see [5]) that show that thepipeline implementations of FS-FAM are efficient and correct.These theorems were omitted due to lack of space, but theinterested reader can consult [5] for more details. We believethat our objective of appropriately implementing FS-FAM onthe Beowulf cluster has been accomplished, as evidenced byFigures 3 and 4.

ACKNOWLEDGMENT

Jose Castro would like to thank the Computer ResearchCenter of the Technological Institute of Costa Rica, the Insti-tute of Simulation and Training (IST) and the Link FoundationFellowship program for partially funding this project. Thiswork was also supported in part by the National Science Foun-dation under grants # CRCDn0203446 anti It CC(T:0I3416(l]

REFERENCES

[1] G. C. Anagnostopoulos, "Putting the utility of match tracking in fuzzyARTMAP to the test," in Proceedings of the Seventh InternationalConference on Knowledge-Based Intelligent Information Engineering,

/,,,.-t vol. 2, University of Oxford, UK. KES'03, 2003, pp. 1-6.

[2] J. A. Blackard, "Comparison of neural networks and discriminant analysisin predicting forest cover types." Ph.D. dissertation, Department of ForestSciences, Colorado State University, 1999.

-. [3] G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds, and D. B.Rosen, "Fuzzy ARTMAP: A neural network architecture for incrementallearning of analog multidimensional maps." IEEE Transactions on NeuralNetworks, vol. 3, no. 5, pp. 698-713. September 1992.

[4] G. A. Carpenter. S. Grossberg, and J. H. Reynolds, "Fuzzy ART: Anadaptive resonance algorithm for rapid, stable classification of analog pat-

25 30 : 5 terns:' in International Joint Conference on Neural Networks, IJCNN'91,vol. II. IEEE/INNS Inc. Seattle, Washington: IEEE-INNS-ENNS, 1991,pp. 411-416.

algorithm for differing [5] J. Castro, J. Secretan, M. Georgiopoulos, R. Demara, G. Anagnostopou-los, and A. Gonzalez. "Pipelining of fuzzy ARTMAP without matchtrack-ing: Correctness, performance bound, and Beowulf evaluation," NeuralNetworks (under review).

[6] T. Kasuba, "Simplified Fuzzy ARTMAP ' Al Expert, pp. 18-25, Novem-ber 1993.

[7] A. Malkani and C. A. Vassiliadis, "Parallel implementation of the FuzzyARTMAP neural network paradigm on a hypercube:' Expert Svstems,vol. 12, no. 1. pp. 39-53, 1995.

[8] E. S. Manolakos, Parallel Architecturesfor Neural Networks: Paradigmsand Implementations. IEEE Computer Society Press and John Wiley &Sons. 1998, ch. Parallel Implementation of ARTI Neural Networks on

Processor Ring Architectures.- -----------[9] M. Taghi, V. Baghmisheh. and N. Pavesic, "A fast simplified Fuzzy

ARTMAP network:' Neural Processing Letters, vol. 17, pp. 273-316,2003.

Fig. 4. Speedup of the parallel version of the FS-FAM algorithm for differingnumbers of processors and input patterns.

480

512.000 pater, +2565000 paer - x128.000 panern -64 000 p.ne n32.000 pM-ern - -- -

5

I!