Bringing Algorithms to Life: Cooperative Computing Activities Using Students as Processors

11
Bringing Algorithms to Life: Cooperative Computing Activities Using Students as Processors Gregory F. Bachelis Department of Mathematics Wayne State University Bruce R. Maxim Dept. of Computer & Information Science The University of Michigan-Dearborn David A. James Dept. of Mathematics & Statistics The University of Michigan-Dearborn Quentin F. Stout Electrical Engineering & Computer Science The University of Michigan The authors describe a method for "bringing algorithms to life9’ in secondary school mathematics and computer science classes. Cooperative computing activities are presented in "which each student plays the role of a switch or processor, and the algorithms are "acted out.f f Parallel algorithms, inwhich many steps can occur at the same time, are emphasized, as they are especially suited for cooperative activities. The activities are grouped together in modules, according to the problems they address. Those presented include binary counting, finding the smallest card in a deck, sorting by selection and merging, adding and multiply ing large numbers, sieving for primes, testing whether a number is prime, and sorting using a linear configuration of processors. Detailed descriptions of how to implement the algorithms are given. The design and analysis of algorithms is im- portant to the understanding of basic mathematical concepts and is also at the heart of computer meth- ods for problem solving. Standard 12 of the NCTM Standards (1989) says in part that the "grades 9-12 curriculum should include topics from discrete mathematics so that all students can ... develop and analyze algorithms ... and college bound students can ... investigate problem situations arising in connection with computer validation and the application of algorithms." This article describes some activities which en- able a secondary school mathematics or computer science class to "bring to life" algorithms by having the students themselves act out the various parts. The activities are grouped together in modules, ac- cording to the problem they are intended to address. The modules are utilized as follows: At the beginning of a class, a goal is discussed, such as sorting a list of numbers. A serial algorithm, in which things occur one step at a time, is put for- ward. Then the class is challenged to come up with ideas to modify the algorithm so as to speed up the achievement of the goal. Pipelining, in which things are done in assembly-line fashion, or, more generally, parallelism, in which many steps can oc- cur at the same time, are introduced as means of accomplishing this. By the end of the class, the students have been led to one or more efficient parallel algorithms. At each stage, the students "act out" the algorithm at hand, with each student playing the part of an individual switch or processor. The authors believe that students can best understand an algorithm if they can see the process tangibly at work, and these activities have been devised as a means of accomplishing this. Parallel algorithms are emphasized, since more of the stu- dents are thereby involved in the activities. Thus, the students cooperatively plan (when this is feasible) and act out the algorithms. The word "cooperative" is being used in two senses here: In the planning stage, the students are cooperating as individuals, each with his or her own ideas and suggestions. When it comes time to act out the algorithm, however, they usually play the role of homogeneous processors, acting in concert as they step through the algorithm guided by a global time clock (by a metronome, if you will). Students learn that successful execution of their plans requires correct and timely action by all of them, and they see that many individual units, each performing simple tasks, can together achieve a complex goal. For an algorithm, the measure of efficiency is the count of the number of time steps which are needed to conclude it for a given-sized input. During a time step, each student/processor per- forms, at most, one task, such as comparing two numbers. As a general rule, only those time steps during which processors are performing some op- School Science and Mathematics

Transcript of Bringing Algorithms to Life: Cooperative Computing Activities Using Students as Processors

Bringing Algorithms to Life: Cooperative ComputingActivities Using Students as Processors

Gregory F. BachelisDepartment of MathematicsWayne State University

Bruce R. MaximDept. of Computer & Information ScienceThe University of Michigan-Dearborn

David A. JamesDept. of Mathematics & StatisticsThe University of Michigan-Dearborn

Quentin F. StoutElectrical Engineering & Computer ScienceThe University of Michigan

The authors describe a methodfor "bringing algorithms to life9’ in secondary schoolmathematics andcomputer science classes. Cooperative computing activities are presented in "which each studentplaystherole ofaswitchorprocessor, andthe algorithms are "actedout.f f Parallel algorithms, inwhichmanysteps can occur at the same time, are emphasized, as they are especially suitedfor cooperative activities.The activities are groupedtogether inmodules, according to theproblems they address. Thosepresentedinclude binary counting, finding the smallest card in a deck, sorting by selection andmerging, addingandmultiplying large numbers, sieving for primes, testing whether a number is prime, andsorting usinga linear configuration of processors. Detailed descriptions of how to implement the algorithms aregiven.

The design and analysis of algorithms is im-portant to the understanding of basic mathematicalconcepts and is also at the heart of computer meth-ods for problem solving. Standard 12 of the NCTMStandards (1989) says in part that the "grades 9-12curriculum should include topics from discretemathematics so that all students can ... developand analyze algorithms ... and college boundstudents can ... investigate problem situationsarising in connection with computer validation andthe application of algorithms."

This article describes some activities which en-able a secondary school mathematics or computerscience class to "bring to life" algorithms by havingthe students themselves act out the various parts.The activities are grouped together in modules, ac-cording to the problem they are intended to address.

The modules are utilized as follows: At thebeginning of a class, a goal is discussed, such assorting a list of numbers. A serial algorithm, inwhich things occur one step at a time, is put for-ward. Then the class is challenged to come up withideas to modify the algorithm so as to speed up theachievement of the goal. Pipelining, in whichthings are done in assembly-line fashion, or, moregenerally, parallelism, in which many steps can oc-cur at the same time, are introduced as means ofaccomplishing this.

By the end of the class, the students have beenled to one or more efficient parallel algorithms. At

each stage, the students "act out" the algorithm athand, with each student playing the part of anindividual switch or processor.

The authors believe that students can bestunderstand an algorithm if they can see the processtangibly at work, and these activities have beendevised as a means of accomplishing this. Parallelalgorithms are emphasized, since more of the stu-dents are thereby involved in the activities. Thus,the students cooperatively plan (when this isfeasible) and act out the algorithms.

The word "cooperative" is being used in twosenses here: In the planning stage, the students arecooperating as individuals, each with his or her ownideas and suggestions. When it comes time to actout the algorithm, however, they usually play therole of homogeneous processors, acting in concertas they step through the algorithm guided by aglobal time clock (by a metronome, if you will).Students learn that successful execution of theirplans requires correct and timely action by all ofthem, and they see that many individual units, eachperforming simple tasks, can together achieve acomplex goal.

For an algorithm, the measure of efficiency isthe count of the number of time steps which areneeded to conclude it for a given-sized input.During a time step, each student/processor per-forms, at most, one task, such as comparing twonumbers. As a general rule, only those time stepsduring which processors are performing some op-

School Science and Mathematics

Algorithms 177

eration are counted, while the time spent for the in-put or output of data and for processors to commu-nicate with one another is ignored. When the algo-rithms are acted out, the time steps should be enun-ciated clearly, in order to maintain synchronization.Students not employed as "processors" can be usedas monitors, to see that the student/processorsfollow their instructions properly.

Most of the activities in these modules havebeen tested in various high school math and com-puter science classes. Much of the material is suit-able for middle schools, and, in fact, the material inModule One has been tested at the fifth-grade level.The modules are given in increasing order of so-phistication; however, they can be used in anyorder, together or separately, depending on what theindividual instructor thinks would be suitable forhis or her class.

Some of the activities discussed here have beentreated earlier (Bachelis, James, Maxim and Stout,1992). The algorithms presented here can be intro-duced at appropriate places in college computerscience courses (Bachelis et al., 1992); some arealso suitable for college discrete math courses. Theauthors make no claim of originality for the algo-rithms presented and apologize for not always pro-viding references as to who thought of them first;many of them are part of the parallel computingfolklore.

Module One: The Base 2 Counting Machine

This module introduces students to the binaryrepresentation of numbers in a computer and servesas a good "warm-up" for classroom activities. Fiveor six students should be seated adjacent to oneanother in one row. Each person’s left arm shouldbe resting on the right shoulder of the person to theleft. The students are each to play the part of an on-off switch: pointing their left arm straight up indi-cates "on" and pointing it straight to the left is"off." When they are tapped on the right shoulder,they are to "change their state," that is, moving theirleft arm down if it is up, and conversely. Everyoneshould practice changing states a few times at theinstructor’s commands of "on" and "off."

If one begins with all switches in the off po-sition and the instructor starts tapping the rightshoulder of the person seated at the right end of therow, the students will "step through" the binary re-presentation of the numbers 0,1,2 ... n, where n isthe number of taps, and where "off means zero and

"on" means one. To have each number represented,one must wait after each tap until all the student/processors have changed their state (if called for).See Figure 1 for a pictorial representation.

Figure 1. Pictorial representation of the base 2counting machine.

000 0 0||0 0 0 0 1| (^

P__Q_Q^0 0 Q Q_ Q.-trtrtrfro �^rfrt1

If, however, one is only interested in the repre-sentation of the final value n, then the tapping canbe done systolically, in "pipeline" fashion; that is,the instructor can tap once each time step until ntaps have been made. (The term "systolic" is usedto describe processes such as these because they areanalogous to blood pumping through the heart.)This method, while faster, requires much more con-centration on the part of the students, since they canbe moving their left arm and be tapped on the rightshoulder during the same time step. The studentsshould be reminded that they are to change stateonly when somebody presses down on their shoul-der, not when somebody lifts off.

The point should be made that by using onlyvery simple on-off switches and connecting them inthe right way, a "machine" has been constructedthat can do something no single on-off switch cando: it can count, that is, it can keep track of howmany taps the teacher has made.

Extensions: Binary AddersThe original front row "machine" is left in

place, and directly behind it an identical second rowis placed. Actually, there should be an extra pro-cessor at the left end of the front row for overflow.

The two numbers to be added are "tapped in,"one into each machine as above. The numbers cannow be added as follows: each person in the secondrow whose left hand is up puts this hand on the

Volume 94(4)^ April, 1994

Algorithms

right shoulder of the person sitting directly in front.The action now proceeds as in the systolic caseabove. When all activity stops in the first row, itwill contain the sum of the two numbers. Thus, it isnot only easy to count using on-off switches, butalso to add two numbers.

Module Two: Finding The Smallest CardIn A Deck

The objective is to find the smallest card in adeck of, say, sixteen cards with distinct numbers.The processors involved can be called "two-cardcomparators." They input two cards and output thesmaller.

First, the serial case using one processor is dis-cussed. The students should be able to come upwith the fact that it will take 15 successive compari-sons to find the smallest card. Then the studentsshould be asked to consider how to speed up theprocess using several processors and "divide andconquer." Things are made successively faster byusing two, four and then eight processors.

Two student/processors are employed as fol-lows: Each student is given eight of the cards.Each finds the minimum among the eight; then thetwo minima are compared and the global minimumfound. This procedure takes 7+1=8 time steps.

To use four processors, the student/processorsare numbered from 1 to 4 and each is given four ofthe cards. Each finds the minimum card among thefour and retains only that card. Then student #1gives the minimum card to #2 and student #3 givesthe minimum card to #4. Students #2 and #4 theneach find the smaller of their two cards, and then #2gives the minimum card to to #4, who finds theglobal minimum. This procedure takes 3 + 1 + 1 =5 time steps. It should be clear how to extend thisto eight processors, where each one starts with twocards. With this many processors, it takes 1 + 1 +1+1=4 time steps.

The tree-like procedure outlined above can bemodified to handle other semigroup operations,such as addition of n numbers. In the case of ad-dition, each student/processor will need to have apiece of paper on which to write the result of theaddition(s).

Finding The Minimum In Constant Time

It is surprising but true that one can find theminimum of n numbers in constant time, although

one needs to employ n2 processors. The instruc-tions, a diagram and a sample run for n = 4 aregiven in Figure 2. One proceeds as follows: arrangen2 students in n rows of n each. The students onthe main diagonal are each given a number, whichthey are to display. These people are called "show-ers" (show-ers); the off-diagonal students are the"tellers." On the first step each shower holds thenumber so that it can be seen by everyone in theshower’s row, and each teller in this row notes thenumber. On the second step the same procedure isrepeated in each column. Those tellers for whichthe first number (the number in their row) is greaterthan the second (the number in their column), raisetheir hand. The showers then look up and downtheir row.

Figure 2. Instructions for finding the minimum inconstant time. Diagram and sample run for N = 4numbers.

Instructions for showers (show-ers):Step 1. Hold the two copies of your number so that

one faces right and one faces left.Step 2. Hold the two copies so that one faces front

and one faces rear.Step 3. Now look along your row. If no hands are

raised, then raise your hand with yournumber held in it.

Instructions for Tellers:Step 1. Write down the number held by the shower

in your row.Step 2. Write down the number held by the shower

in your column; if it is smaller than thenumber in your row then raise your hand.

Configuration of students for N = 4:Show = shower. Tell = teller

ShowTellTellTell

TellShowTellTell

TellTellShowTell

TellTellTellShow

Sample run for N=4, Numbers = 5,1, 9, 8:U = hand up, D = hand down

5DUU

U1UU

DD9D

DDU8

School Science and Mathematics

Algorithms

The one whose row has no hands raised isholding the smallest number. (Recall the assump-tion that the numbers are distinct.) Thus, after justone comparison step the minimum has been found,independent of the value of n.

This algorithm is due to Valiant (1975). It dra-matically illustrates an important feature of parallelalgorithms. Even though each processor may knowvery little about the global picture, acting in concertthey can solve the problem at hand.

It is possible to iterate this procedure to handleas many numbers as there are processors. Forexample, to find the minimum of sixteen numbersusing sixteen processors, one first has (serially) fourcontests involving four numbers each, and then arun-off among the four winners to find the globalminimum.

Module Three: Sorting By SelectionAnd Merging

As above, there are sixteen distinct cards. Theobject is to sort them into increasing order. Again,the processor needed is a two-card comparator(with suitably-sized memory). First, selection sortusing one processor should be discussed, in whichthe smallest card is found first, then the smallestamong the remaining cards, etc. The studentsshould be encouraged to come up with the fact thatit takes 15 steps to find the smallest card, 14 to findthe next smallest, etc., for a grand total of 15x16/2=120 steps to sort the deck.

As with finding the minimum, one can employtwo processors and "divide and conquer." Eachstudent/processor gets eight cards to sort, face up,into increasing order. The two decks are thenmerged as follows: Each student holds up his or hersmallest card, and the smaller of the two is placedface down to start the sorted deck. This process isrepeated, with each successive "winner" placed facedown on the pile, until one of the students runs outof cards. The other student then places the remain-ing cards, face down, on top of the pile. The deck,when turned over, has now been put into increasingorder. The sorting takes 8x7/2 = 28 time steps, andthe merge takes, at most, 15 comparison steps for agrand total of, at most, 43 steps.

As before, one can speed things up more usingfour and then eight processors. For four processors,there are 6 steps for the initial sorting, at most, 7steps for the first round of merging, and, at most, 15steps for the final merge, for a total of at most 28

steps. Finally, when eight processors are used,there are, at most, 1+3+7+15= 26 time stepsrequired. At this point, the "selection" part has dis-appeared from the procedure and what remains isall "merge."

The above algorithms provide a good illustra-tion of the power of "divide and conquer," althoughthey are hampered somewhat by the serial bottle-neck at the final merge step, in which only two pro-cessors are employed. The sorting algorithms inModule Six are of a different type and are better interms of "load-balancing," that is, making efficientuse of all the processors and minimizing their idletime.

Selection Sort On A Tree

Selection sort can also be done on a tree, andthis can easily be parallelized. In the case of six-teen cards, one needs a complete binary tree withfifteen student/nodes. Each level of the tree sits indifferent rows: The root in row one, his or her chil-dren in row two, their children in row three, andtheir children (the leaves) in row four. There areeight leaves, and each one initially has two cards.A diagram for this and the instructions for each rowcan be found in Figure 3.

If a bigger tree is desired, then the instructionsfor rows two and three can be modified appropri-ately and used for the additional rows. The situa-tion here is analogous to a tournament in whicheach comparison of two cards corresponds to asingle match in the tournament. Assuming transi-tivity of winning (if A beats B and B beats C, thenA would beat C if they played) this process causesthe competitors to finish in the "proper" order. It iseasy to see that the smallest card starts the outputpile in round four, and that thereafter the nexthighest card is added to the pile every other round.Thus, it takes 34 rounds to sort the sixteen cards.This algorithm becomes rather serial after a while,which is why real tournaments don’t proceed in thismanner. See Knuth (1973, p. 209) for a discussionof Lewis CarrolFs connection with this.

Module Four: Adding And Multiplying LargeNumbers

In this module, algorithms are presented forcooperatively adding and multiplying two large(whole) numbers together. Suppose one wants toadd, say, two 9 digit numbers together. If one

Volume 94(4), April, 1994

180Algorithms

analyzes the usual serial algorithm, one sees that ittakes between 9 and 17 additions of pairs of digits(9 additions and, at most, 8 carry additions) to com-plete the addition. If one takes the operation of ad-ding a pair of digits as the primitive computation,the question is how this can be speeded up by per-forming some of these computations in parallel.

The students sit in a row facing the blackboard,on which the numbers to be added have been writ-ten. If 3 student/processors are employed, theneach is assigned three columns of digits to add: Thefirst student (from right to left as they face front) is

assigned the Is, 10s and 100s columns, the secondstudent the 1,000s, 10,000s and 100,000s column,etc. They first add their two 3-digit numbers toget-her, to find out what their answer will be if theydon’t receive a carry, and then (except for the firststudent) add 1 to this result to find out their answerif they do receive a carry. They write these num-bers on a piece of paper. To get these two answerstakes, at most, 8 computation steps per student.

For example, if one wants to find

534,789.2134- 495.378.388.

Figure 3. Diagram and instructions for selection sort on a tree, using 15 processors.

Diagram:

XRow 1 (the root)

Raw 2

Row3

Row 4 (the leaves)

X X/ \ / \

^ ^ ^ ^X XX XX XX X

Instructions’,

Row One:Rounds One and Two. Bye.Rounds Three, Five, etc. Face your two children. Receive a card from whomever has one. If both send

you a card, return the larger to the one who sent it. If neither sends you a card, then the sort isfinished!

Rounds Four, Six, etc. Face the front and place your card on top of the pile in front of you. (The pilestarts building in Round Four.)

Row Two:Round One. Bye.Rounds Two, Four, etc. If you have no card, face your two children. Receive a card from whomever has

one. If both send you a card, return the larger to the one who sent it. If neither sends you a card, donothing after this round.

Rounds Three, Five, etc. Give your card to your parent. You may get it back.Row Three:

Rounds One, Three, etc. If you have no card, face your two children. Receive a card from whomeverhas one. If both send you a card, return the larger to the one who sent it. If neither sends you a card,do nothing after this round.

Rounds Two, Four, etc. Face your parent. If your parent faces you, send him or her your card. You mayget it back.

Row Four:Rounds One, Three, etc. Face your parent. If your parent faces you, give him or her your smaller (or

your only) card. You may get it back. If your card isn’t returned and you have none left, then donothing after this round.

Rounds Two, Four, etc. Bye.

School Science and Mathematics

Algorithms

then the first student adds 213 to 388 and writesdown 601, the second student adds 789 to 378 andwrites down 1167 and 1168, and the third studentadds 534 to 495 and writes down 1029 and 1030.Each student, except the first, is told to face to theright when the above computations are finished.The first student now says (in this example) "nocarry" to the second student who, in turn, says"carry" to the third student. Each student nowwrites the appropriate number (leaving off the carrydigit, if there is one) on a fresh piece of paper,which is now held up, the number facing back. A"great scorer" standing behind the students will seethe sequence of numbers 1030,167, 601, and canthen announce that the answer is 1,030,167,601.

It is straightforward to modify the above in-structions for larger numbers and more students, orwhere the students are assigned bigger or smaller"chunks." A sample instruction sheet for each stu-dent is given in Figure 4. Slightly different instruc-tions will have to be given to the person at eachend.

One can have several rows of students comput-ing the answer at the same time, as a contest, andone can also have a serial "control" adding the twonumbers, so that it can be seen to what extent theparallelization is speeding things up.

Figure 4. Instructions/or adding two numbers,mth sequential propagation of carries.

1. Add the two_ digit numbers from yourassigned columns.

2. Now add 1 to your number and write itbelow the first one. (This is your answerin the event you receive a carry.)

3. Now face to the right.4. When told by the person on your right

whether or not you are receiving a carry,choose your answer and then tell theperson on your left whether or not he orshe is receiving a carry.

5. Write your answer in big numbers on afresh piece of appear. (Remember to leaveoff the leading 1 if you sent a carry.)

6. Now hold the paper up, with the numberfacing back.

In the parallel case, it is possible to decidemore quickly the question of who is getting a carryand who isn’t by exploiting the fact that, in most

instances, whether a person is going to send a carryis independent of whether one is received, that is,the two numbers computed are not of the form 999,1000. By dividing the processors into two classes,it is possible to arrange things so that a processorsends a carry within two steps of knowing that thisis necessary. One proceeds as follows: The student/processors are labeled alternately 0 and 1, goingfrom right to left. Each student/processor is as-signed the same number of columns (or as close tothat as possible). The instructions given in Figure 5are handed out (with those for the processors ateach end appropriately modified), and then theinstructor starts counting time steps: 1,2, 3, etc.

If, say, everyone concludes steps one and twoof the instructions on the same even time step, andif everyone knows right away whether or not theyare sending a carry, then it will take exactly twoadditional time steps to complete the addition. Inthe worst case, the news about the carry will pro-ceed sequentially from right to left, as with theearlier instruction set.

It is possible to speed things up even more (inthe case where one doesn’t know right awaywhether to send a carry) using a binary tree struc-ture as described in earlier modules; however, asimple and practical implementation of this appearsto be quite difficult when people are playing therole of processors.

Multiplication Of Large Numbers

A brief indication is now given as to howmultiplication can be done in parallel. If one wantsto multiply a 4-digit by a 3-digit number, then sixstudent/processors are required. As above, the stu-dents sit in a row. The first student is assigned theIs column of the product, the second student the10s column, etc. Each student then adds all the pro-ducts which contribute to his or her column. Whendone with this, each student (except the first) facesto the right, as above. Then, going from right toleft, each student gives his or her carry to the personon the left, who adds it to his or her number andthen gives the carry to the person on the left, etc. Itis not possible to precompute things, as in the caseof addition, since there are many possible carries;hence, there is a serial bottleneck at the end. As anexample, suppose one wants to compute:

4352x239.

Volume 94(4), April, 1994

182 Algorithms

Figure 5. Instructions for addition of two numbers, with communication of carries within two steps of-whenthey are known.

Even Numbered Processors:1. Add the two __ digit numbers from your assigned columns.2. Now add 1 to your number and write it below the first one. (This is your answer in the event you

receive a carry.)3. On each successive odd time step:

0 If you know whether or not you are going to send a carry, look to your left.ii) If the person on your left looks toward you, then say "carry" or "no carry" as the case may be.

4. On each successive even time step:Look to your right. If the person on your right looks toward you, then choose your answer, accordingto whether he or she says "carry" or "no carry."

After you have both communicated with your left and heard from your right, then:5. Write your answer in big numbers on a fresh piece of paper. (Remember to leave off the leading 1 if

you sent a carry.)6. Now hold the paper up, with the number facing back.

Odd Numbered Processors:1. Add the two __ digit numbers from your assigned columns.2. Now add 1 to your number and write it below the first one. (This is your answer in the event you

receive a carry.)3. On each successive odd time step:

Look to your right. If the person on your right looks toward you, then choose your answer, accordingto whether he or she says "carry" or "no carry."

4. On each successive even time step:0 If you know whether or not you are going to send a carry, look to your left.ii) If the person on your left looks toward you, then say "carry" or "no carry" as the case may be.

After you have both communicated with your left and heard from your right, then:5. Write your answer in big numbers on a fresh piece of paper. (Remember to leave off the leading 1 if

you sent a carry.)6. Now hold the paper up, with the number facing back.

The first student writes down 18, the second 45+6=51, the third 27 + 15 + 4 = 46, the fourth 36 +9 + 10 = 55, the fifth 12 + 6 = 18, and the sixth 8.Then the first gives the second a carry of 1, whoadds it to 51 to get 52, and who then gives the thirdstudent a carry of 5, etc. When this process isfinished, each student holds up his or her digit (ornumber in the case of the left-most student) facingback, for the "great scorer" to see. In this case, thescorer sees the numbers 10,4,0,1,2, 8.

Module Five: Sieving For Primes And TestingWhether A Number Is Prime

The Sieve of Eratosthenes is a method forfinding all primes less than a given number. If onewants to find all primes less than, say, 100, then onefirst writes down all the numbers from 2 to 99.Then all multiples of 2 (except 2) are crossed out.The first number not crossed out is 3, so it must be

prime. Then all subsequent multiples of 3 notpreviously crossed out are crossed out. Thisprocedure is repeated for 5 and 7. Since 7 is thelargest prime not exceeding 10, the square root of100, the numbers that have not been crossed out arethe primes less than 100.

The question arises as to how best to parallelizethis procedure. Say one wants to find all the odd

primes less than Viooo . First one does a prelimi-nary computation to find all odd primes not exceed-ing 1,000: 3, 5,7,11, 13,17,19,23,29,31. Theseare referred to as the "known" primes.

One method is for the instructor to write theodd numbers from 33 to 999 on the board, assigneach of the known primes to a different student/processor, and have them cross out all multiples oftheir prime. After this has been done the numbersremaining will be the desired primes. This methodpresents logistical problems, and more importantly,it is not optimal with respect to "load-balancing."

School Science and Mathematics

Algorithms

/looo: 3,5.7,11,13,17,

The students assigned smaller primes have a lotmore work to do than the ones assigned larger ones.

A better way is to assign each processor a zoneof numbers, and to have them cross out all multiplesof each known prime in their zone. Thus, if thereare ten students, one can give each of them the in-structions shown in Figure 6 and then assign eachof them a century, giving the first student thenumber 0, the second the number 1, etc. The designof Figure 6 is based on one by Holtfreter (1988).

Figure 6. Instruction set for each century, usingzonal method to find all primes < 1,000.

Odd Primes Less Than19,23.29,31

Using the number you were given (one of {0 ...

9}), fill in your century and all odd numbers in thatcentury.

00 to 99

.07

.17

.27

.37

.47

.57

.67

.77

.87-97

�09�19

29�39

495969798999

-05-15-25-35-45-55-65-75-8595

Century: (

01112131415161718191

�.03�13

23334353637383

�93

For each of the primes p on the list at the top of thepage:

1. Find the first multiple ofp in your century by:a) dividing p into the first odd number in your

century (e.g., divide 3 into 101).b) rounding up the quotient to the next wholenumber and multiplying this number by p.Call the result q.

2. If q is even, then replace q by q + p(q := q + p). Now q is the first odd multipleofp in your century.

3. Cross out q (unless it isp itself).4. Continue by adding 2p to q and crossing out

the resulting number until you have crossedout all odd multiples ofp in your century.

Once you have completed this for all primes p at thetop of the page, raise your hand. The numbersremaining are all the odd primes in your century.

It may be desirable to write out slightly differ-ent instructions for century 0, although the onesgiven will work in that case. One can write theinstructions for each century in the form of a com-puter program, using an actual language such asPascal. This algorithm is a good example of theSCMD (Single Code Multiple Data) paradigm inparallel computing.

The students should use calculators. After theyhave found all odd primes in their century, theyshould compute how many there are and comparethis number with theoretical estimates.

Note that a student assigned a given centuryonly needs to consider the primes not exceeding thesquare root of the largest element in his or hercentury. Even so, this method is much better "load-balanced" than the procedure described first.

Once the students have found all the primesless than 1,000, it is an easy matter for them collec-tively to test any (odd) number less than one millionto see if it is prime, since if such a number isn’tprime, it must have a prime factor less than 1,000.To accomplish this, each student successivelydivides the primes in his or her century into thenumber being tested. The number is composite ifone such division "succeeds." With 20 students,each being assigned a century, one can find allprimes less than 2,000 and then test for primalityany number less than 4 million, and so forth.

Module Six: Two Sorting Algorithms Using ALinear Configuration

As in Module Three, the problem considered isthat of sorting a deck of sixteen distinctly numberedcards into increasing,order. The first algorithm de-scribed below is a compare-exchange sort. It couldbe called "Parallel Bubblesort," since it uses thesame compare-exchange primitive as Bubblesort.However, unlike Bubblesort, which is rather slow,this one is efficient, since comparisons are beingcarried out concurrently.

Sixteen students are seated in one row. Start-ing at the left end, they are alternately assigned thenumbers 0 and 1. Each of the students is given oneof the cards. Each round consists of two steps. Instep 1, each "0" compares his or her card with thecard of the "1" on the right; the "0" takes the smal-ler one and the "1" the larger one. In step 2, thisprocess is repeated between each "0" and the "1" onhis or her left, except that this time the "0" takes the

Volume 94(4), April, 1994

184Algorithms

larger and the"I" the smaller. The instructions aregiven in Figure 7.

Now the instructor starts calling out "Step 1,Step 2, Step 1, Step 2," etc., and the sort proceeds.There are times when a student makes no exchangesduring a round. When no exchanges at all are madeduring a particular round (or, in fact, during a par-ticular step, except for the first one), then the deckis sorted. The students from left to right areholding the cards in increasing order.

It is easy to see that the smallest card will be atthe left end and the largest card at the right end

after, at most, 8 rounds. In fact, it can be provedthat the entire deck will be sorted in, at most, 8rounds or 16 time steps. This compares favorablywith the numbers achieved in Module Three.

There are several other ways of implementingthis algorithm. One variation is to have one of thestudents doing the comparison send his or her cardto the other and receive the appropriate one back.Another variation is to have the students standing,with the "(Ts facing the ’T’s in a cross-stitch pat-tern. Instructions for these variations are also givenin Figure 7. For each version of this algorithm, a

Figure 7. Several patterns and instruction sets for a linear compare-exchange sort (parallel bubblesort)using 16 processors.

Linear Pattern: 0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1

Instructions:

Processors 0:Step 1. Compare your card with the student’s on your right. Take the smaller one.Step 2. Compare your card with the student’s on your left. Take the larger one.

Processors 1:Step 1. Compare your card with the student’s on your left. Take the larger one.Step 2. Compare your card with the student’s on your right. Take the smaller one.

Alternate Instructions:

Processors 0:Step 1. Receive a card from the student on your right. Compare with your own and send the larger one

back.Step 2. Send your card to the student on your left. Receive one back.

Processors 1:Step 1. Send your card to the student on your left. Receive one back.Step 2. Receive a card from the student on your right. Compare with your own and send the larger one

back.

Cross-Stitch Pattern:

11111111/\/\/\/\/\/\/\/00000000

Instructions:

Processors 0:Step 1. Compare your card with the student’s across from you and to your right. Take the smaller one.Step 2. Compare your card with the student’s across from you and to your left. Take the larger one.

Processors 1:Step 1. Compare your card with the student’s across from you and to your right. Take the larger one.Step 2. Compare your card with the student’s across from you and to your left. Take the smaller one.

School Science and Mathematics

Algorithms

method will have to be devised for determiningwhen the sorting is finished. In the first and thirdversions given in Figure 7, this can be achieved byhaving the teacher or other students watch for twoconsecutive time steps during which no exchangestake place. For all implementations, the processorsat each end will need slight modifications to theirinstructions.

A Minimum Delay Systolic Sort

Now a description is given of a systolic sortwith minimum delay. The "delay" is defined to bethe number of steps after the data has been input be-fore output begins. The instructions for each stu-dent and a sample run can be found in Figure 8. Asabove, the sixteen students are lined up in a row.The teacher inputs the sixteen cards one by one tothe person at the right end of the line, while callingout the time steps. Each student, except the one onthe left end, does nothing until receiving two cardsfrom the right. Then, for a certain number of stepsthey pass their larger card to the left while at thesame time receiving one from the right. When theystop receiving a card from the right they will haveone card, and, on alternate time steps, they will passa card to the right and receive one from the left.After they stop receiving a card from the left, theydo nothing for the remainder of the time. When thelast card has been input, then, after a one-step delay,the cards will be output every other step in increas-ing order.

An important concept in parallel algorithms isthe notion of "speedup." It is defined to be thenumber of steps (as a function of the input) requiredby the best serial algorithm, divided by the numberof steps required by a given parallel algorithm thataccomplishes the same task. Perfect speedup isachieved when this ratio is on the order of thenumber of processors. The sorting algorithmsdescribed in this module and in Module Three donot come close to achieving this, mainly becausethere are not enough communication links betweenprocessors. Algorithms for sorting which achieveperfect speedup do exist for a few architectures;however, they are very complicated to describe orimplement and are of mostly theoretical interest atthis time. There is a practical algorithm, BitonicSort, which comes close to perfect speedup. (Seee.g., Bachelis et al., 1992; Bitton, DeWitt, Hsiaoand Menon, 1984; or Knuth, 1973). It uses thesame compare-exchange primitive as "ParallelBubblesort," but the routing is more complicated.The instructions for acting this out are given inBachelis et al. (1992) and are suitable for highschool as well as college classes.

Conclusion

The activities and algorithms that have beenpresented here can be modified or embellished invarious ways, some of which have already beenindicated. In addition, there are other algorithmsthat can be acted out to introduce students to co-

Figure 8. Instructions/or linear systolic sort. Sample runfor 16 cards and 16 processors.

Instructions:1. If you ever have two cards, pass the larger to your left in the next time step.2. If, after you have started receiving cards from the right, you fail to receive one from the right, then

during each succeeding time step, pass one to the right whenever you have one to pass.

Sample Run for 16 Cards and 16 Processors (each "_" represents a processor):

Start:_8,11,13,1,12,10,5,6,18,3,14,16,22

After:8 steps:16 steps:18 steps:20 steps:

____________m21UO 8^ Lfi 18,3,14,16,22,2,7.913.1812.1110.148.166.225.32.71.9

18.1312.1411.1610.228.65.73.92 118.1413.1612.2211.108.76.95 3 2.1

31 steps: 221&_16_M_li_12_ll_lfi _-9 8.7,6,5,3,2,132 steps: 22 _1&_1^_ 14 _11_ 12_H_ 10_ 9,8,7,6,5,3,2,148 steps: ________________ 22,18,16,14,13,12,11,10,9,8,7

Volume 94(4), April, 1994

Algorithms

operative computing activities. For example, usinga binary tree structure, one can do database opera-tions systolically (Bachelis et al., 1992). On a two-dimensional array, one can consider routing prob-lems and operations on matrices. It is worth notingthat, in general, routing is a special case of sortingin which the ordering is determined by the destina-tion of each item; thus, routing (of a complete set ofitems) can always be done at least as fast as sorting,for a given configuration.

If one has access to a local area network in acomputer lab or media center, and if it is possiblefor the different micros to communicate with oneanother and if each one has a "pop-up" calculator,then many of the above activities can be upgradedin terms of problem size and speed of computation.Calculations are done on the micros and communi-cation is achieved via the network. This upgradingapplies in particular to finding the minimum in con-stant time and sieving for primes. Also, for ex-ample, one can consider running parallel bubblesortwhere each processor has many data elements andnot just one; the "compare-exchange" primitivewould then consist of exchanging and then mergingsorted blocks of data, rather than just single ele-ments.

Although, no doubt, many individual teachersat times have their students "act out" processes tohelp illustrate what is going on, the authors know ofno existing educational materials which systemati-cally approach the teaching of students aboutalgorithms, especially parallel ones, in the mannerdescribed here. Some computer offline activitiesare discussed in Erickson (1986).

Two good college-level parallel algorithm textsare Aki (1989) and Leighton (1992). A number ofcollege algorithm texts, such as Baase (1988),Cormen, Leiserson and Rivest (1990), Manber(1989), and Sedgewick (1988) contain material onparallel algorithms.

References

Akl,S.G. (1989). The design and analysis ofparallel algorithms. Englewood Cliffs, NewJersey: Prentice-Hall.

Baase, S. (1988). Computer Algorithms (2nd ed.).Reading, MA: Addison-Wesley.

Bachelis, G., James, D., Maxim, B. & Stout, Q.(1988, December). Making parallel sorting

algorithms come alive. The MACUL Newslet-ter, 20-21.

Bachelis, G., James, D., Maxim, B. & Stout, Q.(1989, Winter). Cooperative computingactivities for the mathematics classroom.Mathematics in Michigan, (MCTM), 3-8.

Bachelis, G., James, D., Maxim, B. & Stout, Q.(1990, Spring). Bringing computing algo-rithms to life. Factorial, (DACTM), S-19.

Bachelis, G., James, D., Maxim, B. & Stout, Q.(1992). A novel approach to introducingparallel algorithms in undergraduate computerscience courses. Computer Science Education,3,17-32.

Bitton, D., DeWitt, D. J., Hsiao, D. K. & Menon, J.(1984). A taxonomy of parallel sorting.Computing Surveys, 76(3), 287-318.

Cormen, T. H., Leiserson, C. E. & Rivest, R. L.(1990). Introduction to algorithms. NewYork: MIT Press/McGraw-Hill.

Erickson, T. (1986). Off and running: The com-puter offline activities book. Berkeley, CA.:University of California,.

Holtfreter, T. (1988). [West Bloomfield HighSchool] Unpublished classroom notes.

Knuth.D.E. (1973). In The art of computerprogramming: Vol. 3. Sorting and searching.Reading, MA: Addison-Wesley.

Leighton, F. T. (1992). Introduction to parallelalgorithms and architectures. San Mateo, CA:Morgan Kaufmann.

Manber, U. (1989). Introduction to algorithms.Reading, MA: Addison-Wesley.

National Council of Teachers of Mathematics.(1989). Curriculum and evaluation standardsfor school mathematics. Reston, VA: Author.

Sedgewick, R. (1988). Algorithms (2nd ed.)Reading, MA: Addison-Wesley.

Valiant, L. G. (1975). Parallelism in ComparisonProblems. SIAMJ. Computing, 4, 348-355.

Note: The authors’ addresses are Gregory F.Bachelis, Dept. of Mathematics, Wayne State University,Detroit, MI 48202; David A. James, Dept. of Mathemat-ics and Statistics, The University of Michigan-Dearborn,Dearborn, MI 48128; Bruce R. Maxim, Dept. ofComputer and Information Science, The University ofMichigan-Dearborn, Dearborn, MI 48128; Quentin F.Stout, Electrical Engineering and Computer ScienceDepartment, The University of Michigan, Ann Arbor, MI48109

School Science and Mathematics