A Parallel Euclidean Distance Transformation Algorithm

COMPUTER VISION AND IMAGE UNDERSTANDING

Vol. 63, No. 1, January, pp. 15–26, 1996ARTICLE NO. 0002

A Parallel Euclidean Distance Transformation AlgorithmHUGO EMBRECHTS AND DIRK ROOSE

Katholieke Universiteit Leuven, Dept. Computerwetenschappen, Celestijnenlaan 200A, B-3001 Heverlee, Belgium

Received May 10, 1993; accepted November 18, 1994

and algorithms to calculate it. The reported machine visionapplications are nearest-neighbor interpolation, patternWe present a parallel algorithm for the Euclidean distance

transformation (EDT). It is a ‘‘divide-and-conquer’’ algorithm matching, and morphological processing. Other applica-based on a fast sequential algorithm for the signed EDT tions are robot collision avoidance, path finding, and aerial(SEDT). The combining step that follows the local partial calcu- image registration. From the DT one easily derives thelation of the SEDT can be done efficiently after reformulating medial axis transform (MAT), which serves as a basis forthe SEDT problem as the partial calculation of a Voronoi skeletonization. The latter is an important means for ex-diagram. This leads to an algorithm with two local calculation

tracting structural features, for image compression andsteps with a computational complexity proportional to the num-describing shapes.ber of pixels of the subregions and a global calculation step with

complexity proportional to the image perimeter. This articlecontains a description of the algorithm, a complexity analysis,a discussion on load imbalance, and timings obtained on an 2. THE SEQUENTIAL ALGORITHMiPSC/2. 1996 Academic Press, Inc.

The sequential algorithm that we use as the basis of ourparallel algorithm is the 4SSEDT (sequential signed EDT)

1. INTRODUCTION algorithm [6] derived from the algorithm of Danielsson[1]. Rather than the EDT it calculates the signed EDTA distance transformation (DT) converts a binary image,(SEDT), which is a related transformation. Instead of theconsisting of foreground and background pixels, into andistance to the closest foreground pixel this transformationimage where all background pixels have a value equal toprovides for any pixel a vector pointing to the closest fore-the distance to the nearest foreground pixel.ground pixel. Its name stems from the fact that the vectorTo allow faster computation of the DT the normal Eu-components are signed values. The 4 in 4SSEDT refers toclidean distance is often approximated by computationallythe fact that the algorithm is based on a propagation ofless expensive distance measures such as the City Blockdistances over 4-connected neighbors.and the Chamfer 3-4 distance. Recently, however, Leym-

The algorithm consists of two passes during which thearie and Levine [6] modified the sequential algorithm ofimage is traversed, first from top to bottom, and secondDanielsson [1], which approximates the Euclidean DTfrom bottom to top. In both passes every row is traversed(EDT) with a negligible error, to make it run nearly asconsecutively from left to right and from right to left. Eachfast as an implementation with the Chamfer 3-4 distancepixel with coordinates (i, j), i.e., the pixel on row i andmeasure. Because of its properties the Euclidean distancecolumn j, is assigned a vector LW (i, j) that, at the end, pointsmeasure is considered to be superior to other measures.from (i, j) to the closest foreground pixel in the image.One of the desired properties is, e.g., that the EDT isForeground pixels are assigned 0W at once. The bulk of therotation invariant. The quality of the approximating mea-operations consists in comparing the distance from a pixelsures is illustrated in Fig. 1.(i, j) to (i, j) 1 LW (i, j) with the distance from (i, j) to (i9,In the past we have developed parallel algorithms forj 9) 1 LW (i9, j 9) with i9 5 i 6 1 and j 9 5 j 6 1. If some ofthe DT with the City Block and the Chamfer 3-4 distancethe latter distances are smaller, then LW (i, j) is replaced bymeasures [4, 2, 3]. Here we present a parallel algorithm(i9, j 9) 1 LW (i9, j 9) 2 (i, j), with (i9, j 9) chosen to yield thefor the Euclidean DT that is based on the mentioned newsmallest distance. This is illustrated in Fig. 2 for (i9, j 9)sequential implementation.equal to (i 2 1, j) and (i, j 2 1). In this section we will writeThe DT is a basic operation in image analysis, where it

has numerous applications. Recent publications [7, 8, 9,12] show ongoing research on its properties, applications, aW r min(bW, cW)

151077-3142/96 $18.00

Copyright 1996 by Academic Press, Inc.All rights of reproduction in any form reserved.

16 EMBRECHTS AND ROOSE

FIG. 1. The DT of an image with one foreground pixel centered in the middle of the image for (a) the City Block, (b) the Chamfer 3-4, and(c) the Euclidean distance. Growing distance is represented by a graytone repeatedly varying from black to white (to accentuate the contours ofthe DT).

as a short notation for that runs about as efficiently as any implementation ofthe distance transformation with the Chamfer 3-4 distancemeasure [6].aW r arg min

xW[hbW ,cWjuxWu.

If we want to obtain the SEDT values of only the pixelsof a particular subregion, we can use the 4SSEDT algo-

The actual operations during the first pass are rithm as soon as we know the SEDT values of the borderpixels of this subregion. Application of (1), (2), and (3) to

LW (i, j) r min(LW (i 2 1, j) 1 (21, 0), LW (i, j 2 1) (1) the internal pixels of the subregion propagates distances1 (0, 21)), from the subregion borders to the internal pixels where

necessary, i.e., for those background pixels for which theLW (i, j) r min(LW (i, j), LW (i, j 1 1) 1 (0, 1)) (2) closest foreground pixel lies outside the subregion. This

will be used in the parallel algorithm that we present.while traversing row i from left to right and from right toleft, respectively. During the second pass we have

3. TOWARD PARALLELIZATION

LW (i, j) r min(LW (i, j), LW (i 1 1, j) 1 (1, 0),(3) 3.1. Divide and Conquer

LW (i, j 2 1) 1 (0, 21))In a classical ‘‘divide-and-conquer’’ algorithm a partition

P is chosen that divides the image into as many subregionsand again (2), respectively. Vectors associated with pixelsas there are processors available. The operation to be par-outside the image are excluded from the minimization. Forallelized, in our case the SEDT, is initially executed onthe left upper corner pixel (0, 0) both pixel positions ateach subregion separately and the results, i.e., the localthe right-hand side of (1) are outside the image. If thisSEDTs, are subsequently used in one way or another tocorner pixel is a background pixel, it must be assigned acompute the global SEDT of the image. The necessity ofvector with infinite components, in practice, a vector withthe combining step follows from the fact that the closestcomponents twice the largest image size.foreground pixel for a particular background pixel may lieThis algorithm is not error-free as some of the algorithmsin another subregion. The local SEDT then assigns a valuein [9, 11]. However, the error is known to be smaller thanto this pixel that is too large.half the pixel distance, which is negligible for most applica-

This scheme is often optimized in the sense that thetions [1]. Careful programming allows an implementationinitial local operation is executed only partially; the partiallocal results are transformed by a global operation thatprovides for each subregion sufficient information for thelocal calculation of the global operation. For the SEDTcase, the knowledge of, e.g., the global SEDT values ofthe subregion border pixels is sufficient for calculating theglobal SEDT locally, as is explained in Section 2. For thecalculation of the global SEDT border values we use thenotion of a Voronoi diagram which allows a less redundantFIG. 2. The calculation of LW(i, j) during a rightward line scan of

the first pass. representation of the same information.

PARALLEL ALGORITHM FOR EDT 17

3.2. Voronoi Diagrams and the SEDT or more border names to select only those VCs that crossthe named border(s).

Pixels are often seen as little squares that touch oneIn order to make the algorithm description less cumber-

another by sides and corners. In this paper we will associatesome we use the symbols 9marg

s and 9margsri in two senses.

pixels with the center of these square boxes; for instanceAs there is for any of the 9marg

s and 9margsri a one-to-one

when we say that a pixel is contained in a subset of thecorrespondence between its cells and the foreground pixels

plane, we mean that its center is contained in this subsetthat those cells contain, we use the symbols 9marg

s andand not the whole box. By subregion border we mean the

9margsrs sometimes for the sequence of the VCs and some-

rectangle that passes through the top and bottom pixeltimes also for the sequence of the associated foreground

row and the leftmost and rightmost pixel column of thepixels (VPs).

subregion. However, for convenience, the graphical repre-sentation of pixels as boxes will be used in some of thefigures. 3.3. Algorithms for Some Kernel Problems

A Voronoi diagram (VD) [10] of a discrete set of pointsThe presented algorithm consists partly of solving a num-in the plane, the Voronoi points (VPs), is defined as the

ber of subproblems similar to the problems describedpartitioning of the two-dimensional plane into a numberbelow.of cells, referred to as Veronoi cells (VCs). Each VC con-

tains one VP and all points of the plane for which this VP Problem 1. Assume a line l and a sequence r of pointsis the closest VP. See Fig. 3b for an example. hpiji51,2, . . . ,n with orthogonal projections on l that are or-

Consider the VD of the foreground pixels of an image dered along l. Then consider the VD of the points of r andi and let 9i denote the set of the cells of this VD. There find the subsequence of r formed by the points with VCsis a simple relationship between 9i and the SEDT. From that cross l. (If the common border between two neigh-9i one trivially finds the SEDT; any pixel p is assigned a boring VCs coincides with l, one of both is arbitrarily se-vector pointing to the VP of the cell to which it belongs, lected for the subsequence.)as this VP is the foreground pixel closest to p.

As mentioned, we use VDs for the calculation of the Solution. From any two successive points of r with or-thogonal projections that coincide, we can eliminate theglobal SEDT values at the subregion borders. We are

therefore interested in finding the VCs that intersect the point that is further away from l, or, if they are on equaldistance from l, arbitrarily one of both. In what followssubregion borders. The sequence of the VCs that intersect

the border of a particular subregion s, ordered either clock- we can therefore assume the orthogonal projections of rto be strictly ordered. A sufficient condition for decidingwise or counterclockwise, is called 9marg

sri . Note that someof the foreground pixels associated with these cells lie that the VC around a point pi does not contain a part of

l is that pi is hidden from l by two points pj and pk , i.e.,inside s, some outside s. An intermediate step for finding9marg

sri is to find for each subregion s separately the cells of the orthogonal bisector of pi and pj and the orthogonalbisector of pi and pk delimit a space around pi that does9s (not 9i) that cross the subregion border. See Fig. 3c.

(9s denotes the VD cells of the foreground pixels lying in not contain a part of l, as is illustrated by Fig. 4. Thisfollows from the property that a VC around a point pi cansubregion s.) These cells are further referred to as the

9margs . Their calculation is described in Section 5. The tran- be found by intersecting the half-planes delimited by the

orthogonal bisectors of pi and any pj , selecting those half-sition from 9margs to 9marg

sri is described in Section 6. Some-times we replace the superscript in these symbols by one planes containing pi .

FIG. 3. (a) An image i with foreground pixels represented as black squares. (b) The VD (restricted to i) of the foreground pixels of i. (c) TheVD of the four quarters of i. The cells that cross a border of one of the quarters are shaded.


Solution.A. We first determine whether the VC of p crosses l,

or not. This is done by a simple test that, however, dependson the position of p with respect to the qi. We first describethe treatment of some special cases.

Assume first the trivial case where the projection of ponto l coincides with the projection of one of the qi , sayqk. If p is further away from l than qk , its cell can notcontain any point of l and p is not a point of the solutionsequence. If it is close to l, however, it is a part of theFIG. 4. pi is hidden from l behind pj and pk , because the shaded area

(and necessarily also the VC around qi) does not contain a point of l. solution sequence and qk is not.In a second trivial case, the projection of p onto l does

not lie between the projections of q1 and qm. Then the VCof p contains at least a half-infinite part of l. Consequently,The solution is found in n steps. In step k (k 5 1, 2,in this case, p belongs to the solution sequence.. . . , n) we solve the subproblem in which only the first k

In the regular case, the projection of p lies between thosepoints of r are considered. Steps 1 and 2 are trivial. Forof qk and qk11 for some k. Then the presence of p in thestep k (k . 2) we start from the sequence obtained in thesolution sequence follows from whether p is hidden fromprevious step. Before adding pk we remove repeatedly thel by qk and qk11. If it is hidden it can clearly be excludedlast element of this sequence, each time this element isfrom the solution sequence. If it is not hidden by qk andhidden from l by its predecessor in this sequence and pk ,qk11, p belongs to the solution sequence, as we now prove.until either this test fails or the last element has become

Let f be the point where the orthogonal bisector of qkthe only element in the sequence. This is illustrated in Fig.and qk11 crosses l, as shown in Fig. 6. Points qk or qk11 are5, where hq1 , . . . , q5j is the sequence obtained at the endtherefore on equal distance from f. All other qi (i , k andof step k 2 1. It is clear that (a) the VC of the removedk 1 1 , i) are further away. It is now easily shown thatpoints do not contain a part of l and that (b) the orthogonalthe circle through qk and qk11 centered at f discriminatesbisectors of successive points in the remaining subsequencebetween points hidden from l by qk and qk11 (darklyintersect l at points that are ordered along l. These observa-shaded) and those that are not (lightly shaded). It is there-tions prove the correctness of the algorithm.fore clear that if p is not hidden from l by qk and qk11 ,The execution time of this algorithm is linear in thethen p is closer to f than qk and qk11 and consequently alsonumber of points of r, as each point is inserted once andcloser than all other qi . Therefore the VC of p crosses lremoved at most once. nas it contains at least f.

In the next problem we assume that we have found aB. We further assume that p is part of the solutionsolution to Problem 1 and we now want to insert a new

sequence and describe the further treatment of the prob-point into the solution sequence.lem for the regular case. The special cases are treated simi-larly.Problem 2. Assume a line l and a sequence of points

hqiji51,2, . . . ,m with orthogonal projections on l that are or- The introduction of p to the sequence may exclude someof the hq1 , . . . , qkj and some of the hqk11 , . . . , qmj fromdered along l and with VCs that all cross l. Then insert a

new given point p into the sequence and find again the the solution sequence. As in the solution to Problem 1,the exclusion of points is started from the points next tosubsequence formed by the points with VCs that cross l.

FIG. 5. Removal of points before adding pk to hq1 , . . . , q5j.


Step III of the parallel algorithm is done by executingthe 4SSEDT algorithm on each subregion starting fromthe original image data and the global SEDT values of thesubregion border pixels which are calculated from 9marg

sri .

5. CALCULATION OF 9margs

5.1. Introduction

In order to find a suitable algorithm for obtaining9marg

s and to prove its correctness, we make use of a Voro-noi property. As stated above, 9marg

s consists of those sub-region foreground pixels with a VC that contains pointsof the subregion border. The VC enclosing a particular

FIG. 6. The darkly shaded areas contain the points hidden from l by subregion border point can be found by investigating theqk and qk11 . This area is delimited by the circle trough qk and qk11 centered subregion pixels along a circle arc that is centred at p andat f. In the left figure qk and qq11 lie at one side of l. In the right figure that expands starting from zero radius until it passes overthey are at different sides.

a foreground pixel. This foreground pixel is then the VPwhose cell contains p. If the circle passes over severalforeground pixels (VPs) at the same time, p lies on theboundary of the corresponding VCs.p. From hq1 , . . . , qkj we remove repeatedly the last element

Instead of scanning pixels under an expanding circleeach time it is hidden from l by its predecessor in thisarc, one can imagine that one lets circles expand from allsequence and p, until either this test fails or either the lastboundary points at the same time. The result is a waveelement has become the only element in the sequence. Astarting at the subregion border and passing over entiresimilar procedure applies to hqk11 , . . . , qmj starting at qk11 .pixel rows and columns.The correctness of this algorithm follows from the same

This idea inspired the algorithm presented in this section.observations made above for Problem 1.The first part of the algorithm consists in finding the mini-The complexity of this algorithm is O(1) 1 O (the num-mal bounding box enclosing all foreground pixels. Theber of points removed from hq1 , . . . , qmj). nproblem of further investigating the pixels inside thebounding box can then be uncoupled into scanning smallerareas inside this bounding box, which are delimited by the4. OVERVIEW OF THE PARALLEL ALGORITHMforeground pixels found on the bounding box.

The points that will be found to belong to 9margs areThe algorithm we propose consists of three steps:

maintained in a circular list. The order of the points in +I. The calculation of 9marg

s for each subregion s [ P will be the order in which the corresponding cells crossseparately. the subregion border (either clockwise or counterclock-

wise).II. The combination of the 9margs to obtain the 9marg

sri .

III. The calculation of the global SEDT values for eachsubregion s [ P from the corresponding 9marg

sri and the 5.2. Bounding Box and Windowsinput values of the pixels in s.

In order to find the bounding box we scan successivesubregion pixel rows starting from the top pixel row untilThe first step could be done by executing the 4SSEDT

algorithm on each subregion, retaining the border values a row with one or more foreground pixels is found. Thisis repeated starting from the other borders, as shown inand deriving 9marg

s from these border values. However, inSection 5 we present a shorter one-pass algorithm which Fig. 7a.

Clearly all foreground pixels on the bounding box belongtraverses each pixel at most once.For step II we present a hierarchical algorithm in Section to 9marg

s , so they are the first points to be included into +.Furthermore the knowledge of the position of the6. We calculate 9marg

s9 recursively for s9 being the unionof a number of subregions hsjj for which the 9marg

sjare bounding box and the foreground pixels on it allows us to

uncouple the problem of finding the remaining points ofalready determined. This is done until s9 is the whole image.Then we have 9marg

i 5 9margiri . In a downward pass 9marg

s9ri is 9margs . Consider for this purpose the projections of the

foreground pixels lying on the bounding box onto the sub-calculated for s9 being the same unions of subregions, butpassed through in reverse order. region border, as shown in Fig. 7b. The projection points


FIG. 7. (a) Retrieval of the bounding box (indicated by a dashed line). (b) The areas that need to be scanned further.

divide the subregion border into pieces that we call inter- 5.3. Windows not Containing a Cornervals (i1, i2, . . . in Fig. 7b). The problem of finding theremaining points of 9marg

s , i.e., the foreground pixels with In this section we assume a horizontal window [a, b]without a corner. Vertical windows are treated similarly,VCs that crosses the subregion border, is split into finding

the foreground pixels with VCs that cross the respective except that ‘‘column’’ must be read where ‘‘row’’ iswritten.intervals. For each interval we can delimit an area in which

these foreground pixels lie, as we now explain. These areas The pixels behind [a, b] are scanned along adjacent par-allel rows starting from the row next to ab, until a row isare shaded in Fig. 7b.

Let [c, d] be one of the intervals. To this interval corre- found that contains one or more foreground pixels. Wewill refer to these foreground pixels as candidates (forsponds a piece of the bounding box which is enclosed

between the foreground pixels of which c and d are the 9margs ). They do not all necessarily belong to 9marg

s , as oneof the candidates may be hidden from the subregion borderprojections. We further call such a piece of the bounding

box a window. These windows are indicated by w1, w2, by another candidate and one of the window end points (aor b). This is the case in Fig. 8a for the rightmost candidate,. . . in Fig. 7b. It is now easily proved that the foreground

pixels with VCs that cross [c, d] all lie inside the area which is hidden by the middle candidate and b. Whichcandidates belong to 9marg

s can be determined by insertingbehind the window delimited by (a) the bounding box and(b) a circle arc through the window end points (a and b). each candidate into + one by one with the procedure

described in Section 3.3 for Problem 2.The center of this circle is at the point of the subregionborder through which the orthogonal bisector of a and The remaining candidates together with a and b give

rise to new windows behind which subregion pixels needb passes.The way in which the area behind a window is scanned to be scanned, see Fig. 8b. The new search areas are again

delimited by (a) circle arcs through the new window enddepends on whether the windows contain a corner of thebounding box. points centered on the subregion border and (b) the pixel

FIG. 8. Investigation of the area behind a window [a, b] not containing a corner. The foreground pixels found on the fourth row break up thewindow into three other windows.


FIG. 9. The insertion of q into + causes p to be removed.

row (column) parallel to this border and through the new Part a takes time proportional to the number of subre-gion pixels or less. The excecution time of part b is at mostwindow end point furthest away from this border.

Notice that some of the points added to + may have to proportional to the subregion border length. The numberof operations of part c is proportional to the number ofbe removed again later, because they are found not to

belong to 9margs . This is the case when a point of + becomes encountered foreground pixels, as each of these pixels is

inserted into + at most twice (some of the window areashidden by a newly inserted pixel and an old one, see Fig. 9b.This case is handled correctly by the algorithm described so can overlap and contain common foreground pixels) and

they are removed at most twice. The number of encoun-far. The procedure that inserts a new point (q) takes careof the removal of all points that may become hidden by tered foreground pixels can be proved to be at most the

number of subregion border pixels.its insertion.As a consequence the computational complexity of the

5.4. Windows Containing a Corner presented algorithm for finding 9margs is

Consider now a window [a, b] containing a corner. As-t pp 5 an2

s 1 bns , (4)sume that a is the end point that is closer to the corner,as in Fig. 10. In the area behind this window successiverows (columns) are scanned starting from the row next to where ns is the subregion size and a and b are data-depen-b, until a row is found that contains one or more foreground dent constants (pp is an abbreviation of ‘‘preprocessingpixels, which are candidates for 9marg

s . If this row contains step’’). The asymptotical complexity is therefore O(n2s).

a or lies above a, we proceed exactly as in the case where However, as b is much larger than a, the linear term is(a, b) would be a window without a corner, as described fairly important, especially for small subregion sizes.above. The case where this row lies below a is also treatedthe same way except that the leftmost candidate can not

6. TRANSITION FROM 9margs TO 9marg

sribe hidden by any pair of points and it forms with a awindow that has to be treated again as a window with a The algorithm that we propose is based on a hierarchicalcorner as described in this subsection. subdivision of the image. Consider therefore a sequence

of gradually coarser rectangular partitions hPljl51,2,. . . ,L11 of5.5. Computational Complexity

the image, as in Fig. 11. The finest partition P1 is thepartition introduced above as P containing as many subre-The algorithm described here consists of (a) scanning

subregion pixels, (b) delimiting areas to be scanned, and gions as there are processors available. Each of the otherpartitions Pl (l . 1) consists of subregions that are each(c) operations on +.

FIG. 10. A window [a, b] containing a corner. The first line containing a foreground pixel runs either (a) through or above a or (b) below a.


FIG. 11. A recursive image partitioning with three levels (L 5 2).

FIG. 12. If a foreground pixel q of b lies above and to the left of p,the VC around p, which must lie in the shaded area, cannot cross thenorthern border of a.the union of two subregions of Pl21. The coarsest partition

PL11 contains the image itself as the only subregion. Forthis way of partitioning it is necessary that the number ofsubregions is a power of two (2L). Other partitionings are

with 9northa and the algorithm of Section 3.3 is executed onalso possible, but we restrict the algorithm description to

the resulting sequence with the line through the northernbinary partitionings.border of a < b as line l.We derive the 9marg

sri for s [ P1 from the 9margs by a

hierarchial algorithm with an upward and a downward 6.1.2. Calculation of 9westa<b . The points of 9west

a<b are apass. During the upward pass the sets 9marg

s are calculated subset of 9westa and 9west

b . They can be found by mergingrecursively for s being the subregions of the successive 9west

a and 9westb to a vertically ordered sequence and

partitions P2, P3, . . . , PL11. Because PL11 contains the applying the algorithm of Section 3.3 with the line throughwhole image as the only subregion, we have obtained the west border of a < b as line l. The cells of the resulting9marg

i 5 9margiri . From this set a downward pass starts in sequence that cross l above or below a and, therefore, do

which the 9margsri are calculated for s [ PL, PL21, . . . , P1. not cross the west border of a < b, are eliminated.

For the description of the two passes we consider asubregion of Pl for l , l # L 1 1, which is the union oftwo subregions a and b of Pl21. Assume, without loss of 6.2. The Downward Passgenerality, that a and b have a vertical common border.

The points of 9north,west,southari and 9north,east,south

bri can be cop-During the upward pass 9marga<b is derived from 9marg

a andied from 9marg

a<bri. Less trivially 9eastari and 9west

bri are found.9margb . During the downward pass we derive 9marg

ari andWe describe the calculation of 9east

ari . 9westbri is found similarly.9marg

bri from 9marga<bri, 9marg

a , and 9margb . The latter two were

The points of 9eastari are a subset of 9east

a <found during the upward pass.9north,west,south

ari < 9westb < 9north,east,south

bri . From 9northbri only

those points p need to be considered for which no otherpoint q in 9north

bri exists that lies below and to the left of p.6.1. The Upward PassThis selection is done as in Section 6.1.1. A similar treat-

We describe how 9northa<b and 9west

a<b are found. Similarlyment is applied to 9south

bri . The points that have been selectedone finds 9south

a<b and 9easta<b.

in this way from 9north,southbri are concatenated with 9east

bri withexclusion of those points of the latter sequence that lie6.1.1. Calculation of 9north

a<b . We describe how the leftpart of 9north

a<b can be found, i.e., those cells that cross the above a point of 9northbri or below a point of 9south

bri .In a similar (symmetrical) manner points are selectedtop border of a. The right part, i.e., those cells that cross

the top border of b, is found similarly. from 9north,west,southari . These sequences, selected from

9north,east,southbri and 9north,west,south

ari , are merged with 9easta andThe points of the left part of 9north

a<b form a subset of9north

a < 9westb . From 9west

b we only need to consider those 9westb to a vertically ordered sequence and on the result

the algorithm of Section 3.3 is executed with the east borderpoints p for which no other point q in 9westb exists that lies

above and to the left of p. From Fig. 12 it is readily seen of a as line l.We remark that the points from 9north,west,south

ari can asthat a point p for which such a point q does exist has aVC that does not cross the northern border of a, as the well be neglected. The difference is that the found set

9margsri are incorrect in the following sense: a VP outside sorthogonal bisector of p and q separates the VC of p from

the northern border of a. with a cell that crosses the border of s more than oncemay appear in 9marg

sri only once at the position correspond-The remaining points of 9westb are found by traversing

9westb from top to bottom and accepting a point each time ing to the place closest to the VP. This has no effect on

the final SEDT values, however, as during step III, distanceit lies more to the left than the previously accepted point.In this way we obtain a sequence of points ordered horizon- information is propagated by the 4SSEDT algorithm

through the subregion from one border to the other.tally (from right to left). This sequence is concatenated


7. PROCESSOR TASK DISTRIBUTION sent. The amount of data sent on recursion level l is atmost proportional to the size of a subregion of Pl, being

Each processor is assigned one subregion s of P1 of whichit receives the image data and for which it computes

Sl 5 O S n2(L2l)/2D. (5)9marg

s . In order to compute the 9margsri recursively as de-

scribed above (step II), processors have to cooperate.The operations of step II on Pl (1 # l # L 1 1) are

Thereforedone with one processor per subregion, which means that2L2(l21) out of the 2L available processors are active whileprocessing on Pl. Processors assigned to a pair of subre- ttransfer 5 O SOL

l51

n2(L2l)/2D5 O(n). (6)

gions a and b of Pl, the union of which is a member ofPl11, exchange the sets 9marg

a and 9margb in order to each

compute a part of the 9marga<b during the bottom-up pass. The computational complexity is also O(n) as the data

One of the processors of each pair sends its computed are processed in linear time.values to the other and becomes idle, while the other proc- In summary, the global computations take time propor-essor continues execution on recursion level l 1 1. During tional to the image perimeter (O(n)), whereas the localthe top-down pass the latter processor activates the former computations take time proportional to the image areaby sending 9marg

a<bri. (O(n2)). Therefore the global computations become negli-On a hypercube multiprocessor this scheme can be opti- gible with respect to the local computations for large im-

mized to eliminate communication during the top-down age sizes.pass. With this optimization all processors remain activeand a processor executes the same code as the processor 9. TIMING AND EFFICIENCY RESULTSto which it sends results in the scheme described above.

In this section we present execution time measurements.For these measurements we used the images shown in8. ASYMPTOTICAL COMPLEXITYFig. 13 and two artificial images, one entirely filled withforeground pixels (FULL) and one with one foregroundWe study the asymptotical complexity of the presentedpixel centered in the middle of the image (CENTRE). Thealgorithm for an image with n 3 n pixels on a machineDT of the latter is shown in Fig. 1. Measurements werewith N processors. More precisely, we examine how thedone for several image sizes in order to verify scalability.execution time for a particular number of processors N

In this section we will pay special attention to the balanceevolves when the image size becomes large.between the different parts of the algorithm.The algorithm consists of (a) local computations, i.e.,

step I (calculation of 9margs ) and step III (calculation of

9.1. The 4SSEDT Algorithmthe global SEDT from the 9margsri ), and (b) global computa-

tions, i.e., step II (combination of the 9margs to obtain the The overall speed of the parallel algorithm is mainly

9margsri ). determined by the speed of the heart of it, the 4SSEDT

algorithm (step III). Its execution time t4SSEDT is propor-8.1. Local Computations tional to the number of subregion pixels but also dependent

on the image content. This dependence is mainly deter-From the discussions in Sections 2 and 5.5 it follows thatmined by the ratio r of the number of background pixelsthe calculation of 9marg

s (step I) as well as the calculationto the total number of pixels,of the SEDT (step III) can be performed in an amount of

time proportional to n2s , the number of pixels in a subre-

t4SSEDT 5 a9n2s(1 1 b9r), (7)gion. However, step I may require less time, see Section 9.2.

8.2. Global Computations

The transition from 9margs to 9marg

sri consists of computa-tion and communication. The latter can be divided intothe initiation and the actual transfer of messages.

Since the number of messages sent on each recursionlevel is constant and since initiating a message takes con-stant time, the total start-up time is proportional to thenumber of recursion levels L 5 log2 N.

FIG. 13. Test images referred to as BOLTS and CHROMOSOMES.The transfer time is proportional to the amount of data


TABLE I with tsequ as the execution time of the best-known sequen-The Execution Time in Milliseconds of the tial algorithm and tN as the execution time of the parallel

4SSEDT Algorithm on One Processor of the iPSC/ algorithm for N processors, is shown in Fig. 15 for four2 for Images with 256 3 256 Pixels images, as a function of the size of the image (n).

From the asymptotical complexity formulas of SectionExecution time 4SSEDT8 we learn that for large image sizes the execution time of

FULL 1289 the global computations is negligible with respect to theCENTRE 1742 execution time of the local computations (step I and III).BOLTS 1684

Therefore the ratio of the execution times of the latterCHROMOSOMES 1711two determines the asymptotical efficiency. For FULL thecomplexity of Step I reduces to a linear term, which givesrise to an asymptotical parallel efficiency that tends to100%. For the other images the asymptotical parallel effi-with a9 and b9 as data-independent constants and ns as theciency apparently lies between 80% and 90%.subregion size. The constants can be derived from Table

For smaller images the calculation of 9margs (step I) be-1 which yields the execution time of the 4SSEDT algorithm

comes more important compared to the 4SSEDT part (stepon one processor of the Intel iPSC/2 for the four testIII) because of the linear term in (4). Also the time takenimages. (There is only one subregion containing the entireby global computations (step II) becomes relatively sub-image.) For our implementation b9 is apparently approxi-stantial. The latter execution time is, however, data depen-mately 0.35.dent. If for instance the 9marg

s generally contain few or nopoints, the global computations reduce to the communica-9.2. The Calculation of 9marg

stion latency. This is the case for CENTRE where all

Figure 14 shows the execution time tpp for the calculation 9margs are empty, except for one subregion where it contains

of 9margs as a function of the subregion size (ns). As dis- the center pixel. The other extreme is FULL for which

cussed in Section 5.5, tpp consists of a linear and a quadratic the 9margs contain the maximal amount of points resulting

term in ns. As can be seen from the measurements, for in a maximal execution time for Step II. This is one reasonFULL tpp contains only the linear term, for CENTRE only why the parallel efficiency for small images is generallythe quadratic term, and in general tpp contains both terms. higher for CENTRE than for FULL. Another reason is

that the 4SSEDT algorithm is faster for FULL than for9.3. Efficiency of the Parallel Algorithm CENTRE, and a third reason is that for moderate image

sizes Step I is slowest for FULL, as Fig. 14 shows. It isThe parallel efficiency

« 5tsequ

NtN, (8)

FIG. 14. The execution time in milliseconds of calculating 9margs for

FIG. 15. The parallel efficiency as a function of the image size fora subregion containing the respective images FULL, CENTRE, BOLTS,the images CENTRE, FULL, BOLTS, and CHROMOSOMES. For theand CHROMOSOMES as a function of the subregion size, on one proces-1024 3 1024 images no value is provided for the 2 3 1 processor configura-sor of the iPSC/2.tion, because of memory limitations.


therefore clear that, for small image sizes, the parallelefficiency for FULL and CENTRE form extremes betweenwhich the parallel efficiency of other images such asBOLTS and CHROMOSOMES lies.

The parallel efficiency for the CHROMOSOMES imageon 32 (8 3 4) and 64 (8 3 8) processors is not measuredbut should be similar to the parallel efficiency obtainedfor the BOLTS image.

The parallel efficiency of the BOLTS and CHROMO-SOMES images is also influenced by the load balance ofthe algorithm, which is analyzed in the next section.

9.4. Load Balance

Load imbalance occurs when a part of the algorithmFIG. 16. I worst is shown for b 5 0.35 for a number of processor

takes more time on one processor than on the others and configurations. The second part of the curves resulting from the secondthe processors have to wait for one another. A measure formula in (10) all coincide with the solid line.for the load imbalance of a part of the algorithm is

This function is shown in Fig. 16 for a number of processorI 5

tmax 2 tav

tav , (9)configurations. Note that we have derived a similar upperlimit for load imbalance for a parallel component-labelingalgorithm [5].with tmax and tav as the maximal and average execution

A second source of load imbalance is caused by the datatimes of the part of the algorithm under investigation. Wedependence of calculating 9marg

s as discussed in Sectioncan distinguish two sources of load imbalance.9.2. This is practically unavoidable, because for most im-A first source of load imbalance is the data dependenceages at least one subregion contains a considerable amountof the 4SSEDT algorithm. From a statistical point of viewof background pixels, thus determining the execution timeit is to be expected that the load imbalance I grows withof this part of the algorithm.the number of subregions. However, we can find a hard

upper limit for the possible load imbalance caused by this10. CONCLUSIONpart of the algorithm. As we stated earlier, the execution

time of the 4SSEDT algorithm is determined by the ratioThe emergence of fast sequential implementations forof the number of background pixels to the total number

the EDT brought us to its parallelization for MIMD multi-of pixels. Assuming a subdivision of the image in equal-processors. After partitioning the image into subregionssized subregions, we obtain the worst-case load imbalance,which are assigned to the processors, we followed the di-when one subregion consists entirely of background pixelsvide-and-conquer approach. In general this approach re-and all other subregions consist of foreground pixels. Fromquires that the bulk of the operations are local to the(7) we derive that in this case the load imbalance is maxi-subregions of the image and global operations are re-mally b (P0.35).stricted to a minimum. For the EDT global operations areThe situation improves when the image contains moreneeded as the influence region of a foreground pixel maybackground pixels than one subregion can contain. If wereach outside the subregion to which that pixel belongs.know, e.g., half of the pixels of the image to be foreground,The resulting necessary data exchange between subregionsthen the worst-case load imbalance is only 0.15. In generalcan best be done in its least redundant form using Voronoiwe derivediagram concepts. This data exchange can be performedin a recursive manner.

We have given a complete description of the algorithmfollowed by a complexity analysis, which shows that the

I worst 5(N 2 1)rglobal

(1/b) 1 rglobalfor 0 # rglobal ,

1N

I worst 51 2 rglobal

(1/b) 1 rglobalfor

1N

# rglobal , 1,(10) algorithm is scalable. Finally timings on an iPSC/2 and a

model for the load imbalance are given.

ACKNOWLEDGMENTSwith rglobal as the ratio of the number of background pixelsof the image to the total number of pixels, N as the number This article presents research results of the Belgian Incentive Program

‘‘Information Technology,’’ Computer Science of the Future, initiatedof processors, and b as the parameter appearing in (7).


by the Belgian State, Prime Minister’s Service, DWTC. The scientific on an MIMD multiprocessor, CVGIP: Image Understanding 57(2),1993, 155–165.responsibility is assumed by its authors.

We wish to thank the Mathematical Sciences Section, Division of Engi- 6. F. Leymarie and M. D. Levine, Fast raster scan distance propagationneering Physics and Mathematics, Oak Ridge National Laboratory, for on the discrete rectangular lattice, CVGIP: Image Understandingletting us use their iPSC/2 machine. 55(1), 1992, 84–94.

Further, we thank the referee for his valuable comments. 7. J. Mullikin, The vector distance transform in two and three dimen-sions, CVGIP 54(6), 1992, 526–535.

REFERENCES 8. C. W. Niblack, P. B. Gibbons, and D. W. Capson, Generating skele-tons and centerlines from the distance transform, CVGIP 54(5),

1. P.-E. Danielsson, Euclidean distance mapping, CVGIP: Graphic 1992, 420–437.Models Image Process. 14, 1980, 227–248. 9. D. W. Paglieroni, Distance transforms: Properties and machine vision

2. H. Embrechts and D. Roose, Parallel distance transformation algo- applications, CVGIP 54(1), 1992, 56–74.rithms, part I: City block distance, Parallel Comput. 21, 1995, 1051– 10. F. P. Preparata and M. I. Shamos, Computational Geometry—An1076. Introduction, Texts and Monographs in Computer Science, Springer-

3. H. Embrechts and D. Roose, Parallel distance transformation algo- Verlag, Berlin/New York, 1985.rithms, Part II: Chamfer 3-4 distance, Parallel Comput. 21, 1995, 1077– 11. I. Ragnemalm, The Euclidean Distance Transform, Vol. 304, Lin-1096. koping Studies in Science and Technology, Dissertations, De-

4. H. Embrechts and D. Roose, Parallel algorithms for the distance partment of Electrical Engineering, Linkoping University, Sweden,transformation, in Parallel Processing: CONPAR 92—VAPP V (L. 1993.Bouge, M. Cosnard, Y. Robert, and D. Trystram, Eds.), pp. 571–582, 12. X. Wang and G. Bertrand, Some sequential algorithms for a general-Springer-Verlag, Berlin/New York, 1992. ized distance transformation based on Minkowski operations, IEEE

Trans. Pattern Anal. Machine Intell. 14(11), 1992, 1114–1121.5. H. Embrechts, D. Roose, and P. Wambacq, Component labelling

A Parallel Euclidean Distance Transformation Algorithm

Documents

Transcript of A Parallel Euclidean Distance Transformation Algorithm