Specialized Hardware for Real-Time Navigation

Real-Time Imaging 7, 97–108 (2001)doi:10.1006/rtim.1999.0220, available online at http://www.idealibrary.com on

Specialized Hardware for Real-TimeNavigation

In this paper we propose a specialized hardware architecture for the real time visual navigationof a mobile robot. The adopted navigation method is based on a two-steps approach.Features are extracted and matched over an image sequence which is captured by a video-

camera (mounted on a mobile robot) during its motion. As a result, a 2D motion field is recoveredand used to extract ego-motion parameters. Our hardware implements the first step of themethod, which consists of feature extraction and raw match computation by means of radiometricsimilarity computation. Real time performances are allowed since a 40 MHz processing rate isachieved.

# 2001 Academic Press

F. Marino1, E. Stella2;*, A. Branca2, N. Veneziani2 and A. Distante2

1Dipartimento di Ingegneria Elettrotecnica ed Elettronica, Politecnico di Bari,Via Re David 200, 70125 Bari, Italy

2Istituto Elaborazione Segnali ed Immagini - C.N.R.Via Amendola 166/5, 70126 Bari, Italy

Introduction

Navigation is the capability of a mobile robot of safetymoving in the environment, avoiding obstacles. In sucha contest, an important role is played by artificial vision.In fact, a significant amount of useful information onthree-dimensional motion can be obtained by imagesequences which result from the movement both of avideo camera and of 3D objects. We refer to as passivenavigation the ability of an autonomous agent todetermine its motion with respect to the environment.Ego-motion parameters for passive navigation areefficiently recovered by means of a displacement vectorfield analysis. Such vector field represents correspon-dences of two-dimensional features (extracted in succes-sive images of a sequence) with 3D features in the space.A small number of such displacement vectors on theimage plane is sufficient to obtain useful informations on

*Corresponding author. E. Stella,e-mail [email protected].

1077-2014/01/020097+12 $35.00/0

ego-motion parameters. In literature, two frameworksseem to approach the ‘finding correspondences’ (match-ing) problem: direct and optimization methods. Bothframeworks consider a low level step consisting offeature extraction from images. Then, direct methodsuse local constraints in order to find correspondences [1,2, 3]. Conversely, the optimization methods use globalconstraints to formulate an energy or cost function anddetect correspondences by minimizing this function.Such minimization is generally achieved using iterativetechniques. While the direct methods are fast, but moresensitive to the noise, the optimization-based techniquesare more reliable, but require burdensome processing.

In [4] we described a two-step algorithm for passivenavigation. The effectiveness of this algorithm wassuccessfully and largely verified by means of software-based autonomous motion tests. This method computesthe heading direction from a displacement vector fielddetermined through a feature based approach. Itextracts features in an image and performs a two-step

# 2001 Academic Press

98 F.MARINOETAL.

matching in successive images of the acquired sequence:the first step detects raw correspondences using astandard correlation based technique; the second steprefines the raw matches by minimizing an energyfunction. A performance analysis indicated the firststep as the phase requiring the most burdensomeprocessing. In this paper, we propose a specializedhardware which speeds-up the first step of matching,allowing the ability to perform passive navigationin real-time. This hardware has been designed keepingin mind three specific constraints: modularity ofhardware structures; easiness of the algorithm imple-mentation; and a data flow approach enabling toavoid the use of complex software programs inexploiting the hardware computational power. In fact,it often happens that devices designed for high nominalperformances have many degrees of internal parallelism.On the other hand, a full and efficient exploitationof such devices requires expert software developersand presents increasing difficulties such as the stagesand the number of arithmetic units such as the numberof control lines.

As an example of this, before approaching the designof the hardware solution proposed in this paper, weverified the possibility of using commercially availablehardware. We have tested the performance of theMAXPCI architecture provided by Datacube Inc. Thisarchitecture consists of a set of hardware modules(convolver, histogrammer, warper, etc.), organized as apipeline processor on PCI bus, oriented to reach a speedof 50Mhz pixel processing. Besides the high cost and thelack of a friendly programming, the MAXPCI efficientlyperforms image based operations. Nevertheless, it is notdesigned for pixel based operations (e.g., search a localmaximum). This implied that our approach, implemen-ted on MAXPCI architecture, allowed real-time perfor-mance only when the vehicle moved at low speeds (10cm/s,) while, for being attractive, practical applicationsoften require higher speed (e.g., 1 m/s). Therefore,neglecting sophisticated solutions, we tried to map thealgorithm described in [4] by means of a simplehardware architecture, based on the use of several LookUp Tables (LUTs).

Note that in our application, independently from thearithmetic unit (LUT-based or Full-Adder based),memory access represents the throughput limiting factorof the whole architecture, supposing it is pipelined. Infact, the data to be processed at any clock cycle are thosegrabbed in a frame memory by the video camera.Therefore, even if FA-based arithmetic units can be in

general faster than the employed LUT-based units, theycannot speed up the pipelined computation of theexamined case, which is bounded by the latency ofthese memories. In other words, replacing the proposedLUT-based units with faster units does not produceany improvement in terms of computing power,since the throughput of the pipe remains boundedby the latency of the grabbing frame memory (whichis assumed not to be shorter than the latency of theLUTs).

As a consequence, we chose the LUT-based approachfor the arithmetic units, since it allowed both a fasterdevelopment of the whole architecture, and an easierpipelining for high-speed processing [8]. In our archi-tecture, operands are used to address the specificarithmetic LUT storing the related results, which canbe directly achieved in a single access. The use of LUT’sis already been exploited in DSP applications (e.g., [7]).We adopted Residue Number System (RNS): LUTsimplementing RNS-based arithmetic have dimensionswhich are reduced with respect to those of LUTs basedon the conventional arithmetic [6, 7, 8]. A more compactFA-based device could be object of further research.

Moreover, the goal of low complexity for the resultinghardware was reached also by means of a short dynamicrange for the input data. By a software simulationcarried out on a general-purpose computer, we haveverified that the algorithm described in [4] can alsosuccessfully operate on images with a quantization of 5bits/pixel. Therefore, few bits down to the usual imagequantization of one byte/pixel induced a great benefit interms of memory amount needed by the LUTs. After theminimum quantization level needed by the algorithmwas derived by means of software simulation, thehardware design and its CAD implementation havebeen performed with the aim of answering the followingquestions. Is it really possible to develop a complexalgorithm in its entirety by using the above mentionedapproaches? What is the processing rate achievable byLUT based structures, using current on-shelf technol-ogy? How large and expensive could this specializedhardware be when it is developed under the describedconstraints?

The paper is organized as follows. In the followingsection the whole heading estimation technique isdescribed. Next, we present a high level description ofthe hardware architecture. Implementation details andperformance analysis are provided in the next section.Some experimental results are then presented.

REAL-TIMENAVIGATION 99

Image Motion Estimation

The heading algorithm is based on estimation ofcorrespondences among features which are extractedfrom successive images. These correspondences aredetected imposing the cross ratio invariance and areestimated only for features of ‘‘high’’ interest such ascorners or edges. Such ‘‘high interest’’ features areselected using the ‘‘interest operator’’ introduced byMoravec, which isolates points having minimal auto-correlation values [2]. The method works in two stages:

1. It computes the SAD (Sum of Absolute Differences)between neighbouring pixels in four directions(vertical, horizontal and two diagonals). Such SADsare computed over a window ðq�qÞ using thefollowing equations:horizontal (08)

Hðx; yÞ ¼Xÿ1þðqÿ1Þ=2

i¼ÿðqÿ1Þ=2k Iðxþ i; yÞ ÿ Iðxþ i þ 1; yÞ k ð1Þ

vertical (908)

Vðx; yÞ ¼Xÿ1þðqÿ1Þ=2

i¼ÿðqÿ1Þ=2k Iðx; yþ iÞ ÿ Iðx; yþ i þ 1Þ k ð2Þ

diagonal 1 (+458)

D1ðx; yÞ ¼Xÿ1þðqÿ1Þ=2

i¼ÿðqÿ1Þ=2kIðxþ i; yÿ iÞ

ÿ Iðxþ i þ 1; yÿ i ÿ 1Þk

ð3Þ

diagonal 2 (7458)

D2ðx; yÞ ¼Xÿ1þðqÿ1Þ=2

i¼ÿðqÿ1Þ=2kIðxþ i; yþ iÞ

ÿ Iðxþ i þ 1; yþ i þ 1Þk

ð4Þ

where, Iðx; yÞ is the brightness function computed onthe pixel ðx; yÞ. The smallest value of V (.), H (.), D1 (.),and D2(.) is called interest operator value and isconsidered the variance for pixel ðx; yÞ.

2. It chooses as high interest features the pixels wherethe interest operator values are local maxima.

Once N high variance features pi ¼ ðxi; yiÞ areextracted in the first image, the best possible candidate‘‘matching features’’ (matches) qi ¼ ðxi; yiÞ are selectedin the second image using a correlation based measure

(radiometric similarity): given pi ¼ ðxi; yiÞ in the firstimage, we select qi ¼ ðxi; yiÞ in the second image whichminimizes the SAD:

SADðxj ; yjÞ ¼Xk¼wk¼ÿw

Xl¼wt¼ÿwk Iðxi þ k; yi þ lÞ

ÿ Iðxj þ k; yj þ lÞ kð5Þ

In Eqn (5), w is the size of the square window on whichthe SAD is evaluated, I and I represent the imagebrightness functions associated with the first and secondimage, respectively. The points having the smallest SADare considered ‘‘raw matches’’.

In actual fact, since we compute correspondencesamong regions, false matches are unavoidable: thecorrect match point can mismatch the center of thehighest correlated window. Also, several candidatematches can give the same value of SAD. Therefore,we use the raw matches computed by means ofcorrelation only as an initial guess to be refined by anoptimization approach which is based on the cross-ratioinvariance.

The cross-ratio is the most popular geometricinvariant of four collinear points or five coplanar points,since it is the simplest numerical property of an objectthat is unchanged under perspective projection.

Five coplanar points Q ¼ ðp1; p2; p3; p4; p5Þ have thefamiliar cross ratio CRðQÞ as their projective invariant:

CRðQÞ ¼ sinða13Þ � sinða24Þsinða23Þ � sinða14Þ

ð6Þ

where sinðaijÞ is the sin of angle pip5pj:

We use the geometric invariance of cross-ratio ofcoplanar points both to verify the goodness of matchesestimated by correlation similarity and to correct allmismatches. The idea is based on the assumption thatplanar surfaces in indoor environments are frequent(e.g., walls, tables). Moreover, this assumption is not alimitation. Since this constraint can be satisfied only ifthe considered features are coplanar, we use a globaloptimization process that both tries to satisfy the crossratio similarity and takes into account the radiometricsimilarity of the features. For each subset of fivecoplanar points Pijklm ¼ fpi; pj ; pk; pl; pmg in the firstimage, the corresponding points in the second imageQijklm ¼ fqi; qj ; qk; ql; qmg should have the same crossratio. The used method takes advantage by consideringmany intersecting subsets of five points obtained as

100 F.MARINOETAL.

combinations of available sparse features. If the featuresare on different planes, the minimization approachconstraints only coplanar features to influence them-selves, producing subsets of non-coplanar features.Information is cooperatively propagated among subsets,and, due to the imposed radiometric similarity, matchesamong non-coplanar features are avoided.

In addition, we propose to solve the correspondenceproblem by minimizing the sum of all differencesbetween the cross ratio computed for each subset offive features in the first image and the cross ratiocomputed for the corresponding points in the secondimage. The derived energy function which has to beminimized can be formalized as:

E ¼XN Feat

i¼1

XN Feat

j¼iþ1

XN Feat

k¼kþ1

XN Feat

l¼lþ1

XN Feat

m¼mþ1CRðPijklmÞ

ÿCRðQijklmÞ 2 1

DðPijklmÞð1ÿ RiÞ; ð7Þ

where

. CRðPijklmÞ and CRðQijlkmÞ denote the cross ratiofunctions estimated respectively in the subset of five

Figure 1. Functional scheme of the proposed architecture (w¼ 6

coplanar points Pijklm in the first image, and in thesubset of corresponding points Qijklm in the secondimage.

. DðPijklmÞ denotes the Euclidean distance among thepoints fpi; pj ; pk; pl; pmg in the first image. Theintroduction of this factor is motivated since nearfeatures have higher probability of being coplanar.

. The term Ri imposes that corresponding features inthe first and in the second image must have aradiometric similarity.

The proposed approach converges to the desiredcorrespondence points of the subset Qijklm by imple-menting gradient descent along the EðQijklmÞ surface,which expresses the quadratic cost function’s depen-dency on all of the points of Qijklm. The input data arecorrespondences estimated for sparse features (charac-terized by an high directional variance) extracted by theMoravec’s interest operator [2] in the image acquired attime t. Optimal correspondences are estimated byiteratively updating the raw corresponding features inthe second image. To reduce the processing time, theexpensive steps consisting of high variance featuresextraction and raw match computation, are directlyperformed by a LUTs-based hardware.

, q¼ 7).


Hardware Architecture High-Level Description

A logical scheme of the architecture [5] is shown inFigure 1.

At the ith step of a sequence, an image of ðDIM�DIMÞ pixels is acquired, and:

. N features are extracted in order to be matched overthe image which has to be acquired at the (iþ1)th step;

. N corresponding points (one for each feature whichwas extracted over the image acquired at the (i71)th

step) are detected.

The Computing Blocks performing the above opera-tions are the ‘‘Interest Block’’ and the ‘‘CorrespondenceBlock’’ respectively. These blocks can operate inparallel, since features of extraction and match compu-tation are independent tasks.

The pixels of the image, provided by an externalframe grabber, flow into a pair of interleaved frame

Figure 2. Interest Block. It is mainly composed by: (a) a windowfour Blocks (V, H, D1 and D2) computing respectively the vertical,the variances in order to select the smallest value (the ‘‘5Block’’detail in Figure 4) which compares the Interest value with the currethe register MAX (contextually it updates the register CKMAX w

memories through a pipe of S shift registersðS ¼ maxðq;wÞ, where q and w are the sizes of thesquare windows used for computing, respectively, thedirectional variances and the SAD). S71 of these shiftregisters have size¼DIM, whilst the last one hassize¼S.

When the frame memory 0 is full (filled by the ith

image), the frame memory 1 is enabled in writing inorder to store the ði þ 1Þthimage, and so on. Both theimages are needed to be stored, since while theextraction of features in the current image can beperformed at the same time as the data are flowingthrough the shift registers, the match computation alsoneeds the previously acquired image.

The Interest Block

The interest Block computes the directional variancesamong neighboring pixels in four directions in a q� qwindow (Eqns (1)–(4)), selects the smallest value as the‘‘interest value’’ and associates it to the central pixel.

(on the left side) coming from the rows of the shift registers; (b)horizontal and the two diagonal variances; (c) a tree comparingis shown in detail in Figure 3); (d) a terminal block (shown innt maximum in order to detect the biggest one, and to store it inith the new address).

Table 1. Quantity, type, function and collocation of the LUTs required by the Interest Block

# Type Function, Colloction

664 1K-word, 5-bits/word absolute difference (AD) in eqn (1)–(4), first level664 32-words, 4-bits/word Binary to RNS conversion (m2), second level564 1K-word, 5-bits/word sum in eqn (1)–(4), other levels (m1)564 256-words, 4-bits/word sum in eqn (1)–(4), other levels (m2)3þ1 1K-word, 5-bits/word subtraction (m1), tree of comparison3þ1 256-words, 4-bits/word subtraction (m2), tree of comparison3þ1 512-words, 1-bit/word RNS to sign conversion (comparison)

Figure 3. The ‘‘5Block’’ used to compare the variances inorder to detect the minimum one (the Interest value).

102 F.MARINOETAL.

Because the main purpose of the interest block is toestimate a set of N features to be matched over the nextimage, the acquired image is partitioned in N regions(each one having DIM6DIM/N pixels). For eachregion, a local maximum among all the interest valuesis evaluated, and the address of the related pixel is storedin the register Vi (where i=0, 1 depends on the currentframe memory) in order to recover the feature and tomatch it over the next image (Figure 2).

When the kthðk ¼ 0; . . . ;N ÿ 1Þ region has to beprocessed, each estimated interest value is comparedwith the current maximum (which is stored in the MAXregister). The interest value will update the currentmaximum if it results higher. Contextually, the value ofa ShiftCounter (i.e., v) will be written in the registerCKMAX. The value v is directly related to thecoordinates ðx; yÞ of the new maximum since:

x ¼ ðvÞdivðDIMÞ ð8Þ

y ¼ ðvÞmodðDIMÞ ð9Þ

where ðaÞdivðbÞ and ðaÞmodðbÞ denote respectively theinteger value and the residue of b/a.

In practical cases, DIM ¼ 2i, and equations (8) and(9) can be trivially solved considering the binary wordcodifying v as the join of y and x (each one having ibits). Moreover, in these cases, v represents the addressof the current maximum in the frame memory.

Finally, when the kth region has been completelyprocessed, the value stored in the register CKMAX is fedinto the kth cell of the Register Viði ¼ 0; 1 depending onthe current frame memory), and the register MAX isreset in order to correctly begin the analysis of theðkþ 1Þth region.

The above described interaction between the comput-ing blocks and the frame memories could seem complexbut it is really simple: when the frame memory 1 is

storing the ði þ 1Þth image, the Interest Block isevaluating the N features on the ði þ 1Þth image andstoring their addresses in the register V1. Simulta-neously, the Correspondence Block is evaluating, overthe ði þ 1Þth image, the candidate matches of featureswhich were extracted in the ith image and whoseaddresses are stored in the register V0 (for this reason,memory of the ith image must be held in the framememory 0).

The Correspondence Block

The Correspondence Block (Figure 5) computes theSAD (according to Eqn (5)) among the features ðx; yÞextracted from the previous image and the data ðx; yÞflowing in the pipe of Shift Registers. At each clockcycle, the pixels of the presently acquired image areshifted through the pipe, and a new SAD computed. Thecurrent SAD is compared with the current minimum(stored in a MIN register), and if it results smaller, it iswritten in the MIN register. Contextually, the value v ofthe the ShiftCounter is saved in the CKMIN register. Inthis way, when the image has been completely processed,the CKMIN register will indicate the value v whichdirectly gives the coordinates (see Eqns (8) and (9)) of

Figure 4. The Block comparing the estimated SAD with thecurrent minimum SAD.


the point having the best matching (the minimum error).This value represents the output of each Correspon-dence Block.

Though the architecture could have N Correspon-dence Blocks, all of them working in parallel to match atthe same time the features, only one CorrespondenceBlock is really necessary. In fact, we can reasonablyassume that two consecutive images are grabbed after asmall motion of the camera. Therefore, if we considerimages subdivided in regions having size (NK6DIM),with ðNK � DIMÞ, the probability that the ‘‘highinterest’’ feature of the kth region of the first image hasits match in the kth region of the second image is high.As a result, a single Correspondence Block can berecursively used to find the matches for each one of Nfeatures. Note that, if the feature is located close to theboundary of the region, the above assumption might be

Table 2. Quantity, type, function and collocation of the LUTs re

# Type

32 1K-word, 5-bits/word32 32-words, 7-bits/word62 1K-word, 5-bits/word31 16-words, 2-bits/word2 1K-word, 5-bits/word1 16-words, 2-bits/word1 4K-words, 1-bit/word

not verified, nevertheless, the loss of some feature is nota problem for the heading estimation.

Implementation Details and Performance Analysis

In order to reduce the dimensions of the LUTs mappingthe arithmetic tables, we found the lowest dynamic forthe pixels of the acquired images which can be adoptedwithout a decrease in accuracy. The algorithm in [4] wastested using different quantizations for the input images.These experiments showed that a 5-bit quantization forthe image data does not limit the performance in termsof accuracy, while also producing great benefits inhardware design as well as computing time. In addition,the Residue Number System (RNS, Appendix A) hasbeen chosen to reduce the memory requirement and toachieve a suitable speed-up. Table 1 shows the amountof required memory to implement the Interest Block.

Such a Block firstly computes (q71) absolutedifferences (ADs) among adjacent pixels according toEqns (1)–(4), along four directions. In order to performthese operations by means of LUTs, we employmemories as specified in row one of Table 1. Becauseof the absolute value, the input as well as the outputdynamic range is [0–431]. Therefore, the two 5-bitinputs play the role of a 10-bit address in a LUT storing5-bit words.

The resulting ADs should be added according to theexternal sum in Eqns (1)–(4) in order to achieve V(.),H(.), D1(.), and D2(.). Therefore, the dynamic range[0–431] is expanded to [0–4186]. Finally, these valueshave to be subtracted from each other to detect theminimum value in the tree of comparison. In such a step,the dynamic range becomes [(–186)–4(186)]. This rangeshould require 8.53 bits. Therefore, we introduce aconversion from binary to a (5þ4)-bits RNS (see

quired by a Correspondence Block

Function, Collocation

pixel-to-pixel AD, eq. (5), first levelBinary to RNS conversion (m2 and m3), second levelsum, other levels (m1 and m2) eq. (5)sum, other levels (m3) eq. (5),subtraction (m1 and m2), comparisonsubtraction (m3), comparisonRNS to sign conversion (comparison)

Figure 5. The Correspondence Block. It is mainly composedby: (a) two windows (w¼ 6) taken respectively over the siftregisters and over the frame memory storing the previouslyacquired image; (b) a tree computing the SAD between thesewindows; (c) a terminal block comparing the current SAD withthe current minimum in order to detect the smallest one, and tostore it in the register MIN (contextually it updates the registerCKMIN with the new address).

104 F.MARINOETAL.

Appendix A) S0=m1=32 (5-bits modulus), m2=15(4-bit modulus) of the ADs computed in the first level.

The conversion of a binary 5-bit data x to S0 isimplemented by a LUT (row two of Table 1, whichmaps x–4 xj jm2

only, since the mapping x–4 xj jm1is not

necessary, because xj j32¼ x.

The external sums required by Eqns (1)–(4) areperformed in S0 by the LUTs described in the rows 3and 4 of Table 1, respectively, concerning m1 and m2.

Table 3. Data sheets of the LUTs required by the Interest Block.Cells-based technology (ES2 Library)

# Type tacc [ns] tcyc [ns] siz

48 1K-word, 5-bit/word 12.40 24.7524 32-word, 4-bit/word 12.09 22.0724 256-word, 4-bit/word 12.86 23.014 512-word, 1-bit/word 12.42 22.52

The interest value is detected by three comparatorsselecting the minimum value among V(.), H(.), D1(.),and D2(.). Such an interest value is therefore comparedwith the current maximum as described previously.Therefore, 3þ1 comparators are needed. They computethe RNS difference between a couple of data to becompared (row five and six of Table 1). The sign of theresult is therefore derived by a LUT having (5þ 4) bitsaddresses (row seven of Table 1) and used to select theoutput of the comparator, as shown in Figure 3. Notethat the full RNS/binary conversion is not needed.

Similarly, Table 2 shows the amount of requiredmemory to implement the Correspondence Block. Sucha Block, computes 32 ADS among corresponding pixelsbelonging to the windows shown in Figure 5, accordingto eqn (5) (first level). The required memories arespecified in row one of Table 2. Because of the absolutevalue, the input as well as the output dynamic range is[0–431]. Therefore, the two 5-bits inputs play the roleof a 10-bit address in a LUT storing 5-bits words.

The computed ADs should be added according to theexternal sum in eqn (5) in order to achieve the SAD.Therefore, the dynamic range [0-7gt;31] is expanded to[0–4992]. The SAD has to be subtracted to the actualminimum SAD in order to detect the ‘‘raw match’’. Inthis step, the dynamic range becomes [(–992)–4992],and it should require 10.95 bits. Therefore, we introducea conversion from binary to a ð5þ 5þ 2Þ-bits RNSS00=m1=32 (5-bits modulus), m2=31 (5-bits modulus),m3=3 (2-bits modulus) of the ADs computed in the firstlevel.

The conversion of a binary 5-bits data x to S00 isimplemented by a LUT (row 2 of Table 1, which mapsx–4 xj jm2

; xj jm3

� �, being xj j32¼ x.

The external sums required by Eqn (5) are performedin S00 by the LUTs described in rows three (for what

They have been compiled as ROM in 0.7 mm CMOS Standard

e [mm]6[mm] area [mm2] #6area [mm2]

0.7760.58 0.443 21.2640.4760.37 0.174 4.1760.5460.46 0.250 6.0000.5460.40 0.218 0.872

TOTAL AREA 32.312Working freq. [MHz] 40.4

Table 4. Data sheets of the LUTs required by the Correspondence Block. They have been compiled as ROM in 0.7 mm CMOSStandard Cells-based technology (ES2 Library)

# Type tacc [ns] tcyc [ns] size [mm]6[mm] area [mm2] #6area [mm2]

96 1K-word, 5-bit/word 12.40 24.75 0.7760.58 0.443 42.52832 32-word, 7-bit/word 12.17 22.32 0.5260.37 0.196 6.27232 16-word, 2-bit/word 11.93 21.79 0.4360.36 0.154 4.9281 4K-word, 1-bit/word 13.95 24.40 0.6960.58 0.399 0.399

TOTAL AREA 54.127Working freq. [MHz] 40.4

Figure 6. The images were acquired while the vehicle is translating on a rectilinear path. The distance between the two frames is500mm. Estimated FOE: x¼710, y¼ 15.


Figure 7. The images were acquired while the vehicle is moving on a curvilinear path. The distance between the two frames is500mm; the rotation angle is 1 degree. Estimated FOE: x¼ 126, y¼71.

106 F.MARINOETAL.

concerns m1 and m2) and four (for what concerns m3)of Table 1. The smallest SAD is detected comparingthe SAD with current minimum as described in aprevious section. The required comparator computesthe RNS difference (row 5 and 6 of Table 1). The sign ofthe result is therefore derived by a LUT havingð5þ 5þ 2Þ-bit addresses (row seven of Table 1) andused to select the output of the comparator. Note thatalso in this Block, the full RNS/binary conversion is notneeded.

The LUTs resumed in Table 1 and Table 2 have beenimplemented by means of Read Only Memories(ROMs) in 0.7 mm CMOS Standard Cells-based

technology (ES2 Library). For convenience, their datasheets are provided in Table 3 and in Table 4.

Since the whole architecture is pipelined, it canwork using the working frequency of the slowestLUT. Therefore, it is able to process sequences ofimages at a rate of 40Mpixels/s. Such performancesare fully satisfactory, since a 5126512 TV frame at50 Hz has a rate of 13M pixels/s. The describedhardware is not yet realized, but the whole archi-tecture has been simulated by software, and all theassumptions have been verified. The hardwaredesign has been performed using CADENCE envi-ronment and all performances have been tested

Figure 8. (a) Starting image of the sequence (extracted features are denoted by a white ‘‘þ’’); (b) second image of the sequence:matching features computed by means of correlation; (c) second image of the sequence: white ‘‘þ’’ denote features which have beencorrected after the optimization; (d) optical flow estimated after matching with correlation; (e) matches selected from the flow in(d) by means of the global minimization correction.


using CADENCE internal tools. The chip realizationis at masterization phase.

Experimental Results

Tests have been performed on image sequences acquiredin our laboratory by a TV camera mounted on a pan-tilthead which is installed on our vehicle SAURO. Thefocal length of the TV camera was 6mm.

Performances of the heading estimation algorithm areshown in Figures 6 and 7. Features extracted in the firstimage were successfully matched in the second one.Match results are shown in terms of the displacementvectorial field. The small black square in the figuresrepresents the Focus of Expansion (FOE) position.Images in Figure 6 were acquired while the vehicle wasperforming a forward translational motion. Images inFigure 7 were originated by a curvilinear motion(forward translation combined with a rotation). Figures8(a), 8(b) and 8(c) show three images of an outdoorsequence. Also in this case, all the features were

successfully matched. The displacements fields shownin Figure 8(d) and 8(e) are, respectively, the results ofthe raw matching step (performed by the proposedalgorithm) and the result of the global minimizationcorrection. The above examples have been shown inorder to give an idea of the typology of images that canbe considered in autonomous navigation.

Conclusions

A VLSI architecture enabling both to select a set offeatures from an image and to match them over animage sequence has been described. Both extraction andmatching steps are independently performed on eachacquired frame.

The proposed architecture is able to process sequencesof images at a rate of about 40M pixel/s. This computingpower has been essentially reached because of the use ofLook Up Tables, whose sizes have been maintainedlimited by adopting Residue Number System. In fact, allthe Look Up Tables needed by both the Interest Block

108 F.MARINOETAL.

(performing the extraction of the features) and theCorrespondence Block (detecting the correct matchesfor the extracted features) can be integrated in amedium size chip implemented in 0.7 mm CMOSStandard Cells-based technology (ES2 Library). Studyand design of the described hardware is motivated bythe need of realtime image processing for passivenavigation tasks of our mobile robot SAURO. As soonas hardware becomes available, it will be tested onSAURO architecture.

References

1. Ayache, N. (1991) Artificial Vision for Mobile Robots,MIT Press, Boston, U.S.A.

2. Moravec, H.P. (1983) The Stanford Cart and the CMURover. Proc. IEEE 7(71), pp. 872–882.

3. Marr, D. & Poggio, T. (1979) A computational theory ofhuman stereo vision. Proc. of Royal Society of LondonB 204: 301–328.

4. Branca, A. Stella, E. & Distante, A. (1996) PassiveNavigation using Focus of Expansion. Proceedingsof Workshop on Applications in Computer Vision, Florida,U.S.A. pp. 64–69.

5. Marino, F. Stella, E. Veneziani, N. Distante, A. (1997)Real Time Hardware Architecture for Visual RobotNavigation. Lecture Notes in Computer Science-Proceed-ings of ICIAP’97 1311: 93–100. Springer, Berlin.

6. Shenoy, P. & Kumaresen, R. (1988) Residue to BinaryConversion for RNS Arithmetic Using Only ModularLook-up Tables’. IEEE Trans. Circuits and Systems CAS-35(19): 1158–1162.

7. Bayoumi, M.A. Jullien, G.A. & Miller, W.C. (1987) ALook-Up Table VLSI Design Methodology for RNSStructures Used in DSP Applications. IEEE Trans.Circuits and Systems CAS-34(6): 604–615.

8. Jullien, G.A. (1978) Residue Number Scaling and OtherOperations Using ROM Arrays. IEEE Trans. On Compu-ters, C-27(4): 325–337.

Appendix: The Residue Number System (RNS)

An RNS is constituted by n pair-wise prime modulim1;m2; . . . ;mnÿ1;mn.

In this system, any integer X in the range½ÿM=2;M=2�, where

M ¼Yni¼1

mi ð10Þ

will be expressed as:

X � x1; x2; . . . ; xi; . . . ; xnÿ1; xn; ð11Þwhere

xi ¼ jX jmið12Þ

and the weighted representation corresponding to X canbe obtained by means of the Chinese RemainderTheorem as:

X ¼Xni¼1

Mi

�� xiMi mi

�� M

ð13Þ

Where Mi ¼ Mmi, and j 1

Mijmi

indicates the multiplicativeinverse of Mi. In RNS, the multiplication, addition orsubtraction are performed independently on eachmodulo. Therefore, if X and Y are two numbersrepresented in RNS, their sum or product Z is given by:

Z � ½ÿM=2;M=2� � z1; z2; . . . ; znÿ1; zn

� ðXÞOPðYÞð14Þ

where ðaÞOPðbÞ denotes a � b; aÿ b; or aþ b: In RNSeach zi may be computed as

zi ¼ jðxiÞOPðyiÞjmið15Þ

Since operations among different modules are inde-pendent in RNS, carry propagations are stronglyreduced. As a consequence, small LUTs can realizecomplete arithmetic tables. For instance, in the System:

S� � m1 ¼ 32;m2 ¼ 31;m3 ¼ 29; ð16Þwe have M� ¼ 28768 ¼ 214:81; which means that threeindependent and parallel operations, each one needing 5bits, can substitute one operation needing almost 15bits. Moreover, for short values of mi, the RNS basedprocessors can be implemented by means of simple andmodular Look Up Tables [6].

Specialized Hardware for Real-Time Navigation

Documents

Transcript of Specialized Hardware for Real-Time Navigation