Experiences on developing computer vision hardware algorithms using Xilinx system generator
Transcript of Experiences on developing computer vision hardware algorithms using Xilinx system generator
ffiELSEVIER
Available online at www.sciencedirect.com
scrENcE @"rREcro
Microprocessors and Microsystems 29 (2005) 411419www.elsevier.com/locate/micpro
Experiences on developing computer vision hardwarealgorithms using Xilinx system generator
Ana Toledo Moreo, Pedro Navarro Lorente, F. Soto Valles,Juan Suardíaz Muro*, Carlos Fernández Andrés
División de Sistemas e Ingeniería Electrónica (DSIE), universidad Potitécnica de Cartagena, Campus Muralla del Mar, s/n. Cartag.ena E-30202, Spain
Received 20 October 2004; revised 8 November 2004; accepted 22 November 2004
Available online 2l December 2005
Abstract
Computer vision is one of the areas where hardware-implemented algorithms perform clearly better than those implemented via softwa¡e.Digital designers have so far optimized their designs by means of application specific integrated circuits (ASICs) or digital signal processors(DSPs). However, nowadays they are increasingly using field programmable gate arrays (FPGAs), powerful hardware devices combining themain advantages of ASICs and DSPs with the possibility of re-programming, which make them very attractive devices for rapid prototyping.This paper shows how the Xilinx system generator (XSG) envi¡onment can be used to develop hardware-based computer vision algorithmsfrom a system level approach, which makes it suitable for developing co-design environments.A 2004 Elsevier B.V. All rights reserved.
Keywords: Computer vision; Co-design environments; FPGA-base<i hardware implementation; Matlab; Xilinx system generator
MrcrcPROCESSORS AND
MICROSYSTEMS
1. Introduction
In the current computer vision systems, video camerasor some similar devices are used to transform incidentlight into data information that can be processed. Asampling process performed over a rectangular gridprovides a picture matrix, the elements of which arerelated to that incident light [9,31]. Each element of thisimage matrix is called a 'picture element' or 'pixel', andits associated numeric value comes from a quantizationprocess where incident energy is coded by using a givennumber of bits.
Grey-Level images are very common in image proces-sing. These types of images use eight bits to code each pixelvalue, which results in 256 different possible shades ofgrey,ranging from 0 (black value) to 255 (white value). Fig. 1
shows an example of this kind of representation.
* Corresponding author. Tel.: +34 68 32 53 80; fax: +34 68 32 53 45.E-mnil addresses.' [email protected] (A.T. Moreo), pedroj.navarro
@upct.es (P.N. Lorente), [email protected] (F.S. Valles), [email protected] (J.S. Muro), [email protected] (C.F. Andrés).
0l4l-9331/$ - see front matter @ 2004 Elsevier B.V. Atl rights reserved.doi: 10. I 016/j.micpro.2004. t 1.002
An image processing operation is commonly dividedinto four phases [3,9,22,3 l]: p r e - p roc e s s ing, s e g ment ation,
feature extraction and recognition. The initial pre-processing phase seeks to enhance the quality of theoriginal image by removing from the acquisition processsuch interfering elements as noise, unbalanced brightness,etc. This is followed by segmentation, a process duringwhich images are separated from what could beconsidered the background into various elements withsimilar properties. Then feature extraction is performedon every detected object to reduce its information to a listof parameters or attributes (feature vectors) that areusually stored in memory. Finally, during Lhe recognitionstage this list is used to generate a set of signals thatconstitute the upper level of processing, where a givenmeaning is assigned to every detected object. Forexample, a recognized string of digits could be associatedwith a telephone number, zip code, etc. depending on thetype of processed information.
From previous lines it can be inferred that it is usuallyalong the processing stages when a huge amount of data has
to be processed using simple and repetitive calculations,while subsequent steps will demand less data, but they will
A.T. Moreo et al. / Microprocessors and Microsystems 29 (2005) 411419
Fig. 1. Grey-scale image representation
require much more complex algorithms to deal with allthese available data.
Designers of computer vision systems designers have ahistorical tendency to implement all these initial processing
stages by using hardware devices-mainly ASICs-like the
NOA processor used by Matrox in the Genesis Processing
Card [21]. However, more advanced stages-especiallythose of recognition-present complex computationalrequirements and they are therefore better implementedvia software routines.
For that reason, our current work focuses on developingprocessing algorithms by means of programmable logicaldevices. Field programmable gate affays (FPGAs) have
become increasingly popular because recent trends indicatea faster growth of its transistor density than even general-
purpose processors [6]. These electrical programmable
elements can be used as basic electronic devices toimplement low-level operations-where calculations are
simple, but a huge number of them are required-but they
also have potential enough to implement much more
complex systems dedicated to perform excessive time-consuming computation processes. Therefore, high logic
density plus and 'in system programmability' (ISP) offerdesigners a low cost and a powerful hardware tool for a
VLSI implementation of circuits with short lead times,
which turn FPGAs into a very useful design platform forrapid prototy ping U,16,23,301.
2. Main contributions of Xilinx system generator
When a system for quick complex computations isrequired, designers usually opt for hardware rather than
software based implementation, so an application specific
integrated circuit (ASIC), an application specif,c standardproduct (ASSP) or a digital signal processor (DSP) is
eventually chosen as the final implementation platform [10,19,261.
FPGA accelerators are very popular among the signal orimage processing community. Today's programmable logic
devices, especially FPGAs, present logical capacity and
potential performance enough to implement efflcientalgorithms that had hitherto been confined to the former.
Latest generations FPGAs compute more than 160 billionmultiplication and accumulation (MAC) operations per
second. These types of calculations are mainly presented as
a repetitive and typical spatial convolution computation in a
large number of computer vision algorithms. Such calcu-
lation usually requires image exploration with a convolution
mask in order to generate an output for each image pixel
consisting in a linear combination (multiplication and
additions) of the grey-scale values of its surrounding pixels
(Fie. 2).
In short, FPGA-based configurable computing machines
(CCMs) have emerged as an attractive alternative to high-
performance embedded computing. CCMs retain a general-
purpose nature, yet can be configured to offer a degree ofperformance which rivals application-specific hardware [6].A single CCM platform can be quickly reconfigured to
implement such a vast spectrum of different operations as
convolutions t37), a two-dimensional Fourier transform
[36], or even control-driven processing. CCMs are typically
characterized by high-capacity data paths and program-
mable interconnections among processing elements (PEs).
Historical CCM examples are Splash 2 [2], SONIC [11],PARTS [38] and Spyder [4]. Other interesting applications
in computer vision and image processing have already been
used to accelerate real-time point tracking Í291, real-time
image processing [15], medical image processing [41], 3D
Response: R = l¡r,' z, +tu¡'e¡ + ... +lf¡ü' E¡
{Ut lnt: !11¡
I,fil¡ Ws lñlE
Wr lik UI,
Fig. 2. Response of convolution masks
A.T. Moreo et al. / Microprocessors and Microsystems 29 (2005) 4ll-419 413
processing [24), video surveillance [17], patter recognition[20] and neural networks [8]. Future seems to be promisingfor this type of devices: new FPGAs families arisefrequently and they can also take advantage ofthe presenceof the intellectual property (IP) modules, consisting of a setof optimized hardware libraries which implement complexprocedures like Fourier transforms, FIR filters or evenmicrocontrollers and PCI bus controllers.
Although development tools for FPGA-based designsare now very powerful, the task of describing a complexsystem, which should be hardware implemented is still avery hard one. As Nayak [23], from AccelChip Inc.,affirms: '...since FPGAs are normally used for appli-cations with tremendous time to market constraints, thereis a need for more mature design tools. Although designtools exist which take the VHDL description of thehardware and generate bitstream, a current challenge inthis field is to design a highJevel synthesis (HLS) rool,which would allow designers to enter designs at a muchhigher level of abstraction. These HLS tools would takethe algorithmic description of the required hardwaretogether with cerlain area and performance requirementsof the designer, and perform an exploration of the designspace to output the hardware which meets the designer'sspecifications'.
To sum up, it would be very helpful-at least fordesigning hardware processing computer vision systems-to have similar specification languages to those used insoftware engineering because these types of language mayoffer very interesting features, such as:
(a) Ease of description of entangled concepts asconcurrence, synchronization, temporary models,structural descriptions, etc. all very common inhardware systems.
(b) Analytical power, related to the formal definition of thelanguage (mainly at a semantic level) and hence to itsability, when it is applied in some developmentenvironments, to analyse, transform and verify.
(c) Utilization cost, calculated by weighting several selec-tion criteria such as simplicity, availability of develop-ing tools or standardization level.
As a result, many languages for hardware specificationhave emerged in the last few years. C/C** is the mostpopular target language for implementing extendedlanguage variations capable of describing hardwareelements fl,27,28,351.
The most imporlant of these are listed below.
. System C Í25,34) consists ofa standardized proposal forhardware/software specification which permits systemspecification at different levels of abstraction. The resultis a very refined specification process where both RTLand functional description can be deflned and used,depending on the degree of accuracy sought.
o Unlike the previous language, Handel C t4,l8l regardsthe main process as a software component and hardwareas a co-processor of this main process.
o Ocapi-XL [33] was specially designed for unifyingheterogeneous hardware/software systems models. Co-simulation using C/C * * code can be achieved byjoining simulation and implementation objects intothe same subset so that the same code can be used forboth simulation and implementation steps.
Although C language is prevalent when describingcomplex systems, there are other interesting options [5,12,13,401. One of them is Matlab-Simulink [32] environmentwith the addition of the Xilinx sysrem generaror (XSG) [39]blocksets, particularly appropriate for designing computervision systems.
Matlab is both a general-purpose programminglanguage and an interpreter for scientific and engineeringapplications, with emphasis on matrix computations (thename MATLAB is derived from MATrix LABoratory).Simulink extends the MATLAB interpreter to provide ageneral environment for modelling dynamical systems. Itprovides domain-specific objects called blocks from whichmodels can be built. Simulink's graphical editor is used toassemble blocks into models and to access the underlyingsimulation engine. Simulink supports discrete time simu-lation, which makes it well suited for modelling hardware,including the concurrent behaviour of sampled datasystems. Matl-ab-Simulink is much applied to signal andimage processing areas, it is easier to learn than theCIC+ + language and it can be programmed and simulatedfaster too, so designers can easily check the trade off of anyvariation in the code.
Xilinx system generator is a tool, which extends Simulinkwith software and blocks for bit and cycle-accuratemodelling of digital logic and DSP functions. It also has atranslator that converts a Simulink model into hardware forXilinx FPGAs. Models constructed from Xilinx blocksbehave in exactly the same way in Simulink as they do inhardware. As part of the Simulink environment, these modelscan readily be combined with non-Xilinx blocks to modelparts of a system not bound to the FPGA. Setting parametersfrom the MATLAB workspace can customize Xilinx blocks,like most Simulink blocks. Only the portion of a model builtfrom Xilinx blocks is translated into hardware by SystemGenerator. The behaviour of the non-Xilinx portion of a
System Generator model can be captured in an HDL testbench using simulation vectors computed in Simulink. In thisway, the Simulink behaviour of the model can be comparedwith that of the generated hardware.
The Xilinx blockset contains high-level blocks that mapinto intellectual property (IP) cores which have beenhandcrafted for efficient implementation in the targetFPGA. By using such high-level blocks, a complex systemcan be quickly assembled. In addition, the Xilinx blocksetcontains many lower level blocks that can be used to
4t4 A.T. Moreo et al. / Microprocessors and Microsvstems 29 (2005) 411419
fffi"-wll%** |tw Il"#* IlwlSy!tem
Generatar
Fig. 3. Modular-blockset-based
assemble custom functions. These blocks provide controls to
access key aspects of the underlying FPGA resources' The
combination of high-level and low-level blocks supports a
wide range of implementation options to explore the
available irchitectures that can be implemented in an FPGA'
Consequently Matlab Simulink and Xilinx system
generator offer án ideal virtual prototyping environment in
*f,i"i, u system test can be run concurrently with hardware
and software models, using all the capabilities and toolboxes
iurnished Uy Matlab-Simulink in conjunction with powerful
hardware dlscription languages like VHDL' Moreover' they
offer designers a set of optimized hardware blocks that can be
combined to create much more complicated systems' thus
makingsystem
Blrü¡i Box
hardware binarization block'
for the kind of easilY described
needed for co-design.
modular reusable
3. Apptication examPles
3. 1. Binarization algorithm
When processing an image, thresholds constitute a very
important tool for detecting objects 13'9'3 1l' This technique
consist, of analysing eacñ pixel via image exploration in
order to determine whether its grey-level value is between
ihe specifiea threshold values (then a pre-established output'
f"m"wll%.- It#_ Il%k-:&l§yaam
GBn*r3tnr
Tr: ú/rrlsPaoel
E el ¡tion¡l
t o nr.ta nl
T r lr1Jo leP a ce 1
lqlodelSim_.:Palametets#
Fig. 4. VHDl-based hardware binarization block'
A.T. Moreo et al. / Microprocessors and Microsystems 29 (2005) 411-419 415
(a)this_block. setTopLevelLanguage('VHDL') ;th¡s_block.setEntityName('thres');
(b)this_block.addSimulinklnport('video_¡n') ;th¡s_block.addSimul¡n kOutport('v¡deo_out') ;bl ock. outpo rt( ¡). setRate (theI n putRate);this_block.addClkCEPair('sys_cl k','sys_ce', block. inputRates);
(c)VALUE= eval(get_param(this_block. blockName,'THRESHOLD_VALUE')) ;
if (this_block.inputTypesKnown )this_block.addGeneric('DATA_WIDTI'I',this*block. port('video_in'),width); this_block.addGener¡c('THRESHOLD_VALUE',VALUE);
end
Fig. 5. M-Function file associated with the VHDL binarization black block.
associated with black or white colour, is produced) or elsethe current pixel grey-level is outside these thresholds (thenthe value is replaced by a new one which is the opposite ofthe first).
3.1.1. Xilinx blockset based implementotionFig. 3 shows the one-threshold hardware binarization
block developed using Xilinx system generator, which isapplied to a 256X256 X 8 sized grey-scale image, whichmeans 256 pixels image widthX256 pixels image heightX8 bits data storing format. The binarization block comprisesthe following modular elements.
¡ From Workspace. The From Workspace block reads datafrom the MATLAB workspace. The block's Dataparameter (Array_in) specifies the workspace data via aMatlab expression evaluating a matrix (2D array) or astructure containing an array of signal values and timesteps. It was created as a callback function to be invokedby Simulink before the simulation starts (Load_Image.m). This function reads a grey-scale intensity
image from a graphics file and converts it into an array ofpixel values (Array _in).To Workspac¿. This block writes its input to theworkspace and its output to an array or structure thathas the name specified by the block's variable nameparameter (Aruay_out). A callback function (Show_-
Result.m) is used after simulation stops.
System Generator Block. This special Xilinx block callsupon the tool's code generation software. By placing the'System Generator token' on the Simulink project, it cangenerate FIDL and Xilinx LogiCore instances for all theXilinx blocks on the sheet. Currently supported familiesare: Virtex, Virtex2, Virtex2P, Sparlan2, Spartan2, and
VinexE. The device selected for all the works describedhere is a Xilinx Virtex, XCV800 HQ240, speed-6.Gateway In/Gateway Out. These blocks are, respect-ively, the input/output blocks in the Xilinx FPGA part ofthe Simulink design. The first block (Dln) converts a
Simulink double precision input into a System Gen-erator's fixed point type and specifies an input port forthe top level of the HDL design. The second block(Doutl) converts a System Generator fixed point data toSimulink double precision value. It can either define an
output port for the top level of the HDL design generatedby System Generator or be simply used as a test point.
The blocks of the Threshold subsystem are the following.
Constant. This block generates the threshold value (inthis example, 125).
Register. The Xilinx Register block models a D flip-flopbased register for registering design inputs, with a
latency of one sample period.Relational. The Xilinx Relational block implements a
comparator. The following comparisons are supported:equal-to; not-equal-to; less-than; greater-than; less-than-
or-equal-to; greater-than-or-equal-to. The output of
Fig. 6. ModelSim co-simulation of the VHDL binarization black block
A.T. Moreo et al. / Microprocessors and Microsystems 29 (2005) 411419
i##w
{a .-.,,r,rLi:,:::,,::
b
rd,Jifiñiiiifix3
:.,:§i.. ll
Fig. 7. Outputs from different hardware-implemented processing blocks
lillt-*
the block is the logical value 'f if the comparison is true
and '0' if it is false.
3. 1.2, V H DL-bas ed imPleme ntation
Similar to Xilinx hardware blocks, System Generator
allows implementation using the VHDL hardware descrip-
tion language. Fig. 4 represents the same binarization
process previously described, but now implemented using a
VHDL description. A System Generator Black Box block
must be used to specify both the simulation behaviour in
Simulink and the implementation files to be used during
the code generation with System Generator. The Black Box
element adopts the interface (ports and parameters) of its
implemented function.
Interface description must be defined by a MATLAB m-
function called the block's configuration M-function(thres-config.m), which specifies the top-level entity name
of the HDL component that should be associated with the
black box. It is used to select the language (Fig" 5a), to
describe ports and sample rate (Fig. 5b), to def,ne the
generics required by the black box HDL-in this example
the input port is a dynamic port (Fig. Sc)-and to specify
the black box HDL and other files (e'g' EDIF) that are
associated with the block.
Clock and clock-enable ports on a black box are treated
in a different way. Every System Generator block requiring
a clock has at least one clock and clock-enable port on its
HDL counterpart. When a black box is translated into
r'l',i;i::.::::; li§:b
A.T. Moreo et al. I Microprocessors and Microsystems 29 (2005) 411419
LineBr¡t'leÉ
Fig. 8. Hardware convolution block.
hardware, System Generator drives the clock and clock-enable ports with signals whose rates can be specifiedaccording to the block's configuration and the sample rateswhich drive it in the Simulink environment.
For black box HDL code to be co-simulated withSimulink, a ModelSim HDL co-simulation block must beadded to the design. This block run a ModelSim sessionwhen a Simulink simulation starts, transfers communicationbetween Simulink and ModelSim and reports whatevereffors are detected when black box HDL is compiled. Whenthe model (Fig. a) is run, waveforrns are displayed by theModelSim HDL Simulator (Fig. 6). Finally, Marlabdisplays the processing results in the way shown in Fig. 7b.
3.2. Generalized convolution blocks
As noted earlier, convolution is one of the mostpredominant image processing operations, so severalconfigurable hardware cores capable of performing 5 X 5and 3 X 3 sized convolutions have been implemented.Fig. 7d and e shows the result of performing a 3 X 3smoothness and a 5 X 5 sharpness operation on the 256 greylevels input image of Fig .7a. A coefficient must be enteredfor each fllter in the convolution mask as a parameter inorder to achieve a maximum flexibility. This could meanthat in some cases it may not produce the most optimizedpossibility when synthesising this proposed system blockdescription.
In Fig. 8, two of these 3X3 generic convolution blocksare used to perlorm an edge detection applying a Prewittfilter. This type of filter computes gradient using twoconvolution blocks, one for X-gradient estimation and theother one for Y-gradient estimation, which are thencombined in the resulting image 7c. A memory buffer is
needed to store two image lines, as detailed in Fig. 8; this isimplemented in the FPGA as a RAM block.
4. Results and conclusions
Xilinx system generator is a very useful tool fordeveloping computer vision algorithms. It could bedescribed as a timely, advantageous option for developingin a much more comfortable way than that permitted byVHDL or Verilog hardware description languages (HDLs).Its main benefits and drawbacks are:
¡ When it is combined with the powerful Matl-ab-Simulink IDE and its image processing toolbox, itbecomes a very suitable environment for applying co-design techniques. Communication among Matlabenvironment and user hardwa¡e models is easy, so veryrealistic and complex simulation environment for verydifferent types of computing systems (image processing,fuzzy logic, neural networks, etc.) are possible.
o Algorithms can be implemented and simulated at highabstraction levels by using Matlab script files-in
Table I
Hardware binarization block implementation results
Virtex XCV800
Xilinx blocksets VHDL
Slices 9
Flip-flops 8
LUTs 8
IOBs 9
Gates 112Maximum oper- 167.588
ation rate (MHz)
7
9
3
9
90190.006
9408
18,816
1 8,8 l6170800 K
Filter_A 3>á
To Wod{sp¿6e
Filter B 3)á
418 A.T. Moreo et ú1. / Microprocessors and Microsystems 29 (2005) 411419
Table 2
Generalized hardware convolution implementation results
Prewitt X* Y Conv.3X3 Conv- 5 X 5
Slices (7o)
Flip-flops (7o)
Four inputsLtJTs (Vo)
IOBs (9o)
BRAMs (7o)
Gates (7o)
Maximumoperation rate(MHz)
4
3
2
9
28
17
68.34
3
2
2
9
t710'16.587
6
4
4
9
3219
69.195
the event of a software implementation-, or they can be
hardware-modelled by using the Xilinx system generator
blocks. These eiements provide a ready-made benchmarkfor hardware implementation of image processing
algorithms. But if higher performance is required,
a hardware description language can be also used formodelling at a very low-level of abstraction, as the
binarization example shows (Table 1).. Once a system is tested, hardware blocks can be
synthesized and downloaded into a programmable logicdevice like an FPGA. Table 2 shows the implementationresults for each developed convolution mask, whereas
Table 1 shows the results for the binarization examples;
in both cases the target device is a Xilinx VirtexXCV800. As most of computer vision applications are
based on MAC operations, an FPGA with more internal
resources, i.e. Virtex II or Spartan III will be tested in the
next stage of the development process. An improvement
of the figures in Tables 1 and2 is expected, due to the use
of the embedded multipliers available in such devices.o The main drawback of the Xilinx system generator (XSG)
development tool is time consurnption in low-ievelhardware simulation, which is usually greater than, forinstance, in a Handel C simulation. Despite this, the huge
available number of toolboxes and dedicated libraries forimage processing makes it a highly suitable environment
for co-designing on computer vision algorithms.
As modern FPGAs are capable of implementing high-performance operations, rnore complex modules likehistogram computation, labelling algorithms, Hough Trans-
form, etc. are culrently being deveioped by using this
methodology in order to achieve an improved comparison
between XSG and VHDL and to obtain a final co-design
environment for automated inspection systems develop-ment, which is now the main goal of our future work.
References
[1] I. Alston, B. Madahar, From C to netlists: hardware engineering for
software engineers?, Electronics and Communication Engineering
Journal 14 (4) (2002) 165-173.
l2l P.M. Athanas, A.L. Abbott, Real-time image processing on a custom
computing platform, IEEE Conrputer 28 (2) (1995) 16-25.
[3] J. Bemrl, Computer Vision and Applications: A Guide for Students
and Practitioners, Academic Press, New York, 2000, ISBN:
0123'1 9'7 7 7 2, pp. 5-30.Celoxica Inc., http://www.celoxica'comK. Compton, S. Hauck, R.econfigurable computing: a survey of systems
and software, ACM Computing Surveys June {2002) 17 l-210.
t6l A.S. Dawood, S.J" Visser, J.A Williams, Reconfigurable FPGAS
for real time image processing in space l4th Intemational
Conference on Digital Signal Processing, DSP 2002, vol' 2'
2002, pp. 845-848.[7] M. Edwards, B. Fozard, Rapid prototyping of mixed hardware and
software systems, Euromicro Symposium on Digital System Design
2002; 118-125.
t8l M.A. Figueiredo, C. Gloster, Implementation of a probabilistic neural
network for multi-spectral image classification on an FPGA based
custom computing machine, Fifth Brazilian Symposium on Neural
Networks, 1998, pp. 1"14-179.
tgl D.A. Forsyth, Computer Vision: A Modern Approach, Pearson
Education, 2002, ISBN: 0 1 3085 I 98 I .
t10l A. Garimella, M.V.V. Satyanarayana, R. Satish Kumar,
P.S. Murugesh, U.C. Niranjan, VLSI implementation of online digital
watermarking technique with difference encoding for 8-bit gray scale
images, Proceedings of the l6th Intemational Conference on VLSI
Design, 2003, pp. 283-288.
tlll S.D. Haynes, P.Y.K. Cheung, W. Luk, J. Stone, SONIC-a plug-in
architecture for video processing, Proceedings of the Seventh Annual
IEEE Symposium on Field-Programmable Custom Computing
Machines, FCCM'99, 1999, pp. 280-281.
[l2] A. Hoffrnann, O. Schliebusch, A. Nohl, G. Braun, O. Wahlen'
H. Meyr, A methodology for the design of application specific
instruction set processors (ASIP) using the machine description
language LISA, IEEE/ACM international Conference on Computer
Aided Design, ICCAD 2001, 2001, pp. 625-630.
[3] J. Hwang, B. Milne, N. Shirazi, J' Stroomer, System level tools for
DSP in FPGAs, FPL 200 I , Lecture Notes in Computer Science, 200 I 'pp.534-543.
tl4l C. Iseli, E. Sanchez, Spyder, a SURE, SUperscalar and REconfigur-
able, Processor, Joumal of Supercomputing 9 (1995) 231-252'
[15] L. Kessal, D. Demigny, N. Boudouani, R. Bourguiba, Reconfigur-
able hardware for real time image processing, Proceeding
of the 2000 Intemational Conference on Image Processing' vol'
3, pp. 110-113.
[6] H. Krupnova, G. Saucier, FPGA technology snapshot: current devices
and design tools, llth Intemational Workshop on Rapid System
Prototyping, 2000, pp. 200-205.
[17] L. Liang, S. Yu, Real-time duplex digita'l video surleillance srstem
and its implementation with FPGA, Proceedings of the Fourth
Intemational Conference on ASIC, 2001, pp. 471-{73'
tl8l S.M. Loo, B.E. Wells, N. Freije. J. Kulick. Handel-C for rapid
prototyping of VLSI coprocessors for real time systems, Proceed-
ings of the 34th Southeastern Symposium on System Theory,
2002, pp. 6-10.
t19l A.K. Majumdar, Design of an ASIC for straight line detection in an
image, Proceedings of the l3th International Conference on VLSI
Design, 2000, pp. 128-133.
[20] R. Manner, M. Sessler, H. Simmler, Pattern recognition and
reconstruction on an FPGA coprocessor board, IEEE Symposiun.r on
Field-Programmable Custom Computing Machines, 2003, pp' 325-
326.
[2ll Matrox Inc., http://www.matrox com
[22] S. Mitra, Digitai Signal Processing: A Computer Baseti Approach,
second ed., McGraw-Hill, New York, 2001, pp. 862-870'
[23] A. Nayak, M. Haldar, Accurate area and delay estimators for FPGAs'
Proceedings of the 2002 Design, Automation, and Test in Europe
Conference and Exhibition (DATE'O2), 2002, pp.862-870'
f41
t5l
A.T. Moreo et al. / Microprocessors and Micrr¡»,stetns 29 (2005) 411419 419
[24] L. O'Donnell, G.L. Williams, A.J. Lacey, N.L. Seed, P.R. Thome,A.C. Zawada, P.A. Ivey, A 3-D re-configurable image processing
element, IEE Colloquium on High Performance Architectures forReal-Time Image Processing (Ref. no. l998ll9'7),1998, pp. l-7.
t25l P.R. Panda, SystemC-a modeling platfomr supporting multipledesign abstractions, Proceedings of the 14th International Symposiumon System Synthesis,200l, pp. 75-80.
[26] M.E. Rizkalla, K. Palaniswamy, A.S.C. Sinha, M. El-sharkawy,P. Salama, S. Lyshevski, H. Gundrum, ASIC memory design of 2-Dmedian filters, Proceedings of the 43rd IEEE Midwest Symposium on
Circuits and Systems. vol. l, 2000, pp. 354-357.
[27] L. Semeria, A. Ghosh, Methodology for hardware/software co-verification in C/C* *, Proceedings of the ASP-DAC 2000 DesignAutomation Conference, 2002, pp. 123-128.
[28] L. Semeria, A. Seawright, R. Mehra, D. Ng, A. Ekanayake,B. Pangrle, RTL C-based methodology for designing and verifyinga multi+hreaded processor, Design Automation Conference, 2002,pp. 123-128.
[29] L. Shijian, G.Li,Z. Junzhu, L. Feng, A CPLD design of real timesystem for point targets detection in infrared image sequences,Proceeding of the Fifth International Conference on ASIC, 2l-24October vol. 2, 2003, pp. 906-909.
[30] G. Spivey, S.S. Bhattacharyya, K. Nakajima, Logic foundry:rapid prototyping of FPGA-based DSP systems, Proceedings ofthe ASP-DAC 2003 Design Automation Conference, 2003, pp.
374-38 I .
[31] G. Stockman, Computer Vision, Pearson Education, 2001., ISBN:0130307963, pp. 374-381.
[32] The MathWorks Inc., http://www.mathworks.com
[33] The Ocapi XL initiative, http://www.imec.be/design/ocapi
[34] The SystemC Initiative, htrp://www.systemc.org
[35] T. Todman, W. Luk, Real¡ime extensions to a C-like hardwaredescription language, llth Annual IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2003, 9-11April, 2003, pp. 302-304.
t36l I.S. Uzun, A.A.A. Bouridane, FPGA implemenrarions of fasr Fouriertransforms for real-time signal and image processing, IEEEInternational Conference on Field-Programmable Technology(FPr), 2003, pp. 102-109.
[37] K. Wiatr, E. Jamro, Implementation image data convolutions operationsin FPGA reconflgurable structures for real-time vision systems,Intemational Conference on Information Technology: Coding andComputing, 2000, pp. 152-157.
[38] J. Woodfill, B. Von Herzen, Real-time srereo vision on rhe PARTSreconfigurable computer, Proceedings of the Fifth Annual IEEESymposium on FPGAs for Custom Computing Machines, 1991, pp.20t-210.
[39] Xilinx Inc., hup://www,xilinx.com
[40] X. Zhang, K.W. Ng, A review of high-level synthesis for dynamicallyreconfigurable FPGAs, Microprocessors and Microsystems 24 (4)(2000) 199-21 L
[41] T. Yokota, M. Nagafuchi, Y. Mekada, T. Yoshinaga, K. Ootsu,T. Baba, A scalable FPGA-based custom computing machine for a
medical image processing, Proceedings of the l0th Annual IEEESymposium on Fie'ld-Programmable Custom Computing Machines,2002, pp.307-308.
Ana Toledo Moreo, BE, MSc, is a PhDstudent and Assistant Lecturer in the
Electronics Technology Department ofthe Technical University of Cartagena,
Spain. Her research has been mainlyfocussed in developing lowJevel image-processing algorithms for FPGAs. She isalso interested in artificial neural networks.principally in the improvement of leaming
algorithms. She has a number of publi-cations relaled with the two areas in
intemational joumals and conferences. Currently, she is a VisitorResearcher in the University of Reading, UK.
Pedro J. Navarro Lorente. received the
MSc degree in Electrical and Electronicsin 2000 from Technical University ofCartagena, Murcia, Spain; actually he is
taking PhD degree in Mechatronics and
Engineering in Technical University ofCartagena, Murcia, Spain. Since 2000 he
has participated in different projectsfocused on computer vision and robotics
application for the industry. In 2001, he
joined the Systems and Electronic Engin-eering Division (DSIE). He is currently working as associated professorat the Technical University of Cartagena, Murcia, Spain, in the fleld ofelectronics. His current research interests include hardware-based
computer vision algorithms, mechatronics and robotics, and he is
author of various papers on these topics.
Fulgencio Soto Valles, BE, MSc, is a PhD
student and Assistant Lecturer on the
Electronics Technology Department ofthe Technical University of Cartagena,
Spain. His research has been mainlyfocussed in developing digital controlsfor power electronic systems. Also, he is
interested in artificial vision, principally in
the improvement of quality inspection
sysfems. He has a number of publications
related with the two areas in intemationaljoumals and conferences. Currently, he is a Visitor Researcher in the
University of Reading, UK.