18. Retina-Like Sensors: Motivations, Technology and Applications

12
18. Retina-Like Sensors: Motivations, Technology and Applications Giulio Sandini and Giorgio Metta Abstract I. Introduction II. Space-Variant Imaging III. Technology of Solid State Log-Polar Sensors A. Parameters of Log-Polar Sensors B. CMOS Implementations C. The Fovea of Log-Polar Sensors IV. Applications A. Image Transmission B. Robotics V. Conclusions References Abstract Retina-like visual sensors are characterized by space-variant resolution mimicking the distribu- tion of photoreceptors of the human retina. These sensors, like our eyes, have a central part at high- est possible resolution (called fovea) and a gradu- ally decreasing resolution in the periphery. We will present a solid-state implementation of this concept. One attractive property of space-variant imaging is that it allows processing the whole image at frame rate while maintaining the same field of view of traditional rectangular sensors. The resolution is always maximal if the cameras are allowed to move and the fovea placed over the regions of interest. This is the case in robots with moving cameras. As an example of possible ap- plications, we shall describe a robotic visual sys- tem exploiting two retina-like cameras and using vision to learn sensorimotor behaviors. I. Introduction This chapter describes the physical im- plementation in silicon of a biologically inspired (retina-like) visual sensor. It is important also to understand the context and the motivations for undertaking such a research. Over the past few years, we have studied how sensorimotor patterns are acquired in a complex system such as the human body. This has been carried out from the unique perspective of imple- menting the behaviors we wanted to study in artificial systems. The approach we followed is biologically motivated from at least three perspectives: i) morphology (sensors are shaped as close as possible to their biological counterparts); ii) physiol- ogy (control structures and processing are modeled after what is known about hu- man perception and motor control); iii) development (the acquisition of those sensorimotor patterns follows the process of biological development in the first few years of life). The goal has been that of understand- ing human sensorimotor coordination and cognition rather than building more efficient robots. In fact, some of the de- sign choices might even be questionable on a purely engineering ground but they

Transcript of 18. Retina-Like Sensors: Motivations, Technology and Applications

18. Retina-Like Sensors: Motivations,Technology and Applications

Giulio Sandini and Giorgio Metta

AbstractI. IntroductionII. Space-Variant ImagingIII. Technology of Solid State Log-Polar Sensors

A. Parameters of Log-Polar SensorsB. CMOS ImplementationsC. The Fovea of Log-Polar Sensors

IV. ApplicationsA. Image TransmissionB. Robotics

V. ConclusionsReferences

Abstract

Retina-like visual sensors are characterized byspace-variant resolution mimicking the distribu-tion of photoreceptors of the human retina. Thesesensors, like our eyes, have a central part at high-est possible resolution (called fovea) and a gradu-ally decreasing resolution in the periphery. Wewill present a solid-state implementation of thisconcept. One attractive property of space-variantimaging is that it allows processing the wholeimage at frame rate while maintaining the samefield of view of traditional rectangular sensors.The resolution is always maximal if the camerasare allowed to move and the fovea placed over theregions of interest. This is the case in robots withmoving cameras. As an example of possible ap-plications, we shall describe a robotic visual sys-tem exploiting two retina-like cameras and usingvision to learn sensorimotor behaviors.

I. Introduction

This chapter describes the physical im-plementation in silicon of a biologicallyinspired (retina-like) visual sensor. It isimportant also to understand the contextand the motivations for undertaking sucha research. Over the past few years, wehave studied how sensorimotor patternsare acquired in a complex system suchas the human body. This has been carriedout from the unique perspective of imple-menting the behaviors we wanted tostudy in artificial systems. The approachwe followed is biologically motivated fromat least three perspectives: i) morphology(sensors are shaped as close as possible totheir biological counterparts); ii) physiol-ogy (control structures and processing aremodeled after what is known about hu-man perception and motor control); iii)development (the acquisition of thosesensorimotor patterns follows the processof biological development in the first fewyears of life).The goal has been that of understand-

ing human sensorimotor coordinationand cognition rather than building moreefficient robots. In fact, some of the de-sign choices might even be questionableon a purely engineering ground but they

are pursued nonetheless because theyimproved the similarity with biologicalsystems. Along this feat of understandinghuman cognition our group implementedvarious artifacts both in software (e.g. im-age processing, machine learning) andhardware (e.g. silicon implementation,robot heads).The most part of the work has been car-

ried out on a humanoid robot we call theBabybot (Metta 2000, Metta et al. 1999).It resembles the human body from thewaist up although in a simplified form.It has twelve degrees of freedom overalldistributed between the head, arm andtorso. Its sensory system consists of cam-eras (eyes), gyroscopes (vestibular sys-tem), microphones (ears) and positionsensors at the joints (proprioception).Babybot cannot grasp objects but it cantouch and poke them around. Investiga-tion touched aspects such as the integra-tion of visual and inertial information(vestibulo-ocular reflex) (Panerai et al.2000), and the interaction between visionand spatial hearing (Natale et al. 2002).In this paper we will focus on the eyes

of Babybot that mimic the distribution ofphotoreceptors of the human retina – wecall these retina-like cameras. Apart fromthe purely computational aspects, theyare best understood within the scientificframework of the study of biological sys-tems. In our view, the retina-like cameratruly represents such a thought-provok-ing mix of technological and biologicallyinspired work. Retina-like cameras havea non-uniform resolution with a high res-olution central part called the fovea anda coarser resolution periphery.As we hope to clarify in this chapter, the

uniform resolution layout, common tocommercial cameras, did not survive theevolutionary pressure. Evolution seems tobe answering the question of how to opti-mally place a given number of photore-ceptors over a finite small surface. Manydifferent eyes evolved with the disposition

of thephotoreceptors adapted to thepartic-ular ecological niche. Examples of thisdiversity can be found in the eyes of in-sects (see for example (Srinivasan andVenkatesh 1997) for a review) and in thoseof some birds which have two foveal re-gions to allow simultaneous flying andhunting (Blough 1979, Galifret 1968).There is clearly something to be earnedby optimizing the placement of photosen-sitive elements. In a constrained problemwith limited computational resources (i.e.the size of the brain), limited bandwidth(i.e. the diameter of the nerves deliveringthe visual information to the brain), andlimited number of photoreceptors, naturemanaged to obtain a much higher acuitythan what can be achieved with uniformresolution.From the visual processing point of

view, we asked on the one hand whetherthe morphology of the visual sensor facil-itates particular sensorimotor coordina-tion strategies, and on the other, howvision determines and shapes the acqui-sition of behaviors which are not necessar-ily purely visual in nature. Also in thiscase we must note that eyes and motorbehaviors coevolved: it does not makesense to have a fovea if the eyes cannotbe swiftly moved over possible regions ofinterest. Humans developed a sophisti-cated oculomotor apparatus which in-cludes saccadicmovements, smooth track-ing, vergence, and various combinationsof retinal and extra-retinal signals tomaintain vision efficient in a wide varietyof situations (see Carpenter 1988 for areview).This addresses the question of why it

might be worth copying from biologyand which are the motivations for pursu-ing the realization of biologically inspiredartifacts. How this has been done is pre-sented in the following sections where weshall talk about the development of theretina-like camera. Examples of applica-tions are also discussed in the field of

252 Giulio Sandini and Giorgio Metta

image transmission and robotics. The im-age transmission problem is alike the lim-itation of bandwidth=size of the opticnerve mentioned above. The limitationsin the case of autonomous robots are interms of computational resources andpower consumption.

II. Space-Variant Imaging

Babybot, shown in Fig. 1, relies on a pairof retina-like sensors for its visual process-ing. These sensors are characterized by aspace-variant resolution mimicking thedistribution of photoreceptors of the hu-man retina. The density of photoreceptorsis highest in the center (limited by the par-ticular technology used) and decreasesmonotonically as the eccentricity – thedistance of the photosite from the centerof the sensory surface – increases. The re-sulting image is, consequently a compro-mise between resolution, amplitude of thefield of view (FOV), and number of pixels.This space-variant imaging is unique be-cause it enables high-resolution tasks

using the central region while maintain-ing the lower resolution periphery provid-ing relevant information about the back-ground. The arrangement is advanta-geous, for example, for target tracking:the wide peripheral part is useful for de-tection while the central part takes overduring the tracking and performs withthe highest accuracy.Of all possible implementations of

space-variant sensors what is describedhere is the so-called log-polar structure(Sandini and Tagliasco 1980, Schwartz1980, Weiman and Chaikin 1979). Thelog-polar geometry models accuratelythe wiring of the photoreceptors fromthe retina to the geniculate body and theprimary visual cortex (area V1). In thisschema a constant number of photositesis arranged over concentric rings (the po-lar part of the representation) giving riseto a linear increase of the receptor’s spac-ing with respect to the distance from thecentral point of the structure (the radiusof the concentric rings). A possible imple-mentation of this arrangement is shownin Fig. 2. Because of the polar structure

Fig. 1. The Babybot. Left: the complete setup. Middle: detail of the head, the tennis-like balls cover theeyes for esthetic purpose, the microphones and ear lobes are mounted on top of the head. Right: backview of the head showing the inertial sensors in its center

Retina-Like Sensors 253

of the photosensitive array and the in-creasing size of pixels in the periphery,retina-like sensors, do not provide imageswith a standard topology. In formulasmapping from the retina (�,#) into the cor-tical plane (�,�) accounts to:

� ¼ q � #� ¼ log a

��0

�ð1Þ

where �0 is the radius of the innermostcircle, and 1=q is the minimum angularresolution. Equation (1) is easily under-stood by observing that the angular vari-able � is linearly mapped from its polarrepresentation #, and the eccentricity �is scaled logarithmically (with basis a).Equation (1) is related to the traditionalrectangular coordinate system by:

x ¼ � cos#y ¼ � sin#

�ð2Þ

It is worth noting that the mapping is ob-tained at no computational cost, as it is adirect consequence of the arrangement ofthe photosites and the read-out sequence.

The topology of a log-polar image isshown in Fig. 3. Note that radial struc-tures (the petals of the flower) correspondto horizontal structures in the log-polarimage. In spite of this seemingly distortedimage, the mapping is conformal and,consequently, any local operator usedfor standard images can be applied with-out changes (Weiman and Chaikin1979).In the Babybot, for example, images

coming from the left and right channelswere processed to recover the location, inretinal coordinates, of possibly interestingobjects. Color is a good candidate for thispurpose and it is extracted after mappingthe RGB components into the Hue Sat-uration and Value (HSV) space. This isnot normalized or scaled as for examplein (Darrell et al. 2000), but proved to beenough for our experiments. The process-ing extracts and labels the most salientimage regions. Regions are successivelycombined through a voting mechanismand finally the coordinates of the most’’voted’’ one are selected as the position

Fig. 2. Layout of receptor’s placing for a log-polarstructure composed of 12 rings with 32 pixelseach. The pixels marked in black follow a loga-rithmic spiral

Fig. 3. Left: space variant image obtained by re-mapping an image acquired by a log-polar sen-sor. Note the increase in pixels size with eccen-tricity. Right: Log-polar image acquired by aretina-like sensor. Horizontal lines in the log-po-lar image are mapped into rings to obtain the re-mapped image shown on the right

254 Giulio Sandini and Giorgio Metta

of the object to look at. This fusion proce-dure provides the attentional informationto generate tracking behavior. The error –i.e. the distance of the projection of thetarget on the image plane from the centerof the image – is used to trigger saccade-like movements or, in case of smoothlymoving targets, as feedback signal forclosed loop control.The way the different visual cues are

combined does not provide informationabout binocular disparity. In fact the pro-cedure described above does not performany stereo matching or correspondencesearch – regions are treated as 2D enti-ties. For the purpose of measuring depthwe used a different algorithm whose de-tails can be found in (Manzotti et al.2001). This, too, uses log-polar imagesand provides a measure of the global dis-parity, which is used to control vergence.Grossly simplifying, by employing a cor-relation measure, the algorithm evaluatesthe similarity of the left and right imagesfor different horizontal shifts. It finallypicks the shift relative to the maximumcorrelation as a measure of the binoculardisparity. The log-polar geometry, in thiscase, weighs differently the pixels in thefovea with respect to those in the periph-ery. More importance is thus accorded tothe object being tracked.Positional information is important but

for a few tasks optic flow is a betterchoice. One example of use of optic flowis for the dynamic control of vergence asin (Capurro et al. 1997). We implementeda log-polar version of a quite standard al-gorithm (Koenderink and Van Doorn1991). According to the choice of the al-gorithm we defined the affine model as:

_xx_yy

� �¼ u0

v0

� �þ Dþ S1 S2 � R

Rþ S2 D� S1

� �� xy

� �

ð3Þwhere _xx and _yy are the optic flow, and xand y the image plane coordinates (with

origin in the image center). Equation (3)depends on four quantities: translation,rotation, divergence and shear. The firsttwo components u0 and v0 represent a rig-id 2D translation, and D, R, S1, S2 are thefour first-order vector field differentialinvariants: divergence, curl, and shear re-spectively. The details of the implementa-tion can be found in (Tunley and Young1994). The estimation of the optic flowrequires taking into account the log-polargeometry because it involves non-localoperations.

III. Technology of Solid StateLog-Polar Sensors

Traditionally, the log-polar mapping hasbeen obtained in two different ways: bymeans of electronic boards transforming,in real-time, standard images into log-polar ones or by building sensors withthe photosites arranged according to thelog-polar mapping. Electronic boardswere employed first; they were used togenerate log-polar images for real-timecontrol and image compression (Engelet al. 1994, Rojer and Schwartz 1990,Wallace et al. 1994, Weiman and Juday1990). The advantage is the use of stan-dard off-the-shelf electronic components.The main disadvantage is the constraintintroduced by the size of the original im-age limiting the potential advantages ofthe log-polar structure. This point willbe clarified in the following sections.Whatever the approach used, the designof the sensor has to start from the techno-logical limitations: the most important ofwhich are the minimum pixel size and themaximum sensor size. Besides our reali-zations a few other attempts have beenreported in the literature on the imple-mentation of solid-state retina likesensors (Baron et al. 1995, Baron et al.1994). So far we are not aware of anycommercial device, besides those

Retina-Like Sensors 255

described here, that have been realizedbased on log-polar retina-like sensors.

A. Parameters of Log-Polar Sensors

Starting from the technological con-straints (minimum pixel size and size ofthe sensor), the most important sensor’sparameter is the total number of pixels.The total number of pixels is directly re-lated to the amount of information ac-quired. For constant resolution devicesthis parameter is fixed by the technologi-cal constraints in a simple way. In thecase of log-polar sensors the relationshipbetween these two parameters is not asimple one (see Sandini and Tagliasco1980, Wallace et al. 1994, Weiman 1988)for more details). The second importantparameter, unique to log-polar sensors,is the ratio between the largest and thesmallest pixels – we shall call it R.For example, our first solid-state imple-

mentation was realized at the beginningof the 90s using CCD technology (Vander Spiegel et al. 1989). At that time, withthe technology available, the size of thesmallest possible pixel was about 30 mmand for practical limitations the overallsensor diameter was limited to 94mm. Apicture of the layout is shown in Fig. 4.This sensor is composed of 30 rings andeach ring is covered by 64 pixels. Alto-gether the sensor had 2022 pixels, 1920of which in the log-polar part of the sen-sor. In the CCD implementation R wasabout 13.7 (the largest pixel was412mm). This parameter describes theamount of space variance of the sensorand is, of course, equal to 1 in standardconstant resolution sensors.The third important parameter is the

ratio between size of the sensor and thesize of the smallest pixel. We shall call itQ. The importance of Q can be under-stood by observing that its value is equalto the size of a constant resolution image

with the same field of view and the samemaximum resolution of the correspondinglog-polar sensor. For example for theCCD sensor shown in Fig. 4 Q is equalto about 300 meaning that if we want toelectronically remap a constant resolutionimage and obtain the same amount of in-formation obtained from our log-polarsensor, the original image must be at least300� 300 pixels.

B. CMOS Implementations

The CCD implementation described ear-lier, even if it was the first solid-state de-vice of this kind in the world, had somedrawbacks mostly related to the use ofCCD technology itself. In our more recentimplementations, the CMOS technologywas used. A first version of the sensorwas realized using a 0.7 mm technologyallowing a minimum pixel size of 14 mm.Later, improvement of technology en-abled a further reduction of the minimumpixel size to 7 mm. In fact, our most recentCMOS implementation uses a 0.35 mm

Fig. 4. Structure of the first log-polar sensor rea-lized with CCD technology

256 Giulio Sandini and Giorgio Metta

technology. This sensor – realized atTower in Israel – has been developedwithin a European Union-funded re-search project (SVAVISCA). The goal ofthe project was to build, beside the sen-sor, a microcamera with a special-pur-pose lens with a 140 degree field of view.The miniaturization of the camera is nowpossible because some of the electronicsrequired to drive the sensor as well as theanalog-to-digital converter is included onthe chip itself. A picture of part of thelayout of the sensor is shown in Fig. 5.Table 1 summarizes the main parametersof the sensor.To compare, the first implementation

had a Q equal to 300. Even if the totalnumber of pixels of the 300� 300 imagewas 40 times larger than the 2000 pixelretina-like image, its size was still wellwithin the limits of standard computerhardware (e.g. bus bandwidth, memory,etc.). The latest sensor, if simulated usinga software or hardware remapper wouldrequire the storage and processing of animage with a number of pixels exceeding

the current standard dimensions and,consequently, the design of special pur-pose hardware. The silicon solution notonly requires a much smaller number ofpixels but, more importantly, it also re-quires a lower consumption and hasmuch faster read-out times – 33,000 pix-els can be read about 36 times faster than1,200,000 pixels. This advantage is boundto increase even more in the future whenhigher integration will be available.

C. The Fovea of Log-Polar Sensors

The equations of the log-polar mappinghave a singularity at the origin wherethe size of the individual photoreceptorswould theoretically go to zero. In practi-cal terms this is not possible because – forsilicon as well as biological photorecep-tors – the technology limits the size ofthe smallest realizable photoreceptor1.For the sensors described here, therefore,the radius of the innermost ring of thelog-polar representation is constrainedby the size of the smallest pixel whilethe region inside this circle – the fovea –cannot follow this representation. Withthe CCD sensor this inner circular regionwas covered by a square array of pixels,all of the same size. The solution had twodrawbacks. First, the square array insidethe circular fovea left a portion of the vi-sual scene unsampled; second, not onlythe logarithmic part of the representationwas broken but also its polar components.In the successive implementation we

tried to improve on the design of the firstsensor: the resulting geometry is shownin Fig. 6. The solution was to reduce thenumber of pixels in the foveal rings by

Fig. 5. Layout of the latest CMOS implementa-tion

1In humans the smallest diameter of a photore-ceptor in the fovea is of the order of 1.5 mm whilein the sensors realized so far the size of the smal-lest realizable photoreceptor was 30mm, 14 mm,7 mm for the 2000, 8000 and 33,000 pixel sensorsrespectively.

Retina-Like Sensors 257

halving their number when necessary.For example, as in the periphery thestructure has 128 pixels per ring and thesize decreases toward the center, whenthe size of the pixels can no longer bereduced, the successive ring only accom-modates 64 pixels until the technologicallimit is reached again and the number ofpixels per ring is halved one more time.In summary, as it can be observed in Fig.6, the fovea contains 10 rings with 64 pix-els, 5 rings with 32 pixels, 2 rings with 16pixels and 1 ring with 8, 4, and 1 pixelrespectively. It is worth noting that byemploying this arrangement the polargeometry is preserved and there is noempty space between the periphery andthe fovea. However, continuity in terms ofspatial resolution is not preserved be-cause, whenever the number of pixelsper ring is halved, the size of the pixelsalmost doubles.In our latest implementation we

adopted a different solution which is op-timal in the sense that it preserves the

polar structure and covers the fovea withpixels of the same size. This solution isgraphically shown in Fig. 7.

IV. Applications

A. Image Transmission

Built around a retina-like camera exten-sive experiments on wireless image trans-mission were conducted with a set-upcomposed of a remote PC running a webserver embedded into an application thatacquires images from the a retina-likecamera (Giotto – see Fig. 8) and compressthem following one of the recommenda-tions for video coding over low bit ratecommunication line (H.263 in our case).

Table 1. Descriptive Parameters of the Latest CMOS Sensor

Peripheral Pixels Foveal Pixels Total Pixels R Q Size (diameter)

27720 or 252� 110 5473 33193 17 1100 7.1mm

Fig. 6. Layout of the fovea of the IBIDEM CMOSsensor – 0.7 mm technology

Fig. 7. Solution adopted for the arrangement ofthe photoreceptors in the fovea of our latest reali-zation. With this topology the centermost regionis covered uniformly with pixels of the same size

258 Giulio Sandini and Giorgio Metta

The remote receiving station was a palm-top PC acting as a client connected to theremote server through a dial-up GSMconnection (9600 baud). Using a standardbrowser interface the client could connectto the web server, receive the compressedstream, decompress it and display the re-sulting images on the screen. Due to thelow amount of data to be processed andsent on the line, frame rates of up to fourimages per second could be obtained. Theonly special-purpose hardware requiredis the Giotto camera; coding=decodingand image remapping is done in softwareon a 200-MHz PC (on the server side),and on the palmtop PC (on the clientside). The aspect we wanted to stress inthese experiments is the use of off-the-shelf components and the overall physicalsize of the receiver. This performance interms of frame rate, image quality andcost cannot clearly be accomplished byusing conventional cameras.More recently (within a project called

AMOVITE) we started realizing a porta-ble camera that can be connected to thepalmtop PC allowing bi-directional image

transmission through GSM or GPRS com-munication lines. The sensor itself is notmuch different from the one previouslydescribed apart from the adoption of acompanion chip allowing a much smallercamera.

B. Robotics

As far as robotics is concerned the mainadvantage of log-polar sensors is relatedto the small number of pixels and the com-paratively large FOV. For some importantvisual tasks images can be used as if theywere high resolution. Typically, for exam-ple, in target tracking the fovea allowsprecise positioning and the periphery al-lows the detection of moving targets. Thisproperty is exploited to control the robothead of the Babybot (see (Metta 2000,Sandini 1997)). In this implementation alog-polar sensor with less than 3000 pixelsis used to control in real-time the directionof gaze of a 5 degree-of-freedom head. As-pects of the image processing requiredwere outlined in section II.Particularly relevant is the possibility to

use vision to learn a series of diverse be-haviors ranging from multisensory inte-gration to reaching for visually identifiedtargets. Fundamental to any other visualtask is the possibility of moving the cam-eras so that the fovea moves and exploresinteresting regions of the visual field. Weimplemented a series of eye-head visuo-motor behaviors including saccades,vergence, smooth pursuit and vestibulo-ocular reflexes (VOR). For example ver-gence is controlled by relying on a mea-sure of the binocular disparity (Manzottiet al. 2001), and dynamically by using theD component of the optic flow (see equa-tion (3)); see (Capurro et al. 1997) formore details.In the case of the VOR the appropriate

command, using a combination of visualand inertial information, is synthesized

Fig. 8. Two different prototypes of the latest ver-sion of the Giotto camera. Both cameras areshown with a wide field of view lens (140�). Therightmost model has a standard C-Mount for thelens

Retina-Like Sensors 259

by measuring the stabilization perfor-mance. The latter measure is an exampleof the kind of parameters which is bestestimated by means of the optic flow. Inthis case the translational component pro-vides information to the robot about theperformance of the controller; a neuralnetwork can be trained on the basis of thisinformation (Panerai et al. 2000, Paneraiet al. 2002). Saccades and other visualcontrol parameters are also learnt asshown in (Metta et al. 2000). Visual in-formation about suitable targets is ac-quired in terms of eccentricity (the angu-lar position with respect to the currentgaze direction), binocular disparity, andoptic flow as described in section II.We also investigated the relationship

between vision and the acquisition of anacoustic map of the environment (inneural network terms). In this case therobot, equipped with the appropriatelearning rules, learnt autonomously therelationship between vision, sound, andsaccade=pursuit behaviors (Natale et al.2002). After a certain amount of training,the Babybot is able to orient towards avisual, acoustic, or visuo-acoustic target(a colored object emitting sound), thusbringing two non-homogeneous quanti-ties (vision and sound) into the same ref-erence frame.Finally, eye-head-arm coordination

was investigated. Under certain biologi-cal compatible hypotheses, we were ableto show that the robot could learn how totransform visual information (coding theposition of a target) into the sequence ofmotor commands to reach the target(Metta et al. 1999) and touch it. Also inthis case, vision was the organizing factorto build autonomously and on-line an in-ternal motor representation apt to controlreaching.It is worth stressing that the whole

system runs on a network of a small num-ber of PCs (four Pentium-class processors)at frame-rate and carries out on-line

learning – neural network weights arechanged on the fly during the operationof the robot. While the reason this is pos-sible is related to the overall number ofpixels (a limiting factor inmost implemen-tations), it was not clear beforehandwhether this same visual informationwas enough to sustain a broad range ofbehaviors. We believe that this implemen-tation provides a proof by existence thatindeed the amount of information re-quired can be obtained visually, with amoderately computational burden, if thevisual scene is sampled appropriately.Log-polar, in this sense, proved to be agood enough sampling strategy.

V. Conclusions

The main advantage of retina-like sensorsis represented by the compromise be-tween resolution and field of view allow-ing high resolution as well as contextualinformation to be acquired and processedwith limited computational power. Theadvantages of a silicon realization withrespect to hardware and software remap-pers has been discussed showing that, astechnology progresses, the retina-like ap-proach will not only maintain its currentadvantages, but it is bound to becomeeven more interesting in the applicationareas described – and possibly in others.Different realizations of log-polar sensorshave been illustrated as well as two keyapplications. In particular the peculiari-ties of the retina-like sensors for real-timecontrol of gaze and, in general, for ro-botics have been presented. This is cer-tainly the most obvious use of a sensortopology that has been shaped by evolu-tion to support the control of behaviorwhile maintaining the balance withenergy consumption and computationalrequirements. In spite of the great varietyof eyes found in nature, the conven-tional camera solution based either on

260 Giulio Sandini and Giorgio Metta

increasing simultaneously the resolutionand field of view or on the use of inter-changeable or variable focal lengthlenses, has not survived. There is nodoubt, in our view, that in the future,adaptable robots will have space-varianteyes or, conversely, the design will tradeautonomy for batteries and computationalpower. The image transmission applica-tion demonstrates empirically that, in or-der to fully exploit a communicationchannel, it is better to eliminate uselessinformation at the sensor’s level thancompressing information which is notused.

Acknowledgements

The work described in this paper in rela-tion to Babybot has been supported by theEU Project COGVIS (IST-2000-29375),MIRROR (IST-2000-28159). The researchdescribed on the retina-like CMOS tech-nology has been supported by the EUProjects SVAVISCA (ESPRIT 21951) andAMOVITE (IST-1999-11156). The re-search on the CMOS sensors has beencarried out in collaboration with IMECand FillFactory (Lou Hermans and DannyScheffer), AITEK S.p.A. (Fabrizio Ferrari,Paolo Questa) and VDS S.r.l. (AndreaMannucci e Fabrizio Ciciani). Featuresof the retina-like sensor topologies de-scribed are covered by patents. GiorgioMetta is Postdoctoral Associate at MIT,AI-Lab with a grant by DARPA as part ofthe ’’Natural Tasking of Robots Based onHuman Interaction Cues’’ project undercontract number DABT 63-00-C-10102.

References

Baron T, Levine MD, Hayward V, Bolduc M,Grant DA (1995) A biologically-motivated roboteye system. Paper presented at the 8th Cana-dian Aeronautics and Space Institute (CASI)Conference on Astronautics, Ottawa, Canada

Baron T, Levine MD, Yeshurun Y (1994) Explor-ing with a foveated robot eye system. Paperpresented at the 12th International Conferenceon Pattern Recognition, Jerusalem, Israel

Blough PM (1979) Functional implications of thepigeon’s peculiar retinal structure. Granda AM,Maxwell JM (eds) Neural Mechanisms of Be-havior in the Pigeon (pp. 71–88). New York,NY: Plenum Press

Capurro C, Panerai F, Sandini G (1997) Dynamicvergence using log-polar images. Int J ComputVision 24: 79–94

Carpenter RHS (1988) Movements of the Eyes(Second ed.). London: Pion Limited

Darrell T, Gordon G, Harville M, Woodfill J(2000) Integrated person tracking using stereo,color, and pattern detection. Int J Comput Vi-sion 37: 175–185

Engel G, Greve DN, Lubin JM, Schwartz EL(1994) Space-variant active vision and visuallyguided robotics: design and construction of ahigh-performance miniature vehicle. Paperpresented at the International Conference onPattern Recognition, Jerusalem

Galifret Y (1968) Les diverse aires fonctionellesde la retine du pigeon. Z Zellforsch 86:535–545

Koenderink J, Van Doorn J (1991) Affine struc-ture from motion. J Optical Soc Am 8: 377–385

Manzotti R, Gasteratos A, Metta G, Sandini G(2001) Disparity estimation in log polar imagesand vergence control. Comput Vis Image Und83: 97–117

Metta G (2000) Babybot: a study on sensori-motordevelopment. Unpublished Ph.D. Thesis,University of Genova, Genova

Metta G, Carlevarino A, Martinotti R, Sandini G(2000) An incremental growing neural networkand its application to robot control. Paper pre-sented at the International Joint Conference onNeural Networks, Como, Italy

Metta G, Sandini G, Konczak J (1999) A devel-opmental approach to visually-guided reachingin artificial systems. Neural Networks 12:1413–1427

Natale L, Metta G, Sandini G (2002) Develop-ment of auditory-evoked reflexes: visuo-acous-tic cues integration in a binocular head. RobotAuton Syst 39: 87–106

Panerai F, Metta G, Sandini G (2000) Visuo-inertial stabilization in space-variant binocularsystems. Robot Auton Syst 30: 195–214

Panerai F, Metta G, Sandini G (2002) Learningstabilization reflexes in robots with movingeyes. Neurocomputing, In press

Retina-Like Sensors 261

Rojer A, Schwartz EL (1990) Design consider-ations for a space-variant visual sensor withcomplex-logarithmic geometry. Paper pre-sented at the 10th International Conferenceon Pattern Recognition, Atlantic City, USA

Sandini G (1997) Artificial systems and neu-roscience. Paper presented at the Otto andMartha Fischbeck Seminar on Active Vision,Berlin, Germany

Sandini G, Tagliasco V (1980) An anthropo-morphic retina-like structure for scene analysis.Comp Vision Graph 14: 365–372

Schwartz EL (1980) A quantitative model ofthe functional architecture of human striatecortex with application to visual illusionand cortical texture analysis. Biol Cybern 37:63–76

Srinivasan MV, Venkatesh S (1997) From LivingEyes to Seeing Machines. London: Oxford Uni-versity Press

Tunley H, Young D (1994) First order optical flowfrom log-polar sampled images. Paper pre-sented at the Third European Conference onComputer Vision, Stockolm

Van der Spiegel J, Kreider G, Claeys C,Debusschere I, Sandini G, Dario P, Fantini F,Bellutti P, Soncini G (1989) A foveated retina-like Sensor using CCD technology. In: MeadC, Ismail M (eds), Analog VLSI Implementa-tion of Neural Systems (pp. 189–212). Boston:Kluwer Academic Publishers

Wallace RS, Ong PW, Bederson BB, Schwartz EL(1994) Space variant image processing. Int JComput Vision 13: 71–91

Weiman CFR (1988) 3-D Sensing with polar ex-ponential sensor arrays. Paper presented at theSPIE – Digital and Optical Shape Representa-tion and Pattern Recognition

Weiman CFR, Chaikin G (1979) Logarithmicspiral grids for image processing and display.Computer Graphic and Image Processing 11:197–226

Weiman CFR, Juday RD (1990) Tracking algo-rithms using log-polar mapped image coordi-nates. Paper presented at the SPIEInternational Conference on Intelligent Robotsand Computer Vision VIII: Algorithms andTechniques, Philadelphia (PA)

262 Giulio Sandini and Giorgio Metta