Prototype of Video Endoscopic Capsule With 3-D Imaging Capabilities

11
IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 4, AUGUST 2010 239 Prototype of Video Endoscopic Capsule With 3-D Imaging Capabilities Anthony Kolar, Olivier Romain, Jad Ayoub, Sylvain Viateur, and Bertrand Granado Abstract—Wireless video capsules can now carry out gastroen- terological examinations. The images make it possible to analyze some diseases during postexamination, but the gastroenterologist could make a direct diagnosis if the video capsule integrated vi- sion algorithms. The first step toward in situ diagnosis is the im- plementation of 3-D imaging techniques in the video capsule. By transmitting only the diagnosis instead of the images, the video capsule autonomy is increased. This paper focuses on the Cyclope project, an embedded active vision system that is able to provide 3-D and texture data in real time. The challenge is to realize this integrated sensor with constraints on size, consumption, and pro- cessing, which are inherent limitations of the video capsule. We present the hardware and software development of a wireless mul- tispectral vision sensor which enables the transmission of the 3-D reconstruction of a scene in real time. An FPGA-based prototype has been designed to show the proof of concept. Experiments in the laboratory, in vitro, and in vivo on a pig have been performed to determine the performance of the 3-D vision system. A roadmap towardthe integrated system is set out. Index Terms— 3-D reconstruction, 3-D sensor, integrated sensor, medical applications, wireless sensor. I. INTRODUCTION E XAMINATION of the all gastrointestinal tract represents a challenge for endoscopists due to its length and inacces- sibility using natural orifices. Since 1994, video capsules (VCE) [1], [2] have been developed to allow direct examination of this inaccessible part of the gastrointestinal tract and help doctors to find the cause of symptoms, such as stomach pain, Crohn’s dis- ease, diarrhea, weight loss, rectal bleeding, anemia, etc. The most popular video capsule is the Pillcam of Given Imaging. This autonomous embedded system enables acquiring about 50 000 images of gastrointestinal tract during 8 h of analysis. The offline image processing and its interpretation by the patrician permit determining the origin of the disease. However, recent benchmarks that have been published show some limitations with this video capsule regarding the quality of images and inaccuracy of the size of the polyps. This is a real problem for the practitioner because, in some cases, Manuscript received July 19, 2009; revised February 17, 2010. Date of current version July 28, 2010. This paper was recommended by Associate Editor Z. Wang. A. Kolar, O. Romain, J. Ayoub, and S. Viateur are with Universite Pierre et Marie CURIE-Paris VI Equipe SYEL, Paris 75005, France (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). B. Granado is with the ETIS CNRS-ENSEA-Univ Cergy Pontoise, Cergy F95000, France (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TBCAS.2010.2049265 an operation is necessary only if polyps exceed a minimum size, or at this moment, the estimation mainly depends on the practitioner’s experience. One of the solutions could be to use 3-D imagery techniques, either directly in the video capsule or on a remote computer. This latter solution is actually used in the Pillcam capsule by using the 2–4 images taken per second and stored by wireless communication in a recorder that is worn around the waist. 3-D processing is performed offline from the estimation of the displacement of the capsule. However, the speed of the videocapsule is not constant; for example, in the oesophagus, it is 8 cm/s; in the stomach ,it is almost null; and in the intestine, it is 0.08 cm/s. Consequently, by taking images at constant frequencies, some zones of the transit will not be rebuilt. Moreover, the regular transmission of the images through the body consumes much energy and reduces the au- tonomy of video capsules at 8 h. The ideal would be to reduce the quantity of information to be transmitted like a diagnosis. The first development necessary to the delivery of a diagnosis lies with the use of the algorithm of pattern recognition on 3-D information. Introducing accurate 3-D reconstruction techniques inside a videocapsule means that we are able to realize a very small embedded system that realizes the 3-D processing in a context with hard constraints, such as size and low power consumption. To obtain this kind of system, the first step is to find the best 3-D technique where we could extract an algorithm that processes in real time, 40 ms by image, the accurate 3-D reconstruction. The most common 3-D reconstruction techniques are those based on passive or active stereoscopic vision methods, where image sensors are used to provide the necessary information to retrieve the depth. Passive method consists of taking two images of a scene at two different points of view. Unfortunately, using this method, only characteristic points, with high gradient or high texture can be detected [3]. The active stereovision methods offer an alternative approach for the use of two cameras where processing time is critical. They consist in replacing one of the two cameras by a projection system which projects a set of structured rays. In this case, only one imager is necessary. Many implementations of active stereovision methods have been realized in the past [4], [5] and provided significant results by using a traditional computer for the application of these methods. We will focus in this paper on the description of Cyclope, an integrated 3-D active vision sensor [3] that could be used to re- alize a video capsule with 3-D vision capacity. We present Cyc- lope from the architecture to the reconstruction results obtained by the demonstrator. Section II describes briefly Cyclope and Section III deals with the principles of the active stereovision 1932-4545/$26.00 © 2010 IEEE

Transcript of Prototype of Video Endoscopic Capsule With 3-D Imaging Capabilities

IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 4, AUGUST 2010 239

Prototype of Video Endoscopic CapsuleWith 3-D Imaging Capabilities

Anthony Kolar, Olivier Romain, Jad Ayoub, Sylvain Viateur, and Bertrand Granado

Abstract—Wireless video capsules can now carry out gastroen-terological examinations. The images make it possible to analyzesome diseases during postexamination, but the gastroenterologistcould make a direct diagnosis if the video capsule integrated vi-sion algorithms. The first step toward in situ diagnosis is the im-plementation of 3-D imaging techniques in the video capsule. Bytransmitting only the diagnosis instead of the images, the videocapsule autonomy is increased. This paper focuses on the Cyclopeproject, an embedded active vision system that is able to provide3-D and texture data in real time. The challenge is to realize thisintegrated sensor with constraints on size, consumption, and pro-cessing, which are inherent limitations of the video capsule. Wepresent the hardware and software development of a wireless mul-tispectral vision sensor which enables the transmission of the 3-Dreconstruction of a scene in real time. An FPGA-based prototypehas been designed to show the proof of concept. Experiments in thelaboratory, in vitro, and in vivo on a pig have been performed todetermine the performance of the 3-D vision system. A roadmaptowardthe integrated system is set out.

Index Terms— 3-D reconstruction, 3-D sensor, integrated sensor,medical applications, wireless sensor.

I. INTRODUCTION

E XAMINATION of the all gastrointestinal tract representsa challenge for endoscopists due to its length and inacces-

sibility using natural orifices. Since 1994, video capsules (VCE)[1], [2] have been developed to allow direct examination of thisinaccessible part of the gastrointestinal tract and help doctors tofind the cause of symptoms, such as stomach pain, Crohn’s dis-ease, diarrhea, weight loss, rectal bleeding, anemia, etc.

The most popular video capsule is the Pillcam of GivenImaging. This autonomous embedded system enables acquiringabout 50 000 images of gastrointestinal tract during 8 h ofanalysis. The offline image processing and its interpretationby the patrician permit determining the origin of the disease.However, recent benchmarks that have been published showsome limitations with this video capsule regarding the qualityof images and inaccuracy of the size of the polyps. This isa real problem for the practitioner because, in some cases,

Manuscript received July 19, 2009; revised February 17, 2010. Date of currentversion July 28, 2010. This paper was recommended by Associate Editor Z.Wang.

A. Kolar, O. Romain, J. Ayoub, and S. Viateur are with Universite Pierreet Marie CURIE-Paris VI Equipe SYEL, Paris 75005, France (e-mail:[email protected]; [email protected]; [email protected]; [email protected]).

B. Granado is with the ETIS CNRS-ENSEA-Univ Cergy Pontoise, CergyF95000, France (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TBCAS.2010.2049265

an operation is necessary only if polyps exceed a minimumsize, or at this moment, the estimation mainly depends on thepractitioner’s experience. One of the solutions could be to use3-D imagery techniques, either directly in the video capsule oron a remote computer. This latter solution is actually used inthe Pillcam capsule by using the 2–4 images taken per secondand stored by wireless communication in a recorder that is wornaround the waist. 3-D processing is performed offline fromthe estimation of the displacement of the capsule. However,the speed of the videocapsule is not constant; for example, inthe oesophagus, it is 8 cm/s; in the stomach ,it is almost null;and in the intestine, it is 0.08 cm/s. Consequently, by takingimages at constant frequencies, some zones of the transit willnot be rebuilt. Moreover, the regular transmission of the imagesthrough the body consumes much energy and reduces the au-tonomy of video capsules at 8 h. The ideal would be to reducethe quantity of information to be transmitted like a diagnosis.The first development necessary to the delivery of a diagnosislies with the use of the algorithm of pattern recognition on 3-Dinformation.

Introducing accurate 3-D reconstruction techniques insidea videocapsule means that we are able to realize a very smallembedded system that realizes the 3-D processing in a contextwith hard constraints, such as size and low power consumption.To obtain this kind of system, the first step is to find the best 3-Dtechnique where we could extract an algorithm that processes inreal time, 40 ms by image, the accurate 3-D reconstruction. Themost common 3-D reconstruction techniques are those basedon passive or active stereoscopic vision methods, where imagesensors are used to provide the necessary information to retrievethe depth. Passive method consists of taking two images of ascene at two different points of view. Unfortunately, using thismethod, only characteristic points, with high gradient or hightexture can be detected [3]. The active stereovision methodsoffer an alternative approach for the use of two cameras whereprocessing time is critical. They consist in replacing one ofthe two cameras by a projection system which projects a setof structured rays. In this case, only one imager is necessary.Many implementations of active stereovision methods havebeen realized in the past [4], [5] and provided significantresults by using a traditional computer for the application ofthese methods.

We will focus in this paper on the description of Cyclope, anintegrated 3-D active vision sensor [3] that could be used to re-alize a video capsule with 3-D vision capacity. We present Cyc-lope from the architecture to the reconstruction results obtainedby the demonstrator. Section II describes briefly Cyclope andSection III deals with the principles of the active stereovision

1932-4545/$26.00 © 2010 IEEE

240 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 4, AUGUST 2010

Fig. 1. Cyclope diagram.

system and 3-D reconstruction method, explaining the problemstatement. In Section IV, we discuss the multispectral time ac-quisition. In Section V, we present the implementation of theoptical correction developed to correct the lens distortion. Sec-tion VI deals with the implementation of new thresholding andlabeling methods. In Sections VII and VIII, we present the pro-cessing of matching in order to give a 3-D representation of thescene. Finally, in Section IX, before the conclusion and perspec-tives of this paper, we present the lab prototype and its perfor-mances which attested the feasibility of this original approach.

II. CYCLOPE

A. Overview of the Architecture

Cyclope is an integrated wireless 3-D vision system based onactive stereo-vision technique. It uses many different algorithmsto increase accuracy and reduce processing time. For this pur-pose, the sensor is composed of three blocks (see Fig. 1).

1) instrumentation block: composed of a complementarymetal–oxide semiconductor (CMOS) camera and a struc-tured light projector on an IR band;

2) processing block: integrates a microprocessor core and areconfigurable array; the microprocessor is used for se-quential processing; the reconfigurable array is used to im-plement parallels algorithms.

3) RF block: dedicated for the OTA (over-the-air) communi-cations.

The feasibility of Cyclope was studied by an implementa-tion on a system on programmable chip (SOPC) target. Thesethree parts will be realized in different technologies: 1) CMOSfor the image sensor and the processing units; 2) gallium ar-senide (GaAs) for the pattern projector; and 3) RF-CMOS forthe communication unit. The development of this integratedsystem in package (SIP) is actually the best solution to overcomethe technological constraints and realize a chip-scale package.This solution is used in several embedded sensors, such as The“Human++” platform [6] or Smart Dust [7].

B. Multispectral Acquisition

The spectral response of the silicon cuts near 1100 nm andcovers ultraviolet (UV) to near infrared domains. This importantcharacteristic allows defining multispectral acquisition by grab-bing the color texture image on the visible band and near theinfrared band, the depth information. Cyclope used this originalacquisition method, which permits directly accessing the depth

Fig. 2. Active stereo-vision system.

information’s independently of processing the texture image.This characteristic allows the reducing time of processing in the3-D reconstruction process by minimizing the calculation forretrieving the interest point.

The projected pattern is obtained trough an IR laser diode anda diffraction optical head. The projector illuminates the studiedscene with an array of laser beams. Each ray is separated fromits neighbors by a fix and equal angle. So on the IR band, wecan acquire the pattern projected on the scene.

The combination of the acquisition of the projected pattern onthe infrared band, the acquisition of the texture on visible band,and the mathematical model of the active 3-D sensor make itpossible to restore the 3-D textured representation of the scene.

C. Principle of the 3-D Reconstruction

The basic principle of 3-D reconstruction is the triangulation.Knowing the distance between the camera and the laser pro-jector, and defining the line of views, one passing by the centerof the camera and the other by the object, we can find the objectdistance.

The active 3-D reconstruction is a method aiming to increasethe accuracy of the 3-D reconstruction by the projection on thescene of a structured pattern. The matching is largely simplifiedbecause the points of interest in the image needed for recon-struction are obtained by the extraction of the pattern. It alsohas the effect of increasing the speed of processing.

The setup of the active stereo-vision system is represented inFig. 2. The distance between the camera and the laser projectoris fixed. The projection of the laser beams on a plane gives anIR spots matrix.

The 3-D reconstruction is achieved through triangulation be-tween laser and camera. Each point of the projected pattern onthe scene represents the intersection of two lines:

• the line of sight, passing through the pattern point on thescene and its projection in the image plane;

• the laser ray, starting from the projection center and passingthrough the chosen pattern point.

For Cyclope, the pattern is a regular mesh of points. Foreach point of the pattern, we can find the correspondingepipolar line

(1)

KOLAR et al.: PROTOTYPE OF VIDEO ENDOSCOPIC CAPSULE 241

Fig. 3. Acquisition and 3-D reconstruction flowchart.

where are the image coordinates, and the parametersare estimated through an offline calibration process.

In addition to the epipolar lines, we can establish the relationbetween the position of a laser spot in the image and its distanceto the stereoscopic system.

So we can express, for each pattern point , the depth asa hyperbolic function

(2)

where the and parameters are also estimated during theoffline calibration of the system [8].

III. 3-D PROCESSING AND PROBLEM STATEMENT

A. 3-D Processing Flowchart

Image processing is composed of several stages from the ac-quisition up to matching algorithm to retrieve 3-D information,as shown in Fig. 3. The multispectral acquisition is realized byusing a high-energy method. This latter acquisition allows cap-turing by a CMOS camera the scene illuminated by the struc-tured light source and then stored into memory. The capturedimage submits a preprocessing phase to extract the useful dataconcerning the light spots.

The first stage of the processing phase is to make a correc-tion of coordinates pixels due to the optical lens distortion. Lensdistortion prevents the accurate perception of range [9], just be-cause the true coordinates of laser spots are deviated due to lensdistortion. That makes the measurement and distance judgmentdifficult. So the distortion correction process will be necessaryto accurately reconstruct 3-D coordinates of an object. The nextstage is to apply a thresholding process in order to classify thebright areas and the dark areas. In this image, the laser spots be-long to the bright areas. The threshold process often producesan image that is less than perfect; common problems are noiseproduced by incoherent lighting. It is often desirable to processa binary image before analysis to remove these abnormalities.An adaptive algorithm has been developed and implemented inhardware to make this gray-level thresholding process.

The classification of laser spots is then accomplished by ap-plying morphological operations (erosion) on the binary picture,segmentation, and labeling algorithms on bright areas.

After that, the spots centers have to be matched to corre-sponding epipolar lines. The distance between each spot and thestereoscopic system is computed from the depth model. Theselatter and epipolar lines are obtained offline through a calibra-tion process [4]. At the end, the 3-D coordinates of laser spotsare sent by wireless communication to a distant PC to visualizethe textured color reconstruction of the scene.

B. Problem Statement

General problems of the 3-D reconstruction in embedded sys-tems with limited resources are the accurate determination ofinterest points with minimum errors and excessive time pro-cessing. To solve these problems, it is necessary to develop par-allel algorithms and to reduce the errors in the calculation fromthe acquisition to the determination of 3-D coordinates.

The 3-D reconstruction of an object depends on the centerscoordinates of light spots in the image plane. Thus, anyinaccurate representation of these points will highly affect theaccuracy of our results in the 3-D estimation stage. Indeed, manyproblems were encountered when performing this procedure; inthis paper, we will focus on four important aspects as follows.

1) multispectral acquisition of the spot laser must be per-formed without additional optical devices as optical filterson the lens, which can affect the coordinates image;

2) lens distortion prevents the accurate perception of range[9], just because the true coordinates of laser spots are de-viated due to lens distortion;

3) how to compute the image, the labeling, and the coordi-nates of spot centers without consuming a considerableamount of resources, and considering high accuracy of ourembedded systems?

4) In vivo proof of concept of a 3-D video capsule.

IV. HIGH-ENERGY APPROACH FOR

MULTISPECTRAL ACQUISITION

One of the originalities of Cyclope lies in the acquisition ofimages which simplify the 3-D reconstruction process by usingonly the information on the IR spectral band. For this purpose,the imager realizes multispectral acquisition based on the spec-tral separation. Generally, filters are used to cut the spectral re-sponse. Here, we used a high-energy method, which has the ad-vantage of being generic for imagers.

To allow real-time acquisition of pattern and texture, we havedeveloped a pixels CMOS imager in 0.8 m for a total surfaceof 20 mm . This sensor has programmable light integration andshutter time to allow dynamic change. It was design to have alarge response in the visible and near IR. We have characterizedthis sensor to validate our high-energy approach. But its size,64 64 pixels imager, is too small to be used in endoscopy. Forthis reason, we have also used a commercial CIF imager (352288) to validate our work on bigger images.

The projector pulses periodically on the scene a high-energyIR pattern. An image acquisition with short integration time en-ables grabbing the image of the pattern without the backgroundtexture. A second image acquisition with a longer integrationtime enables grabbing the texture when the projector is off.Fig. 4 shows the sequential scheduling of the images acquisi-tion.

Typically, the time of the short integration is mand the long one is 10 ms.

V. DISTORTION CORRECTION

A. Mathematical Background

In vision, a projective model of the camera is commonly usedto establish the relationship [9] between world coordinates and

242 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 4, AUGUST 2010

Fig. 4. Acquisition sequence.

imaging coordinates. In real systems, for examples endoscopesand video capsules, where high accuracies on imaging are re-quired, the projective model is not appropriate due to opticalaberrations [10]. In this case, a more comprehensive cameramodel must be used, taking into account the corrections forthe systematically distorted image coordinates. As a result, sev-eral types of imperfections in the design and assembly of lensescomposing the camera optical system [11], the real projectionof the point P in the image plane expressed in (3), will be re-placed by expressions that take into account the error betweenthe real image observed coordinates and the corresponding ideal(nonobservable) image coordinates

(3)

where are the ideal nonobservable, distortion-free imagecoordinates; are the corresponding real coordinates; and

and are, respectively, the distortion along the and axes.Usually, the lens distortion consists of radial symmetric distor-tion, decentering distortion, affinity distortion, and nonorthogo-nality deformations.

In reference to [12]–[15], the total amount of distorsion alongthe and axis can be modeled by polynomial expressions.

Assuming that only the first- and second-order terms areenough to compensate for the distortion, and the terms of orderhigher than three are negligible, we obtain a camera modelto become fifth-order polynomials (4), where are thedistorted image coordinates in pixels, and are truecoordinates (undistorted)

(4)

B. Hardware Implementation

Once the values of parameters of (16) are obtained offlinethrough a calibration process, we used them to correct the

distortion of each frame. With the input frame captured bythe camera denoted as the source image and the correctedoutput as the target image, the task of correcting the sourcedistorted image can be defined as follows: for every pixellocation in the target image, compute its corresponding pixellocation in the source image. Distortion correction have beenimplemented as a lookup table (LUT). We calculate the imagecoordinates through evaluating the polynomials correction inadvance, storing them in a lookup table which is referencedat runtime. All of the parameters needed for LUT generationare known beforehand; therefore, for our system, the LUTis computed only once and offline.

However, since the source pixel location can be a real number,using it to compute the actual pixel values of the target imagerequires some form of pixel interpolation. For this purpose,we have used the nearest neighbor interpolation approach thatmeans that the pixel value closest to the predicted coordinateis assigned to the target coordinate. This choice is reasonablebecause it is a simple and fast method for computation, andvisible image artifacts have no subject with our system.

The proposed architectures described before have been im-plemented in VHDL. The pixel values of the input distortedand the output corrected images use an 8-b word-length integernumber. The coordinates use an 18-b word length.

VI. THRESHOLDING AND LABELING

The projected laser spots must be extracted of the image fordelivering a 3-D representation of the scene. Laser spots corre-spond to the image as a spot with variable sizes. At this level,a preprocessing block has been developed and implemented inhardware to make adaptive thresholding in order to provide a bi-nary image and labeling to classify each laser spot to later com-pute their center.

A. Thresholding Algorithm

The threshold value represents the key parameter in thresh-olding processing. Several methods exist from a static value de-fined by the user up to dynamic algorithm as in the Otsu method[6].

We have chosen to develop a new approach that is less com-plex than Otsu or other well-known dynamic methods in orderto reduce time processing [17]. The simple method consists of(Fig. 5) as follows.

• Build the histogram of the gray-level image.• Find the first maxima of the Gaussian corresponding to the

background and compute its mean and standard deviation.

• Calculate the threshold value with (5)

(5)

where is an arbitrary constant. A parallel architecture of pro-cessing has been designed to compute the threshold and give abinary image. The implementation of this parallel architectureis resumed in Fig. 6.

KOLAR et al.: PROTOTYPE OF VIDEO ENDOSCOPIC CAPSULE 243

Fig. 5. Method developed in Cyclope.

Fig. 6. Architecture of the IP of thresholding.

B. Labeling

After this first stage of extraction of spot laser from the back-ground, it is necessary to classify each laser spot in order toseparately compute its center. Several methods have been de-veloped in the past. We chose to use a classical two-passes com-ponent-connected labeling algorithms [18] with 8-connectivity.We designed specific optimized intellectual property in VHDL.

VII. COMPUTATION OF SPOTS CENTERS

The threshold and labeling processes applied to the capturedimage allow us to determine the area of each spot (number ofpixels). So the coordinates of the center of these points could becalculated as follows:

(6)

(7)

where

, abscissa and ordinate of th spot center;

and coordinates of pixels constructing the spot;

number of pixels of the th spot (area inpixels).

The goal of our work in this section is to compute the spotscenters, taking into account precision demand of our implemen-

TABLE IFILTER COEFFICIENTS INDEXED BY SPOT SIZE

tation. The hardest step in the center detection part is the divi-sion operations A/B in (6) and (7). Several methods can be es-tablished to solve this problem [19].

Here, we describe our implemented method. The area of eachspot (number of pixels) is always a positive integer, while itsvalue is limited in a predeterminate interval [Nmin, Nmax],where Nmin and Nmax are, respectively, the minimum and max-imum areas of laser spot in the image. The spot areas depend onobject illumination, distance between object and camera, andthe angle of view of the scene. Our method consists of realizinga finite-impulse-response (FIR) filter to replace the division bymultiplication to calculate the centers of each spot

(8)

indexed by spot size (Table I).In our case, the filter coefficients are constants and equals.

For the spot I, with an area equal to NI, the filter coefficientsare . In other words, it is suffi-cient to perform a simple inner product between inputs that arethe pixels coordinates of each spot, and a short predeterminatesequence formed of constant filter coefficients registered in alookup table (see Table 2) with an inverse proportionality rela-tionship with the area of each spot, and the contents of LUT areindexed by spot size.

For the luminous spots source, the number of operationsneeded to compute the centers coordinates isat 25 fps, and Np is the average area of spots.

VIII. MATCHING ALGORITHM

The set of parameters for the epipolar and depth models isused on run time to make the point matching (identify the orig-inal position of a pattern point from its image) and calculate thedepth by using the appropriate parameters.

The architecture is divided into two principal parts.1) the preprocessing unit for low-level image processing and

feature extraction from the IR image;2) 3-D unit for point matching and depth calculation.Fig. 7 shows the global processing architecture. A dual-port

memory is used for image storage, allowing asynchronousimage acquisitions. The processing implemented in the pre-processing unit is thresholding, segmentation, and spot centercalculation.

A fifo enables the communication between the two units, anda final storage fifo allows communication toward an externalUART.

A. 3-D Unit

In this part, we present the 3-D extraction method for onlineprocessing. To achieve this purpose, we have designed a paralleldigital processing unit (Fig. 8).

244 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 4, AUGUST 2010

Fig. 7. Global-processing architecture.

Fig. 8. 3-D unit.

Fig. 9. Estimation bloc.

After a preprocessing step, where the laser spot center is esti-mated from the IR image, the coordinate of each point is directedto the processing unit where the coordinate is calculated.

Starting from the point abscissa , we calculate its esti-mated ordinate if it belongs to an epipolar line. We comparethis estimation with the true ordinate .

Fig. 10. Comparison bloc.

These operations are made for all of the epipolar lines simul-taneously. After thresholding, the encoder returns the index ofthe corresponding epipolar line.

The next step is to calculate the coordinate from the co-ordinate and the appropriate depth model parameters .

These computation blocs are synchronous and pipelined, thusallowing high processing rates.

B. Estimation Bloc

In this bloc, the estimated ordinate is calculated .The parameters are loaded from memory.

C. Comparison Bloc

In this bloc, the absolute value of the difference between theordinate and its estimation is calculated. This difference isthen thresholded.

Thresholding avoids a resource-consuming sort stage. Thethreshold was a priori chosen as half the minimum distance be-tween two consecutive epipolar lines. The threshold can be ad-justed for each comparison bloc.

This bloc returns a “1” result if the distance is underneath thethreshold.

D. Encoding Bloc

If the comparison blocs return a unique “1” result, then theencoder returns the corresponding epipolar line index.

If no comparison bloc returns a “true” result, the point is ir-relevant and considered as picture noise.

If more then one comparison blocs returns “1,” then we con-sider that we have a correspondence error and a flag is set.

The selected index is then carried to the next stage where thecoordinate is calculated. It allows the selection of the good

parameters to the depth model.We have chosen to calculate rather than to have a sim-

pler computation unit. This computation bloc is then identical tothe estimation bloc.

E. Wireless Communication

Finally, after computation, the 3-D coordinates of the laserdots accompanied by the image of texture are sent to an externalreader. So Cyclope is equipped with a block of wireless com-munication which allows us to transmit the image of texture and

KOLAR et al.: PROTOTYPE OF VIDEO ENDOSCOPIC CAPSULE 245

Fig. 11. Setup of the optical system.

the 3-D coordinates of the centers of the spots laser. While at-tending the IEEE802.15 Body Area Network standard, the fre-quency assigned for the implanted device RF communication isaround 403 MHz and referred to as the medical implant commu-nication system (MICS) band due to essentially three reasons:

1) a small antenna;2) a minimum losses environment which allows designing a

low-power transmitter;3) a free band without causing interference to other users of

the electromagnetic radio spectrum [20].

IX. EXPERIMENTAL DEMONSTRATOR

To demonstrate the feasibility of our system, a large-scaledemonstrator has been realized. It uses a field-programmablegate array (FPGA) prototyping board based on a XilinxVirtex2Pro, a pulsed IR LASER projector, and an originalCMOS imager.

Fig. 11 shows the sizes of the mechanical parts of the op-tical system. The 15 cm is fixed by construction between thecamera and laser. All of the results presented depend on thisconfiguration.

In order to rapidly make a wireless communication of our pro-totype, we chose to use the Zigbee module at 2.45 GHz availableon the market contrary to modules MICS. We are self-assuredthat the later frequency is not usable for the communication be-tween the implant and an external reader, due to the electromag-netic losses of the human body. Two Xbee-pro modules fromDigi Corp. have been used. One is for the demonstrator and thesecond is plugged on a PC host where a human–machine in-terface has been designed to visualize in real time the 3-D tex-tured reconstruction of the scene. Communication between thewireless module and the FPGA circuit is performed by a stan-dard Universal Asynchronous Receiver Transmitter protocol. Tomake this communication, we integrated a Microblaze softcoreprocessor with UART functionality. The Softcore recovers allof the data stored in memory (texture and 3-D coordinates) andsends them to the wireless module.

Fig. 13 represents the experimental set. It is composed of astandard 3-mm lens, the developed CMOS camera with an ex-ternal 8-b digital-to-analog converter (DAC), a projector IR pat-tern, and a Virtex2pro prototyping board.

Fig. 12. Implementation architecture.

Fig. 13. Demonstrator.

Fig. 14. Wireless communication.

The FPGA is used to control image acquisition, laser syn-chronization, analog-to-digital conversion, and image storageand displays the result through a video-graphics-array (VGA)interface.

Fig. 12 shows the principal parts of the control and storagearchitecture as set in the FPGA. Five parts have been designed:

1) a global sequencer to control the entire process;2) a reset and integration time configuration unit;3) a VGA synchronization interface;4) a dual-port memory to store the images and to allow asyn-

chronous acquisition and display operations;5) a wireless communication module based on the ZegBee

protocol (Fig. 14).A separated pulsed IR projector has been added to the system

to demonstrate system functionality.

246 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 4, AUGUST 2010

TABLE IIAREA AND CLOCK CHARACTERISTICS OF THE WHOLE IMPLEMENTATION

Fig. 15. (a) Checkerboard image before distortion correction. (b) Checker-board image after correction.

X. EXPERIMENTAL TEST BENCH AND RESULTS

The computation unit was described in VHDL and imple-mented on FPGA. The FPGA device is a XilinX VirtexIIpro(XC2VP30) with 30816 logic cells and 136 hardware multi-pliers [21].

A. Architecture Performance

Performance results are presented in Table II. In this table, wehave implemented the correction model only to the active lightspots, and not to the entire image. The error estimation is lessthan 0.01 pixels.

Regarding latency, the processing can be executed with re-spect to the real-time constraint of video cadence (25 fps) andthen obtain a real-time constraint of 40 ms.

B. Depth Estimation Error of the 3-D Reconstruction Made bythe Demonstrator

In 3-D imaging, the most important parameter is the accuracyof the reconstruction. To evaluate this parameter, we have madesome measures. We have first estimated the benefit of the dis-tortion correction.

Fig. 15 presents image example before and after correctionof lens distortion. Regarding the size and latency, it is clear thatthe results are suitable for our application.

In stereoscopic vision systems, the error is a quadratic func-tion of the distance. We had estimated the error of the 3-D re-construction system made by the demonstrator on a bench. Asshown in Fig. 16, a plane is moved in front of the system at dif-ferent known positions.

Fig. 16. Setup to determine the error estimation of the 3-D reconstruction.

Fig. 17. Error comparison before and after applying distortion correction andcenters re-computing.

At each position of the plan, 3-D reconstruction is made bythe system. We determine the median value of the position ofthe plane that we compared with the known position.

We then compare the error before and after the correction. Wealso estimate this error with a scale factor to be adequate withendoscopic examination (Fig. 17).

Comparing many measure on the depth estimation before andafter the implementation of correction, the results indicate thatthe precision of the system increased, so that the residual erroris reduced about 33%. On a distance of 10 centimeters, we ob-tained on average 1.5% of error.

C. Error Estimation of the 3D Reconstruction Made by theIntegrated Version of the Demonstrator

In a near future, we wanted to integrate our demonstrator tovalidate in vivo this new video capsule. Actually, we can givean error estimation of the 3D reconstruction made by this inte-grated version by making a mathematical extrapolation. If weconsider that the video will have a stereoscopic distance of 5millimeters, an error of 3.54% will be done at the distance of10 centimeters (Table III). This precision is sufficient within the

KOLAR et al.: PROTOTYPE OF VIDEO ENDOSCOPIC CAPSULE 247

TABLE IIIPRECISION WITH A 5 mm STEREOSCOPIC BASE

TABLE IVPROCESSING BLOCK POWER CONSUMPTION ESTIMATION

framework of the browsing of human body where the bodieswhich present right lengths are about 5 cm.

D. Power Estimation of the Integrated Version

To validate the power consumption estimation in an em-bedded context and in the worst case, such as real-timeacquisition and reconstruction, we consider a 3-V CR1220battery that has a maximum of 180-mAh power consump-tion—that means ideal power of 540 mWh with three batteries.This battery is fully compatible with a VCE, such as the Pillcamfrom Given Imaging. As we can see, the integration of staticrandom-access memory (SRAM FPGA technology in a VCE isimpossible because of the SRAM memory that consumes toomuch energy. If we consider the EEPROM technology, such asthe Actel solution, we can observe that its power consumptionis compatible with a VCE. This technology permits four hoursof autonomy with only one battery, and 12 hours of autonomy ifwe used three 3 V-CR1220 batteries in the VCE. If we considera full-custom ASIC solution, the power consumption can bestill reduced. Actel results are, at first, encouraging because theactual mean duration of an examination is 10 h and second,it represents the best tradeoff between design time and cost(Table IV).

E. In-Vitro Experiments

To demonstrate the performance of the proposed technique,we have applied it to a colonoscopy for the detection of polypsin the colon wall. There are two different types of colon polyps,namely, hyperplasias and adenomas. Hyperplasias are benignpolyps and do not have chance to evolve into cancer and, there-fore, do not need to be removed. By contrast, adenomas have astrong tendency to become malignant. Therefore, they have tobe removed immediately via polypectomy. We have created adataset consisted of 111 polyp models (40 adenomas and 81 hy-perplasias, see Fig. 18). These polyps are built in silicone witha scale factor of 2. Polyps are fixed on the internal wall of theintestine. The intestine is simulated by a tube in silicone with ascale factor of 2 compared to the human size.

Fig. 18. Polyp models. (a) Adonema. (b) Hyperplasias.

Fig. 19. Example of 3-D reconstruction.

The 3-D reconstructions of the aforementioned two polypshave been made (Fig. 19). This reconstruction allows us todemonstrate the feasibility of this system in-vitro experiments.

F. In-Vivo Experiments

We made an in-vivo experiment of our system on a pig inorder to check that parameters, such as blood and human tissues,do not modify the reflexion of the laser spot; what would prevent3-D reconstruction. Since the sizes of our demonstrator are notcompatible with the gastrointestinal tract, we have tested the3-D imaging system only on external tissues of a pig. After alaparoscopy of the pig (Fig. 20), the IR-structured light has beenprojected on the intestine, the stomach, the kidney, and the liver.No speckle effects and deep absorption have been noted. So thereconstructed 3-D gastrointestinal tract is realizable with a videocapsule, which integrates an IR-structured light projector andour processing architecture.

XI. CONCLUSION

In this paper, we have presented a sensor design as a 3-Dvideocapsule called Cyclope.

We explained a method to acquire the images at a 25-frames/svideo rate with discrimination between the texture and the pro-jected pattern. This method uses a high-energy approach, apulsed projector, and a standard CMOS image sensor with

248 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 4, AUGUST 2010

Fig. 20. Laparoscopy of the pig.

programmable integration time. Multiple images are taken withdifferent integration times to obtain an image of the patternwhich has more energy than the background texture.

Also, we presented 3-D reconstruction processing that al-lows precise and real-time reconstruction. This processing is de-signed in order to integrate it easily.

The method was tested on a large-scale demonstrator usingan FPGA prototyping board with a CMOS sensor. The resultsshow that it is possible to integrate a stereoscopic base which isdesigned for an integrated sensor and to keep good precision forhuman body exploration. We have tested our system on a real-istic silicon model, and demonstrate that it is possible to havean accuracy estimation of the size of a polyp. We have also con-ducted an experiment on a pig to measure the compatibility ofthis kind of environment and our sensor, and that 3-D recon-struction inside the body is possible.

The next step for this work is to realize a video capsule with3-D vision capacity, with chip-level integration of the imagesensor and the pattern projector.

Moreover, the real-time acquisition and reconstruction enablethe implementation of very high-level processes, such as patternrecognition, in order to provide help to the practitioner.

REFERENCES

[1] G. Iddan, G. Meron, A. Glukhovsky, and P. Swain, “Wireless cap-suleendoscopy,” Nature, vol. 17, p. 405, 2000.

[2] J. Rey, K. Kuznetsov, and E. Vazquez-Ballesteros, “Olympus capsuleendoscope for small and large bowel exploration,” Gastrointest. En-doscopy, vol. 63, p. AB176.

[3] T. Graba, B. Granado, O. Romain, T. Ea, A. Pinna, and P. Garda, “Cy-clope: An integrated real-time 3d image sensor,” presented at the XIXConf. Design of Circuits and Integrated Systems, Bordeaux, France,2004.

[4] F. Marzani, Y. Voisin, L. L. Y. Voon, and A. Diou, “Active sterovisionsystem: A fast and easy calibration method,” presented at the 6th Int.Conf. Control Automation, Robotics and Vision, Singapore, Dec. 2000.

[5] W. Li, F. Boochs, F. Marzani, and Y. Voisin, “Iterative 3d surface re-construction with adaptive pattern projection,” in Proc. 6th IASTEDInt. Conf. Visulatization, Imaging and Image Processing, Aug. 2006,pp. 336–341.

[6] B. Gyselinckx, C. V. Hoof, J. Ryckaert, R. Yazicioglu, P. Fiorini, andV. Leonov, “Human++: Autonomous wireless sensors for body areanetworks,” in Proc. IEEE Custom Integrated Circuits Conf., 2005, pp.13–19.

[7] B. Warneke, M. Last, B. Liebowitz, and K. Pister, “Smart dust: Com-municating with a cubic milimeter computer,” Computer, vol. 34, pp.44–51, Jan. 2001.

[8] S. Woo, A. Dipanda, F. Marzani, and Y. Voisin, “Determination of anoptimal configuration for a direct correspondence in an active stereo-vision system,” presented at the IASTED, 2002.

[9] J. P. Helferty, C. Zhang, G. Mclennan, and W. E. Higgins, “Videoen-doscopic distortion correction and its application to virtual guidance ofendoscopy,” IEEE Trans. Med. Imaging, vol. 20, no. 7, pp. 605–617,Jul. 2001.

[10] Heikkilae, “Accurate camera calibration and feature based 3-D recon-struction from monocular image sequences,” Ph.D. dissertation, Univ.Oulu, Oulu, Finland, 1997.

[11] O.-Y. Mang, S.-W. Huang, Y.-L. Chen, H.-H. Lee, and P.-K. Weng,“Design of wide-angle lenses for wireless capsule endoscopes,” Opt.Eng., vol. 46, 2007.

[12] D. G. Bailey, “A new approach to lens distortion correction,” Proc.Image Vis. Comput., 2003.

[13] A. Basu and S. Licardie, “Alternative models for fisheye lenses,” Pat-tern Recogn. Lett., vol. 16, pp. 433–441, 1995.

[14] S. S. Beauchemin and R. Bajcsy, “Modelling and removing radial andtangential distortions in spherical lenses,” Theor.Found. Comput. Vis.,pp. 1–21, 2000.

[15] H. Farid and A. C. Popescu, “Blind removal of lens distortion,” J. Opt.Soc. Amer., vol. 18, pp. 2072–2078, 2001.

[16] N. Otsu, “A threshold selection method from gray level histogram,”IEEE Trans. Syst. Man Cybern., vol. SMC-9, no. 1, pp. 62–66, Jan.1979.

[17] J. N. Kapur, P. Sahoo, and A. K. C. Wong, “A new method for gray-level picture thresholding using the entropy of the histogram,” Comput.Vis. Graphics, Image Process., vol. 273–285, 1985.

[18] L. G. Shapiro and G. Stockman, Comput.Vis.. Upper Saddle River,NJ: Prentice Hall, 2002.

[19] A. Kolar, O. Romain, and J. Ayoub et al., “A system for an accurate 3Dreconstruction in video endoscopy capsule,” EURASIP J. EmbeddedSyst., vol. , 2009.

[20] Yuce and Mehmet et al., “Wireless body sensor network using medicalimplant band,” J. Med. Syst., vol. 467–474, 2007.

[21] Xilinx, “Virtex-II Pro and Virtex-II pro platform FPGA: complete datasheet,” Virtex-II pro and Virtex-II pro platform FPGA: complete datasheet,” Oct. 2005.

Anthony Kolar received the M.Sc. degree in concep-tion and architecture of digital circuits and the Ph.D.degree in electronics from Pierre et Marie Curie Uni-versity-Paris VI, Paris, France, in 2009.

Currently is a Postdoctoral Researcher at CEA. Heis working on artificial retina, biomedical instrumen-tation, and reconfigurable architectures.

Olivier Romain received the Ph.D degree in elec-tronics from Pierreand Marie Curie University, Paris,France, in 2001.

Currently, he is Assistant Professor of Electronicsat Polytech’Paris UPMC Engineer’s school. Heteaches digital and radio-frequency systems. Hismain research interests include 3-D division, digitalcircuits design, software radio, and biotechnologies.

Jad Ayoub received the B.S. and M.S degrees inelectrical and electronics engineering from LebaneseUniversity in 2000 and 2001, respectively, and iscurrently pursuing the Ph.D. degree from Pierre etMarie Curie University, Paris, France.

He was an Assistant Teacher in the Control SystemDepartment of the ITI institute, Lebanon. His currentresearch takes place in the ETIS Lab, involving 3-Dvision systems and their implementation on reconfig-urable system-on-a-chip platforms.

Mr. Ayoub was a member of the Syndicate of En-gineers, Beirut, Lebanon, from 2001 to 2006.

KOLAR et al.: PROTOTYPE OF VIDEO ENDOSCOPIC CAPSULE 249

Sylvain Viateur is Electronics Engineer at PierreandMarie Curie University, Paris, France.

He designs digital systems. His main research in-terests include digital circuits design and embeddedsoftware development.

Bertrand Granado received the M.S. and Ph.D. de-grees in computer science from Paris XI University,Orsay, France, in 1994 and 1998, respectively.

Currently, he is a Professor of Electrical Engi-neering at ENSEA, and the Head of the ASTREteam at ETIS Laboratory, Cergy, France. His fieldsof research are based around adequacy algorithmarchitecture. He is working on artificial retina,biomedicalinstrumentation, reconfigurable architec-tures, and adaptable architectures.