Super-resolution image construction from high-speed camera sequences

6
Super-Resolution Image Construction from High-Speed Camera Sequences Hong-Thinh Nguyen, Ha Vu Le Department of Information Processing University of Engineering and Technology, VNU Hanoi 144 Xuan Thuy, G2-206, Cau Giay, Hanoi, Vietnam Abstract—Super-resolution is a very well-studied topic in image enhancement. However, traditional super-resolution techniques are limited by global motion assumption and the accuracy of displace- ment estimation. In this paper, we are interested in the problem of constructing super-resolution images from high-speed camera sequences with the presence of moving objects. The objective of our research is finding a suitable method for this problem. We have experimented with several super-resolution methods on simulated and real high frame rate sequences in order to compare the performance of these methods. Experimental results and discussion will be reported. Index Terms—video processing, super-resolution, image interpolation, high-speed camera, high frame rate video I. I NTRODUCTION Image enhancement has been a well-studied topic in the literature, with a wide variety of solutions. It can be separated into two groups: a) creating high quality images by increasing the number of image pixels or information (super- resolution imaging), and b) de-noising, de-blurring for enhancing the visual quality (but no extra information). In this paper, we focus on the problem of super-resolution image construction from video sequences captured by a high-speed camera. High- speed cameras which could capture up to thound- sands, even millions, frames per second have been developed and found many applications, especially in scientific imaging, when there are needs to capture and to analyze the motion of fast moving objects. Our work is concerned with high-speed surveil- lance cameras. Surveillance cameras are usually used in places where the environment is full of noises, motions and vibrations which could affect the quality of captured images. Traffic cameras, surveillance cameras at airports and train stations, and monitoring cameras at factories, are some examples. The main benefit of using high-speed surveillance cameras is that their images are less affected by motions, low-frequency noises and vibrations. However, spatial resolution of a high- speed camera is usually low compared to that of normal-speed cameras, so it is needed to construct images with a higher resolution from the low resolution images captured by high-speed cameras to obtain better details of the objects in those images. II. BACKGROUND The background knowledge necessary for un- derstanding and exploring super-resolution meth- ods can be found in [12] and [6]. In the context of super-resolution imaging, it is generally assumed that several low res- olution (LR) images can be combined into a single high resolution (HR) image (decreasing the temporal-resolution in order to increase the spatial-frequency content) (see Fig 1). The LR im- ages should not all be identical, of course. Rather, there must be some variation between them, such as motion of camera and/or objects, or change of viewing angle. In theory, given multiple image frames of a same scene and the transformations between these frames, it should be able to obtain a much better image of the scene. A. Super-resolution with the presence of moving object(s) However, most super-resolution methods work well only with images of stationary scenes (no

Transcript of Super-resolution image construction from high-speed camera sequences

Super-Resolution Image Constructionfrom High-Speed Camera Sequences

Hong-Thinh Nguyen, Ha Vu LeDepartment of Information Processing

University of Engineering and Technology, VNU Hanoi144 Xuan Thuy, G2-206, Cau Giay, Hanoi, Vietnam

Abstract—Super-resolution is a very well-studiedtopic in image enhancement. However, traditionalsuper-resolution techniques are limited by globalmotion assumption and the accuracy of displace-ment estimation. In this paper, we are interested inthe problem of constructing super-resolution imagesfrom high-speed camera sequences with the presenceof moving objects. The objective of our research isfinding a suitable method for this problem. We haveexperimented with several super-resolution methodson simulated and real high frame rate sequences inorder to compare the performance of these methods.Experimental results and discussion will be reported.

Index Terms—video processing, super-resolution,image interpolation, high-speed camera, high framerate video

I. INTRODUCTION

Image enhancement has been a well-studiedtopic in the literature, with a wide variety ofsolutions. It can be separated into two groups:a) creating high quality images by increasing thenumber of image pixels or information (super-resolution imaging), and b) de-noising, de-blurringfor enhancing the visual quality (but no extrainformation).

In this paper, we focus on the problem ofsuper-resolution image construction from videosequences captured by a high-speed camera. High-speed cameras which could capture up to thound-sands, even millions, frames per second have beendeveloped and found many applications, especiallyin scientific imaging, when there are needs tocapture and to analyze the motion of fast movingobjects.

Our work is concerned with high-speed surveil-lance cameras. Surveillance cameras are usuallyused in places where the environment is full of

noises, motions and vibrations which could affectthe quality of captured images. Traffic cameras,surveillance cameras at airports and train stations,and monitoring cameras at factories, are someexamples. The main benefit of using high-speedsurveillance cameras is that their images are lessaffected by motions, low-frequency noises andvibrations. However, spatial resolution of a high-speed camera is usually low compared to that ofnormal-speed cameras, so it is needed to constructimages with a higher resolution from the lowresolution images captured by high-speed camerasto obtain better details of the objects in thoseimages.

II. BACKGROUND

The background knowledge necessary for un-derstanding and exploring super-resolution meth-ods can be found in [12] and [6].

In the context of super-resolution imaging,it is generally assumed that several low res-olution (LR) images can be combined into asingle high resolution (HR) image (decreasingthe temporal-resolution in order to increase thespatial-frequency content) (see Fig 1). The LR im-ages should not all be identical, of course. Rather,there must be some variation between them, suchas motion of camera and/or objects, or change ofviewing angle. In theory, given multiple imageframes of a same scene and the transformationsbetween these frames, it should be able to obtaina much better image of the scene.

A. Super-resolution with the presence of movingobject(s)

However, most super-resolution methods workwell only with images of stationary scenes (no

Fig. 1. Multi-frame super-resolution image construction,step by step: 1) Finding displacement between frames (sub-pixel accuracy is required), 2) Up-sampling all the frames andregister them to a finer grid, and 3) Performing interpolationto estimate missing pixel in the grid (known as irregularinterpolation and/or scattered interpolation [14])

object motion). There have been comparativelyfew investigations into applying super-resolutiontechniques to video with the presence of movingobjects. There are some differences between them:• In the case of stationary scenes, there is

only motion of the camera, thus the relativedisplacement is global. In video sequenceswith moving objects, the camera may bestationary but there are individual objectsmoving within the scene like walking peopleor running cars. In such situations, it maybe necessary to identify and determine themotion of each object individually. We mayalso need to care about local blur caused byobject motion.

• A 2-D image is a projection of a 3-D sceneinto the image plane. Depending on therelative position between a object and thecamera, as well as the motion of the object,that object can appear drastically different indifferent frames. For example, a disc standingparallel to the image plane will appear as acircle. If it is rotated about a parallel axis,however, it will become an ellipse of shrink-ing width, until it finally looks like a line.Moreover, parts of an object may becomeinvisible due to occlusion. Many methodsassume simple affine transformations to dealwith changes of the object’s shape fromframe to frame [8].

• In many super-resolution methods the con-struction is possible only with sub-pixel dis-placements [3]. In some applications it ispossible to control camera motion to guaran-tee sub-pixel displacements between frames.However, in reality it is difficult to control thespeed of moving objects in the scene, thus

it is unable to ensure that the displacementsare sub-pixel. A better way to deal withthis situation is detecting moving objects inthe sequence and focusing on sub-pixel partsof the motion. In this way, the registrationbetween frames becomes more complex withclassifying pixels into moving objects andbackground.

[9] and [15] are about super-resolution imageconstruction for moving object case. In theseworks, the construction process includes threesteps: 1) The moving object is detected and seg-mented from low resolution frames, 2) Performingmotion estimation and registration only on objectpixels, and 3) Irregular interpolation is performedto obtain super-resolution image of the object.Flow diagram of super-resolution image construc-tion for moving objects is presented in Fig.2.

Fig. 2. Flow diagram of super-resolution image constructionfor moving objects

B. Super-resolution for high-speed camera se-quences

There are important features we need to con-sider when working with images from high-speedsurveillance cameras:• The camera is often stationary during record-

ing process.• Displacements of moving objects between

successive frames are very small (usually

sub-pixel). The shape of a moving objectseems to be constant and its motion is simplytranslation from a frame to the next.

• Blur caused by object motion is insignificant.

We have found nothing in the literature regard-ing super-resolution techniques specifically forhigh frame rate sequences. But for most commonsuper-resolution methods, these above mentionedfeatures could be seen as advantages. However, theaccuracy of these may suffer with the presenceof noise since the variation between successiveframes is too small.

C. Motion approximation by optical flow

An optical flow method tries to estimate howmuch each pixel moves from a frame to thenext, based on temporal derivatives. The mainadvantage of using optical flow is that a densemotion field is computed for every pixel in eachframe, that is appropriate when the motions arelocal (motions of objects) and there is no a prioriinformation about object motions. Most opticalflow methods could achieve sub-pixel accuracywhen the displacements are small [17], [2], [5],[10], so they fit quite well with the features ofimages from high-speed cameras.

Using optical flow for motion estimation alwaysintroduces errors since the flow equations arebuilt up on image gradients. When computing thegradients, we tend to amplify the noise. In the caseof high frame rate sequences, when displacementsbetween successive frames are very small, noisebecomes a remarkable obstacle. To overcome thisobstacle, consider optical flow methods using highorder derivation, and also multi-scale, multi-layermethods[3], [10]. In the latter the optical flowcould be refined by using both temporal andspatial information.

D. Image interpolation

Interpolation is an important step in super-resolution image construction, because LR imagesneed being upscaled to fit a higher resolution grid.There are many ways to perform interpolation insuper-resolution methods. The easiest way maybe that upscaling each LR frame separately thencombining them after compensating their motionsto obtain an HR image. In this way interpolation

is done with a uniform grid, that could be simplebut the amount of computations could be huge.

Another way is to fill up a HR grid with datafrom LR frames after compensating their motions,then using non-uniform interpolation methods,also called irregular interpolation, to estimate val-ues of missing pixels in the HR grid. Note thatmotion vectors, estimated with sub-pixel accuracyfrom LR frames, must be rounded to pixel levelin the HR grid. One of the state-of-the-art amongnon-uniform interpolation methods is the KernelRegression Interpolation (KRI), proposed in [14].The strength of KRI when applying to imageinterpolation is that it could exploit geometricregularities in images to reduce artifacts.

There is also a class of interpolation methods,called scattered data interpolation[7], [11], [1],which seems also suitable for our purpose. Inthese methods, the first step is to fit the availablepixels in the HR grid with a smooth surface, thenusing that surface for interpolating missing pixels.There are many methods to generate a smoothsurface from scattered points, such as triangulationor tetrahedrization.

E. Super-resolution without explicit motion esti-mation

A method for constructing a super-resolutionimage from multiple LR frames without having toestimate frame-to-frame displacements is knownas the Nonlocal-Means (NLM). The concept ofNLM was first proposed by Buades, Coll andMorel [4] as a de-noising algorithm. The keyidea of this method is to update each pixel withweighted values of its spatial neighbors. The up-date formula for a pixel p is:

x[p] =

∑p∈N(p) w[p, p]y[p]∑

p∈N(p) w[p, p]

where x[p] is the updated value of pixel p, N(p)denotes a set of neighboring pixels of pixel p,and the weight w[p, p] presents an identical factorbetween the two pixels p and p, whose valueis calculated based on the distance (difference)between two patches centered at these two pixels:

w[p, p] = e−dist(R(p),R(p))2/2σ2f(dist(p, p))

Protter is the first author to apply NLM to super-resolution imaging. In this case, each output pixel

is computed as a weighted average of pixels in its3-dimensions neighborhood (both spatial and tem-poral) in the input sequence. By taking a slightlydifferent perspective, we can regard the weightsas reflecting the similarity between an updatedpixel in the HR grid and its neighbors in the LRframes. Details of this method can be found in[13]. The NLM-based super-resolution method isa potentially good approach to process sequencesfrom high-speed surveillance cameras since thetemporal search range is small compared to thatwith normal-speed camera sequences, hence re-ducing computational loads. Another benefit isthat irregular interpolation is not needed.

III. EXPERIMENTS

The objective of our experiments is to comparethe performance of super-resolution methods whenapplying to enhancing images from high-speedcameras. We have implemented two categories ofmethods: one with optical flow-based motion es-timation and the other without motion estimation(NLM-based), using three different image interpo-lation schemes: non-uniform linear interpolation,Kernel Regression interpolation, and triangulation-based scattered interpolation. These methods weretested on simulated and real-world sequences. Thereal-world sequences were captured by a surveil-lance camera, which has the frame rate of up to500 frames/second.

The super-resolution process using optical flowcan be summarized into four steps:

1) Calculating the motion vector field betweensuccessive frames.

2) Identifying the region of interest ROIwhich containes the moving object fromeach frame based on the motion field.

3) Registering pixels in the ROI of each frameinto the HR grid, with motion compensation.

4) Interpolating missing pixels in the HR grid.

Fig. 3. Identifying the moving object.

Fig. 4. Combination of pixels in the ROI of original framesto obtain denser information of moving object.

Simulated sequences were created from HRimages by generating small displacements, thendownsampling all frames 5 times in each dimen-sion and adding noise with the SNR of 20dB. HRimages were then reconstructed from simulatedLR sequences, with and without added noise. TheHR images were reconstructed from 10, 15, and20 LR frames, equivalent to 40%, 60%, and 80%of the original data, respectively.

The registration into the HR grid is illustratedin Fig.5. The solid line represents a continuousorbit of the object within a sequence. Positions ofthe object in frames are represented by numerousdots (with 1/10-pixel accuracy), each of them isconsidered a “piece of information”. After motionestimation, we have to warp all LR frames in to aHR grid as shown in Fig.6.

Fig. 5. The orbit of the moving object in a sequence: thesolid line illustrates the continuous movement of this objectover time; each dot denotes a position at one time.

Experimental results on simulated sequencesare shown in Tables I, II, and III. In each tablethere are three sets of PSNR data equivalent to

Fig. 6. Warping data from multiple frames into a HR grid.

three different numbers of frames used for LRimage reconstruction. For each case, the first linecontains results obtained from simulated sequencewithout noise, and the second line contains resultsobtained from simulated sequence with addednoise. For all sequences, we could see using KRIfor missing pixel interpolation always achieves thebest performance. Another observation from theseresults is that noise could severely degrade theperformance of super-resolution methods.

TABLE IRESULTS ON SIMULATED SEQUENCE #1.

# of frames Linear KRI Scattered

10 29.2 30.24 28.329.18 30.44 28.98

15 31.79 33.24 30.3229.18 31.5 30.58

20 36.15 36.56 35.6830.27 31.58 31.11

TABLE IIRESULTS ON SIMULATED SEQUENCE #2.

# of frames Linear KRI Scattered

10 27.1 29.08 28.5324.78 26.28 24.37

15 31.43 35.62 31.5927.01 27.03 25.6

20 35.23 38.01 33.429.79 31.23 26.94

Fig.7 is an original frame from a real-world highframe rate sequence we used in our experiments,

TABLE IIIRESULTS ON SIMULATED SEQUENCE #3.

# of frames Linear KRI Scattered

10 28.8 29.2 27.8728.78 29.92 29.31

15 30.87 31.02 29.8128.59 28.63 28.84

20 34.5 34.78 32.7128.78 29.92 29.31

and the super-resolution image constructed fromthat sequence is shown if Fig.8.

Fig. 7. An LR frame from a real-world high-speed camerasequence.

Fig. 8. Super-resolution image of a moving car constructedfrom a real-world sequence.

For the NLM-based super-resolution method,the performance is quite low in term of PSNR(about 25dB for simulated sequences withoutnoise and about 20dB for simulated sequenceswith added 20dB noise), but the visual quality of

constructed images is comparable to those resultedfrom optical flow-based methods, as shown inFig.9. A possible explanation is that NLM methodtends to smooth irregularities in images.

Fig. 9. Super-resolution image constructed from seven framesof a real-world sequence by using NLM-based method.

IV. CONCLUSIONS AND FUTURE WORK

Super-resolution construction for video contain-ing local moving objects is always a difficulttask. For high-speed camera sequences, methodsemploying optical flow-based motion estimationor NLM (no motion estimation needed) are goodcandidates. However, from our experiments withthese methods, we have realized that the key forimproving quality of constructed images lies withthe selection of image interpolation techniques.State-of-the-art methods in image interpolation arethe ones making use of geometric regularities like[16]. Our findings are quite consistent with thistrend since the use of KRI always yields the bestresults. The disadvantage of [16] and some otherimage interpolation methods is they are developedoriginally for uniform-grid interpolation. In thenext step we will dig deeper into approachesfor irregular image interpolation with focus ongeometric regularities.

ACKNOWLEDGEMENT

We would like to thank Prof. Alain Merigotof Institute de Electronique Fondamentale (IEF),CNRS, France, for providing the high-speed cam-era sequences used in our experiments, and for hisvaluable comments on our work.

This work is partly supported by the projectQC.09.07 of the Vietnam National University,Hanoi, and by an internship for Ms. Hong-ThinhNguyen provided by IEF.

REFERENCES

[1] Isaac Amidror. Scattered data interpolation for electronicimaging systems: a survey. Journal of Electronic Imag-ing, 11(2):157–176, 2002.

[2] S. Baker and T. Kanade. Super-resolution optical flow.Robotics Institute, Carnegie Mellon Univ., Pittsburgh, PA,CMU-RI-TR-99–36, 1999.

[3] JL Barron, DJ Fleet, and SS Beauchemin. Performanceof optical flow techniques. International journal ofcomputer vision, 12(1):43–77, 1994.

[4] A. Buades, B. Coll, and J.M. Morel. A non-local al-gorithm for image denoising. In IEEE Computer SocietyConference on Computer Vision and Pattern Recognition,2005. CVPR 2005, volume 2, 2005.

[5] C. Crutchfield. Improving Super-Resolution Enhance-ment of Video by using Optical Flow.

[6] S. Farsiu, M. Elad, P. Milanfar, et al. Multiframedemosaicing and super-resolution of color images. IEEETransactions on Image Processing, 15(1):141–159, 2006.

[7] R. Franke. Scattered data interpolation: tests of somemethod. Mathematics of Computation, 38(157):181–200,1982.

[8] Ha Vu Le and Guna Seetharaman. A Super-ResolutionImaging Method Based on Dense Subpixel-AccurateMotion Fields. The Journal of VLSI Signal Processing,42(1):79–89, 2006.

[9] A. Letienne, F. Champagnat, G. Le Besnerais, C. Kulcsar,and P.V. De Lesegno. Fast super-resolution on movingobjects in video sequences. In EUSIPCO EuropeanSignal Processing Conference, 2008.

[10] S.H. Lim and A. El Gamal. Optical flow estimationusing high frame rate sequences. In Image Processing,2001. Proceedings. 2001 International Conference on,volume 2, 2001.

[11] C.A. Micchelli. Interpolation of scattered data: distancematrices and conditionally positive definite functions.Constructive Approximation, 2(1):11–22, 1986.

[12] S.C. Park, M.K. Park, and M.G. Kang. Super-resolutionimage reconstruction: a technical overview. IEEE signalprocessing magazine, 20(3):21–36, 2003.

[13] M. Protter, M. Elad, H. Takeda, and P. Milanfar. General-izing the non-local-means to super-resolution reconstruc-tion. IEEE Transactions on Image Processing, 18(1):36–51, 2009.

[14] H. Takeda. Kernel regression for image processing andreconstruction. PhD thesis, Citeseer, 2006.

[15] A. Van Eekeren, K. Schutte, J. Dijk, DJJ de Lange, andLJ van Vliet. Super-resolution on moving objects andbackground. In 2006 IEEE International Conference onImage Processing, pages 2709–2712, 2006.

[16] X. Zhang and X. Wu. Image interpolation by adaptive 2-d autoregressive modeling and soft-decision estimation.IEEE Transactions on Image Processing, 17(6):887–896,2008.

[17] W .Y. Zhao and H.S. Sawhney. Is super-resolution withoptical flow feasible? Lecture Notes in Computer Science,pages 599–613, 2002.