Three Dimensional Data-Driven Multi Scale Atomic Representation of Optical Coherence Tomography

35
0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging Copyright (c) 2010 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. Three Dimensional Data-Driven Multi Scale Atomic Representation of Optical Coherence Tomography Raheleh Kafieh 1 , Student Member, IEEE, Hossein Rabbani *1 , Senior Member, IEEE, Ivan Selesnick 2 , Senior Member, IEEE 1 Department of Biomedical Engineering, Medical Image and Signal Processing Research Center, Isfahan Univ. of Medical Sciences, Isfahan, IRAN, Phone: +98-311-792-2414, Fax: +98-311-792-2362, email: [email protected], [email protected] 2 Electrical and Computer Engineering Dept., Polytechnic Institute of New York University, 2 Metrotech Center, Brooklyn, New York 11201, Phone: (718) 260-3416, Fax: (718) 260-3906, email: [email protected] Abstract In this paper, we discuss about applications of different atomic representations in Optical Coherence Tomography (OCT). Atomic representation is known as a method for decomposing a signal over elementary waveforms chosen in a family called a dictionary. If the selected dictionary is learned from the processed data, it is called data-driven or non-parametric representation. Three fundamental properties for non-parametric atomic representation can make them appropriate for image processing of OCT: being data-driven, applicability on 3D, and working in multi-scale. We discuss about application of such representations including complex wavelet based K-SVD, and diffusion wavelets on OCT data. We introduce complex wavelet based K-SVD to take advantage of adaptability in dictionary learning methods to improve the performance of simple dual tree complex wavelets and demonstrate its ability in speckle reduction of OCT datasets in 2D and 3D. 72 randomly selected slices from six 3D OCTs taken by Topcon 3D OCT-1000 and 72 randomly selected slices from six 3D OCTs taken by Cirrus Zeiss Meditec are used for evaluation, and improvement of contrast to noise ratio (CNR) (from 0.9 to 11.91 and from 3.09 to 88.91 in two studied datasets) and equivalent number of looks (ENL) (form 5.08 to 1322.81 and from 28.6 to 22231.73 in two studied datasets) is achieved. The time complexity of this method is higher than complex wavelets, but it can be improved by learning an enhanced dictionary for each dataset and applying the saved dictionary to reduce the time. Furthermore, two approaches are proposed for image segmentation using diffusion wavelets as another tool for 3D data-driven multi-scale atomic representation of OCT. The first method is designing a competition between extended basis functions at each level and the second approach is defining a new distance for each level and clustering based on such distances. A combined algorithm, based on these two methods is then proposed for segmentation of retinal OCTs, which is able to localize 12 boundaries with unsigned border positioning error of 9.22±3.05 μm, on a test set of 20 slices selected from thirteen 3D OCTs. Keywords- Optical coherence tomography, dictionary learning, 3D complex wavelet transform, diffusion wavelet, denoising, segmentation

Transcript of Three Dimensional Data-Driven Multi Scale Atomic Representation of Optical Coherence Tomography

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

Copyright (c) 2010 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected].

Three Dimensional Data-Driven Multi Scale Atomic Representation of Optical Coherence Tomography

Raheleh Kafieh1, Student Member, IEEE, Hossein Rabbani*1, Senior Member, IEEE, Ivan Selesnick2, Senior Member, IEEE

1 Department of Biomedical Engineering, Medical Image and Signal Processing Research Center, Isfahan Univ. of Medical Sciences, Isfahan, IRAN, Phone: +98-311-792-2414, Fax: +98-311-792-2362, email: [email protected], [email protected]

2 Electrical and Computer Engineering Dept., Polytechnic Institute of New York University, 2 Metrotech Center, Brooklyn, New York 11201, Phone: (718) 260-3416, Fax: (718) 260-3906, email: [email protected]

Abstract

In this paper, we discuss about applications of different atomic representations in Optical Coherence Tomography (OCT). Atomic representation is known as a method for decomposing a signal over elementary waveforms chosen in a family called a dictionary. If the selected dictionary is learned from the processed data, it is called data-driven or non-parametric representation. Three fundamental properties for non-parametric atomic representation can make them appropriate for image processing of OCT: being data-driven, applicability on 3D, and working in multi-scale. We discuss about application of such representations including complex wavelet based K-SVD, and diffusion wavelets on OCT data. We introduce complex wavelet based K-SVD to take advantage of adaptability in dictionary learning methods to improve the performance of simple dual tree complex wavelets and demonstrate its ability in speckle reduction of OCT datasets in 2D and 3D. 72 randomly selected slices from six 3D OCTs taken by Topcon 3D OCT-1000 and 72 randomly selected slices from six 3D OCTs taken by Cirrus Zeiss Meditec are used for evaluation, and improvement of contrast to noise ratio (CNR) (from 0.9 to 11.91 and from 3.09 to 88.91 in two studied datasets) and equivalent number of looks (ENL) (form 5.08 to 1322.81 and from 28.6 to 22231.73 in two studied datasets) is achieved. The time complexity of this method is higher than complex wavelets, but it can be improved by learning an enhanced dictionary for each dataset and applying the saved dictionary to reduce the time. Furthermore, two approaches are proposed for image segmentation using diffusion wavelets as another tool for 3D data-driven multi-scale atomic representation of OCT. The first method is designing a competition between extended basis functions at each level and the second approach is defining a new distance for each level and clustering based on such distances. A combined algorithm, based on these two methods is then proposed for segmentation of retinal OCTs, which is able to localize 12 boundaries with unsigned border positioning error of 9.22±3.05 μm, on a test set of 20 slices selected from thirteen 3D OCTs.

Keywords- Optical coherence tomography, dictionary learning, 3D complex wavelet transform, diffusion wavelet, denoising, segmentation

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

2

1. Introduction

Optical Coherence Tomography (OCT) is a new imaging technique to provide different information about the cross-sectional structures of an object. The performance of this method is principally similar to ultrasound imaging, except that OCT uses light beams instead of sound profiles [1]. OCT has made its most significant clinical contribution in the field of ophthalmology, where it has become an important technology in the areas of retinal diseases and glaucoma [2-4]. In this paper, we discuss about applications of different atomic representations in OCT, introduce a new atomic representation for denoising task (to avoid interpretation of OCT data [5]), and justify the superiority of the proposed method over other prevalent methods. Furthermore, we discuss about new theories in image segmentation using diffusion wavelets and show its application on OCT segmentation. Atomic representation is decomposing a signal over elementary waveforms chosen in a family called a dictionary [6]. Correct selection of dictionaries makes a sparse representation with few non-zero coefficients carrying the needed information, but obtaining an ideal sparse transform adapted to all signals is impossible. Therefore, a great number of strategies are utilized in atomic representation to concentrate the signal energy over a set of few vectors. One may categorize such representations into two principal subclasses: parametric and non-parametric methods. Each subclass can also be placed in two categories of single-scale and multi-scale transforms. In parametric methods, the representing atoms (dictionaries or basis) are fixed and predetermined regardless of the data. Therefore, any particular parametric method (with its predetermined dictionary) may be a powerful and near-to-ideal transform for a kind of data with special specifications, but it may operate poorly for other samples. Single-scale parametric category consists in Fourier transform, and multi-scale parametric group comprises wavelet transforms and geometrical X-lets. In non-parametric methods, any dataset can be plugged in and the most meaningful basis is identified to re-express the dataset and reveal the hidden structure behind the data [7]. There is no requirement to any parameter to be adjusted and no regard for how the data was recorded. Single-scale non-parametric class includes Principal Component Analysis (PCA), Independent Component Analysis (ICA), diffusion maps and dictionary learning. PCA is looking for a new basis in data, which is a linear combination of the original basis, that best re-expresses the data-set [7, 8]. Unlike PCA decompositions, ICA [9, 10] is in general over-complete, i.e. the number of basis elements are in general greater than the dimension of the data. Different from PCA, diffusion maps is a non-linear method that focuses on discovering the underlying manifold that the data has been sampled from [11]. In more complicated dictionary learning methods [12, 13], dictionary of primitive elements is learned from the signal and the signal is decomposed into these primitive elements. Such learning may be based on different solution algorithms like Method of Optimal Directions (MOD) [14] or K-SVD [15]. Conventional MOD and K-SVD initialize with a start dictionary of Discrete Cosine Transform (DCT) and are accordingly single-scale methods.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

3

Table 1. Summary of atomic representations discussed above.

Official website or online reference Reference Introduced in parameters Dimension Transform name

http://www.thefouriertransform.com G. Campbell and R.

Foster[16] 1948 Frequency 1D/2D/3D Fourier Transform Single Scale

Para

met

ric

met

hods

(n

on-d

ata-

driv

en d

ictio

nary

)

http://planetmath.org/?op=getobj&from=objects&id=1469

N. Ahmed, T. Natarajan

and K. R. Rao[17] 1974 Duration-

Translation 1D/2D/3D Cosine Transform

(DCT)

https://www.ceremade.dauphine.fr/~peyre/numerical-

tour/tours/wavelet_4_daubechies2d/

I. Daubechies[18] 1988 Scale-Translation

1D/2D/3D Separable Wavelet Transform

Multi-scale

http://www.bearcave.com/misl/misl_tech/wavelets/packet/index.html

R.R. Coifman[19] 1992 Scale-

Translation- Frequency

1D/2D/3D Wavelet packet Transform

http://ssli.ee.washington.edu/people/duh/projects/wedgelets.html

D. L. Donoho[20] 1997 Scale-

Translation- Rotation

2D Geometrical X-lets (wedgelet)

http://eeweb.poly.edu/iselesni/WaveletSoftware

N. G. Kingsbury[21] 1998

Scale-Translation-

Rotation 2D/3D Complex Wavelet

Transform

http://www.curvelet.org E. Candès

and D. Donoho[22] 2000 Scale-

Translation- Rotation

2D/3D Geometrical X-lets (curvelet)

http://www.cmap.polytechnique.fr/~peyre/bandelets

Le Pennec and E. Mallat[23] 2004

Scale-Translation-

Rotation 2D Geometrical X-lets

(bandlet)

http://www.ifp.uiuc.edu/~minhdo/software/contourlet_toolbox.zip

MN. Do and M. Vetterli[24] 2005

Scale-Translation-

Rotation 2D Geometrical X-lets

(contourlet)

http://www.math.wustl.edu/IllMoHAS/pdfs/yuelu.pdf

Y Lu and MN Do[25]

2005 Scale-

Translation- Rotation

3D Geometrical X-lets (Surfacelet)

http://www.fon.hum.uva.nl/praat/manual/Principal_component_analysis.html I.T. Jolliffe[26] 1986 Eigen Vectors 1D/2D/3D

Principal Component

Analysis (PCA)

Single scale

Non

-par

amet

ric

met

hods

(d

ictio

nary

lear

ning

)

http://www.cs.helsinki.fi/u/ahyvarin/whatisica.shtml P. Comon[9] 1994 Eigen Vectors 1D/2D/3D

Independent Component

Analysis (ICA)

http://eprints.maths.ox.ac.uk/740/1/bah.pdf and

http://www.math.ucla.edu/~wittman/mani/

R.R. Coifman and S. Lafon[11]

2006 Spectral 1D/2D/3D Diffusion Maps

http://media.aau.dk/null_space_pursuits/2012/04/dictionary-learning-with-the-method-

of-optimal-directions.html K Engan and SO

Aase[14] 1999 Duration-

Translation 1D/2D/3D Conventional

Method of Optimal Directions (MOD)

http://www.cs.technion.ac.il/~elad/software/ M Aharon[15] 2006 Duration-Translation 1D/2D/3D Conventional K-

SVD

http://www.math.duke.edu/~mauro/diffusionwavelets.html

R. R. Coifman and M.

Maggioni[27] 2006 Scale-Spectral 1D/2D/3D Diffusion wavelet

Multi-scale

- R.Kafieh and H. Rabbani and I. Selesnick (this

paper)

This work Scale-

Translation- Rotation

1D/2D/3D Complex wavelet based K-SVD

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

4

Multi-scale non-parametric group is composed of diffusion wavelets and complex wavelet based K-SVD. Diffusion wavelets [27] is an extension of classical wavelet transform and unlike wavelets whose basis functions are predetermined, diffusion wavelets are adapted to the geometry of a given operator 𝑇 (a random walk on the data). Complex wavelet based K-SVD is a new approach introduced in this paper which takes advantage of adaptability in dictionary learning methods and multi-scale and redundancy benefits of dual tree complex wavelets.

Table 1 shows a summary of mentioned atomic representations. Dimension, official website and inventors of each method are summarized in the table. In parametric methods, two most important specifications are “the utilized atom” and “the domain in which the transformation is applied”, indicated in “transform name” and “parameters” columns of the table, respectively. In a similar manner, the non-parametric approaches are mainly defined based on “learning method” and “the domain in which the transformation is applied” pointed out in “transform name” and “parameters” columns of the table, respectively. Parametric atom representation is widely used in OCT [5, 28-42], but non-parametric representations are new to this application [43-46]. Three fundamental properties can be defined for a subgroup of non-parametric methods which make them appropriate for image processing of OCT: being data-driven (as the principal factor in non-parametric dictionary learning methods to better fit the data); applicability on 3D data (to fit the intrinsic three dimensional property of OCT); and working in multi-scale (to provide fine-to-coarse representation of the information). Complex wavelet based K-SVD and diffusion wavelets are candidates possessing these properties among above representations. Depending on desired applications, each of such representations may be chosen for analysis of OCT. In this paper that is an extension of our preliminary works [47, 48], we introduce complex wavelet based K-SVD and demonstrate its ability in speckle reduction of OCT datasets. Especially we show that designing 3D dictionaries results in a considerable improvement in denoising results of valid datasets from different OCT imaging devices. It is previously shown that diffusion maps are powerful in segmentation of OCTs in 2D and 3D [45, 46], and in this paper we discuss about their multi-scale versions expected to be able to manage segmentation tasks in OCT segmentation. These methods take advantage of geometrical information of retinal layers in OCT rather than being only dependent on gray-scale properties. In this paper we introduce an application of diffusion wavelet to localize 12 boundaries of OCT data using a new combined method based on defining extended basis functions and new distances for each level of diffusion wavelet. The rest of this paper is organized as below: Section 2 is devoted to introduction of complex wavelet based K-SVD and its application in OCT denoising. Section 2.1 provides an introduction to OCT denoising; Section 2.2 describes the principals of conventional dictionary learning. A newer version of dictionary learning named double-sparse is elaborated in section 2.3. Section 2.4 is dedicated to depict specifications of dual tree complex wavelet transform in 2D and 3D versions. Our proposed method as "dictionary learning with wise selection of start dictionary" is described in section 2.5. Section 3 is devoted to diffusion wavelet as another tool for 3D data-driven multi-scale atomic representation of OCT. Construction of the affinity and normalized Laplacian matrixes is discussed in section 3.1 and diffusion wavelets algorithm is elaborated in section 3.2. Section 3.3 introduces two new theories in application of diffusion wavelets in image segmentation and the proposed combined method for segmentation of OCT images is discussed in section 3.4. The results and performance evaluation of complex wavelet based K-SVD denoising method and diffusion wavelet based segmentation technique are respectively available in section 4.1 and section 4.2. Finally, section 5 includes the conclusion.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

5

2. Complex wavelet based K-SVD and its application in OCT denoising 2.1. OCT denoising

Similar to ultrasound images, OCT suffers from speckle noise which causes erroneous interpretation. Multidimensional filtering and noise removal as the most widely used canonical filtering operation is a fundamental operation in image, video processing, and low-level computer vision [49]. In older OCT devices, named time-domain techniques, interferometric fringes were generated in the detector by axial movement of reference reflector. This translation could reduce the image acquisition speed significantly. In newer generation of OCT, named Fourier domain OCT, the reference reflector remains fixed, and the difference between optical path length of the sample and the reference is transformed to frequency domain. Consequently, Fourier domain OCT can reach higher speeds up to several hundred thousand lines per second [50]. The resolution of this frequency-based method is also high in both axial and lateral directions. The axial resolution in OCT is dependent on spectral properties of the light source, and the lateral resolution relies on the spot size of the focused illuminating light beam [51]. However, it should be considered that as a result of using coherent illuminations, OCT is also affected by speckle noise in both axial and lateral directions. Therefore, the effective resolution of OCT images, as the smallest detectable detail, is limited by this factor [51-56]. It should be considered that speckle is not pure noise since it carries crucial information correlated with noise, which should be distinguished from corrupting noise [57].

Noise reduction in OCT may be used on OCT signal before producing magnitude of OCT interference signal (complex domain or hardware based) or on magnitude of the OCT signal (magnitude domain or software based). Complex domain methods can be split into "modification in optical setup" and "adjustment in imaged subject itself". Modified optical setup produces a number of tomograms with uncorrelated speckle pattern which are averaged to make a speckle-reduced image. In order to decorrelate the speckle, parameters like the incident angle of the laser beam [52, 58-60], the recording angle of the back reflected light [61] or the frequency of the laser beam [62] should be altered in optical setup. Adjustment in imaged subject can be described in eye movement for retinal OCTs. Such movements may be due to small motions caused by respiration and heart beat or movements of the eye itself, like for example saccadic motion. Weighted averaging schemes [63], registration of multiple frames by cross correlation [64, 65], and eye tracking systems [66] are proposed in this area. An OCT system using eye tracking method is offered commercially by Heidelberg Engineering in the Spectralis devices [67].

Magnitude domain methods may be studied in "raw image domain" and "sparse representation". The methods in raw image domain include some traditional strategies like low-pass filtering [68], lineal smoothing [1], median filtering [69-76], Wiener filtering, mean filtering [77-79], and two 1D filtering [80]. Some advanced methods are also utilized in raw image domain like I-divergence regularization approaches [81], non-linear anisotropic filter [82-85], complex diffusion [86, 87], directional filtering [88, 89], adaptive vector-valued kernel function [90], SVM approach [91], and Bayesian estimations [92].

Sparse representation is also popular in denoising of OCT images which includes "non-parametric methods" like robust PCA [44], and "parametric methods" like wavelet–based methods [28, 35-39], dual tree complex wavelet transformation [5, 29, 36, 40-42], curvelet transform [31, 32], circular Symmetric Laplacian mixture model in wavelet diffusion [93], and sparsity-based denoising [43]. It should also be mentioned that many papers (most of which utilize graph-based methods for segmentation of OCT images) declared that their methods are more robust to noise and no particular denoising is utilized in these methods [94-97]. Table 2 shows a complete list of available denoising methods in OCT images.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

6

Table 2. Available denoising methods in OCT images.

Denoising method Reference

Com

plex

dom

ain

met

hods

Modification in optical setup

Alternation in incident angle of the laser beam

Bashkansky, M., 2000 [52] Iftimia, N, 2003 [59] Ramrath, L, 2008 [60] Hughes, M., 2010 [58]

Alternation in the recording angle of the back reflected light

Desjardins, A, 2006 [61]

Alternation in the frequency of the laser beam Pircher, M, 2003 [62]

Adjustment in imaged subject itself

Weighted averaging schemes Sander, B, 2005 [63] Registration of multiple frames by cross

correlation Götzinger, E., M., 2005 [64] Jørgensen, T.M., 2007 [65]

Eye tracking systems Ferguson, R.D., 2004 [66]

Mag

nitu

de d

omai

n m

etho

ds

Raw

Imag

e do

mai

n

Traditional Methods

Low-pass filtering Hee, M.R., 1995 [68] 2D linear smoothing Huang, Y., 1991 [1]

Median filter

George, A., 2000 [70] Koozekanani, D., 2001 [72] Rogowska J., 2002 [74] Herzog, A., 2004 [71] Shahidi, M., 2005 [75] Boyer, K., 2006 [69] Srinivasan, VJ., 2008 [76] Lee, K., 2010 [73]

Adaptive Wiener filter Ozcan, A., 2007 [79]

Mean filter

Ishikawa, H., 2005 [77] Ozcan, A., 2007 [79] Mayer, M., 2008 [78]

Two 1D filters Baroni, M., 2007 [80]

Advanced methods

I-divergence regularization approach Marks, D. L., 2005 [81]

Non-linear anisotropic filter

Cabrera Fernández, D., 2004[82] Gregori, G., 2004 [84], Garvin, M., 2008 [83] , Puvanathasan, P, 2009 [85]

Complex diffusion Bernardes, R., 2010 [86] Salinas, H. M., 2007 [87]

Directional filtering Rogowska, J., 2000 [89] Bagci, A.M., 2008 [88]

Adaptive vector-valued kernel function Mishra, A., 2009 [90] SVM approach Fuller, A.R., 2007 [91]

Bayesian estimations Wong, A., 2010 [92]

Spar

se r

epre

sent

atio

n

Non-Parametric methods

Sparsity-based denoising Fang, L, 2012 [43] , Fang, L, 2013 [98] Robust principal component analysis Luan, F, 2013 [44]

Parametric methods

Wavelet–based methods

Adler, D. C., 2004 [28] Zlokolica, V., 2007 [39] Gupta, V., 2008 [35] , Pizurica, A., 2008 [37], Quellec , G., 2010 [38] Mayer, M., 2012 [36]

Dual tree complex wavelet transformation

Forouzanfar, M., 2007 [42] Chitchian, S., 2009 [29] Mayer, M., 2012 [36] Kajic, V., 2012, 2010 [40, 41] Rabbani, H., 2013 [5]

Curvelet transform Jian, Z., 2009 [31], Jian, Z., 2010 [32] Circular symmetric Laplacian mixture model

in wavelet diffusion Kafieh, R., 2013 [93]

None

Abramoff, M.D., 2009 [94] Yazdanpanah, A., 2009 [97] Yang, Q., 2010 [96], Kafieh, R., 2013 [45], Bogunovic , H., 2014 [99], Rathke , F., 2014[100]

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

7

In this work, we focus on denoising in magnitude domain. We chose dictionary learning to improve the performance of available wavelet-thresholding by tailoring adjusted dictionaries instead of using pre-defined basis functions. Furthermore, in order to take advantage of shift invariant wavelets, we introduce a new scheme in dictionary learning which starts from a dual tree complex wavelet. Moreover, we defined three dimensional versions of the proposed method to be applied on 3D volumes of OCT and to use intrinsic properties of 3D denoising.

2.2. Conventional dictionary learning

Dictionary learning in denoising of OCT was first proposed in [43]. The authors took advantage of customized scanning patterns with a selected number of B-scans imaged at higher signal-to-noise ratio (SNR). A sparse representation dictionary was then locally learned [101] for each of these high-SNR images, and such dictionaries were utilized to denoise the low-SNR B-scans. The reported results of this work were good, but the most important problem associated with this work was its dependence on achievement of high-SNR slices which is not accessible in most of available datasets. Although this work was improved in [98], it again requires a reference dataset provided by a customized scanning pattern.

In this paper we propose application of K-SVD algorithm [102] on OCT data. For construction of sparseland, each data (𝑥) can be represented over a redundant dictionary matrix 𝐷 ∈ ℛ𝑛×𝑘 (with 𝑘 > 𝑛):

𝛼� = 𝑎𝑟𝑔min𝛼‖𝐷𝛼 − 𝑥‖22 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜 ‖𝛼‖0 < 𝑡 (1)

for a defined value of 𝑡 . Having a noisy version of 𝑥 named 𝑦 , the maximum a posteriori (MAP) estimator for denoising the data is built by solving:

𝛼� = 𝑎𝑟𝑔min𝛼‖𝐷𝛼 − 𝑦‖22 + 𝜇‖𝛼‖0 (2)

The K-SVD proposes an iterative algorithm to handle the above task efficiently [102]. As discussed in [103], it is not computationally efficient to apply K-SVD on the whole image. Therefore, overlapping patches should be obtained from the original data and a dictionary 𝐷 can be learned on these patches by K-SVD algorithm [102]. In practice, the exact solution of (2) is computationally demanding and an approximate solution is introduced in [104] to reduce the complexity of this task. Furthermore, the original K-SVD needs to be trained using patches from noiseless data, but when we work on an OCT data with no access to set of clean examples, it is allowed to take the patches from the noisy image itself because there is a noise rejection capability in K-SVD dictionary learning process [102]. The procedure can be broken into two stages which iterate until reaching a convergence.

In the first stage, we can suppose 𝐷 as an unknown, and define our problem as K-SVD [102]:

�𝐷� ,𝛼�𝑖𝑗 ,𝑋�� = arg min𝐷�,𝛼𝑖𝑗,𝑋 𝜆 ‖𝑋 − 𝑌‖22 + ∑ 𝜇𝑖𝑗 �𝛼𝑖𝑗�0 +∑ �𝐷𝛼𝑖𝑗 − 𝑅𝑖𝑗𝑋�22

𝑖𝑗𝑖𝑗 (3)

In this expression, the first term is the log-likelihood global forces to guaranty the proximity between the noisy version 𝑌 , and its denoised (and unknown) version 𝑋 [103]. The second and the third terms represent the data prior and assure that in the denoised version, every patch (𝑥𝑖𝑗) has a sparse representation. 𝛼𝑖𝑗 is expected to be the representation of each patch (𝑥𝑖𝑗) on dictionary 𝐷 (according to the third term). Every patch is shown by 𝑥𝑖𝑗 = 𝑅𝑖𝑗𝑋 by size of √𝑛 × √𝑛 where 𝑅𝑖𝑗 is a 𝑛 × 𝑁 matrix for an √𝑁 × √𝑁 image, which extracts the (𝑖𝑗) blocks. In the second stage, we can regard 𝐷 and 𝑋 to be fixed, and compute the representation using a sparse coding stage by Orthonormal Matching Pursuit

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

8

(OMP) [105]. Having the representations in hand, the dictionary can be updated using K-SVD approach [102]. We name this method 2D Conventional Dictionary Learning (2D CDL).

2.3. Double-sparse dictionary learning

In order to reduce the size of saved dictionaries, we may also take advantage of Double-Sparse Dictionary Learning (DSDL) [106]. This procedure is based on modeling of the dictionary atoms 𝐷 = Φ 𝐵 on a sparse base Φ with a sparse 𝐵. In this new definition, low adaptability of implicit dictionaries is compensated and efficient implementation is acquired unlike explicit dictionaries. The needed adaptability can be provided by modification of 𝐵, trained by example patches. Furthermore, since Φ can be any implicit dictionary, the model is a kind of adaptable extension to existing dictionaries. Similar to original K-SVD in (1), DSDL aims to solve the problem:

𝛼� = 𝑎𝑟𝑔min𝛼‖Φ 𝐵 𝛼 − 𝑥‖22 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜 �‖𝛼‖0 < 𝑡‖𝛽‖0 < 𝑝

� (4)

A new version of efficient implementation of K-SVD [104] is then used to fit (4), which is called sparse K-SVD algorithm [106]. In this paper we compare conventional (2D-CDL) and double-sparse dictionary learning (2D-DSDL) [106] methods for denoising. We also use 3D method in double-sparse dictionary learning (3D-DSDL) and show their superiority on 2D versions.

2.4. Dual tree complex wavelet transform

The start dictionary in [102] was redundant DCT, but since the penalty minimized in (3) is a highly non-convex functional, local minimum solutions are likely to happen. Thus, as elaborated in section 2.5, we propose using dual-tree Complex Wavelet Transform (CWT) [107], instead of redundant DCT for start dictionary to improve the results of conventional algorithms. CWT is nearly shift invariant and directionally selective in two and higher dimensions because of a redundancy factor of 2𝑑 for 𝑑 -dimensional signals (lower than the undecimated Discrete Wavelet Transform (DWT)). The multidimensional (M-D) dual-tree CWT is non-separable but is based on a computationally efficient, separable Filter Bank (FB) [108].

We know that perfect reconstruction forbids the DWT from being an analytic wavelet transform, but Kingsbury introduced dual-tree CWT [107] to solve this problem by employing two real DWTs; the first DWT for real part of the transform while the second for the imaginary part. A complex-valued scaling function and complex-valued wavelet are required in CWT:

𝜓<1>𝑐(𝑡) = 𝜓<1>

ℎ(𝑡) + 𝑗𝜓<1>𝑔(𝑡) (5)

where 𝜓<1>ℎ(𝑡) is one dimensional, real and even and 𝑗𝜓<1>

𝑔(𝑡) is imaginary and odd. In order to have an analytic signal, supported only on one-half of the frequency axis, 𝜓<1>

ℎ(𝑡) and 𝜓<1>𝑔(𝑡) should

form a Hilbert transform pair.

2.4.1. 2-D dual tree CWT

Discussing about 2D-CWT, a 2D wavelet transform that is both oriented and complex (approximately analytic) can be easily developed. The oriented 2D dual-tree CWT is four-times expansive, but it has the benefits of being oriented, approximately analytic, and full shift-invariant. Consider 𝜓<2>(𝑥,𝑦) = 𝜓<1>(𝑥)𝜓<1>(𝑦) for 2D dual tree CWT where 𝜓<1>(𝑥) is a complex (approximately analytic) wavelet given by 𝜓<1>(𝑥) = 𝜓<1>

ℎ(𝑥) + 𝜓<1>𝑔(𝑥). Then we have [108]:

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

9

𝜓<2>(𝑥,𝑦) = �𝜓<1>ℎ(𝑥) + 𝑗𝜓<1>

𝑔(𝑥)� �𝜓<1>ℎ(𝑦) + 𝑗𝜓<1>

𝑔(𝑦)�

= 𝜓<1>ℎ(𝑥)𝜓<1>

ℎ(𝑦) − 𝜓<1>𝑔(𝑥)𝜓<1>

𝑔(𝑦) + 𝑗[𝜓<1>𝑔(𝑥)𝜓<1>

ℎ(𝑦) +𝜓<1>ℎ(𝑥)𝜓<1>

𝑔 (6)

2-D dual tree CWT is oriented since the spectrum of 1-D wavelet is only on one side of frequency axis which makes the 2-D wavelet remain in one quadrant. To have a wavelet in two quadrants, we can take the real part of 𝜓<2>(𝑥,𝑦) which is the sum of two separable wavelets, placed symmetrically and oriented at −45° . Similarly, a real 2-D wavelet oriented at +45° can be obtained if we choose 𝜓<2>(𝑥,𝑦) = 𝜓<1>(𝑥)𝜓<1>(𝑦) and four more oriented real 2-D wavelets can be made by using 𝜙<1>(𝑥)𝜓<1>(𝑦) , 𝜓<1>(𝑥)𝜙<1>(𝑦) , 𝜙<1>(𝑥)𝜓<1>(𝑦) , and𝜙<1>(𝑥)𝜙<1>(𝑦) , where 𝜙<1>(𝑥) = 𝜙<1>

ℎ(𝑥) + 𝜙<1>𝑔(𝑥). Fig. 1 demonstrates a comparison between subbands of DWT and real dual tree

CWT. A 2-D wavelet transform that is both oriented and complex (approximately analytic) can also be easily developed by taking complex part of 𝜓<2>(𝑥,𝑦).

Fig. 1. A comparison between subbands of DWT and real dual tree CWT. (a) The wavelets in the space domain (LH, HL, HH). (b) The idealized support of the Fourier spectrum of each wavelet in the 2-D frequency domain. We can see the checkerboard artifact of the third wavelet. (c) The complex wavelets in the space domain. (d) The idealized support of the Fourier spectrum of each wavelet in the 2-D frequency plane. The absence of the checkerboard phenomenon is observed in both the space and frequency domains [5, 108].

2.4.2. 3-D dual tree CWT

Similar to 2-D approach, 3-D dual-tree CWT suffers from more serious checkerboard artifact. It is shown in [109] that 3-D dual-tree CWT is a good candidate for processing medical volume data and video sequences. Suppose 𝜓<3>(𝑥,𝑦, 𝑧) = 𝜓<1>(𝑥)𝜓<1>(𝑦)𝜓<1>(𝑧). Then we have [109]:

𝜓<3>(𝑥,𝑦, 𝑧) = 𝜓<1>(𝑥)𝜓<1>(𝑦)𝜓<1>(𝑧)

= �𝜓<1>ℎ(𝑥) + 𝑗𝜓<1>

𝑔(𝑥)� �𝜓<1>ℎ(𝑦) + 𝑗𝜓<1>

𝑔(𝑦)� �𝜓<1>ℎ(𝑧) + 𝑗𝜓<1>

𝑔(𝑧)� (7)

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

10

As we had for 2-D case, real part and imaginary part can be calculated and different versions of 𝜓<3>(𝑥,𝑦, 𝑧) can be defined to have various orientations. Fig. 2 demonstrates a comparison between subbands of DWT and dual tree CWT. According to discussed points, before applying dual tree CWT as start dictionary of the proposed method, we use separable 2-D DWT and three different methods based on dual tree CWT itself, to remove the noise from OCT data. These methods are real part of 2-D dual-tree CWT, 2-D dual-tree CWT, and 3-D dual-tree CWT, all based on soft thresholding [110].

Fig. 2. A comparison between the idealized support of the Fourier spectrum of each standard and complex wavelet in the 3-D frequency domain. (a) Isosurfaces of the seven 3-D wavelets for a standard 3-D wavelet transform. The blue and red colors have the same amplitude but their phases are complement. (b) Isosurfaces of seven (from 28) 3-D wavelets for a 3-D DCWT. Each subband corresponds to motion in a specific direction [5, 109].

2.5. Dictionary learning with wise selection of start dictionary

It should be considered that since the penalty minimized by the second and third terms in (3) is a highly non-convex functional, the solution may fall in local minimums. Therefore, it is important to start from a wisely selected dictionary, instead of using redundant DCT as utilized in [103], to reduce the chance of being trapped to local minimums and to reach the convergence in fewer number of iterations. According to undeniable positive properties of dual-tree CWT [108], we started with these dictionaries for their shift invariant and directionally selective characteristics, along with overcomplete subband decomposition. We named this method 2D/3D Complex Wavelet-based Dictionary Learning (2D-CWDL and 3D-CWDL).

For this purpose, we cannot use CWT in its algebraic form discussed in section 2.4, since in dictionary learning approaches we need to have an explicit dictionary to be multiplied by the data and produce the new signal. This leads to matrix form implementation of CWT. Suppose that the usual 2D separable DWT implemented using the filters {ℎ0(𝑛),ℎ1(𝑛)} can be represented by the square matrix 𝐹ℎℎ. If 𝑥 is a real image, the real and imaginary parts of the oriented 2D dual-tree CWT can be represented by 𝑊𝑟2𝐷 and 𝑊𝑖2𝐷, respectively [108], where 𝐼 is an identity matrix.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

11

𝑊𝑟2𝐷 = 12

�𝐼 −𝐼𝐼 𝐼 � �

𝐹ℎℎ𝐹𝑔𝑔�

𝑥. (8)

𝑊𝑖2𝐷 = 12

�𝐼 𝐼𝐼 −𝐼� �

𝐹𝑔ℎ𝐹ℎ𝑔

� 𝑥. (9)

Therefore, the complex coefficients can be calculated by:

𝐹𝐶2𝐷 = 14�𝐼 −𝐼𝐼 𝐼

𝐼 𝐼𝐼 −𝐼

𝐼 𝐼𝐼 −𝐼

−𝐼 𝐼−𝐼 −𝐼

⎣⎢⎢⎡𝐹ℎℎ𝐹𝑔𝑔𝑗𝐹𝑔ℎ𝑗𝐹ℎ𝑔⎦

⎥⎥⎤. (10)

This dictionary can now be used as start dictionary in dictionary learning of 2D-CWDL. Similarly, we may use 3D dual-tree CWT as start dictionary of 3D-CWDL. In particular, M-D dual-tree wavelets are not only approximately analytic but also oriented and thus natural for analyzing and processing oriented singularities like edges in images and surfaces in 3-D datasets [108]. According to section 2.4.2, real and imaginary parts of 𝜓<3>(𝑥,𝑦, 𝑧) can be extracted to make new orientations. By making new definitions for 𝜓<3>(𝑥,𝑦, 𝑧) similar to what elaborated in section 2.4.1 for 𝜓<2>(𝑥,𝑦), it can be shown that real and imaginary parts of the oriented 3D dual-tree CWT can be represented by 𝑊𝑟3𝐷 and 𝑊𝑖3𝐷:

𝑊𝑟3𝐷 = 14�𝐼 −𝐼𝐼 −𝐼

−𝐼 −𝐼𝐼 𝐼

𝐼 𝐼𝐼 𝐼

−𝐼 𝐼𝐼 −𝐼

⎣⎢⎢⎡𝐹ℎℎℎ𝐹𝑔𝑔ℎ𝐹𝑔ℎ𝑔𝐹ℎ𝑔𝑔⎦

⎥⎥⎤. (11)

𝑊𝑖3𝐷 = 14

�𝐼 −𝐼−𝐼 𝐼

𝐼 𝐼𝐼 𝐼

𝐼 𝐼−𝐼 −𝐼

𝐼 −𝐼𝐼 −𝐼

⎣⎢⎢⎢⎡𝐹ℎ𝑔ℎ𝐹𝑔𝑔𝑔𝐹𝑔ℎℎ𝐹ℎℎ𝑔⎦

⎥⎥⎥⎤. (12)

The complex coefficients can be calculated in 3D as start dictionaries of 3D-CWDL by :

𝐹𝐶3𝐷 = 116

⎣⎢⎢⎢⎢⎢⎡𝐼 −𝐼𝐼 −𝐼

−𝐼 −𝐼𝐼 𝐼

𝐼 𝐼𝐼 𝐼

−𝐼 𝐼𝐼 −𝐼

𝐼 −𝐼𝐼 𝐼

𝐼 𝐼𝐼 −𝐼

−𝐼 𝐼−𝐼 −𝐼

𝐼 𝐼𝐼 −𝐼

𝐼 𝐼𝐼 𝐼

𝐼 −𝐼−𝐼 𝐼

𝐼 −𝐼𝐼 −𝐼

𝐼 𝐼−𝐼 −𝐼

𝐼 𝐼𝐼 −𝐼

−𝐼 𝐼−𝐼 −𝐼

−𝐼 −𝐼−𝐼 𝐼

−𝐼 𝐼−𝐼 −𝐼⎦

⎥⎥⎥⎥⎥⎤

⎣⎢⎢⎢⎢⎢⎢⎢⎡𝐹ℎℎℎ𝐹𝑔𝑔ℎ𝐹𝑔ℎ𝑔𝐹ℎ𝑔𝑔𝑗𝐹ℎ𝑔ℎ𝑗𝐹𝑔𝑔𝑔𝑗𝐹𝑔ℎℎ𝑗𝐹ℎℎ𝑔⎦

⎥⎥⎥⎥⎥⎥⎥⎤

. (13)

3. OCT image representation based on diffusion wavelets

In this section, we present another tool for 3D data-driven multi-scale atomic representation of OCT that is a new diffusion model based approach. Diffusion wavelets can be considered as multi-scale version of diffusion maps. Diffusion map was introduced by Coifman and Lafon [11] to embed a dataset into a low-dimensional Euclidean space, whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The embedding is therefore in a way that "diffusion distance" between probability distributions centered at original points are equal to Euclidean distance between corresponding points in the embedded space. Our approach is based on work of Coifman and

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

12

Maggioni [27] in harmonic analysis and finds multiscale embeddings of pixels in an image. Harmonic analysis is commonly referred to Fourier analysis in continuous spaces, but wavelet methods can also be classified as new versions of harmonic analysis, dealing with both temporal and spatial properties of the signals. Coifman and Maggioni [27] introduced diffusion wavelets as a recent extension of wavelets to perform on discrete spaces like graphs. In this new approach, the basis functions are constructed in each level and despite traditional wavelets, they are not predefined [111].

3.1. Construction of the affinity and normalized Laplacian matrixes

We focus on gray-level images (which can be simply extended to colored ones). In order to apply the diffusion wavelets to an image, graph nodes must be associated with the image pixels. To reduce the complexity, we select 1×15 non overlapping pixel boxes as graph nodes and a kernel is defined as:

𝑤(𝑥,𝑦) = exp[−��𝑑2(𝑥,𝑦)2𝜎𝑔𝑒𝑜2

�+ �𝑑2(𝑔(𝑥),𝑔(𝑦))2𝜎𝑔𝑟𝑎𝑦2

��] (14)

where 𝑥, 𝑦 indicate the centroids of selected 1 × 15 boxes (we call them pixel nodes), 𝑔(. ) is a function to show the mean gray level of each box, and 𝜎𝑔𝑒𝑜 and 𝜎𝑔𝑟𝑎𝑦 point out the scale factor (calculated as 0.15 times the range of 𝑑(𝑥,𝑦) and 𝑑(𝑔(𝑥),𝑔(𝑦)) , respectively. 𝑑 is also selected to be the Euclidian distance). The affinity matrix (𝑊) is a 𝑁 × 𝑁

15 matrix, where 𝑁 is the number of pixels in the image. The

normalized pixel-pixel matrix 𝑇 is built by:

𝑇 = 𝐷−1 2� 𝑊 𝐷−1 2� (15)

where 𝐷 is a diagonal matrix, whose i-th diagonal entry is the sum of the entries on the i-th row of 𝑊. The normalized Laplacian matrix 𝐿 is also constructed by:

𝐿 = 𝐼 − 𝑇 (16)

Spectral graph theory is usually looking for eigenvectors of 𝑇, while multiscale diffusion analysis is trying to work on scaled versions of the data and finds scaling functions of 𝑇 using diffusion wavelets [27]. The overall process can be defined as a projection to lower dimensional spaces using scaling functions, while large scale information are kept [111]. The most effective property of multiscale diffusion analysis is providing a multiscale embedding and the main assumption on this method is that 𝑇 is local, i.e. it has a small support and that high powers of 𝑇 have low dimensional rank. Fig. 3 (the section titled "diffusion wavelet") describes how the spectral powers of 𝑇 relate with the multiscale eigen-space decomposition. Here the rank of 𝑇 decreases when increasing the powers of 𝑇. From the analyst's perspective, high powers are smooth functions with small gradient; hence they are compressible, leading to data reduction.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

13

Fig. 3. Schematic correlation between Fourier and wavelet transforms in continues spaces and diffusion maps and wavelets in embedded space

3.2. Diffusion wavelets algorithm

Diffusion wavelets are multiscale analysis on graphs, as an extension of classical wavelet theory from harmonic analysis [27]. Outputs of such analysis are sets of diffusion scaling functions (𝜙𝑗) and wavelet functions (𝜓𝑗 ). Orthogonal bases of diffusion scaling function 𝜙𝑗 , span a group of subspaces 𝑉0 ⊇ 𝑉1 ⊇ 𝑉2 ⊇…𝑉𝑗 ⊇ ⋯ to make a multiresolution decomposition of the functions on the graph. Having 𝑇 according to the previous section, the subspace 𝑉𝑗 can be shown as a numerical range

(to precision 𝜀 ) on 𝑇2𝑗+1−1 at scale 2𝑗+1 [111]. To define wavelet functions, orthogonal diffusion wavelets 𝜓𝑗 span 𝑊𝑗 (orthogonal complement of subspace 𝑉𝑗+1 into 𝑉𝑗 ).

3.3. Application of diffusion wavelets in image segmentation

Two approaches are proposed in this paper for image segmentation. The first method is based on extended basis functions at each level and designing a competition between the basis values for partitioning. The second approach is defining a new distance for each level and clustering based on such distances. These two methodologies are discussed below.

3.3.1. Image segmentation based on extended basis functions at each level

In single scale version on diffusion wavelets (diffusion maps), the diffusion distances were used for feature representation. We also use this approach for feature representation based on the extended basis functions, at level 𝑗. Let us consider extended basis functions 𝜙𝑗 at level 𝑗 is,

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

14

𝜙𝑗 = �𝜙𝑗1(𝑥1) … 𝜙𝑗𝑚(𝑥1)

⋮ ⋱ ⋮𝜙𝑗1(𝑥𝑛) … 𝜙𝑗𝑚(𝑥𝑛)

� (17)

where 𝑛 is the number of pixel nodes in image and 𝑚 is the number of extended basis functions at level 𝑗. For each arbitrary level 𝑗, the number of extended basis functions (𝑚) will differ and the final level will have one extended basis functions (𝑚 = 1). For instance, in a image with 𝑛 =800 pixel nodes, we may have 𝑚 = 7 at level 𝑗 = 10. Therefore, 𝜙𝑗 will have a size of 800 × 7 at level 𝑗 = 10. We can consider that the best number of clusters at level 𝑗 = 10 is 𝑚 = 7 and we can design a competition between the basis values for partitioning. For this purpose, 𝑥𝑖 (the 𝑖-th pixel node) belongs to cluster F which is obtained by:

𝐹 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑓=1:7 �𝜙10𝑓(𝑥𝑖)�, (18)

To clear up, we find the maximum value in each row of 𝜙𝑗 and the column, at which the maximum is localized, reveals the correct cluster to which the pixel node corresponding to that row is belonged. To determine the level 𝑗 at which the extended basis functions 𝜙𝑗 should be calculated, we determine the best number of clusters for our image and find level 𝑗 with nearest number of extended basis functions (𝑚) to the determined cluster number.

3.3.2. Image segmentation based on defining a new distance for each level

The second approach is based on defining new distances between each pair of pixel nodes and applying a central grouping method like k-means according to these new distances. This approach is similar to diffusion maps and the most important difference is new distance functions pertaining to each level and consequently, leading to different partitioning results at each level. The distance between pixel nodes 𝑥𝑖 and 𝑥𝑘 at level 𝑗 is defined based on extended basis functions (𝜙𝑗) at each level:

𝐷𝜙𝑗(𝑥𝑖, 𝑥𝑘) = (∑ �𝜙𝑗𝑙(𝑥𝑖) − 𝜙𝑗𝑙(𝑥𝑘)�2

)𝑚𝑙=1

12� (19)

We may consider the whole extended basis distance matrix at level j, which is given by the following:

𝐷𝜙𝑗 = �𝐷𝜙𝑗(𝑥1, 𝑥1) … 𝐷𝜙𝑗(𝑥1,𝑥𝑛)

⋮ ⋱ ⋮𝐷𝜙𝑗(𝑥𝑛, 𝑥1) … 𝐷𝜙𝑗(𝑥𝑛,𝑥𝑛)

� (20)

and, we can run a method like k-means clustering between the pixel nodes and the only difference with traditional k-means is using distance as calculated in 𝐷𝜙𝑗 instead of simple Euclidean or Mahanalobis distances.

3.4. Proposed algorithm for segmentation of OCT images

This algorithm is designed to localize the 11 layers (12 surfaces) as given in [45], using a two stage algorithm. In the first stage, according to the method introduced in section 3.3.1, we look for the best level, in which the number of extended basis functions is equal to 3 (number 3 is chosen to represent two background areas above and below the retina, and the region of interest (retinal layers)). Then, image clustering can be performed and the cluster with second highest cluster center can be chosen as retinal area to be analyzed in the second stage. Fig. 4 demonstrates a sample clustering in the first stage.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

15

Fig. 4. Sample clustering in the first stage. The scale number (14) is the chosen scale which has 3 extended basis functions. Clustering is done according to (20) to find 3 corresponding clusters, and the edge information of clusters are extracted to be used in next stage.

The edge points of the second cluster are then become more precise by looking for the highest gradient information in a 7 pixel length area around each pixel. The corrected positions can then be fed into graph information to add or remove the corresponding pixels to the group of nodes which will be classified as the second cluster. Fig. 5 shows two samples of this gradient based improvement step. Using the pixel information of the 8th boundary, we can find the location of boundaries 9 to 12 by gradient search in vertical direction.

Fig. 5. Two samples of gradient based improvement step for adding/ eliminating nodes to/from the second cluster. Yellow circles indicate the pixels which should be removed from the second cluster.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

16

Before starting the second stage, nodes pertaining to the second cluster should be detached from the rest and produce a new graph. Despite the similar method based on diffusion maps [45], this method doesn’t need the construction of a new graph and affinity matrix for new portion of the image; instead, we look through the matrix of extended basis functions and eliminate the rows, corresponding to the nodes which are not allocated to the second cluster. The remaining matrix can then be used for further analysis.

In the second stage, in each level, we apply the distance function defined in section 3.3.2, and perform the clustering on nodes of the second cluster. Fig. 6 shows samples of this stage with different window sizes (the best selected window size for reported results is 1×15 and number of clusters is selected to be 6).

Fig. 6. Samples of second stage in proposed segmentation method with different window sizes. The randomly selected box sizes show that thinner boxes result in better clustering. Furthermore, since the algorithm looks for transition points, despite of looking for 7 retinal layers, the best number of clusters is 6.

As it can be seen in Fig. 6, each cluster in second stage is not definitely allocated to a specific area; instead, the transitions between clusters are indicative of boundary positions. However, in methods based on diffusion maps (like [45]) the clusters were allocated to the regions and we needed to find the first and last points of each cluster. The reason of such a difference can be found in multi scale properties of diffusion wavelets which mostly depend on higher powers of the adjacency matrix and accordingly provides the possibility of connection between nodes located somehow far together. Fig. 7 shows incorrect segmentation result in the case of allocating each cluster to one region and localizing the first and last points of each cluster as desired boundaries.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

17

Fig. 7. Incorrect segmentation result in the case of allocating each cluster to one region and localizing the first and last points of each cluster as desired boundaries.

As it can be found based on Fig. 6, the resulted segmentation is different from one level to another. In lower levels with higher number of extended bases, over-segmentation is more probable and the complexity is higher. Therefore, we concentrate on 5 higher levels in each image and follow the next procedure. In last stage, the number of internal layers is supposed to be 7, and for each bunch of nodes in one column, the transition points in all of the desired levels are considered as possible boundary locations. Since it is probable that some boundary points would be missed in this procedure, and some unwanted points would be distinguished according to the transition points, the next algorithm is proposed to localize the correct position of the boundaries.

In order to find the location of each boundary (for example the 𝑖-th boundary with 𝑖 = 1, … ,6), do:

A) For the first boundary: 1) Suppose that in each column, the 𝑖 = 1 transition belongs to the 𝑖 = 1 -th boundary; and for

each column, the points 𝑋𝑖=1 = 𝑥1,𝑥2, … . , 𝑥𝑘 can be determined as the first boundary, in which 𝑘 is the number of columns from the original image, the center of the selected window is placed on each.

2) Calculate the mean and std of the 𝑖 = 1 –th boundary as 𝑚𝑖=1 and 𝑠𝑖=1. 3) For 𝑗 = 1, … ,𝑘, if �𝑥𝑗 −𝑚𝑖=1�j=J > 3

2× 𝑠𝑖=1, add the point 𝑥J to candidates of the incorrect

segmentation for the 𝑖 = 1 –th boundary in 𝑐𝐽𝑖=1.

4) Find the interpolated location 𝑥𝐽𝑖𝑛𝑡𝑒𝑟𝑝 from 𝑥𝐽−1 and 𝑥𝐽+1 and consider the next ( 𝑖 + 1 )

transition point (which should have been for next boundary) as a possible candidate 𝑥𝑙𝑛𝑒𝑥𝑡. Then replace the 𝑐𝐽𝑖=1 values in 𝑋𝑖=1 , with 𝑥𝐽

𝑖𝑛𝑡𝑒𝑟𝑝 and 𝑥𝐽𝑛𝑒𝑥𝑡 and calculate 𝑠𝑖𝑖𝑛𝑡𝑒𝑟𝑝 and 𝑠𝑖𝑛𝑒𝑥𝑡 for each replacement; the winner of replacement instead of 𝑐𝐽𝑖=1 will be the one with smaller std value.

B) For the rest of boundaries: 1) Suppose that in each column, the 𝑖 = 2, … ,6 -th transition belongs to the 𝑖 = 2, … ,6 -th

boundary; and for each column, the points 𝑋𝑖 = 𝑥1,𝑥2, … . , 𝑥𝑘 can be determined as the 𝑖 -th boundary, in which 𝑘 is the number of columns from the original image, the center of the selected window is placed on each.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

18

2) If there was any candidate of the incorrect segmentation in previous boundary (𝑐𝑙𝑖−1; 𝑙 =1, … , 𝐿), make 𝑋𝑙𝑖 = 𝑥1,𝑥2, … , 𝑐𝑙𝑖−1, … , 𝑥𝑘 in a way that candidates would be located in their corresponding column (𝑙).

3) Calculate the mean and std of the 𝑖 –th boundary as 𝑚𝑖 and 𝑠𝑖 for 𝑋𝑖. 4) Calculate the mean and std of the candidates for wrong segmentation of the 𝑖-th boundary

(𝑚𝑖𝑙 and 𝑠𝑖𝑙) for 𝑋𝑙𝑖 𝑙 = 1, … , 𝐿.

5) For 𝑙 = 1, … , 𝐿, if 𝑠𝑖𝑙 < 𝑠𝑖 , substitute the point 𝑥𝑙 in 𝑖-th boundary, with 𝑐𝑙𝑖−1. Furthermore, add 𝑥𝑙 to candidates of wrong segmentation in 𝑐𝑙𝑖 to be possibly used in next boundaries.

6) Make final 𝑋𝑖_𝑓𝑖𝑛𝑎𝑙 with replacement of best candidates. 7) Calculate the mean and std of the 𝑖 –th boundary as 𝑚𝑖

𝑓𝑖𝑛𝑎𝑙 and 𝑠𝑖𝑓𝑖𝑛𝑎𝑙, for 𝑋𝑖_𝑓𝑖𝑛𝑎𝑙. 8) For 𝑗 = 1, … ,𝑘 , if |𝑥𝑗 − 𝑚𝑖

𝑓𝑖𝑛𝑎𝑙|j=J > 32

× 𝑠𝑖𝑓𝑖𝑛𝑎𝑙 , add the point 𝑥J to candidates of the

incorrect segmentation for the 𝑖 –th boundary in 𝑐𝐽𝑖. 9) Replace 𝑐𝐽𝑖 in 𝑋𝑖_𝑓𝑖𝑛𝑎𝑙, with interpolated value from 𝑥𝑙−1 and 𝑥𝑙+1.

10) Find the interpolated location 𝑥𝐽𝑖𝑛𝑡𝑒𝑟𝑝 from 𝑥𝐽−1 and 𝑥𝐽+1 and consider the next ( 𝑖 + 1 )

transition point (which should have been for next boundary) as a possible candidate 𝑥𝑙𝑛𝑒𝑥𝑡. Then replace the 𝑐𝐽𝑖 values in 𝑋𝑖, with 𝑥𝐽

𝑖𝑛𝑡𝑒𝑟𝑝 and 𝑥𝐽𝑛𝑒𝑥𝑡 and calculate 𝑠𝑖𝑖𝑛𝑡𝑒𝑟𝑝 and 𝑠𝑖𝑛𝑒𝑥𝑡 for each replacement; the winner of replacement instead of 𝑐𝐽𝑖 will be the one with smaller std value.

11) If 𝑖 < 6, go to B. 4. Results

4.1. OCT Despeckling Results

4.1.1. Datasets

We used two datasets from different OCT imaging systems which includes Topcon 3D OCT-1000, and Cirrus Zeiss Meditec. Each set is consisted of six randomly selected 3-D OCTs. Subjects in Topcon dataset were diagnosed to have retinal Pigment Epithelial Detachment (PED), and the ones in Zeiss dataset are diagnosed to have Symptomatic Exudates Associated Derangement (SEAD). OCT data for Topcon dataset is obtained from Feiz Eye Hospital, Isfahan, Iran; and Zeiss dataset is provided by The Iowa Institute for Biomedical Imaging [112]. The value of the proposed measurements for each denoising method are reported on 144 randomly selected slices (72 randomly selected slices taken from Zeiss device and 72 randomly selected slices acquired from Topcon OCT imaging).

4.1.2. Data processing

The processing is implemented in MATLAB version 7.8 (MathWorks, Natick, MA) and Image Processing Toolbox, version 5.2. To implement the conventional dictionary learning and double sparse dictionary learning we used the code in [104, 106]. For implementation of dual-tree CWT we employed the code from [113, 114].

4.1.3. Performance evaluation

In this paper, we investigate the performance of 11 different speckle reduction methods described in Table 3.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

19

Table 3. List of speckle reduction methods, evaluated in this paper.

Category Name Short name Described in section

Dictionary learning

2D Conventional Dictionary Learning 2D CDL 2.2

2D/3D Double Sparse Dictionary Learning 2D/3D DSDL 2.3

Real part of 2D/3D Dictionary Learning with start dictionary of dual-tree Complex Wavelet Transform

2D/3D RCWDL 2.5

Imaginary part of 2D/3D Dictionary Learning with start dictionary of dual-tree Complex Wavelet Transform

2D/3D ICWDL 2.5

Wavelet transform

2D separable Discrete Wavelet Transform 2D SDWT 2.4 Real part of 2D dual-tree Complex Wavelet

Transform 2D RCWT 2.4

(full Complex) 2D dual-tree Complex Wavelet Transform 2D CCWT 2.4

(full Complex) 3D dual-tree Complex Wavelet Transform 3D CCWT 2.4

Furthermore, for each method in category of dictionary learning, we evaluated the difference in obtained results based on learning the dictionary using a dataset itself, or learning from other available datasets. For this purpose, we constructed a base dictionary for each imaging system modality using dictionary learning methods and the saved dictionary for each modality was used for denoising of each new OCT belonging to that class.

In order to construct a base dictionary for each modality, two disciplines may be imposed. The first approach is random selection of a sample from each imaging system and learning the dictionary for that prototype. Our results showed that different prototypes produce dissimilar denoising results and this may alter the interpretations over performance of a particular method. Therefore, we proposed the second approach in which many datasets are chosen and the dictionary is learned through a number of ensamples to have the least possible dependence on the selected prototype. For this purpose, we start from a start dictionary on the first ensample to find the first version of the dictionary. The obtained dictionary is then used as a start dictionary on the second ensample to produce the second (updated) version of the dictionary. The process continues for obtaining an enhanced version of dictionary to be saved and used for denoising of each new OCT belonging to that class. In order to find the number of needed updates (the number of ensamples to be used in construction of the enhanced dictionary), we proposed a new algorithm depicted in Diagram 1.

According to the proposed algorithm, we calculate the variance of contrast to noise ratio (CNR) for different number of ensamples. It is obvious that when we use only one ensample, dissimilar results will occur as a result of having different dictionaries. When we increase the number of ensamples, the enhanced dictionary becomes more robust and different dictionaries produce more similar results. Finally, we find a value in which by adding the number of ensamples, the enhanced dictionary doesn't change considerably. In other words, the variance of CNRs become similar and increscent of the number of ensamples is not useful any more. By selection of 𝜀 = 0.1 , we found that in our datasets, 4 ensamples are enough to produce robust-enough dictionaries. These ensamples are 2D images (or 3D datasets) in proposed 2D methods (or 3D methods). The values reported in results of this paper for dictionary learning methods are obtained by leaving out each dataset, making an enhanced dictionary from randomly chosen 4 datasets, and calculating the mean result.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

20

Diagram 1. Proposed algorithm to find the number of needed updates for an enhanced dictionary.

A. Take number of ensamples k = 0. B. Var_old = 100. C. k = k +1, Var_old = Var_new; D. for i =1:10

a. Choose k datasets, randomly. b. Calculate the enhanced dictionary for k datasets with updating the dictionary on each

data. c. Use the learned enhanced dictionary for denoising of dataset number 1 (or any fixed

dataset). d. Calculate the resulted CNR. e. Temp(i) = CNR.

E. Var_new = variance (Temp). F. If |𝑉𝑎𝑟_𝑜𝑙𝑑 − 𝑉𝑎𝑟_𝑛𝑒𝑤|>𝜀

a. Go to C. G. k is the number of ensamples needed in construction of the enhanced dictionary.

4.1.4. Quantitative measures of performance

CNR, Texture Preservation (TP), Edge Preservation (EP), and Equivalent Number of Looks (ENL) were computed for each denoising method and for OCTs from each imaging system, to compare the performance of the denoising algorithms. The mentioned values are computed based on methods discussed in [37] and regions for analysis of each criterion is shown in Fig. 8.

CNR is a measurement of the contrast between a feature in Region of Interest (ROI) and background noise. The TP values over different ROIs can provide the desirable measure between 0 and 1, where near the 0 values are expected for filters that severely flatten the image structures. EP correlates the edges in the processed and in the original noisy image over the locally selected ROI, and ENL measures smoothness in areas that should appear homogenous, but are corrupted by speckle.

Fig. 8. Selected ROIs for calculation of a) CNR, b) EP, c) TP and ENL. The bigger circle outside the retinal region is used as

background ROI and other circles represent foreground ROIs.

In order to have a comparison between time complexity of the evaluated denoising methods, mean ± std of the needed evaluation time for each slice on a PC with Microsoft Windows X32 edition, Intel core 2 Duo CPU at 3.00GHz, 4 GB RAM are reported. The value of the proposed measurements for each denoising method on 144 randomly selected slices is summarized in Table 4 and 5; where Table 4 is dedicated to 72 randomly selected slices from OCTs taken from Zeiss device and Table 5 shows the

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

21

results for 72 randomly selected slices acquired from Topcon OCT imaging. Some samples of datasets are also depicted after application of denoising methods in Fig. 9-12. As it can be seen in Tables 4 and 5, the performance of the proposed method in 3D R/I CWDL are considerably better than other methods in CNR and ENL. Performance of all of the studied methods are similar in EP and 2D SDWT had the highest TP which is obtained because of the least difference appeared in the resulted image which cannot be considered as a positive point. TP is low for 3D CCWT which shows unwanted flattening of this method. Visual assessments for Fig. 9-12 can also confirm the superiority of 3D RCWDL on other evaluated denoising methods.

The reported time for 2D/3D R/I CWDL methods are based on denoising with a previously learned enhanced dictionary (described in 2.5). It is obvious that using a saved dictionary reduces the time complexity in comparison with methods that need dictionary learning on each data (like 2D CDL with required time of 32.13±2.42 seconds for Zeiss data and 24.87± 1.35 seconds for Topcon data). There is no doubt that required time for 3D methods is higher than similar 2D versions, but the improved CNR and ENL of 3D methods can be considered as a compensation for their time complexity.

Fig. 13 demonstrates samples of the dictionaries used in evaluated dictionary learning-based methods. Size of each dictionary patch in Fig. 13 is 8×8. 100 (10 rows and 10 columns) dictionary patches are used for DSDL methods, and 256 (16 rows and 16 columns) dictionary patches are used for the rest.

Table 4. The value of the proposed measurements for each denoising method in datasets from Zeiss device.

Ori

gina

l

2D C

DL

2D D

SDL

3D D

SDL

2D

RC

WD

L

3D

RC

WD

L

2D

ICW

DL

3D

ICW

DL

2D

SDW

T

2D

RC

WT

2D

CC

WT

3D

CC

WT

CNR 0.91± 0.74

4.36± 4.33

5.14± 4.66

11.86± 9.73

4.33 ± 3.30

11.91 ± 8.18

4.34 ± 3.32

11.16 ± 9.41

2.15 ± 1.93

3.12 ± 2.63

4.12 ± 3.11

7.23 ± 7.03

EP 1± 0 0.90 ± 0.02

0.88 ± 0.02

0.89 ± 0.02

0.90 ± 0.02

0.87 ± 0.02

0.89 ± 0.02

0.88 ± 0.02

0.92 ± 0.01

0.90 ± 0.02

0.87 ± 0.02

0.78 ± 0.01

TP 1± 0

0.18 ± 0.03

0.15 ± 0.03

0.17 ± 0.04

0.18 ± 0.03

0.10 ± 0.02

0.18 ± 0.03

0.10 ± 0.02

0.26 ± 0.02

0.16 ± 0.02

0.11 ± 0.02

0.07 ± 0.01

ENL 5.08 ± 0.54

140.33± 56.09

172.82± 37.93

1293.92± 810.98

140.61 ± 56.22

1322.79 ± 750.09

140.53 ± 56.24

1322.81 ± 750.53

30.09 ± 17.18

68.24 ± 46.36

143.39 ± 117.08

388.38 ± 328.35

Evaluation Time (sec)

-- 32.13±2.42

4.92± 0.17

16.31±1.63

7.97± 0.23

13.64± 1.52

8.04 ± 0.91

13.76± 1.79

0.19± 0.02

0.29 ± 0.02

0.74±0.01

1.41±0.02

Table 5. The value of the proposed measurements for each denoising method in datasets from Topcon device.

Ori

gina

l

2D C

DL

2D

DSD

L

3D

DSD

L

2D

RC

WD

L

3D

RC

WD

L

2D

ICW

DL

3D

ICW

DL

2D

SDW

T

2D

RC

WT

2D

CC

WT

3D

CC

WT

CNR 3.09 ±1.35

24.24 ±10.63

28.09 ±10.19

78.83 ± 33.20

24.25 ± 10.61

88.91 ± 32.00

24.26 ± 10.63

86.91 ± 31.49

14.66 ± 6.60

26.51 ± 11.55

34.67 ± 15.28

55.06 ± 23.47

EP 1± 0 0.88 ± 0.01

0.86 ± 0.01

0.86 ± 0.01

0.88 ± 0.01

0.86 ± 0.01

0.88 ± 0.01

0.87 ± 0.01

0.90 ± 0.01

0.87 ± 0.01

0.85 ± 0.01

0.81 ± 0.01

TP 1± 0 0.18 ± 0.03

0.17 ± 0.04

0.19 ± 0.05

0.18 ± 0.04

0.14 ± 0.03

0.18 ± 0.04

0.13 ± 0.03

0.23 ± 0.03

0.13 ± 0.02

0.09 ± 0.02

0.06 ± 0.02

ENL 28.69± 0.49

2045.08± 367.62

2183.56 ± 423.96

19059.32 ± 9852.14

2047.14 ±

364.41

22231.73 ±

354.43

2047.43± 364.63

22196.73 ± 354.55

671.47 ± 44.49

2316.04 ± 430.29

4434.58 ± 1270.34

8653.17 ± 3846.45

Evaluation Time

-- 24.87± 1.35

3.43 ± 0.10

11.3± 0/97

5.15± 0.11

9.41± 1.07

5.14± 0.63

9.52± 1.06

0.12 ±0.01

0.19± 0.02

0.54± 0.01

0.96± 0.01

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

22

Fig. 9. Denoising results in a sample image. a)Original noisy image, b) 2D CDL, c) 2D ICWDL, d) 2D RCWDL, e) 3D ICWDL, f) 3D RCWDL, g) 2D DSDL, h) 3D DSDL, i) 2D SDWT, j) 2D RCWT, k) 2D CCWT, l) 3D CCWT.

Fig. 10. Denoising results in a sample image. a)Original noisy image, b) 2D CDL, c) 2D ICWDL, d) 2D RCWDL, e) 3D ICWDL, f) 3D RCWDL, g) 2D DSDL, h) 3D DSDL, i) 2D SDWT, j) 2D RCWT, k) 2D CCWT, l) 3D CCWT.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

23

Fig. 11. Denoising results in a sample image. a)Original noisy image, b) 2D CDL, c) 2D ICWDL, d) 2D RCWDL, e) 3D ICWDL, f) 3D RCWDL, g) 2D DSDL, h) 3D DSDL, i) 2D SDWT, j) 2D RCWT, k) 2D CCWT, l) 3D CCWT.

Fig. 12. Denoising results in a sample image. a)Original noisy image, b) 2D CDL, c) 2D ICWDL, d) 2D RCWDL, e) 3D ICWDL, f) 3D RCWDL, g) 2D DSDL, h) 3D DSDL, i) 2D SDWT, j) 2D RCWT, k) 2D CCWT, l) 3D CCWT.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

24

(a) (b) (c)

(d) (e) (f)

(g) (h)

Fig. 13. Samples of the dictionaries used in evaluated dictionary learning-based methods. a)Dictionary in 2D CDL, b) start dictionary in 2D DSDL, c) Dictionary in 2D DSDL, d) One slice of Dictionary in 3D DSDL, e) Dictionary in 2D ICWDL, f) Dictionary in 2D RCWDL, g) One slice of Dictionary in 3D ICWDL, h) One slice of Dictionary in 3D RCWDL. Size of each dictionary patch is 8×8, 100 (10 rows and 10 columns) dictionary patches are used for DSDL methods (b, c ,d), and 256 (16 rows and 16 columns) dictionary patches are used for the rest (a, e, f, g, h).

In order to make a comparison with recently reported state-of-the-art methods, we chose works of Fang et al. [43] and [96] and Luan [44]. Fang et al. [43] and [96] provided online test codes, but since their methods require a reference dataset which is provided by a customized scanning pattern, it cannot be used on our datasets. Furthermore, Luan [44] didn’t provide any online code and their results have only reported the CNR and ENL improvements.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

25

The proposed method by Fang et al. [96] didn’t provide the original CNR values and correspondingly, the resulted CNRs are not even roughly comparable with our results. Therefore, we may only compare mean of CNR improvement with the reported values of [43] and [44]. It is worth noting that this comparison is not numerically precise, since the evaluation is not performed on similar datasets. Table 6 demonstrates the good performance of the proposed method in CNR and ENL improvement.

The available data of works of Fang et al. [43] and [96] don’t include all B-scans of each patient (17 randomly selected B-scans from 17 subjects in [43], and 5 consequence B-scans from each of the 39 subjects in [96]). So, we can’t apply our 3D method on these datasets. Although according to Table 4-5 there is an essential improvement in denoising results using 3D version of our algorithm, higher CNR is obtained even by applying 2D version of our algorithm (RCWDL) on datasaet of [43] (5.28±0.56 vs 4.76±1.54). For available data in [96], the CNRs are close together (5.27± 0.67 vs 5.32±1.50), however we should accentuate that the best performance of our algorithm (3D versions) could not be evaluated on provided 2D scans of the mentioned papers, and a considerable CNR improvement is anticipated if 3D method could apply on data. Furthermore, compared to [43] and [98], our proposed method doesn’t need any high-SNR scans for dictionary learning or any repeated scans (or averaged versions of scans) for execution.

Table 6. Comparison of the proposed method with Luan and Wu [44] in CNR and ENL improvement.

Methods Mean CNR improvement Mean ENL improvement Our proposed method (2D RCWDL) on datasets from Topcon Device/ 2D method

21.17 (original CNR = 3.09) 2018.74(original ENL = 28.69)

Our proposed method (2D RCWDL) on datasets from Topcon Device/ 3D method

85.82(original CNR = 3.09) 22203.04(original ENL = 28.69)

Our proposed method (2D RCWDL) on datasets from Zeiss Device/ 2D method

3.43 (original CNR = 0.91) 135.53(original ENL = 5.08)

Our proposed method (2D RCWDL) on datasets from Zeiss Device/ 3D method

11 (original CNR = 0.91) 1317.71(original ENL = 5.08)

Proposed method by [44] /applied to the posterior eye

0.9643(original CNR = 2.48) 25.1692(original ENL = 33.26)

Proposed method by [44] /applied to the anterior eye

0.5227(original CNR = 0.99) 4.4226(original ENL = 3.40)

The main motivation behind the need for OCT denoising is assisting better segmentation and

registration of these images. Despite the fact that many segmentations methods are robust to noise and don’t utilize denoising methods (Table 2), many other methods are fully dependent on edge enhancement through image denoising. Particularly, dependence on this step is more substantial in methods which perform based on analysis of A-scans or gradient information from B-scans in OCT datasets [45]. The unprocessed OCT image has a grainy appearance because of the presence of speckle [44] which makes the image features undetectable. However, after a good speckle reduction, much of the speckle noise can be suppressed, making image features hidden in the original image more obvious and accordingly providing more reliable segmentation. Fig. 14 shows the ability of the proposed method in suppressing the speckle noise by making smooth cross sections with identifiable feature peaks and valleys.

Furthermore, to evaluate the effect of our despeckling algorithm in segmentation of pathological cases, Fig. 15 shows the results of a method for localization of RPE complex in datasets diagnosed to have SEAD pathology [93]. It can be seen that the RPE complex is not localized correctly in noisy image, but the proposed 2D and 3D methods could improve the localization task.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

26

Fig. 14. The cross section signal along the white solid line. a) Original sample image from Zeiss device, b) denoised image using 2D RCWDL, c) denoised image using 3D RCWDL. ILM/NFL, inner limiting membrane/nerve fibre layer; IPL, inner lexiform layer; INL, inner nuclear layer; OPL, outer plexiform layer; ELM, external limiting membrane; IS/OS junction between the inner and outer segment of the photoreceptors; RPE, retinal pigment epithelium; CH/SC junction between the choroid and sclera.

Fig. 15. Results of RPE complex localization in datasets diagnosed to have SEAD pathology, using method introduced in [93]. a) Original sample image from Zeiss device, b) denoised image using 2D RCWDL, c) denoised image using 3D RCWDL.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

27

4.2. OCT segmentation results 4.2.1. Datasets

The proposed method was tested on thirteen 3D macular SD-OCT images obtained from eyes without pathologies using Topcon 3D OCT-1000 imaging system in Ophthalmology Dept., Feiz Hospital, Isfahan, Iran. The 𝑥,𝑦, 𝑧 size of the obtained volumes was 512 × 650 × 128 voxels, 7 × 3.125 × 3.125 mm3, voxel size 13.67 × 4.81 × 24.41 𝜇𝑚3. The algorithm was tested against the manual tracings of two independent observers on 20 slices selected randomly from the datasets.

4.2.2. Data processing

The processing is implemented in MATLAB version 7.8 (MathWorks, Natick, MA) and Image Processing Toolbox, version 5.2. For calculation of scaling and wavelet bases, MATLAB toolbox provided by Maggioni [27] was utilized and the mex codes by Wang [111] were also added to increase the calculation speed.

4.2.3. Performance evaluation

The proposed two-stage method is applied on the dataset to localize the 11 layers (12 surfaces) as given in [45]. For validation, the mean unsigned border positioning errors for each border were computed and are presented in Tables 7 for each boundary. Fig. 16 shows an example of the achieved results.

Table 7. Summary of mean unsigned border positioning errors (mean ± sd) in μm boundary Avg. Obs. vs. Our

Alg. Obs. 1 vs. Obs. 2

1 7.24±4.37 6.25±3.12 2 11.81±4.43 15.94±8.94 3 10.57±3.85 10.63±6.19 4 16.42±4.65 16.25±9.27 5 10.21±3.27 12.82±5.36 6 13.02±2.37 12.53±4.89 7 6.37±2.73 7.81±3.76 8 3.57±1.67 5.63±2.23 9 3.46±2.43 5.61±2.58 10 4.59±2.81 8.42±3.95 11 7.41±2.62 6.55±3.84 12 5.79±2.71 7.19±2.45 total 9.22±3.05 9.65±4.83

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

28

Fig. 16. Example results on a sample data. (a) Composite image. (b) Composite image with independent standard (c).

Composite image with segmented borders using the proposed approach.

The computation time of the proposed algorithms (implemented in MATLAB™ version 7.8 -Math Works, Inc., Natick, MA) and Image Processing Toolbox, version 5.2, on a PC with Microsoft Windows XP x64 edition Intel Core i7 3930K, 64 GB RAM were 45 seconds for each image. Comparison of the results with proposed method in [45] shows overall similarity in final results (total mean unsigned border positioning errors (mean ± sd) in μm for the proposed method is 9.22±3.05, while the same value in [45] is 8.81±2.94, with p-value >0.01). However, the proposed method outperforms the method in [45] in localization of faded boundaries in 3 and 7 (mean unsigned border positioning errors of third boundary (mean ± sd) in μm for the proposed method is 10.57±3.85 while the same value in [45] is 11.15±4.18, with p-value <0.01; and mean unsigned border positioning errors of seventh boundary (mean ± sd) in μm for the proposed method is 6.37±2.73 while the same value in [45] is 7.43±1.56, with p-value <0.01). This shows the ability of the proposed method in faded boundaries and can be considered as its intrinsic multiscale abilities. 5. Conclusion In this paper, we discussed about applications of different atomic representations in 3D OCT data, introduced a new atomic representation for denoising task, and justified the superiority of the proposed method over other prevalent methods. Furthermore, we discussed about another tool for 3D data-driven

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

29

multi-scale atomic representation of OCT based on new theories in image segmentation using diffusion wavelets.

We defined three fundamental properties for a subgroup of non-parametric atomic representations which make them appropriate for image processing of OCT: being data-driven (as the principal factor in non-parametric dictionary learning methods to better fit the data); applicability on 3D data (to fit the intrinsic three dimensional property of OCT); and working in multi-scale (to provide fine-to-coarse representation of the information). Complex wavelet based K-SVD, and diffusion wavelets found to be candidates possessing these properties among above representations. We introduced complex wavelet based K-SVD and demonstrated its ability in speckle reduction of OCT datasets. We discussed the application of dictionary learning along with dual tree complex wavelet (complex wavelet based K-SVD) for denoising of OCT data in 2D and 3D. We evaluated different dictionary learning methods and compared their results with dual tree complex wavelet. We also took advantage of adaptability in dictionary learning methods to improve the performance of simple dual tree complex wavelets and showed the superiority of this combined method over each of the mentioned denoising algorithms. As it is shown in the paper, the combined 3D R/I CWDL has better CNR and ENL in comparison with other methods; the time complexity of this method is higher than complex wavelets, but it can be improved by learning an enhanced dictionary for each imaging device and applying the saved dictionary to reduce the time. Visual interpretation of the results also confirms the better performance of the introduced method. It should also be mentioned that in spite of methods based on dual tree complex wavelet that force the data to have a size in powers of two, the presented combinatory method doesn't need any particular size of the data to be processed. Furthermore, since our OCT datasets are acquired together, the proposed 3D methods are simply applicable on saved datasets.

Furthermore, two approaches were proposed in this paper for image segmentation using diffusion wavelets. The first method was based on extended basis functions at each level and designing a competition between the bases value for partitioning. The second approach was defining a new distance for each level and clustering based on such distances. It was previously shown that diffusion maps are powerful in segmentation of OCTs in 2D and 3D [45, 46], and in this paper we discussed about their multi-scale versions expected to be able to manage segmentation tasks in OCT segmentation. Comparison of the results with proposed method in [45] shows overall similarity in final results; However, the proposed method outperforms the method in [45] in localization of faded boundaries in 3 and 7 , which can be considered as its intrinsic multiscale abilities. In this paper we only tried to introduce diffusion wavelet as a new tool for multi-scale representation on geometrical structure of OCT data. We are now working toward development of this 3D data-driven multi-scale atomic representation for fully automatic analysis of pathologic OCT data, in next studies.

Acknowledgements

The authors would like to thank Prof. Milan Sonka and Prof. Michael D. Abramoff from the University of Iowa for their data contribution support.

References

[1] D. Huang, E. A. Swanson, C. P. Lin, J. S. Schuman, W. G. Stinson, W. Chang, et al., "Optical coherence tomography," Science, vol. 254, pp. 1178-1181, 1991.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

30

[2] M. R. Hee, C. A. Puliafito, C. Wong, J. S. Duker, E. Reichel, B. Rutledge, et al., "Quantitative assessment of macular edema with optical coherence tomography," Archives of ophthalmology, vol. 113, pp. 1019-1029, 1995.

[3] R. Koprowski and Z. Wróbel, Image processing in optical coherence tomography using Matlab. Poland: Katowice 2011.

[4] A. G. Podoleanu, "Optical coherence tomography," The British Journal of Radiology, vol. 78, pp. 976-988, 2005.

[5] H. Rabbani, M. Sonka, and M. D. Abramoff, "Optical Coherence Tomography Noise Reduction Using Anisotropic Local Bivariate Gaussian Mixture Prior in 3-D Complex Wavelet Domain," International Journal of Biomedical Imaging, 2013, 23 pages.

[6] S. Mallat, A wavelet tour of signal processing: Academic press, 1999. [7] J. Shlens, A tutorial on principal component analysis. San Diego: University of California 2005. [8] I. T. Jolliffe, Principal component analysis vol. 487: Springer-Verlag New York, 1986. [9] P. Comon, "Independent component analysis, a new concept?," Signal processing, vol. 36, pp.

287-314, 1994. [10] A. Hyvärinen and E. Oja, "Independent component analysis: algorithms and applications," Neural

networks, vol. 13, pp. 411-430, 2000. [11] R. R. Coifman and S. Lafon, "Diffusion maps," Applied and computational harmonic analysis, vol.

21, pp. 5-30, 2006. [12] M. Varma and A. Zisserman, "A statistical approach to material classification using image patch

exemplars," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, pp. 2032-2047, 2009.

[13] B. A. Olshausen, "Emergence of simple-cell receptive field properties by learning a sparse code for natural images," Nature, vol. 381, pp. 607-609, 1996.

[14] K. Engan, S. O. Aase, and J. Husoy, "Frame based signal compression using method of optimal directions (MOD)," in IEEE International Symposium on Circuits and Systems, 1999, pp. 1-4.

[15] M. Aharon, M. Elad, and A. Bruckstein, "-svd: An algorithm for designing overcomplete dictionaries for sparse representation," IEEE Transactions on Signal Processing, vol. 54, pp. 4311-4322, 2006.

[16] G. A. Campbell and R. M. Foster, Fourier integrals for practical applications. New York: Bell telephone laboratories, 1948.

[17] N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete cosine transform," Computers, IEEE Transactions on, vol. 100, pp. 90-93, 1974.

[18] I. Daubechies, Ten lectures on wavelets vol. 61: SIAM, 1992. [19] R. R. Coifman and M. V. Wickerhauser, "Entropy-based algorithms for best basis selection," IEEE

Transactions on Information Theory, vol. 38, pp. 713-718, 1992. [20] D. L. Donoho, "Wedgelets: Nearly minimax estimation of edges," The Annals of Statistics, vol. 27,

pp. 859-897, 1999. [21] N. G. Kingsbury, "The dual-tree complex wavelet transform: a new technique for shift invariance

and directional filters," in IEEE Digital Signal Processing Workshop, 1998, pp. 120-131. [22] E. Candes, L. Demanet, D. Donoho, and L. Ying, "Fast discrete curvelet transforms," Multiscale

Modeling & Simulation, vol. 5, pp. 861-899, 2006. [23] E. Le Pennec and S. Mallat, "Bandelet image approximation and compression," Multiscale

Modeling & Simulation, vol. 4, pp. 992-1039, 2005. [24] M. N. Do and M. Vetterli, "The contourlet transform: an efficient directional multiresolution

image representation," IEEE Transactions on Image Processing, vol. 14, pp. 2091-2106, 2005. [25] Y. Lu and M. N. Do, "3-D directional filter banks and surfacelets," in SPIE Optics & Photonics

2005, p. 59141Q.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

31

[26] I. Jolliffe, Principal component analysis: Wiley Online Library, 2005. [27] R. R. Coifman and M. Maggioni, "Diffusion wavelets," Applied and computational harmonic

analysis, vol. 21, pp. 53-94, 2006. [28] D. C. Adler, T. H. Ko, and J. G. Fujimoto, "Speckle reduction in optical coherence tomography

images by use of a spatially adaptive wavelet filter," Optics Letters, vol. 29, pp. 2878-2880, 2004. [29] S. Chitchian, M. A. Fiddy, and N. M. Fried, "Denoising during optical coherence tomography of

the prostate nerves via wavelet shrinkage using dual-tree complex wavelet transform," Journal of Biomedical Optics, vol. 14, p. 014031, 2009.

[30] C. Buranachai, P. Thavarungkul, P. Kanatharanaa, and I. Meglinski, "Application of wavelet analysis in optical coherence tomography for obscured pattern recognition," Laser Physics Letters, vol. 6, p. 892, 2009.

[31] Z. Jian, Z. Yu, L. Yu, B. Rao, Z. Chen, and B. J. Tromberg, "Speckle attenuation in optical coherence tomography by curvelet shrinkage," Optics Letters, vol. 34, pp. 1516-1518, 2009.

[32] Z. Jian, L. Yu, B. Rao, B. J. Tromberg, and Z. Chen, "Three-dimensional speckle suppression in optical coherence tomography based on the curvelet transform," Optics express, vol. 18, pp. 1024–1032, 2010.

[33] Q. Guot, S. Sunt, F. Dong, B. Z. Ga02, and R. Wang, "Optical Coherence Tomography Heart Tube Image Denoising Based on Contourlet Transform," in International Conference on Machine Learning and Cybernetics (ICMLC), , Xian 2012, pp. 1139 - 1144

[34] J. Zhou, A. L. Cunha, and M. N. Do, "Nonsubsampled contourlet transform: construction and application in enhancement," in IEEE International Conference on Image Processing, ICIP, 2005, pp. 469-72.

[35] V. Gupta, C. C. Chan, C.-L. Poh, T. H. Chow, T. C. Meng, and N. B. Koon, "Computerized automation of wavelet based denoising method to reduce speckle noise in OCT images," in IEEE International Conference on Information Technology and Applications in Biomedicine, ITAB., 2008, pp. 120-123.

[36] M. A. Mayer, A. Borsdorf, M. Wagner, J. Hornegger, C. Y. Mardin, and R. P. Tornow, "Wavelet denoising of multiframe optical coherence tomography data," Biomedical optics express, vol. 3, pp. 572-589, 2012.

[37] A. Pizurica, L. Jovanov, B. Huysmans, V. Zlokolica, P. De Keyser, F. Dhaenens, et al., "Multiresolution denoising for optical coherence tomography: a review and evaluation," Current Medical Imaging Reviews, vol. 4, pp. 270-284, 2008.

[38] G. Quellec, K. Lee, M. Dolejsi, M. K. Garvin, M. D. Abràmoff, and M. Sonka, "Three-dimensional analysis of retinal layer texture: identification of fluid-filled regions in SD-OCT of the macula," IEEE Transactions on Medical Imaging, vol. 29, pp. 1321-1330, 2010.

[39] V. Zlokolica, L. Jovanov, A. Pizurica, P. De Keyser, F. Dhaenens, and W. Philips, "Wavelet-based denoising for 3D OCT images," in Proceedings of SPIE Medical Imaging, 2007, p. 66960P.

[40] V. Kajić, M. Esmaeelpour, B. Považay, D. Marshall, P. L. Rosin, and W. Drexler, "Automated choroidal segmentation of 1060 nm OCT in healthy and pathologic eyes using a statistical model," Biomedical optics express, vol. 3, pp. 86–103, 2012.

[41] V. Kajić, B. Považay, B. Hermann, B. Hofer, D. Marshall, P. L. Rosin, et al., "Robust segmentation of intraretinal layers in the normal human fovea using a novel statistical model based on texture and shape analysis," Opt. Express, vol. 18, pp. 14730-14744, 2010.

[42] M. Forouzanfar and H. Moghaddam, "A directional multiscale approach for speckle reduction in optical coherence tomography images," in IEEE International Conference on Electrical Engineering. ICEE'07. , Lahore 2007, pp. 1-6.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

32

[43] L. Fang, S. Li, Q. Nie, J. A. Izatt, C. A. Toth, and S. Farsiu, "Sparsity based denoising of spectral domain optical coherence tomography images," Biomedical optics express, vol. 3, pp. 927-942, 2012.

[44] F. Luan and Y. Wu, "Application of RPCA in optical coherence tomography for speckle noise reduction," Laser Physics Letters, vol. 10, pp. 35-43 2013.

[45] R. Kafieh, H. Rabbani, M. D. Abramoff, and M. Sonka, "Intra-Retinal Layer Segmentation of 3D Optical Coherence Tomography Using Coarse Grained Diffusion Map," Medical Image Analysis, vol. 17, no. 8, June 2013, pp. 907-928.

[46] J. Jalili, H. Rabbani, M. Akhlaghi, R. Kafieh, and A. Mehridehnavi, "Forming projection images from each layer of retina using diffusion map based OCT segmentation," in IEEE International Conference on Information Science, Signal Processing and their Applications (ISSPA), Canada, 2012, pp. 930-934.

[47] R. Kafieh, H. Rabbani, and S. Gazor, "Combination of graph theoretic grouping and time-frequency analysis for image segmentation," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, 2014, pp. 5115-5119.

[48] R. Kafieh and H. Rabbani, "Optical Coherence Tomography noise reduction over learned dictionaries with introduction of complex wavelet for start dictionary," in SPIE Optical Engineering+ Applications, 2013, pp. 885826-885826-8.

[49] P. Milanfar, "A tour of modern image filtering: new insights and methods, both practical and theoretical," IEEE Signal Processing Magazine, vol. 30, pp. 106-128, 2013.

[50] B. Potsaid, I. Gorczynska, V. J. Srinivasan, Y. Chen, J. Jiang, A. Cable, et al., "Ultrahigh speed spectral/Fourier domain OCT ophthalmic imaging at 70,000 to 312,500 axial scans per second," Optics express, vol. 16, pp. 15149-15169, 2008.

[51] M. Szkulmowski, I. Gorczynska, D. Szlag, M. Sylwestrzak, A. Kowalczyk, and M. Wojtkowski, "Efficient reduction of speckle noise in Optical Coherence Tomography," Optics express, vol. 20, pp. 1337-1359, 2012.

[52] M. Bashkansky and J. Reintjes, "Statistics and reduction of speckle in optical coherence tomography," Optics Letters, vol. 25, pp. 545-547, 2000.

[53] D. D. Duncan, S. J. Kirkpatrick, and R. K. Wang, "Statistics of local speckle contrast," JOSA A, vol. 25, pp. 9-15, 2008.

[54] J. W. Goodman, "Some fundamental properties of speckle," JOSA, vol. 66, pp. 1145-1150, 1976. [55] B. Karamata, K. Hassler, M. Laubscher, and T. Lasser, "Speckle statistics in optical coherence

tomography," JOSA A, vol. 22, pp. 593-596, 2005. [56] J. M. Schmitt, S. Xiang, and K. M. Yung, "Speckle in optical coherence tomography," Journal of

Biomedical Optics, vol. 4, pp. 95-105, 1999. [57] N. M. Grzywacz, J. de Juan, C. Ferrone, D. Giannini, D. Huang, G. Koch, et al., "Statistics of optical

coherence tomography data from human retina," IEEE Transactions on Medical Imaging, vol. 29, pp. 1224-1237, 2010.

[58] M. Hughes, M. Spring, and A. Podoleanu, "Speckle noise reduction in optical coherence tomography of paint layers," Applied optics, vol. 49, pp. 99-107, 2010.

[59] N. Iftimia, B. E. Bouma, and G. J. Tearney, "Speckle reduction in optical coherence tomography by “path length encoded” angular compounding," Journal of Biomedical Optics, vol. 8, pp. 260-263, 2003.

[60] L. Ramrath, G. Moreno, H. Mueller, T. Bonin, G. Huettmann, and A. Schweikard, "Towards multi-directional OCT for speckle noise reduction," in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2008, ed: Springer, 2008, pp. 815-823.

[61] A. Desjardins, B. Vakoc, G. Tearney, and B. Bouma, "Speckle reduction in OCT using massively-parallel detection and frequency-domain ranging," Optics express, vol. 14, pp. 4736–4745, 2006.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

33

[62] M. Pircher, E. Go, R. Leitgeb, A. F. Fercher, and C. K. Hitzenberger, "Speckle reduction in optical coherence tomography by frequency compounding," Journal of Biomedical Optics, vol. 8, pp. 565-569, 2003.

[63] B. Sander, M. Larsen, L. Thrane, J. Hougaard, and T. Jørgensen, "Enhanced optical coherence tomography imaging by multiple scan averaging," British journal of ophthalmology, vol. 89, pp. 207-212, 2005.

[64] E. Götzinger, M. Pircher, and C. K. Hitzenberger, "High speed spectral domain polarization sensitive optical coherence tomography of the human retina," Optics express, vol. 13, pp. 10217–10229, 2005.

[65] T. M. Jørgensen, J. Thomadsen, U. Christensen, W. Soliman, and B. Sander, "Enhancing the signal-to-noise ratio in ophthalmic optical coherence tomography by image registration—method and clinical examples," Journal of Biomedical Optics, vol. 12, pp. 041208-10, 2007.

[66] R. D. Ferguson, D. X. Hammer, L. A. Paunescu, S. Beaton, and J. S. Schuman, "Tracking optical coherence tomography," Optics Letters, vol. 29, pp. 2139-2141, 2004.

[67] Spectralis. Available: http://www.heidelbergengineering.com/products/spectralis-models/. [68] M. R. Hee, J. A. Izatt, E. A. Swanson, D. Huang, J. S. Schuman, C. P. Lin, et al., "Optical coherence

tomography of the human retina," Archives of ophthalmology, vol. 113, pp. 325-332, 1995. [69] K. L. Boyer, A. Herzog, and C. Roberts, "Automatic recovery of the optic nervehead geometry in

optical coherence tomography," IEEE Transactions on Medical Imaging, vol. 25, pp. 553-570, 2006.

[70] A. George, J. Dillenseger, A. Weber, and A. Pechereau, "Optical coherence tomography image processing," Investigat. Ophthalmol.Vis.Sci. , vol. 41, pp. 165–173, 2000.

[71] A. Herzog, K. L. Boyer, and C. Roberts, "Robust extraction of the optic nerve head in optical coherence tomography," in Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis, ed: Springer, 2004, pp. 395-407.

[72] D. Koozekanani, K. Boyer, and C. Roberts, "Retinal thickness measurements from optical coherence tomography using a Markov boundary model," IEEE Transactions on Medical Imaging, vol. 20, pp. 900-916, 2001.

[73] K. Lee, M. D. Abràmoff, M. Niemeijer, M. K. Garvin, and M. Sonka, "3-D segmentation of retinal blood vessels in spectral-domain OCT volumes of the optic nerve head," presented at the Proc. of SPIE Medical Imaging: Biomedical Applications in Molecular, Structural, and Functional Imaging, 2010.

[74] J. Rogowska and M. E. Brezinski, "Image processing techniques for noise removal, enhancement and segmentation of cartilage OCT images," Physics in medicine and biology, vol. 47, p. 641, 2002.

[75] M. Shahidi, Z. Wang, and R. Zelkha, "Quantitative thickness measurement of retinal layers imaged by optical coherence tomography," American journal of ophthalmology, vol. 139, pp. 1056-1061, 2005.

[76] V. J. Srinivasan, B. K. Monson, M. Wojtkowski, R. A. Bilonick, I. Gorczynska, R. Chen, et al., "Characterization of outer retinal morphology with high-speed, ultrahigh-resolution optical coherence tomography," Investigative ophthalmology & visual science, vol. 49, pp. 1571-1579, 2008.

[77] H. Ishikawa, D. M. Stein, G. Wollstein, S. Beaton, J. G. Fujimoto, and J. S. Schuman, "Macular segmentation with optical coherence tomography," Investigative ophthalmology & visual science, vol. 46, pp. 2012-2017, 2005.

[78] M. Mayer, R. Tornow, R. Bock, J. Hornegger, and F. Kruse, "Automatic Nerve Fiber Layer Segmentation and Geometry Correction on Spectral Domain OCT Images Using Fuzzy C-Means Clustering," Invest. Ophthalmol. Vis. Sci., vol. 49, pp. E-Abstract 1880, 2008.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

34

[79] A. Ozcan, A. Bilenca, A. E. Desjardins, B. E. Bouma, and G. J. Tearney, "Speckle reduction in optical coherence tomography images using digital filtering," JOSA A, vol. 24, pp. 1901-1910, 2007.

[80] M. Baroni, P. Fortunato, and A. La Torre, "Towards quantitative analysis of retinal features in optical coherence tomography," Medical engineering & physics, vol. 29, pp. 432-441, 2007.

[81] D. L. Marks, T. S. Ralston, and S. A. Boppart, "Speckle reduction by I-divergence regularization in optical coherence tomography," JOSA A, vol. 22, pp. 2366-2371, 2005.

[82] D. Cabrera Fernández, N. Villate, C. Puliafito, and P. Rosenfeld, "Comparing total macular volume changes measured by Optical Coherence Tomography with retinal lesion volume estimated by active contours," Investigative Ophtalmology and Visual Science, vol. 45, pp. 3072-3079, 2004.

[83] M. K. Garvin, M. D. Abràmoff, R. Kardon, S. R. Russell, X. Wu, and M. Sonka, "Intraretinal layer segmentation of macular optical coherence tomography images using optimal 3-D graph search," IEEE Transactions on Medical Imaging, vol. 27, pp. 1495-1505, 2008.

[84] G. Gregori and R. Knighton, "A robust algorithm for retinal thickness measurements using optical coherence tomography (Stratus OCT)," Investigative Ophtalmology and Visual Science, vol. 45, pp. 3007-3017, 2004.

[85] P. Puvanathasan and K. Bizheva, "Interval type-II fuzzy anisotropic diffusion algorithm for speckle noise reduction in optical coherence tomography images," Optics express, vol. 17, pp. 733-746, 2009.

[86] R. Bernardes, C. Maduro, P. Serranho, A. Araújo, S. Barbeiro, and J. Cunha-Vaz, "Improved adaptive complex diffusion despeckling filter," Optics Express, vol. 18, pp. 24048-24059, 2010.

[87] H. M. Salinas and D. C. Fernández, "Comparison of PDE-based nonlinear diffusion approaches for image enhancement and denoising in optical coherence tomography," IEEE Transactions on Medical Imaging, vol. 26, pp. 761-771, 2007.

[88] A. M. Bagci, M. Shahidi, R. Ansari, M. Blair, N. P. Blair, and R. Zelkha, "Thickness profiles of retinal layers by optical coherence tomography image segmentation," American journal of ophthalmology, vol. 146, pp. 679–687, 2008.

[89] J. Rogowska and M. E. Brezinski, "Evaluation of the adaptive speckle suppression filter for coronary optical coherence tomography imaging," IEEE Transactions on Medical Imaging, vol. 19, pp. 1261-1266, 2000.

[90] A. Mishra, A. Wong, K. Bizheva, and D. A. Clausi, "Intra-retinal layer segmentation in optical coherence tomography images," Opt. Express, vol. 17, pp. 23719-23728, 2009.

[91] A. R. Fuller, R. J. Zawadzki, S. Choi, D. F. Wiley, J. S. Werner, and B. Hamann, "Segmentation of three-dimensional retinal image data," IEEE Transactions on Visualization and Computer Graphics, vol. 13, pp. 1719-1726, 2007.

[92] A. Wong, A. Mishra, K. Bizheva, and D. A. Clausi, "General Bayesian estimation for speckle noise reduction in optical coherence tomography retinal imagery," Opt. Express, vol. 18, pp. 8338-8352, 2010.

[93] R. Kafieh and H. Rabbani, "Curvature correction of retinal OCTs using graph-based geometry detection," Physics in Medicine and Biology, vol. 58, pp. 2925-2938, 2013.

[94] M. D. Abràmoff, K. Lee, M. Niemeijer, W. L. Alward, E. C. Greenlee, M. K. Garvin, et al., "Automated segmentation of the cup and rim from spectral domain OCT of the optic nerve head," Investigative ophthalmology & visual science, vol. 50, pp. 5778-5784, 2009.

[95] R. Kafieh, H. Rabbani, M. Abramoff, M. Sonka, "Intra-retinal layer segmentation of optical coherence tomography using diffusion map," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1080-1084, 26-31 May 2013.

0278-0062 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMI.2014.2374354, IEEE Transactions on Medical Imaging

35

[96] Q. Yang, C. A. Reisman, Z. Wang, Y. Fukuma, M. Hangai, N. Yoshimura, et al., "Automated layer segmentation of macular OCT images using dual-scale gradient information," Optics express, vol. 18, pp. 21293–21307, 2010.

[97] A. Yazdanpanah, G. Hamarneh, B. Smith, and M. Sarunic, "Intra-retinal layer segmentation in optical coherence tomography using an active contour approach," in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2009, ed: Springer, 2009, pp. 649-656.

[98] F. Leyuan, L. Shutao, R. P. McNabb, N. Qing, A. N. Kuo, C. A. Toth, et al., "Fast Acquisition and Reconstruction of Optical Coherence Tomography Images via Sparse Representation," Medical Imaging, IEEE Transactions on, vol. 32, pp. 2034-2049, 2013.

[99] H. Bogunovic, M. Sonka, Y. Kwon, P. Kemp, M. Abramoff, and X. Wu, "Multi-Surface and Multi-Field Co-Segmentation of 3-D Retinal Optical Coherence Tomography," IEEE Transaction on Medical Imaging, vol. 99, pp. 247-253, 2014.

[100] F. Rathke, S. Schmidt, and C. Schnörr, "Probabilistic intra-retinal layer segmentation in 3-D OCT images using global shape regularization," Medical image analysis, vol. 18, pp. 781-794, 2014.

[101] P. Chatterjee and P. Milanfar, "Clustering-based denoising with locally learned dictionaries," IEEE Transactions on Image Processing, vol. 18, pp. 1438-1451, 2009.

[102] M. Aharon, M. Elad, and A. Bruckstein, "k -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation," Signal Processing, IEEE Transactions on, vol. 54, pp. 4311-4322, 2006.

[103] M. Elad and M. Aharon, "Image denoising via sparse and redundant representations over learned dictionaries," IEEE Transactions on Image Processing, vol. 15, pp. 3736-3745, 2006.

[104] R. Rubinstein, M. Zibulevsky, and M. Elad, "efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit," Technion - Computer Science Department, , Israeil, 2008.

[105] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, "Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition," in The Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, 1993, pp. 40-44

[106] R. Rubinstein, M. Zibulevsky, and M. Elad, "Double sparsity: Learning sparse dictionaries for sparse signal approximation," IEEE Transactions on Signal Processing, vol. 58, pp. 1553-1564, 2010.

[107] N. Kingsbury, "Complex wavelets for shift invariant analysis and filtering of signals," Applied and computational harmonic analysis, vol. 10, pp. 234-253, 2001.

[108] I. W. Selesnick, R. G. Baraniuk, and N. C. Kingsbury, "The dual-tree complex wavelet transform," IEEE Signal Processing Magazine, vol. 22, pp. 123-151, 2005.

[109] I. W. Selesnick and K. Y. Li, "Video denoising using 2D and 3D dual-tree complex wavelet transforms," Wavelets: Applications in Signal and Image Processing X, vol. 5207, pp. 607-618, 2003.

[110] W. Y. Lo and I. W. Selesnick, "Wavelet-domain soft-thresholding for non-stationary noise," in IEEE International Conference on Image Processing, 2006, pp. 1441-1444.

[111] C. Wang and S. Mahadevan, "Multiscale analysis of document corpora based on diffusion models," in Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2009, pp. 1592-1597.

[112] P. Jitpakdee, P. Aimmanee, and B. Uyyanonvara, "A survey on hemorrhage detection in diabetic retinopathy retinal images," in Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2012 9th International Conference on, 2012, pp. 1-4.

[113] "Matlab Implementation of Wavelet Transforms " Ivan Selesnick [114] N. Kingsbury, "toolbox of Matlab code to implement the Q-shift DT-CWT ".