FEATURE EXTRACTION OF BRAIN MRI BY STATIONARY WAVELET TRANSFORM AND ITS APPLICATIONS

DOI: 10.1142/S0218339010003652

October 1, 2010 16:44 WSPC-IJAIT S0218339010003652

Journal of Biological Systems, Vol. 18 (October 2010)Special Issue pp. 115–132c© World Scientific Publishing Company

FEATURE EXTRACTION OF BRAIN MRI BY STATIONARY WAVELET

TRANSFORM AND ITS APPLICATIONS

YUDONG ZHANG, SHUIHUA WANG, YUANKAI HUO, LENAN WU

School of Information Science and Engineering, Southeast University, Nanjing 210096, China

[email protected]

AIJUN LIU

State Key Laboratory of Mechanical Transmissions, Chongqing University, Chongqing 400030, China

[email protected]

Received (June 20 2010)

Revised (July 21 2010)

Wavelet transform is widely used in feature extraction of magnetic resonance imaging. However, the

traditional discrete wavelet transform (DWT) suffers from translation variant property, which may

extract significantly different features from two images of the same subject with only slight movement.

In order to solve this problem, this paper utilizes stationary wavelet transform (SWT) to extract

features instead of DWT. Experiments on a normal brain MRI demonstrate that wavelet coefficients

via SWT are superior to those via DWT, in terms of translation invariant property. In addition, we

applied SWT to normal and abnormal brain classification. The results demonstrate that SWT-based

classifier is more accurate than that of DWT.

Keywords: Magnetic resonance imaging; discrete wavelet transform; stationary wavelet transform;

feature extraction; translation invariance; principle component analysis; fisher discriminant analysis.

1. Introduction

Magnetic resonance imaging (MRI) is an imaging technique used primarily in clinical

diagnosis and biomedical research to produce high resolution and high contrast images of

the parts of the human body such as the brain. The most striking advantages of MRI are

its non-invasive nature and the rich information that MRI can provide about the soft

tissue anatomy [1]. Accurate analysis of MRI images relies not only on the expertise of

physicians or investigators but also, increasingly, on the automated feature extraction

methods of MRI images. Wavelet transform (WT) is utilized recently in feature

extraction of MRIs, since the WT provides good localization in both spatial and spectral

domains [2]. However, the discrete wavelet transform (DWT) is translation variant,

115

http://dx.doi.org/10.1142/S0218339010003652


116 Y. Zhang et al.

namely, the wavelet coefficients behave unpredictably under translation of the input signal

[3]. The features obtained by DWT may change remarkably when the brain MR image is only

slightly shifted because of the dithering of the subject. In the worst cases, the DWT based

classification may even recognize two images from one subject as two from different subjects,

when the centers of the images are located at slightly different positions [4].

In this study, the wavelet coefficients translation-variant problem is solved by using

stationary wavelet transform (SWT). Compared to traditional DWT, SWT has three major

merits as follows [5]: (i) SWT is translation invariant, to wit, even if the signal is shifted, the

obtained coefficients will not change; (ii) SWT performs better in denoising and edge-

detecting; (iii) DWT is only suitable for implementation with discrete signals of images

whose size is power of 2, but SWT can be applied to any arbitrary size of images [6].

Realistic brain MRIs were used for validating this SWT-based method. The results show

that the extracted features by SWT perform better than those by DWT with regard to the

translation invariant property. Moreover, normal or abnormal brain classification of structural

MRI based on DWT and SWT are compared, proving SWT is better than DWT.

The paper is organized as follows. Section 2 introduces the basic concepts of DWT.

Section 3 introduces the basic concepts of SWT. Section 4 contains the experiments,

demonstrating SWT is translation-invariant and DWT is translation-variant. Besides, we

validate the translation-invariant property of SWT has nothing to do with types of wavelets.

Section 5 applies SWT to structural MRI classification. Section 6 is devoted to the conclusion.

2. Discrete Wavelet Transform

The discrete wavelet transform (DWT) is a powerful implementation of the wavelet transform

using the dyadic scales and positions. The basic fundamental of DWT is introduced as

follows.

2.1. Advantages of wavelet transform

The most commonly used tool in signal analysis is Fourier Transform (FT), which breaks

down a signal into constituent sinusoids of different frequencies, thereby, transforming the

signal from time domain to frequency domain. However, FT has a serious drawback. It lost

the time information of the signal. For example, an analyst is unable to tell when a particular

event took place from a Fourier Spectrum.

Gabor adapted the FT to analyze only a small section of the signal at a time. The

technique is called windowing or short time Fourier Transform (STFT) [7]. It adds a window

of particular shape to the signal. STFT can be regarded as a compromise between the time


Feature Extraction of Brain MRI by Stationary Wavelet Transform 117

information and frequency information. It provides some information about both time and

frequency domain. However, the preciseness of the information is limited by the size of the

window.

Wavelet transform (WT) represents the next logical step: a windowing technique with

variable size. Thus, it preserves both time and frequency information of the signal. The

development of signal analysis is shown in Fig. 1.

Fig. 1. The development of signal analysis

Another advantage of WT is that it adopts “scale” instead of traditional “frequency”,

namely, it does not produce a time-frequency view but a time-scale view of the signal. The

time-scale view is a different way to view data, but it is a more natural and powerful way [8].

2.2. Discrete wavelet transform

Suppose x(t) is a square-integrable function, then the continuous wavelet transform of x(t)

relative to a given wavelet (t) is defined as

,( , ) ( ) ( )a bW a b x t t dt (2.1)

where

,

1( ) ( )a b

t at

ba(2.2)

Here, the wavelet a,b(t) is calculated from the mother wavelet (t) by translation and dilation:

a is the dilation factor and b is the translation parameter (both real positive numbers). Several

different kinds of wavelets have gained popularity throughout the development of wavelet

analysis. The simplest but yet most important wavelet is the Harr wavelet, which is often the

preferred one in many applications [9].


118 Y. Zhang et al.

Eq. (2.1) can be discretized by restraining a and b to a discrete lattice (a = 2b

& a > 0) to

give the discrete wavelet transform, which can be expressed as follows.*

,

*

,

( ) [ ( ) ( 2 )]

( ) [ ( ) ( 2 )]

j

j k jn

j

j k jn

cA n DS x n l n k

cD n DS x n h n k(2.3)

where cAj,k and cDj,k refer to the coefficients of the gross components and the detailed

components, respectively; l(n) and h(n) denote the low-pass filter and high-pass filter,

respectively; j and k represent the wavelet scale and translation factors, respectively; and DS

operator means downsampling.

S

cA1 cD1

cA2 cD2

cA3 cD3

Fig. 2. A 3-level wavelet decomposition tree

The above decomposition process can be iterated with successive approximations, so

that one signal is broken down into various levels of resolution [10]. The whole process is

called wavelet decomposition tree, as shown in Fig. 2.



2.3. 2D DWT

Fig. 3. Schematic diagram of 2D DWT

In case of images, the DWT is applied to each dimension separately. Fig. 3 illustrates the

schematic diagram of 2D DWT. As a result, there are 4 sub-band images at each scale. The

sub-band cAj+1 is used for next 2D DWT. As the level of decomposition increased, a more

compact but coarser approximation component was obtained. Thus, wavelets provide a

simple hierarchical framework for interpreting the image information [11].

3. Stationary wavelet transform

The DWT is translation variant, meaning that the DWT of a translated version of a signal X is

not the translated version of the DWT of X. Suppose I denotes a given MR image, and T the

translation operator, then

( ) ( )DWT T I T DWT I (3.1)

3.1. -decimated DWT

There exists a lot of slightly different ways to handle the DWT. The decimation in DWT

retrains even indexed elements, which is where the time/spatial variant problem lies in. The

decimation could be carried out by choosing odd indexed elements instead of even indexed

elements. The choice of even or odd concerns every step of the decomposition process.

If we perform all the different possible decompositions of the original signal for a given

maximum level J, then we will have 2J

different decompositions [12].


120 Y. Zhang et al.

Fig. 4. A graphical illustration of =10110

Suppose j=1 or 0 denotes the choices of odd or even indexed elements at step j. Then,

every decomposition is labeled by a sequence of 0s and 1s, namely, = 1 2… J. This

transform is called the -decimated DWT. A graphical example of =10110 is shown in Fig. 4.

3.2. 1D SWT

The SWT can calculate all the -decimated DWT for a given signal at one time. More

precisely, for level 1, the SWT can be obtained by convolving the signal with the appropriate

filters as in the DWT but without downsampling. Then the coefficients of the approximation

and detail at level 1 are the same as the signal length.

Fig. 5. Schematic diagram of 1D SWT



The general step j convolves the approximation coefficients at level j-1, with appropriate

filters but without downsampling, to produce the approximation and detail coefficients at

level j. The schematic diagram is shown in Fig. 5.

3.3. 2D SWT

The algorithm of 1D SWT can be easily extended to the 2D case. Fig. 6 shows the schematic

diagram of 2D SWT.

Fig. 6. Schematic diagram of 2D SWT

4. Experiments

The experiments were carried out on the platform of P4 IBM with 3GHz main frequency and

2G memory, running under Windows XP operating system. The algorithm was developed via

the wavelet toolbox of Matlab 2010a. The programs can be run or tested on any computer

platforms where Matlab is available.

4.1. DWT and SWT results

First, we take DWT and SWT on a normal MR brain image, respectively. Harr wavelet is

utilized, and the decomposition level is set as 3. The results are shown in Fig. 7.

For DWT, the size of the coefficients in level j is 2-j

of original size on x and y direction.

While for SWT, the size of the coefficients in any level is the same as the size of original

image.


122 Y. Zhang et al.

Fig. 7. Wavelet Analysis of a normal brain image based on Haar Wavelet. (a) Normal Brain Image, (b)

DWT Coefficients, (c) SWT Coefficients.

4.2. Translation invariant of SWT

Suppose I denotes a given MR image, C the wavelet coefficients (DWT or SWT), and T the

translation operator. Then, for a translation invariant wavelet transform, the following

formula is expected

[ ( )] [ ( )] 0T C I C T I (4.1)

However, the equality symbol does not hold true at any time. Thus, a translation error was

defined to evaluate the error of the translate invariance

{ [ ( )] [ ( )]}E mae T C I C T I (4.2)

where “mae” represents “mean absolute error.” For a specific translation (x0, y0), the

corresponding error can be written as

0 0 0 00 0 ( , ) ( , )( , ) { [ ( )] [ ( )]}x y x yE x y mae T C I C T I (4.3)



It is clear that the wavelet transform performs better in respect to translation invariant if its

error is closer to 0.

Fig. 8. Process diagram of calculating error of translate invariance

The processing diagram is illustrated in Fig. 8. First, a 2 dimensional translation was

chosen as (x0, y0). Second, the corresponding error was calculated via the above process

diagram, namely, we obtained Error(x0, y0). Third, the value (x0, y0) was changed along the

whole image, and finally the whole Error image was obtained, where Error(x, y) denotes the

error value of (x, y) translation.

The Errors of both DWT and SWT on 10 different decomposition layers are shown in

Fig. 9. The error of SWT on any decomposition level is zero, which corresponds to the blue

plane across the origin and perpendicular to the z-axis, validating our supposition that SWT is

totally translation invariant. On the other hand, the error of DWT is like a tunnel. The error

value is zero only when the shift at x-axis and y-axis are both 0, namely, other errors are

above zero except the origin. In total, DWT is completely translation variant.


124 Y. Zhang et al.

Fig. 9. Error between SWT and DWT of Haar wavelet

4.3. Different Wavelets Test



cH1

50 100 150 200 250

cH2

50 100 150 200 250

50

100

150

200

250

cH3

50 100 150 200 250

50

100

150

200

250

cV1

50 100 150 200 250

50

100

150

200

250

cV2

50 100 150 200 250

50

100

150

200

250

cV3 cD150

100

150

200

250

cD250

100

150

200

250

cD350

100

150

200

250

cA350

100

150

200

250

Fig. 10. Wavelet Analysis of a normal brain image based on db2 Wavelet: (a) Normal Brain Image, (b)

DWT Coefficients, (c) SWT Coefficients.

Fig. 11. Error between SWT and DWT of db2 wavelet

In the previous section we demonstrate SWT is translation-invariant and DWT is

translation-variant. In this experiment the types of wavelet is taken into consideration. The


126 Y. Zhang et al.

db2 wavelets [13] was used to repeat above experiments. The DWT and SWT coefficients of

the untranslated brain image are shown in Fig. 10. Look back to Fig. 8 we can find that both

Haar and db2 wavelets extract edges and textures efficiently.

Next the translation errors of db2 wavelet were calculated and shown in Fig. 11.

Similarly, the errors of SWT on any decomposition level are zero with contrast to the tunnel

shape of errors of DWT. Besides db2, we redo the experiments on db, sym, coif, bior, rbior

wavelet family [14]. The results shown in Fig. 11 resemble those in Fig. 9.

5. Application

We applied our SWT method to the classification problems of structural MRI.

5.1. Schematic Diagram

In this application, we use the SWT for normal and abnormal brain image classification,

which is valuable for clinical diagnosis. Our strategy is as follows. First we extract features

via SWT, and then we reduce the features via principle component analysis (PCA) and finally

classify the brains into normal or abnormal using fisher discriminant analysis (FDA). The

comparative algorithm is alike except DWT is used instead of SWT. The schematic diagram

of this application is shown in Fig. 12.

Structural MRI

DWTSWT

FDAFDA

Classification

Accuracy

Conclusion

SWT is better

PCAPCA

Fig. 12. Schematic diagram of comparing SWT and DWT on the structural MRI classification



5.2. PCA

PCA is an efficient tool to reduce the dimension of a data set consisting of a large number of

interrelated variables while retaining most of the variations [15]. It is achieved by

transforming the data set to a new set of ordered variables according to their variances or

importance. This technique has three effects: it orthogonalizes the components of the input

vectors so that uncorrelated with each other, it orders the resulting orthogonal components so

that those with the largest variation come first, and eliminates those components contributing

the least to the variation in the data set. More specifically, for a given n-dimensional matrix n

× m, where n and m are the number of variables and the number of temporal observations,

respectively, the p principal axes (p << n) are orthogonal axes, onto which the retained

variance is maximal in the projected space [16]. The PCA describes the space of the original

data projecting onto the space in a base of eigenvectors. The corresponding eigenvalues

account for the energy of the process in the eigenvector directions. It is assumed that most of

the information in the observation vectors is contained in the subspace spanned by the first p

PCs. Considering data projection restricted to p eigenvectors with the highest eigenvalues, an

effective reduction in the input space dimensionality of the original data can be achieved

with minimal information loss. Reducing the dimensionality of the n dimensional input space

by projecting the input data onto the eigenvectors corresponding to the first p eigenvalues is

an important step that facilitates subsequent neural network analysis [17].

5.3. FDA

FDA considers maximizing the following objectives.

( )T

B

T

W

SJ

S(5.1)

Where SB denotes the “between class scatter matrix” and SW denotes the “within class scatter

matrix” [18]. Note that due to the fact that scatter matrices are proportional to the covariance

matrices, we could have defined J using covariance matrices – the proportionality constant

would have no effect on the solution. The definitions of the scatter matrices are:T

B c c

c

S x x (5.2)

T

W i c i c

c i c

S x x (5.3)

where x is the overall mean of the data-cases, c is the mean of the data of cth class.


128 Y. Zhang et al.

The concept of FDA lies in that a good solution is one where the class-means are well

separated, measured relative to the (sum of the) variances of the data assigned to a particular

class [19]. This is precisely what we want, because it implies that the gap between the classes

is expected to be big.

To simplify the problem, we can always choose such that the denominator is simplyTSW = 1, since it is a scalar itself. Afterwards, we can transform the problem of maximizing

J into the following constrained optimization problem [20]

1min

2

s.t. 1

T

B

T

T

S

S

(5.4)

5.4. Dataset

The datasets consists of T2-weighted MR brain images in axial plane and 256×256 in-plane

resolution, which were downloaded from the Harvard Medical School website

(http://med.harvard.edu/AANLIB/). We randomly selected 80 images consisting of 20 normal

and 60 abnormal. The distribution of the training images and testing images are shown in

Table 1.

Table 1. Setting of training and test images

No. of images in training(40) No. of images in testing(40)Total No.

of images Normal Abnormal Normal Abnormal

80 5 35 5 35

The abnormal brain MR images consist of the following diseases: glioma, meningioma,

Alzheimer’s disease, Alzheimer’s disease plus visual agnosia, Pick’s disease, sarcoma, and

Huntington’s disease. A sample of each is shown in Fig. 13.



Fig. 13. Sample of normal and abnormal brain image (a) normal brain; (b) glioma; (c) meningioma (d)

Alzheimer’s disease; (e) Alzheimer’s disease with visual agnosia; (f) Pick’s disease; (g) sarcoma; (h)

Huntington’s disease.

5.5. Results

The confusion matrixes via our method based on SWT and DWT are shown in Table 2. Here

1 and 0 denote normal and abnormal image respectively. T and O denote target and output

respectively. It indicates that the strategy based on SWT is more accurate on both training

images as (5+35)/40=100% and test images as (4+34)/40=90%. While the classification

accuracy of DWT method is only (5+32)/40=92.5% on training images and (3+28)/40=77.5%

on test images.

Table 2. Comparison of SWT and DWT on normal and abnormal brain image classification

DWT Training Test SWT Training Test

1 (O) 0 (O) 1 (O) 0 (O) 1 (O) 0 (O) 1 (O) 0 (O)

1 (T) 5 0 3 2 1 (T) 5 0 4 1

0 (T) 3 32 7 28 0 (T) 0 35 1 34

6. Conclusions

The superiority of SWT on feature extraction of brain MRI images was investigated in this

paper. SWT is superior to classical discrete wavelet transform with respect to translation


130 Y. Zhang et al.

invariant property, and uncorrelated to types of wavelet families. Moreover, we applied the

SWT to normal and abnormal structural brain image classification. The results show that

SWT-based classifier is more accurate and efficient.

We anticipate that SWT based features will be further explored for other applications in

MRI, such as de-noising, compression, and fusion.

Acknowledgements

The research is financed by following projects: (1) National Natural Science Foundation of

China (#60872075); (2) National Technical Innovation Project Essential Project Cultivate

Project (#706928) and (3) Nature Science Fund in Jiangsu Province (#BK2007103).

References

[1] L. Knutsson, et al., "Absolute quantification of cerebral blood flow: correlation

between dynamic susceptibility contrast MRI and model-free arterial spin labeling,"

Magnetic Resonance Imaging, vol. 28, pp. 1-7, 2010.

[2] M. Fedi, et al., "2D Continuous Wavelet Transform of potential fields due to

extended source distributions," Applied and Computational Harmonic Analysis, vol.

28, pp. 320-337, 2010.

[3] N. Saravanan and K. I. Ramachandran, "Fault diagnosis of spur bevel gear box using

discrete wavelet features and Decision Tree classification," Expert Systems with

Applications, vol. 36, pp. 9564-9573, 2009.

[4] P. S. Bhowmik, et al., "A novel wavelet transform aided neural network based

transmission line fault analysis method," International Journal of Electrical Power

& Energy Systems, vol. 31, pp. 213-219, 2009.

[5] L. A. N. M. Lopez, et al., "Removing Eddy-Current probe wobble noise from steam

generator tubes testing using Wavelet Transform," Progress in Nuclear Energy, vol.

50, pp. 828-835, 2008.

[6] S. Lesecq, et al., "Fault detection based on wavelets transform. Application to a

roughing mill," in Fault Detection, Supervision and Safety of Technical Processes

2006, Z. Hong-Yue, Ed., ed Oxford: Elsevier Science Ltd, 2007, pp. 1115-1120.

[7] E. D. Ryan, et al., "Time and frequency domain responses of the mechanomyogram

and electromyogram during isometric ramp contractions: A comparison of the short-

time Fourier and continuous wavelet transforms," Journal of Electromyography and

Kinesiology, vol. 18, pp. 54-67, 2008.

[8] E. Avci, "Selecting of the optimal feature subset and kernel parameters in digital

modulation classification by using hybrid genetic algorithm-support vector machines:

HGASVM," Expert Systems with Applications, vol. 36, pp. 1391-1402, 2009.



[9] D. Heric and D. Zazula, "Combined edge detection using wavelet transform and

signal registration," Image and Vision Computing, vol. 25, pp. 652-662, 2007.

[10] L. W. Taylor, "Using the Haar wavelet transform in the semiparametric specification

of time series," Economic Modelling, vol. 26, pp. 392-403, 2009.

[11] B. Martínez and M. A. Gilabert, "Vegetation dynamics from NDVI time series

analysis using the wavelet transform," Remote Sensing of Environment, vol. 113, pp.

1823-1842, 2009.

[12] L. H. Cherif, et al., "Choice of the wavelet analyzing in the phonocardiogram signal

analysis using the discrete and the packet wavelet transform," Expert Systems with


[13] X. Delaunay, et al., "Satellite image compression by post-transforms in the wavelet

domain," Signal Processing, vol. 90, pp. 599-610, 2010.

[14] A. Belsak and J. Flasker, "Determining cracks in gears using adaptive wavelet

transform approach," Engineering Failure Analysis, vol. 17, pp. 664-671, 2010.

[15] J. Camacho, et al., "Data understanding with PCA: Structural and Variance

Information plots," Chemometrics and Intelligent Laboratory Systems, vol. 100, pp.

48-56, 2010.

[16] L. Nanni and A. Lumini, "Ensemble generation and feature selection for the

identification of students with learning disabilities," Expert Systems with


[17] M. Goodarzi and M. P. Freitas, "MIA-QSAR coupled to principal component

analysis-adaptive neuro-fuzzy inference systems (PCA-ANFIS) for the modeling of

the anti-HIV reverse transcriptase activities of TIBO derivatives," European Journal

of Medicinal Chemistry, vol. 45, pp. 1352-1358, 2010.

[18] H. Yamamoto, et al., "Dimensionality reduction for metabolome data using PCA,

PLS, OPLS, and RFDA with differential penalties to latent variables,"

Chemometrics and Intelligent Laboratory Systems, vol. 98, pp. 136-142, 2009.

[19] N. Louw and S. J. Steel, "Variable selection in kernel Fisher discriminant analysis

by means of recursive feature elimination," Computational Statistics & Data

Analysis, vol. 51, pp. 2043-2055, 2006.

[20] G. C. Cawley and N. L. C. Talbot, "Efficient leave-one-out cross-validation of

kernel fisher discriminant classifiers," Pattern Recognition, vol. 36, pp. 2585-2592,

2003.

Yudong Zhang was born in Suzhou, China. He received a B.S. degree in

2004 and M.S. in 2007 in electronic engineering from Nanjing

University of Aeronautics and Astronautics, Nanjing. He received a

Ph.D. degree in 2010 in communication and signal process from

Southeast University, Nanjing.

Now he researches as a postdoctor in Columbia University and a

research fellow in New York State Psychology Institute. His research

interests in computer science are in the areas of data mining for artificial

neural network, decision tree, expert system, artificial intelligence, genetic algorithm, and


132 Y. Zhang et al.

particle swarm optimization. He is the first-author of over 20 papers and is an active

consultant for industry in these areas.

Mr Zhang is the editor of “International Journal of Intelligent Information Processing”,

“Journal of Convergence Information Technology”. He is a member of IEEE, New York

Academy of Science, Chinese Computer Federation (CCF), and China Institute of Electronics

(CIE). Besides, he is a Council Member of Audio/Video Technician Association and

Multimedia Institute in Nanjing.

Copyright of Journal of Biological Systems is the property of World Scientific Publishing Company and its

content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's

express written permission. However, users may print, download, or email articles for individual use.

FEATURE EXTRACTION OF BRAIN MRI BY STATIONARY WAVELET TRANSFORM AND ITS APPLICATIONS

Documents

Transcript of FEATURE EXTRACTION OF BRAIN MRI BY STATIONARY WAVELET TRANSFORM AND ITS APPLICATIONS