Fast adaptive wavelet packet image compression

Submitted to: IEEE Transactions on Image Processing, April 1998, Revised:June 1999Fast Adaptive Wavelet Packet Image Compression

Francois G. Meyer, Yale University, New Haven, USA

Amir Averbuch, Tel Aviv University, Tel Aviv, Israel

Jan-Olov Stromberg, Royal Institute of Technology, Stockholm, Sweden

Corresponding author:

Francois G. Meyer

Yale University School of Medicine, Department of Diagnostic Radiology

333 Cedar Street, P.O. Box 208042, New Haven, CT 06520-8042, USA

tel:(203) 737 6037, fax:(203) 737 4273, e-mail: [email protected] Category: IP 1.1 Coding

Abstract

Wavelets are ill suited to represent oscillatory patterns: rapid variations of intensity can only

be described by the small scale wavelet coefficients. These diffused coefficients carry very little

energy, and are often quantized to zero, even at high bit rates. Our goal in this paper is to provide

a fast numerical implementation of the best wavelet packet algorithm [8] in order to demonstrate

that an advantage can be gained by constructing a basis adapted to a target image. Emphasis

in this paper has been placed on developing algorithms that are computationally efficient. We

developed a new fast 2-D convolution-decimation algorithm with factorized non-separable 2-D

filters. The algorithm is 4 times faster than a standard convolution-decimation. An extensive

evaluation of the algorithm was performed on a large class of textured images. Because of its

ability to reproduce textures so well, the wavelet packet coder significantly outperforms one of

the best wavelet coder [26] on images such as Barbara, and Fingerprints, both visually and in term

of PSNR.

1 Introduction

The main defect of the windowed Fourier based compression methods (such as the DCT) is due to

the limitation put on the size of the blocks, and the inability to adjust the patterns to the nature of the

picture. An answer to this problem is provided by a multiscale decomposition of the image: - low

frequency trends occurring at a large scale in the image can be efficiently coded with very few coef-

ficients. Wavelets with many vanishing moments yield sparse decompositions of piece-wise smooth

surfaces, and are very effective for coding piece-wise smooth images [1, 14, 26, 28, 30]. Wavelets,

however, are ill suited to represent oscillatory patterns. Rapid variations of intensity can only be de-

scribed by the small scale wavelet coefficients. Long oscillatory patterns thus require many of such

fine scale coefficients. Unfortunately, those small scale coefficients carry very little energy, and are

often quantized to zero, even at high bit rates. Much larger libraries of functions, called wavelet

packets, have been constructed [7] to address this problem. The wavelet packets include wavelets, as

well as cosine-like waveforms. In the the two dimensional (2-D) case, wavelet packets are patterns

that can vary in scale, frequency, and location. Because the collection of wavelet packets is over-

complete (there are many more basis functions than the dimension of the input space) one can now

construct a basis that is fitted for a target image (or for a class of images). In general a basis is a good

basis if it can describe the target image with a very small number of basis vectors. In their original

paper Coifman and Wickerhauser proposed a very generic metric to assess the efficiency of a basis

[8]. A more meaningful measure considers the number of bits needed to approximate the image with

a given error. Ramchandran and Vetterli [24] followed this path, and wedded the bit allocation algo-

rithm of Shoham and Gersho [29] to the best basis algorithm [8]. Unfortunately this approach (and

its variation [34]) is extremely computationally intensive (as explained in Section 4.2 the problem in

[24] involves 3 layers of non linear approximations, only one of which lends itself to a fast algorithm).

Very little work has been expended beyond the papers [8, 24], and as a result adapted wavelet packet

bases remain a theoretical curiosity, with no clear practical advantage, and that cannot be computed

within a reasonable amount of time.

Our goal in this paper is to provide a fast numerical implementation of the best wavelet packet

algorithm, in order to demonstrate that an advantage can be gained by constructing a basis adapted

to a target image. Many classes of images, or multidimensional signals, have very diffuse represen-

tations in a standard wavelet basis. Fingerprints, or seismic signals are few examples of non wavelet-

friendly signals. An adapted wavelet packet basis can often provide a very sparse representation

of such images. Fast algorithms for choosing a best basis are therefore of fundamental importance.

2

Emphasis in this paper has been placed on developing algorithms that are computationally efficient.

Three original contributions result from this work:

1. A new fast 2-D convolution-decimation algorithm with factorized non-separable 2-D filters.

The number of operations is reduced by a factor 4 in comparison to a standard implementation,

and the transform is performed in place (no transpose).

2. A cost function that takes into account the cost of coding the output levels of the quantizers,

and the cost of coding the significance map.

3. A context-based entropy coder that conditions the probability of significance of a given pixel

on the probability of its neighbors using a space filling curve.

This paper is organized as follows. In the next section we provide a general description of the prin-

ciples of the algorithm. In section 3 we review the wavelet packet library. In section 4 we explain

how to select, among a large collection of bases, that basis which is best adapted to encode a given

image. The factorization of the conjugate quadrature (lowpass and highpass) filters is described in

section 5. In section 6 we describe the quantization, and the context-based entropy coding. Results

of experiments are presented in Section 7.

Best basis

libraries of bases

Collection of

original

image expansion

Description of the best basis

1

bit stream

budget

2 3

Quantization

entropycoding

entropycoding

Figure 1: Block diagram of the wavelet packet compression algorithm. The compression consists of three

parts: (1) best basis selection, and calculation of the coefficients of the image, (2) quantization of the coefficients,

and (3) entropy coding. The quadtree that describes the best basis is also entropy coded.

2 General description of the compression algorithm

A block diagram of the algorithm is shown in Fig. 1. The algorithm is divided into three parts.

In the first part (1) we select that wavelet packet basis which is best adapted to encode the image.

During the second part (2), the wavelet packet coefficients of the image are quantized ; we assume

3

a Laplacian distribution and we use an efficient near optimal scalar quantizer [31]. Finally in (3),

the significance map, and the output levels are entropy coded. We exploit a higher order arithmetic

coder that relies on a context consisting of pixels in the causal neighborhood.

3 The wavelet packet library

The wavelet packet library [7] is composed of functions, with different time frequency localizations,

that provide a highly redundant representation: - there is not a unique decomposition of each image

over the library. Let fhng;fgng be biorthogonal filters, and let fhng;fgng be the dual filters ; fhng,fhng are the lowpass filters, and the conjugate quadrature filter sequences are:gn = (�1)n h1�n gn = (�1)n h1�n (1)

Let x be a discrete signal x= fxng ; n= 0; : : : , N� 1, thewavelet packet coefficientswn; j; l are definedby the following recursion:

w2n; j;l =∑k

gk�2l wn; j+1;k l = 0; : : : ;N 2 j�J (2)

w2n+1; j;l =∑k

hk�2l wn; j+1;k l = 0; : : : ;N 2 j�J (3)

w0;J;l = xl l = 0; : : : ;N (4)

The indices are interpreted as follows:� j is the scale index: the size of the support of the corresponding wavelet packet is 2� j. Thesignal x is sampled at the finest scale J: the distance between two samples is 2�J.� l is the localization parameter: the corresponding wavelet packet is located at l 2� j,� n is the frequency index: the wavelet packet has roughly n oscillations (a frequency 2 j n).

As shown in Fig. 2 the library organizes itself into a binary tree, where the nodes of the tree represent

subspaces with different time-frequency localization characteristics. The standard dyadic wavelet

basis is obtained by iterating the decomposition process on the low frequency bands only, without

further decomposing the high frequency component at each level of the tree.

4

sd1 dd1

d1

sd2

ddssdsdsssss

ds2

d2s4s3s2 d3

ds1ss2ss1

d4

ssd

G G

GH

H H H HG

dd2

dddsdddsd

H

G

GH

G

s1

x1 x2 x3 x4 x8x7x6x5

Figure 2: Wavelet packet tree. At each node of the tree, we apply a convolution and a decimation with the

lowpass filter H, and the highpass filter, G. The prefix “s” stands for the sum, or lowpass filter, and “d” stands

for difference or highpass filter.

3.1 Arbitrary image size, and boundary extension

In order to apply the lowpass and highpass filters to a sequence of arbitrary finite lengthN, we extend

the sequence on both end points of the sequence. Because we are using odd filters we extend the

sequence symmetrically [4]. This approach does not introduce any discontinuity at the boundaries.

When the number of samples in the sequence is odd,N= 2n+1, we obtain n+1 lowpass coefficients:sjkk = 0; : : : ;n , and n highpass coefficients: d j

kk = 0; : : : ;n� 1, as shown in Fig. 3.

s1 d

1

2n-3x

2n-1x

2n-2x

2n -1x

2nxx

1

s0 d

0

2x

2x

1x

0x

n-1s

n-1d

ns

Figure 3: 1-D periodic even extension. N = 2n+ 1. (s = lowpass, d = highpass).x

2n-3x

2n-2x

d1 n-1 n-1

ds1

2n -1x

2n-2x

2n-3

s

2x

1

s0 d

xx2

x1

x0

0

Figure 4: 1-D periodic even extension. N = 2n. (s = lowpass, d = highpass).4 Best basis algorithm

Clearly the library provides an overcomplete description of the signal x. We need to know how to

assemble the elements of the library to obtain an orthogonal basis. Loosely speaking, wavelet packets

make it possible to adaptively tile the frequency domain into different bands of arbitrary size ; if a

5

collection of functions in the library provides a cover of the time-frequency plane, then this set of

functions is an orthonormal basis. If we associate the dyadic frequency interval [2 jn; 2 j(n+ 1) ) to thewavelet packet coefficient wn; j; l, then we can build orthonormal bases from the binary tree [7]:Theorem 1 [7] If a subset E � N �Z has the property that the union of intervals

[2 jn; 2 j(n+ 1) ); j 2 N; 0 � n < 2 j; (n; j) 2 Eis a disjoint cover of [0;1), then the set of wavelet packet coefficients wn; j; l, with (n; j)2 E, are the coefficientsof x in an orthonormal basis.

The greater flexibility offered by the redundancy of the wavelet packet library can be exploited to

increase the efficiency of the representation. For each target image, we select from the library a basis

that is better fitted for the compression of that image.

4.1 A fast dynamic programming approach

From the previous theorem we learn that we have an extremely large amount of freedom for the

construction of orthogonal basis from the wavelet packets library. In fact, if we consider only dyadic

subdivision at each level, we still get 24Lbases for L levels. An exhaustive search inside the library

is absolutely impossible: for 6 levels there exist 24096 � 101200 bases ! In the case of wavelet packetsconstructed by dyadic subdivision, Coifman and Wickerhauser [8] suggested to use a fast dynamic

programming algorithm (order N log(N), where N is the number of pixels in the image) to search for

that best basis which is optimal according to a given cost function. A key criterion must be met in

order to invoke a dynamic programming strategy [9]: the objective function should be separable. In

the context of the best basis algorithm, this condition states that one must limit oneself to additive

cost-function [32]. Let x be a vector in one of the subspace defined by the wavelet packet tree, let

B = f'kg be any basis of that subspace, M is an additive cost function if there exists a positive

function � such that [32] : � (0) = 0 and M (x) =∑k

� (xk) (5)

where xk =<x; 'k>.4.2 Choice of a cost function

The metric defined by the objective function defines the optimality criterion. In this work, one basis

is better than another if it provides a better reconstruction quality for the same number of bits spent

6

in coding the coefficients, or if it requires less bits to achieve the same reconstruction quality. Initially

Coifman and Wickerhauser [8] proposed to use the entropy of the vector x :

h(x) = �∑k

jxkj2kxk2 log jxkj2kxk2 (6)

as a cost function. As is obvious h is not an additive cost function, however�h(x)� logkxk2�kxk2 is

an additive cost function, and minimizing the latter function over a set of basis that preserve kxk2will minimize (6). It is important to realize that (6) bears no connection with the entropy H (x) of

the probability distribution of the fxkg. For instance, if all xk are equal, then h(x) is maximal, but theentropy of the distribution H (x) is minimal. In practice, we have noticed that the cost function (6) is

usually of little value because it fails to discover any meaningful bases. Because h(x) is not related to

the theoretical number of bits required to code the coefficients <x; 'k>, Ramchandran and Vetterli[24] used the optimal bit allocation algorithm of Shoham and Gersho [29] to select the best basis B

according to the rate distortion criterion:

M (x;Q; �)= D(x;Q)+ �R(x;Q) (7)

Given a set of quantizers Q, the rate R(x;Q) is estimated with the first order approximation of theentropyH (Q(x)), and the distortion R(x;Q) is defined as the mean square error: 1Np∑k(xk �Q(xk))2.The selection of the best basis is part of an embedded optimization problem that involves three non-

linear optimization:

max� (minB2B ( ∑

node2B minQ2Q f D(x;Q)+ �R(x;Q)g)) (8)

Each node of the wavelet packet tree is associated with the best scalar quantizer Q for that node

using an exhaustive search among a predefined set Q of scalar quantizers. Then the best basis B

is obtained using the fast dynamic programming algorithm described above. Unfortunately the dy-

namic programming procedure needs to be iterated many times to find the optimal slope � on therate distortion curve, at which all the quantizers Q operate. The overall complexity of the approach

in [24] is therefore very computationally intensive. A theoretical problem with (7) is that the cost

function (7) is not additive: the mean square error is not additive (the l2 error is), and the entropy

H (x) is not additive. Indeed, if x = fxkg, and y = fykg are the coefficients of the two children of anode in the wavelet packet tree, then we have the following well known equation [10]:

H (x;y) = H (x)+H (y)� I (x;y) (9)

where I (x;y) is the mutual information [10], a measure of the amount of information that x contains

about y. Because the subband x, y are not independent (this is in fact the tenet of the zero-tree based

7

coding algorithms), I (x;y) is usually not zero. Finally, we note that the results published in [24] corre-

spond to hypothetical compression rates, since the first order entropywas chosen tomeasure the rate.

Instead of using the rate distortion framework, we designed a cost function that returns an esti-

mate of the actual rate achieved by each node. The cost function mimics the actual scalar quantiza-

tion, and entropy coding, which are presented in Section 6. However, the cost function is much faster

to compute. It is composed of two complementary terms:� c1(x), the cost of coding the sign and the magnitude of the non zero output levels of the scalarquantizer,� c2(x), the cost of coding the locations of the non zero output levels (significance map),

If x = fxkg is an N dimensional vector, a first order approximation of the cost of coding the mag-nitude of the output levels fjQ(xk)jg is given by the number of bits needed to represent the setfjQ(xk)j; k=Q(xk) 6= 0g:

c1(x) = ∑k=Q(xk)6=0max (log2 jQ(xk)j; 0) (10)

A fast implementation of c1 can be devised using the standard representation of floating numbers:

log2 jQ(xk)j is obtained using a logical “AND” and a mask. The sign of Q(xk) is extracted in a similarmanner.

The second term provides an estimate of the number of bits needed to code the significance map.

This term is calculated using the first order entropy of a Bernoulli process: each coefficient xk is sig-

nificant with a probability p, and we assume that the significance of the coefficients are independent

events. This memoryless property is obviously not true, but since we do not take advantage of the

correlation across subbands in the entropy coding, this hypothesis yield a good estimate of our actual

coding cost. We get

c2(x) = �N �p log2(p)+ (1� p) log2(1� p)� (11)

The computation of the cost function requires to quantize the coefficients. We use the scalar quantizer

described in section 6. An initial estimate of the quantization step is required to compute the cost

function. This estimate can be refined if needed, after a first compression. We noticed that the best

basis selectedwith this cost function varies with the particular choice of this initial quantization step,

but the overall compression result varies slowly as a function of this parameter.

8

5 Fast convolution-decimation: factoring the biorthogonal filters

The best basis algorithm requires to calculate the coefficients at each node of the wavelet packet tree.

In any practical situation the best basis algorithm is only applicable if the calculation of the coeffi-

cients does not require an absurdly large amount of time. This section presents an efficient numerical

algorithm for computing multidimensional convolution-decimation. This way of calculating the co-

efficients at each level of the wavelet packet tree results in an efficient implementation that divides

the number of operations by a factor 2 in 1-D, and a factor 4 in 2-D. The scheme is validated with ex-

periments with images of several sizes. We also note that a parallel implementation of the algorithm

can clearly be devised.

5.1 One dimensional case

Several methods [25] have been proposed in the literature for the fast implementation of convolution

and decimation. While the FFT implementation [25] is useful for large filters (of length 64 or 128), fast

running convolutions methods are best for medium size filters (of length 6 to 12). If one assumes that

the cost of a multiplication is similar to the cost of an addition (a reasonable assumption in terms of

number of cycles on RISC, and non RISC architectures), then the fast running convolutions methods

do not bring any improvement over a straightforward implementation [25].

Recently, several authors [11, 15, 17, 20, 27] have proposed efficient implementations of one di-

mensional (1-D) biorthogonal filters (perfect reconstruction filter banks) using a factorization of the

filters into smaller filters. In [17] the authors show that all biorthogonal filters can be factored into a

sequence of elementary ladder steps. Each ladder step transforms a couple (x2k; x2k+1) of even, andodd samples of a 1-D vector two dimensional vector as follows:0B@ x2k

x2k+11CA!0B@ x2k+ f (x2k+1)x2k+1 1CA (12)

The proof of the factorization of 1-D biorthogonal filters into ladder steps relies on the use of the

Euclidean algorithm to factor polynomials in one variable [17]. A similar result was obtained in [11]

using very similar methods ; in [11] the ladder step is called a lifting step. A more direct proof of the

factorization was derived in [16]. The proof did not use the z-transform, and only relied on matrix

factorization.

We now introduce the notations and recall the key results. For the purpose of the exposition we

consider the case where h is of size 5, and h is of size 3. We merge the decimated lowpass h and

9

highpass g filters into a N�N orthogonal operatorW =

266666666666666666666666664

h0 h1 h2 h�2 h�1�g1 g0 �g�1h�2 h�1 h0 h1 h2

0 �g1 g0 �g�1h�2 h�1 h0 h1 h2�g1 g0 �g�1

h2 h�2 h�1 h0 h1�g�1 �g1 g0

377777777777777777777777775(13)

with y =W x and

8<: y2k = ∑i hi�2k xiy2k+1 = ∑i gi�2k xi (14)

We seek a factorization of the form:

W =H�mG�mH�m�1G�m�1 : : :H�0G�0 (15)

whereH�, andG� have the following form:H� =

266666666666666641 � �0 1 0� 1 �

0 1 0: : :� 1 �0 1

37777777777777775 and G� =266666666666666641 0� 1 �0 1 0� 1 �: : :

0 1 0� � 1

37777777777777775 (16)

We have the following result:

Lemma 1 [11, 16, 17] Let h, g be any symmetric biorthogonal filters of length 2m+ 1, thenW can be factoredinto at most m operators:

W =H�mG�mH�m�1G�m�1 : : :H�0G�0 : (17)

10

We note that this results extends to orthogonal filters as shown in [11, 16]. Proofs of the factorization

can be found for instance in [11, 16]. H� is performed explicitly by (see Fig. 5) :8<: y2i = x2i +�(x2i�1+ x2i+1)y2i+1 = x2i+1 (18)

and G� is computed as follows 8<: y2i = x2iy2i+1 = x2i+1+ �(x2i+ x2i+2) (19)

The computational complexity of each ladder step (18), (19) is 2 additions, and 1 multiplication.

Assuming that additions, and multiplications have similar complexity (same number of cycles), an

elementary ladder step requires 3 operations per 2 samples.

1 1αα

x

y

2i-1 2i+12i

2i 2i+1y

xx

Figure 5: Ladder step H�0

xxxx

H

G

H

G

α

β

α

β

0

1

1

y y

highlow

x2k+4

2k 2k+1

2k-5 2k-4 2k-2 2k-1 2kx

2k+1x

2k+3x x

2k+5

Figure 6: One only needs to apply 4 ladder steps (shown in the shaded regions) to construct the 9-7 lowpass

and highpass filtered coefficients y2k; y2k+1.5.1.1 Fast convolution-decimation

A fast algorithm for convolution-decimation can be derived from the factorization (17). After the cal-

culation of the first lowpass and highpass coefficients (y0; y1)T = (h;g)(x0; x1)t, each new lowpass andhighpass coefficients only requires a cascade of m ladder steps (18) and (19). The principle of the al-

gorithm is shown in Fig. 6, where them ladder steps that are necessary to calculate (y2k; y2k+1) appear11

in the shaded region. The computational complexity of the computation of (y2k; y2k+1) is therefore3m. A standard implementation of convolution-decimation requires 8m+ 2 per two samples. Thefast ladder structures divides the number of operations by a factor 8m+23m > 2:67.5.2 Two dimensional case

In 2-D, one needs to apply the transformW along the rows, and along the columns. One could use

the 1-D factorization described above to process independently rows and columns, and this would

yield a speed up factor of:

4(4m+ 1)(m+ 1)3m(2m+ 3) (20)

Indeed, the computational complexity of a regular convolution-decimation is 4(4m+ 1)(m+ 1) op-erations in order to obtain the GG;HG;GH and HH values at each point. If we were to follow thisapproach the computational complexity for a filter of size 9, would decrease by a factor of at most

2.57.

However one can significantly improve this result by merging the horizontal, and vertical fac-

torizations. This new algorithm provides a speed up factor of 4, and our numerical experiments

validated this result. This algorithm is a new and original contribution of the present work, and was

never proposed in [11, 17].

LetWx be the biorthogonal filters that is applied along the rows, and letWy be the filters that is

applied along the columns. Both filters can be decomposed as in (17):

Wx = Πml=1Hx; �lGx; �l and Wy = Πml=1Hy; �lGy; �l (21)

The 2-D convolution-decimation operatorW2 is given by:

W2 =WxWy = Πml=1Πmk=1Hy; �lGy; �lHx; �kGx; �k (22)

Because the terms commute, we combine Hx; �l andHy;�l into one 2-D filter:H2; �l =Hy; �lHx; �l (23)

and we do the same for Gx; �l andHy;�l :G2; �l = Gy; �lGx; �l (24)

We obtain the following factorization ofW2:

W2 = Πml=1H2; �lG2; �l (25)

12

5.2.1 Fast convolution-decimation

This decomposition suggests a 2-D ladder structure. Each 2-D ladder step H2; �l can be written asfollows:8>>>>>><>>>>>>: x

l+12i+1;2 j+1 = xl2i+1;2 j+1xl+12i;2 j+1 = xl2i;2 j+1 + �l(xl2i�1;2 j+1 + xl2i+1;2 j+1)xl+12i+1;2 j = xl2i+1;2 j + �l(xl2i+1;2 j�1 + xl2i+1;2 j+1)xl+12i;2 j = xl2i;2 j + �l(xl2i;2 j�1 + xl2i;2 j+1 + xl+12i�1;2 j + xl+12i+1;2 j) (26)

where xl are the values on the grid at level l, and xl+1 are values that have been already calculatedat the current level l + 1, and have been stored (see Fig. 7). Similarly, the 2-D highpass ladder stepG2; �l has the following expression:8>>>>>><>>>>>>: x

l+12i;2 j = xl2i;2 jxl+12i;2 j+1 = xl2i;2 j+1 + �l(xl2i;2 j + xl2i;2 j+2)xl+12i+1;2 j = xl2i+1;2 j + �l(xl2i;2 j + xl2i+2;2 j)xl+12i+1;2 j+1 = xl2i+1;2 j+1 + �l(xl2i+1;2 j + xl2i+1;2 j+2 + xl+12i;2 j+1 + xl+12i+2;2 j+1) (27)

2i-1

2i

2i+12j 2j-12j+1

2j+1 2j2i-1

2j-1

2i

2i+1

level l

level l+1

Figure 7: Elementary 2-D ladder step H2; �l . Points in the shaded regions at level l are the input to thecomputation of xl+12i;2 j; xl+12i+1;2 j; xl+12i;2 j+1; xl+12i+1;2 j+1.The number of elementary operations necessary to compute a 2-D ladder step is: 8 additions and

3 multiplications. The 2-D ladder structure is shown in Fig. 8. After having computed the first point

W(x0; y0)T, the steady mode in 2-D is similar to the 1-D case: we only need to apply the filtersH2; �l ,and G2; �l at a small number of positions at each level l. In 2-D we also need m layers of laddersto cover the grid. It is easy to verify that the total number of operations per four filtered values

13

HH;HL; LH; LL is: 8<: 11 m2 (m2 + 1) if m is even

11�m+12

�2otherwise

(28)

Figure 8: Five layers of 2-D ladder steps. Shaded blocks describe the pixels where a computation needs to be

performed in order to obtain the four filtered values HHxi; j;HLxi; j; LHxi; j; LLxi; j a the lowest levelFig. 8 shows in dark gray the 2-D ladder steps needed to computeW(xk; yk)T. In the case of the

9-7 filters we get a theoretical computational gain of 4.54. Another major advantage of the 2-D ladder

structure is that it does not require to transpose the image ; a benefit that is even more important

in 3-D. Some of the outputs generated by a ladder step have to be stored temporarily, and this

requirement may influence the global performance of the algorithm. Finally, it is clear that the 2D

ladder structure lends itself to a parallel implementation where several ladder step can be run in

parallel as a moving front at each level.

5.2.2 Experimental validation of the fast convolution-decimation

We report in Table 1 the average processor time needed for computing the convolution-decimation

using both a regular implementation, and the ladder structure. Times are given for an entire image.

The processing was performed on a standard Pentium, running Linux. No particular optimization

was performed. We note that the theoretical speed up factor is reached for images of size 1024� 1024.As the image size decreases the speed up factor slowly decreases to 3:67. We note that there is noother existing algorithm that permits to achieve comparable speed up for short filters.

14

Image size Convolution-decimation Ladder speed up256� 256 110 ms 30 ms 3.67512� 512 477 ms 130 ms 3.671024� 1024 2146 ms 536 ms 4.0

Table 1: Average processor time needed for computing the convolution-decimation on an entire image, using

a regular implementation, and the ladder structure.

6 Quantization and entropy coding

After several experiments we have noticed that the most efficient way of organizing the coefficients

consists in scanning the wavelet packet subbands by increasing frequency. We start with the smallest

frequency wavelet packet band, and continue until the wavelet packet with the highest frequency.

The 2-D order in the frequency plane is based on the l1 norm in R2 .6.1 Laplacian based scalar quantization

Within each subband the distribution of thewavelet packet coefficients is approximatedwith a Lapla-

cian distribution. As shown in [3] Generalized Gaussian models provide a better fit than Laplacian

models, but they only outperform the Laplacian models by a small margin [3]. Furthermore, Lapla-

cian distribution yields tractable computations of the optimal entropy constrained scalar quantizers

[31], as well some near optimal scalar quantizer [31]. A particularly efficient near optimal scalar

quantizer relies on the three ingredients:� [�∆+ �;∆� �], the symmetric dead-zone ,� ∆, the quantizer step size,� �, the reconstruction offsetThe principle of the quantizer is shown in Fig 9. The optimal (for mean square error) reconstruction

offset is given by [31] : � = 1� ∆ e�∆

1�e�∆ . The theoretical performance of the quantizer is very close to

the optimal behavior of the entropy constrained scalar quantizer [31], but has a much simpler rule

for reconstruction. Finally, we apply a dichotomic search to find the optimal value of ∆ in order to

exactly match the budget.

15

δ

δ

∆

∆

∆

δ

δ-

2--

3 -

2

∆3∆+δ ∆+δ δ+∆-

∆- δ 2∆- δ

∆2-

δ

Figure 9: Scalar quantizer, with a dead zone.

6.2 Ordering of the coefficients

After scalar quantization, the positions of the non-zero output levels are recorded in a significance

map. Because large output levels often appear in clusters, one can exploit the spatial correlation

between neighboring pixels for the lossless compression of the significance map. Correlations also

exist across subbands. In the case of the wavelet basis several authors have exploited these corre-

lations to describe with quadtrees large regions where the quantized coefficients are equal to zero

[12, 18, 26, 28, 33]. Such partitioning techniques take full advantage of the self similar structure of

natural images across scales [13]. While correlations also exist across subbands in a wavelet packet

basis, in general we cannot condition the probability of significance of a given pixel on the probabil-

ity of significance of the pixels in its parent subband. Because the wavelet packet basis is adapted to

the frequency content of the target image, we can expect significant high frequency coefficients.

Another technical difficulty comes from the fact that with a generalwavelet packet treewe usually

cannot define the “parent” subband. Some attempts have been made to use zerotrees with wavelet

packets, but the approach in [34] effectively requires a wavelet-like structure: the scale index j of

the wavelet packet node should be a non decreasing function of the frequency index n of the node.

Because of all these issues, we limit the context to be a spatial context.

The spatial context is defined as follows. For each subband, we scan the pixels inside that subband

using a Hilbert space filling curve (see Fig. 10). This self similar curve exploits the coherence in

two dimensions, and it guarantees that [2]: (i) all pixels in an area are visited before moving on to

another area, (ii) the sequentially visited pixels are always adjacent in the 2-D image plane. The

spatial context is then defined as the nC pixels that appear before the current pixel in the Hilbert scan.

In the experiments we use nC = 3. We then use a nC order arithmetic coder to encode the Hilbertscan of the significance map. This efficient context modeling permits to condition the probability of

16

significance of a given pixel on the probability of significance of its neighbors.

Figure 10: Hilbert space filling curve.

6.3 Entropy coding

The significance map is coded with a nC order arithmetic coder. The signs of the output levels are

not entropy coded, and are simply packed. The magnitude of the output levels are variable length

encoded, using an arithmetic coder to encode the length. The best basis geometry is described by a

quadtree. We code the quadtree, with an adaptive arithmetic coder.

7 Experiments

We implemented the Fast Wavelet Packet (FWP) coder and decoder, and an actual bit stream is

generated by the coder. Note that for all experiments we generated a compressed file with a size

equal to the targeted budget. The FWP code that was used for the experiments is available fromhttp : ==noodle:med:yale:edu= � meyer=pro�le:html. For all experiments we have used the factorizedbiorthogonal filters 9-7 [6]. We present the results of thewavelet packet compression algorithm, using

the following four test images: 512� 512 “Barbara”, 512� 512 “Fingerprints”, 512� 512 “Houses”,and 512� 512 “Lighthouse”. All these images are difficult to compress because they contain a mix-ture of large smooth regions, and long oscillatory patterns. In order to evaluate the performance of

our algorithm, we compared it to one of the best wavelet coder that was available to us: the SPIHT

wavelet coder of Amir Said and William A. Pearlman [26]. A comparison with other wavelet coders

(e.g. [28, 30, 33]) would result in different but comparable results. The performance of the algorithm

is summarized in Tables 1, and 2. We work with 8 bit images, and we define the Peak Signal to Noise

Ratio (PSNR) of the compressed image Ic as PSNR= 10 log10 25521N2

∑N�1i; j=0 jI(i; j)�Ic(i; j)j2 . In terms of perfor-mances, it took 1564 milliseconds to calculate all the coefficients of a 6 level wavelet packet tree, to

calculate the cost of each node, and to prune the tree. It took another 6899 milliseconds to quantize

the image Barbara for a compression ratio of 32. All computations were performed on a regular Pen-

tium. Note that most of the time spent during the quantization is spent on the dichotomic search

17

for the optimal value of the quantization step. Because we want to match exactly the budget (with a

precision of one byte), we iterate the quantization procedure a large number of times.

7.1 Artifacts created by wavelet packets

In general the quantization of a wavelet coefficient<x; 'k> has the following effect: it will add on theoriginal image the vector (<x; 'k>�Q(<x; 'k>))'k. The size of the artifacts depends on the size ofthe support of the function 'k. Large wavelet coefficients occur around edges, and the quantizationof these coefficients result in ringing artifacts. If the quantization affects fine scale wavelets, that

have short supports (large j), then only a few neighbors will be affected. The case of wavelet packets

is more complicated. Because a wavelet packet basis usually contains many oscillatory waveforms

with a precise frequency localization, the fine scale wavelets with short support, may no longer be

part of the basis. Edges need to be reconstructed from wavelet packets that have a longer support,

and quantization of these coefficients will consequently affect a larger region around edges. Our

implementation address this problem by preventing further splits of the higher frequency bands

below a certain scale (i.e. we do not iterate equation (3) when j becomes too small).

Barbara

Fig. 13 shows the original image Barbara. Fig. 14 shows the geometry of the best wavelet packet basis

chosen for a compression of 32. The smaller boxes in the central column of the segmentation map

correspond to large patterns oscillating in the horizontal direction. These basis functions obviously

match the texture on the scarf, and on the legs of Barbara, as well as the checker pattern on the

tablecloth. Because the basis is well fitted for the image, the FWP coder has no difficulty preserving

the oscillatory texture everywhere in the image (see Fig. 15), whereas SPIHT entirely fails to represent

the texture in these regions (see Fig 16). As mentioned in section 7.1, ringing artifacts are visible at

sharp edges (around the legs of the table, and the arms of Barbara), both for SPIHT and FWP. As

shown in the magnified details in Fig. 17, and 18 the artifacts of the wavelets and the wavelet packets

have similar intensity, but the artifacts created by wavelet packets affect more pixels around the edge.

Because of its ability to reproduce the texture so well, FWP significantly outperforms SPIHT, by 1.14

dB on the average.

18

Fingerprints

The original Fingerprints image is shown on the left of Fig. 19. The geometry of the best wavelet

packet basis chosen for a compression of 32 is shown in Fig. 20. The smaller boxes in Fig. 20 corre-

spond to large patterns oscillating in the vertical, horizontal, and diagonal directions. We expected

such a basis for an image composed of concentric circular patterns. We note that the wavelet packets

can only provide “criss-cross” patterns, and that a better basis should contain steerable filters [22].

Fig. 21 shows the result of a compression of 32 using SPIHT, and Fig. 22 shows the result of FWP at

the same compression rate. A magnified region of the image is shown in Fig. 23 and Fig 24. We note

that FWP decoded image is much crisper than the result of SPIHT. FWP also outperforms SPIHT in

terms of PSNR.

Houses

Fig. 25 shows the original image Houses, and Fig. 26 shows the geometry of the best wavelet packet

basis chosen for a compression of 25. The small boxes on the first row and first column of the best

basis correspond to the many horizontal or vertical oscillating patterns that are present in the image.

Fig. 27 shows the result of a compression of 25, using SPIHT, and Fig. 28 is the result of FWP at

the same compression rate. Two magnified regions of the image are available in Figs. 29, 30, 31,and

32. We notice in Fig. 30 and in Fig. 32 that FWP has kept all the details on the shutters, as well as

the texture on the roof. All these details have been erased by SPIHT. Ringing artifacts are visible on

the left border of the central house, where the intensity abruptly changes. While similar artifacts are

visible for wavelets, the artifacts have a larger extent for the wavelet packets, as explained in section

7.1.

Lighthouse

The last image, Lighthouse, is shown in the left of Fig. 33. The geometry of the best wavelet packet

basis chosen for a compression of 40 is shown in Fig. 34. Again, the best basis is selecting many basis

functions that correspond to horizontal or vertical oscillating patterns. Fig. 35 shows the result of a

compression of 40 using SPIHT, and Fig. 36 shows the result of FWP at the same compression rate.

A detailed view, shown in Fig. 37, demonstrates that the wavelet packet coder has better preserved

the texture on the lighthouse, and has not smeared the fence. Again, artifacts on the limb of the

lighthouse are clearly noticeable both for SPIHT (see Fig 37), and FWP (see Fig 38). While FWP

outperforms SPIHT in terms of PSNR at low bit rates, SPIHT performs as poorly as FWP in terms of

19

ringing artifacts.

8 Discussion and Conclusion

This work provides a fast numerical implementation of the best wavelet packet algorithm, and

demonstrates that an advantage can be gained by constructing a basis adapted to a target image

without requiring an absurdly large amount of time. We designed a fast wavelet packet coder that,

combined with a simple quantization scheme, could significantly outperform a sophisticatedwavelet

coder, with a negligible increase in computational load. We developed a new fast 2-D convolution-

decimation algorithm with factorized non-separable 2-D filters. The algorithm is 4 times faster than

a standard convolution-decimation. We proposed a cost function that takes into account the cost of

coding the output levels of the quantizers, and the cost of coding the significance map. A context-

based entropy coder was used to condition the probability of significance of a given pixel on the

probability of its neighbors using a space filling curve.

An extensive evaluation of the algorithm was performed on a large class of textured images. Our

evaluation included not only quantitative figures (PSNR), but also subjective visual appearance. On

the one hand our results indicate that our wavelet packet coder tends to create artifacts at the same

locations (i.e at strong edges) as the wavelet coder does, with a similar intensity. The main difference

is that artifacts created by wavelet packets affect more pixels, than those created by wavelets, as the

analysis in section 7.1 suggested. On the other hand, the basis selected by the algorithm is usually

well adapted to the target image, and the wavelet packet coder has no difficulty preserving the os-

cillatory textures. Because of its ability to reproduce textures so well, the FWP coder significantly

outperforms SPIHT on images such as Barbara, and Fingerprints, both visually and in term of PSNR.

A number of open interesting problems have been raised by this work. We realized that when

coding images that contain a mixture of smooth and textured features, the best basis algorithm is

always trying to find a compromise between two conflicting goals: – describe the large scale smooth

regions and edges, and describe the oscillatory patterns. The best basis may not always yield “visu-

ally pleasant” images. As explained in section 7.1, we notice ringing artifacts on the border of smooth

regions when the basis is mostly composed of oscillatory patterns. This problem could be addressed

by considering other criteria to measure the image quality, as suggested in [14]. Yet another approach

consists in giving up the basis structure, and picking up a collection of functions from a large dictio-

nary that can include wavelets, and oscillatory waveforms [5, 19, 23]. In our very recent work [21]

we explore a different approach. We propose to encode an image with a multi-layered representation

20

technique, based on a cascade of compressions, using at each time a different basis.

References

[1] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies. Image coding using wavelet transform.

IEEE Trans. on Image Processing, 1(2):205–220, 1992.

[2] T. Bially. Space filling curves: Their generation and their application to bandwith reduction.

IEEE Trans. Inf. Theor., 15,(6):658–664, 1969.

[3] K.A. Birney and T.R. Fischer. On the modeling of DCT and subbdand image data for compres-

sion. IEEE Trans. on Image Process., 4(2):186–193, 1995.

[4] C.M. Brislawn. Preservation of subband symmetry in multirate signal coding. IEEE Trans. on

Signal Process., 43, (12):3046–50, Dec. 1995.

[5] S.S. Chen. Basis Pursuit. PhD thesis, Stanford University, Dept. of Statistics, November 1995.

[6] A. Cohen, I. Daubechies, and J. Feauveau. Bi-orthogonal bases of cqmpactly supportedwavelets.

Comm. Pure Appl. Math., 45:485–560, 1992.

[7] R.R. Coifman and Y. Meyer. Size properties of wavelet packets. In Ruskai et al, editor,Wavelets

and their Applications, pages 125–150. Jones and Bartlett, 1992.

[8] R.R. Coifman and M.V. Wickerhauser. Entropy-based algorithms for best basis selection. IEEE

Trans. on Information Theory, 38(2):713–718, March 1992.

[9] L. Cooper and M.W. Cooper. Introduction to Dynamic Programming. Pergamon, 1988.

[10] T. Cover and J. Thomas. Elements of Information Theory. John Willey, 1991.

[11] I. Daubechies and W. Sweldens. Factoring wavelet transforms into lifting steps. J. Fourier Anal.

Appl., to appear, 1997.

[12] G. Davis and S. Chawla. Image coding using optimized significance tree. In IEEE Data Compres-

sion Conference -DCC’97, pages 387–396, 1997.

[13] G.M. Davis. A wavelet-based analysis of fractal image compression. IEEE Trans. on Image Pro-

cessing, 7(2):141–154, 1998.

21

[14] R.A. DeVore, B. Jawerth, and B.J. Lucier. Image compression throughwavelet transform coding.

IEEE Trans. on Information Theory, 38,(2):719–746, March 1992.

[15] R.E. Van Dyck and T.G. Marshall. Ladder realizations of fast subband/VQ coders with diamond

support for color images. In IEEE Int. Sympos. on Circ. & Sys., pages I–677–70, 1993.

[16] E. Fossgaard. Fast computational algorithms for the discrete wavelet transform and applications

of localized orthonormal bases in signal classification. Technical report, Dept of Mathematics

and Statistics, University of Tromsø, Norway, Nov. 1997.

[17] A.A.C. Kalker and I.A. Shah. Ladder structures for multidimensional linear phase perfect re-

construction filter banks and wavelets. In Visual Com. and Image Process.’92, pages 12–20, 1992.

[18] A.S. Lewis and G. Knowles. Image compression using the 2-D wavelet transform. IEEE Trans.

on Image Processing, 1,(2):244–250, 1992.

[19] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Trans. on

Signal Processing, 41(12):3397–3415, Dec. 1993.

[20] T.G. Marshall. U-L block-triangular matrix and ladder realizations of subband coders. In

ICASSP-93, pages 177–80, 1993.

[21] F.G. Meyer, A.Z. Averbuch, J-O. Stromberg, and R.R. Coifman. Multi-layered image representa-

tion: Application to image compression. In IEEE Int. Conf.on Image Process., ICIP’98.

[22] F.G. Meyer and R.R. Coifman. Brushlets: a tool for directional image analysis and image com-

pression. Applied and Computational Harmonic Analysis, pages 147–187, 1997.

[23] R. Neff and A. Zakhor. Very low bit-rate video coding based on matching pursuits. IEEE Trans.

Circ. & Sys. for Video Tech., 7, 1:158–171, Feb. 1997.

[24] K. Ramchandran and M. Vetterli. Best wavelet packet bases in a rate-distortion sense. IEEE

Trans. on Image Processing, 2(2):160–175, April 1993.

[25] O. Rioul and P. Duhamel. Fast algorithms for discrete and continuous wavelet transforms. IEEE

Trans. on Information Theory, 38(2):569–586, March 1992.

[26] Amir Said andWilliam A. Pearlman. A new fast and efficient image codec based on set partion-

ing in hierarchical trees. IEEE Trans.on Circ.& Sys. for Video Tech., 6:243–250, June 1996.

22

[27] I.A. Shah and A.A.C. Kalker. On ladder structures and linear phase conditions for biorthogonal

filter banks. In ICASSP-94, pages III,181–184, 1994.

[28] J.M. Shapiro. Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. on

Signal Processing, 41(12):3445–3462, Dec. 1993.

[29] Y. Shoham and A. Gersho. Efficient bit allocation for an arbitrary set of quantizers. IEEE Trans.

on Accoustics, Speech, and Signal Process., 36(9):1445–1453, Sept. 1988.

[30] P. Sriram and M.W. Marcellin. Image coding using wavelet transforms and entropy-constrained

treillis quantization. IEEE Trans. on Image Processing, 4:725–733, 1995.

[31] G.J. Sullivan. Efficient scalar quantization of exponential and Laplacian random variables. IEEE

Trans. on Information Theory, 42(5):1365–1374, Sept. 1996.

[32] M.V. Wickerhauser. Adapted Wavelet Analysis from Theory to Software. A.K. Peters, 1995.

[33] Z. Xiong, K. Ramchandran, and M.T. Orchard. Space-frequency quantization for wavelet image

coding. IEEE Trans. on Image Process., 6(5):677–693, 1997.

[34] Z. Xiong, K. Ramchandran, and M.T. Orchard. Wavelet packets coding using space-frequency

quantization. IEEE Trans. on Image Process., 7(6):892–898, 1998.

23

BarbaraRate (bpp) Compression SPIHT FWP1 8 36.41 37.240.8 10 34.66 35.730.67 12 33.40 34.580.5 16 31.39 32.820.4 20 30.10 31.530.308 26 28.66 30.120.25 32 27.58 29.120.20 40 26.65 28.110.16 50 25.91 27.170.125 64 24.86 26.220.10 80 24.25 25.40

FingerprintsRate (bpp) Compression SPIHT FWP1 8 36.01 36.850.8 10 34.29 35.090.67 12 33.07 33.700.5 16 31.27 31.790.4 20 29.91 30.430.308 26 28.41 28.960.25 32 27.12 27.590.20 40 26.00 26.800.16 50 25.10 25.750.125 64 23.97 24.710.10 80 23.23 23.89Table 2: Coding results. Left: 8bpp. 512x512 Barbara. Right: 8bpp. 512x512 FingerprintsHousesRate (bpp) Compression SPIHT FWP1 8 30.84 30.640.8 10 29.14 29.130.67 12 28.07 28.040.5 16 26.15 26.480.4 20 25.06 25.390.308 26 24.04 24.230.25 32 23.17 23.410.20 40 22.33 22.590.16 50 21.65 21.860.125 64 20.98 21.110.10 80 20.37 20.49

LighthouseRate (bpp) Compression SPIHT FWP1 8 34.03 34.060.8 10 32.69 32.690.67 12 31.64 31.770.5 16 30.25 30.440.4 20 29.29 29.550.308 26 28.27 28.630.25 32 27.43 27.980.20 40 26.58 27.280.16 50 25.85 26.590.125 64 24.98 25.860.10 80 24.40 25.23Table 3: Coding results. Left: 8bpp. 512x512 Houses. Right: 8bpp. 512x512 Lighthouse

24

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PSNR

(dB)

bit rate (bpp)

Barbara 512x512

FWPSPIHT

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PSNR

(dB)

bit rate (bpp)

Fingerprints 512x512

FWPSPIHT

Figure 11: Comparisons of FWP, and SPIHT [26] for Barbara (left), and Fingerprints (right).

20

21

22

23

24

25

26

27

28

29

30

31

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PSNR

(dB)

bit rate (bpp)

Houses 512x512

FWPSPIHT

25

26

27

28

29

30

31

32

33

34

35

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PSNR

(dB)

bit rate (bpp)

Lighthouse 512x512

FWPSPIHT

Figure 12: Comparisons of FWP, and SPIHT [26] for Houses (left), and Lighthouse (right).

25

Figure 13: Barbara: original image. Figure 14: Best wavelet packet basis geometry

of Barbara at 0.25 bpp.

Figure 15: Decoded Barbara using SPIHT, bit

rate = 0.25 bpp, PSNR = 27.58 dB.

Figure 16: Decoded Barbara using FWP, bit rate

= 0.25 bpp, PSNR = 29.12 dB.

26

Figure 17: Magnified detail of decoded Barbara

using SPIHT, bit rate = 0.25 bpp.

Figure 18: Magnified detail of decoded Barbara

using FWP, bit rate = 0.25 bpp.

Figure 19: Fingerprints: original image. Figure 20: Best wavelet packet basis geometry

of Fingerprints at 0.20 bpp.

27

Figure 21: Decoded Fingerprints using SPIHT,

bit rate = 0.20 bpp, PSNR = 26.00 dB.

Figure 22: Decoded Fingerprints using FWP, bit

rate = 0.20 bpp, PSNR = 26.80 dB.

Figure 23: Magnified detail of decoded Finger-

prints using SPIHT, bit rate = 0.20 bpp.

Figure 24: Magnified detail of decoded Finger-

prints using FWP, bit rate = 0.20 bpp.

28

Figure 25: Houses: original image. Figure 26: Best wavelet packet basis geometry

of Houses at 0.32 bpp.

Figure 27: Decoded Houses using SPIHT, bit

rate = 0.32 bpp, PSNR = 24.17 dB.

Figure 28: Decoded Houses using FWP, bit rate

= 0.32 bpp, PSNR = 24.40 dB.

29

Figure 29: Magnified detail (shutters) of de-

coded Houses using SPIHT, bit rate = 0.32 bpp.

Figure 30: Magnified detail (shutters) of de-

coded Houses using FWP, bit rate = 0.32 bpp.

Figure 31: Magnified detail (roof) of decoded

Houses using SPIHT, bit rate = 0.32 bpp.

Figure 32: Magnified detail (roof) of decoded

Houses using FWP, bit rate = 0.32 bpp.

Figure 33: Lighthouse: original image. Figure 34: Best wavelet packet basis geometry

of Lighthouse at 0.20 bpp.

30

Figure 35: Decoded Lighthouse using SPIHT,

bit rate = 0.20 bpp, PSNR = 26.58 dB.

Figure 36: Decoded Lighthouse using FWP, bit

rate = 0.20 bpp, PSNR = 27.28 dB.

Figure 37: Magnified detail of decoded Light-

house using SPIHT, bit rate = 0.20 bpp.

Figure 38: Magnified detail of decoded Light-

house using FWP, bit rate = 0.20 bpp.

31

Fast adaptive wavelet packet image compression

Documents

Transcript of Fast adaptive wavelet packet image compression