Download - Document Image Enhancement Using Directional ... - CiteSeerX

Image Enhancement of Historical Documents using Directional Wavelet

Qian Wang1, Tao Xia2, Chew Lim Tan1, Lida Li1

1School of Computing, National University of Singapore 3, Science Drive 2, Singapore 117543

[email protected], [email protected], [email protected]

2Centre for Wavelets, Approximation and Information Processing Department of Mathematics, National University of Singapore

2, Science Drive 2, Singapore 117543 [email protected]

Abstract - This paper proposes a novel algorithm to clean up a large collection of

historical handwritten documents kept in the National Archives of Singapore. Due to the

seepage of ink over long period of storage, the front page of each document has

been severely marred by the reverse side writing. Earlier attempts have been made to

match both sides of a page to identify the offending strokes originating from the back so

as to eliminate them with the aid of a wavelet transform. Perfect matching, however, is

difficult due to document skews, differing resolutions, inadvertently missing out reverse

side and warped pages during image capture. A new approach is now proposed to do

away with double side mapping by using a directional wavelet transform that is able

to distinguish the foreground and reverse side strokes much better than the conventional

wavelet transform. Experiments have shown that the method indeed enhances the

readability of each document significantly after the directional wavelet operation without

the need for mapping with its reverse side.

Keywords – Document image analysis, directional wavelet transform, thresholding

AMS Subject Classification – 65T60, 68U10

2

1. Introduction

The National Archives of Singapore keeps a large collection of double-sided

handwritten historical documents. Due to seeping of ink over long periods of storage, the

front page of each document has been severely marred by the reverse side writing. As the

original copies of these historical documents are carefully preserved and not available for

public reading, photocopying of these documents for public access makes the documents

even more difficult to read (See Figure 1). There is thus a need to enhance these images

by removing the interfering strokes.

When we were first approached by the National Archives of Singapore, we tried to

look for existing methods to deal with this problem. In the process, we found Negishi�s

several automatic thresholding algorithms [1] in extracting character bodies from the

noisy background. The algorithms dealt with terribly dirty and considerably large images,

and cases where the gray levels of the character parts overlap with that of the

background. We later found Liang and Ahmadi�s [2] morphological approach to extract

text strings from regular periodic overlapping text/background images. Searching on

further saw two binarization algorithms by Chang et al. [3] and Jiang and Hong [4] using

histogram-based edge detection and thin line modeling, respectively. Finally, Don�s [5]

method utilized the noise attribute features based on a simple noise model to overcome

the difficulty that some objects do not form prominent peaks in the histogram.

(a) (b)

While appealing

problem, mainly be

the original foregro

foreground strokes a

presents such a cas

darker than the fore

entirely different ap

In our earlier wor

identify the offendi

from the front. It

stronger than its orig

In [6], a point p

between two sides

determines which p

The adopted algorit
Figure 1. Two sample historical document images.
3

as they are, the above methods cannot be applied directly to our

cause the interfering strokes appear in varying intensities relative to

und strokes in different documents. In certain cases, the edges of the

re more prominent than the interfering strokes. Image (a) in Figure 1

e. On the other hand, the interfering strokes sometimes look even

ground strokes, as we can observe from image (b) in Figure 1. So an

proach has to be resorted for these historical documents.

k [6][7], we introduced methods for matching both sides of a page to

ng strokes originating from the back so as to eliminate such strokes

is based on the observation that the interfering strokes cannot be

inating stroke because not all ink seeped through the page.

attern matching algorithm is used to retrieve the correspondence

. Once the matching pixels are found, the intensity difference

ixel is originated from the reverse side and should then be removed.

hm is tolerant to varying transformation due to image distortion. On

4

the other hand, as the matching is processed pixel by pixel, the method is not efficient

enough to be used in real-time applications.

We further proposed a wavelet transform [7] to strengthen the foreground strokes and

weaken the interfering strokes. The method first manually matches both sides of a

document page. Then the foreground and interfering strokes are identified roughly, which

guide the revision of wavelet coefficients during the wavelet transform. The wavelet

transform is performed iteratively to improve the readability of the document image step

by step. Perfect mapping of strokes from both sides, however, is difficult due to reasons

such as (1) different document skews and resolutions during image capture of both sides,

(2) inadvertently missing out of a reverse page during scanning, and (3) warped surfaces

caused by the placement of the thick bound volume of the documents on the scanner�s

glass plate. It is with these considerations that another approach without the need for the

reverse page is proposed in this paper.

It is observed that the writing style of these documents is slanting from the lower-left

to upper-right. In contrast, by its mirror image effect, the interfering strokes originating

from the reverse side are slanting from the upper-left to lower-right. We therefore would

like to take advantage of this distinguishing feature to separate the foreground strokes and

the interfering strokes using techniques such as wavelet. However, the conventional

wavelet transform for image highlights and separates the horizontal and vertical edges

into different wavelet frequency domain [8]. Therefore it is not suitable for our

application. To exploit the directional property of the strokes in these documents, we

develop a directional wavelet transform for 2-D image which separates the foreground

strokes and the interference mainly in different wavelet frequency domains. A theoretical

5

analysis shows that it can be implemented by orientation filtering operation of

conventional filters.

In section 2, we will elaborate the directional wavelet based method. Section 3 will

present the experimental results of the proposed system, which is followed by the

conclusion in section 4.

2. Proposed Method

The writing style of the document determines the differences between the orientations

of the foreground and the interfering strokes. Basically the foreground and the interfering

strokes are slanting along the directions of 45° and 135° respectively. It is known that a

two-dimensional wavelet transform extracts the spatial information contained in the

image, more precisely, the horizontal, vertical and diagonal components. However, in our

application, we need to differentiate 45° and 135° components. One way is to rotate the

document image 45° clockwise. Thus the foreground strokes become horizontal and the

interfering strokes are parallel to the vertical axis. But this approach introduces additional

storage, computational cost and some distortion of image caused by the rotation

operation. A more straightforward method is to use a directional wavelet transform for

the image, in which the wavelet filters are convolved along the directions of 45° and 135°

instead of horizontally and vertically, so that the foreground strokes will be captured in

one component and on the other hand, the interfering strokes will contribute to the other

component.

6

2.1. Directional Wavelet Transform

Let ),( yxf denote the 2-D continuous signal. The directional wavelet transform for

),( yxf is as follows,

>Ψ<=

>Ψ<=

>Ψ<=

>Φ<=

∈

∈

∈

∈

2

2

2

2

),(3

,,3

),(2

,,2

),(1

,,1

),(,,

)),(),,((),)((

)),(),,((),)((

)),(),,((),)((

)),(),,((),)((

Znmnmjj

Znmnmjj

Znmnmjj

Znmnmjj

yxyxfnmfD

yxyxfnmfD

yxyxfnmfD

yxyxfnmfC

(1)

where j is the scale index, ),)(( nmfC j are the wavelet approximation coefficients of

),( yxf at scale j2 , while )3,2,1(),)(( =knmfD kj are the wavelet coefficients

accordingly. For simplicity, we use 321 ,,, jjjj DDDC to represent fDfDfDfC jjjj321 ,,, .

In the normal wavelet transform, 2-D scaling and wavelet functions are constructed by

a tensor product of 1-D scaling and wavelet functions of two separable variables [8]. The

proposed 2-D directional wavelet transform for image is constructed as follows,

)()(),(

)()(),(

)()(),(

)()(),(

1,1,3

,,

1,1,2

,,

1,1,1

,,

1,1,,,

yxyx

yxyx

yxyx

yxyx

njmjnmj

njmjnmj

njmjnmj

njmjnmj

ψψ

ψφ

φψ

φφ

=Ψ

=Ψ

=Ψ

=Φ

(2)

where ),2

(2

1)(, mxx jjmj −= φφ ),2

(21)(, mxx jjmj −= ψψ ψφ, are conventional 1-D

scaling and wavelet functions , and ,cossinsincos

,1

1

−

=

=

θθθθ

cAyx

Ayx

In this paper we

use parameters 4/,2c πθ == for document image analysis application.

7

It is well known that wavelet is translation variant. However translation invariance is

desired for some applications, such as signal denoising and our document image analysis.

Translation invariant wavelet transform is proposed for signal applications [9]. It is

equivalent to wavelet transform without downsampling in other words, frames in some

sense. The following theorem gives the translation invariant directional wavelet

transform.

Theorem: Using the orthogonal/biorthogonal 2-D directional wavelet defined in (1)

and (2), directional wavelet transform decomposition and reconstruction are computed as

in (3) and (4), where ),(0 nmC is the approximation coefficient of scale 0, and inserting

12 −j zeros between each sample of filter )(nk makes the dilated filter )(nk j , and

nn ngnh )}({,)}({ are the analysis filters associated with scaling and wavelet functions

respectively, nn ngnh )}(~{ ,)}(~{ are the synthesis filters associate with 1-D scaling and

wavelet functions, )()(~ ),()(~ ngngnhnh == hold for the orthogonal wavelets and do not

hold for biorthogonal wavelets. The proof is given in the appendix.

2),(,

)()(),(),(31

)()(),(),(21

)()(),(),(11

)()(),(),(1

ZnmZjl k

ljgkjglknlkmjCnmjD

l kljhkjglknlkmjCnmjD

l kljgkjhlknlkmjCnmjD

l kljhkjhlknlkmjCnmjC

∈∈

∑∑ ++−+=+

∑∑ ++−+=+

∑∑ ++−+=+

∑∑ ++−+=+

(3)

))(~)(~),(

)(~)(~),(

)(~)(~),(

)(~)(~),((41),(

31

21

11

1

∑∑∑∑∑∑∑∑

−−+−

+−−+−

+−−+−

+−−+−=

+

+

+

+

l kjjj

l kjjj

l kjjj

l kjjjj

lgkglknlkmD

lhkglknlkmD

lgkhlknlkmD

lhkhlknlkmCnmC

(4)

8

2),(, ZnmZj ∈∈

In the application, the original image is regards as 20 Z)n,m()}n,m(C{∈

. It is obvious that

(3) and (4) can be implemented by the convolution of conventional wavelet filters along

45° and 135°.

The three-level wavelet decomposition generates the wavelet coefficients and

approximation coefficients as below. The decomposition results for the first level

decomposition are shown in Figure 2.

}3,2,1;3,2,1),,(),,({),( 3 === kjnmDnmCnmWf kj (5)

(a1) (b1)

(a2) (b2)

(a3) (b3)

(a4) (b4)

As w

with the

,(2 nmD j

further

enhance

Figure 2. Directional wavelet decomposition results: (a1)(b1) ),(1 nmC ;

(a2)(b2) ),(1 nmD ; (a3)(b3) ),(2 nmD ; (a4)(b4) ),(3 nmD .

9

e can observe from the images above, the foreground and background strokes

diagonal orientation are distinct and highlighted in the images of ),(1 nmDj and

) respectively, therefore directional wavelet transform output is suitable for

image processing operations for the document image [7]. An alternative is to

),(1 nmDj and smear ),(2 nmD j as below,

Do { ),(),(~ nmDenmD kj

kj

kj =

} for ( j=1,2,3) (6)

1 1 1

where kje }3,2,1,01,1{ 21 =>>> jee jj are set empirically. The inverse directional wavelet

transform (4) is used for the reconstruction back from the processed wavelet coefficients

together with approximation coefficients },,j),n,m(D),n,m(D~),n,m(D~),n,m(C{ jjj 3213213 = .

Daubechies [10] has proved that there is no symmetric orthogonal wavelet except for

Haar wavelet, so in the image compression and image processing applications

biorthogonal wavelets are in popular use. In this paper we use biorthogonal wavelets with

(5,3) taps.

The directional wavelet transform can strengthen the foreground strokes and weaken

the interference in the reconstructed image, as we can see from Figure 3.

(a) (b)

It is noticed

interference imag

Figure 4).

.
Figure 3. Final results of directional wavelet reconstruction
10

that the reconstructed foreground strokes is much darker than the

es. So unwanted noise can be easily removed after binarization (See

11

(a) (b)

2.2 Image Recovery

Directional wavelet transform produces a clean output; however some of the

foreground strokes become broken when interfering strokes that were intersecting with

these foreground strokes have been removed. And it is obvious that small pieces of

strokes may have extremely different orientations other than the majority and the system

will remove them together with the interference. To deal with this problem, the

reconstructed image from the above now serve as loci for us to recover streaks of gray

level images from the original document image, such that the neighboring pixels within a

7×7 window centered on each edge pixel are recovered. It may be perceived as that while

tracing along the edges, a small 7×7 window is opened up to view the original document

image (see Figure 5). The size of the window is based on the average width of the strokes

in the documents. This window is more reasonably defined for performing adaptive

thresholding. Through this recovery, isolated or broken foreground strokes are fully

restored. Figure 6 shows the restored images of foreground strokes. To improve the final

Figure 4. Reconstruction images after binarization.

12

appearance, the restored images are binarized using Niblack�s [11] method and the

resultant images are shown in Figure 7.

(a) (b)

(a) (b)

Figure 6. Restored foreground text images.

Figure 7. Final binarized output.

7

7

Edges

Figure 5. Recovery of the foreground strokes images from the original document image while tracing along edges.

13

3. Experiment Results And Discussion

The performance of our approach has been evaluated based on the scanned images of

historical handwritten documents from the National Archives of Singapore. About 200

images have been tested and found to produce readable outputs. For illustration purpose,

we choose 50 images that contain serious interference and manually count the following

numbers of words: words in the original documents, words that are fully restored, words

or parts of words from the interference, words that are impaired by the interference. The

two evaluation metrics: precision and recall [12] defined below are used to measure the

performance of the system.

detected wordsof no. Totaldetectedcorrectly wordsof No. Precision = (7)

document in the wordsof no. Totaldetectedcorrectly wordsof No. Recall = (8)

In equations (7) and (8), the total number of words detected refers to the words appearing

in the final output, some of which are the original words on the front side and some are

from the reverse side (interfering words). The total number of words in the document

refers to all the original words on the front side of a document image. If some words on

the front side are lost or not recovered properly in the resultant image, the whole word is

considered lost and not counted. If parts of a word from the reverse side appear, the total

number of words detected will be increased by 1. Precision shows how well the system

can remove the interfering strokes while recall is an indication of the performance of the

system in restoring the front page to its original state.

The evaluation of the 50 images is shown in table 1, and the first image with its final

binarized results are shown in Figure 8. Table 1 shows a high average precision and recall

14

of 87.5% & 92.4% respectively. By enhancing and smearing wavelet coefficients

sufficiently, almost all the original foreground strokes can be detected. However, in the

image recovery process, although the interfering strokes have already been removed, bits

and pieces of interfering strokes can still fall into the 7×7 window and remain as

interference in the foreground. On the other hand, a few strong interfering strokes are

erroneously regarded as the foreground strokes. These have thus prevented the system

from achieving a perfect recall and precision.

(a) (b)

Figure 8. Result of interference removal of the entire document page: (a) original image; (b) final binarized result.

15

Image 1 2 3 4 5 6 7 8 9 10 11 12 Total # words 188 204 162 166 247 146 186 187 194 172 113 124 Precision (%) 93.7 96.6 67.8 98.8 90.8 96 90 86.8 96.4 93.7 91.3 92.3

Recall (%) 94.1 97.1 72.2 100 91.9 98 92 88.2 96.9 95.3 92.9 96.8 Image 13 14 15 16 17 18 19 20 21 22 23 24

Total # words 180 173 157 150 99 113 120 115 109 124 127 139 Precision (%) 87.5 89.7 89.9 86.7 85 91.3 92 93.2 85.1 92.3 82 77.6

Recall (%) 93.3 96 96.1 91.3 91.9 92.9 95.8 94.8 89 96.8 89.8 87.1 Image 25 26 27 28 29 30 31 32 33 34 35 36

Total # words 173 180 211 175 95 150 157 159 91 78 98 48 Precision (%) 89.7 87.5 85.6 86.7 74.3 86.7 89.9 75 77.9 79.1 87 84

Recall (%) 96 93.3 92.9 93.1 82.1 91.3 96.2 84.9 81.3 87.1 96 87.5 Image 37 38 39 40 41 42 43 44 45 46 47 48

Total # words 147 109 113 199 130 135 124 149 127 40 66 117 Precision (%) 80.1 86.2 92.2 87.7 88.8 84.4 89.4 92.3 94.7 95.1 83.6 84.9

Recall (%) 87.8 97.2 94.7 89.9 97.9 88.1 95.2 96.6 97.6 97.5 92.4 91.5 Image 49 50 Average

Total # words 142 88 140 Precision (%) 77.5 89.5 87.5

Recall (%) 87.3 96.6 92.4

4. Conclusions And Future Work

In this paper we introduce a new method based on directional wavelet to remove

interference appearing in historical handwritten documents. This new algorithm improves

the appearances of the original documents significantly. Currently we are looking into the

development of a flexible method to work with strokes of arbitrary θ and more general

affine transform A other than constant multiply a unitary matrix in (2). This will then

allow other such historical documents with different stroke orientations to be applied with

the directional wavelet.

Table 1. Evaluation of image restoration results

16

Acknowledgement

This research is supported by a joint grant R-252-000-071-112/303 provided by the

Agency for Science, Technology and Research (A*STAR) and the Ministry of Education,

Singapore. We would like to thank the National Archives of Singapore, for the

permission to use their archival documents.

References

[1] H. Negishi, J. Kato, H. Hase and T. Watanabe, Character extraction from noisy

background for an automatic reference system, Proc. of 5th International

Conference on Document Analysis &Recognition, India, 1999, pp. 143-146.

[2] S.Liang, M. Ahmadi, A morphological approach to text string extraction from

regular periodic overlapping text/ background images, Graphical Models and

Image Processing. CVGIP, Vol. 56, No. 5, Sep.1994, pp. 402-413.

[3] M. S. Chang, S. M. Kang, W. S. Rho, H. G. Kim and D. J. Kim, Improved

binarization algorithm for document image by histogram and edge detection, Proc.

of 3rd International Conference on Document Analysis and Recognition, Canada,

1995, pp. 636-639.

[4] J. H. Jang and K. S. Hong, Binarization of noisy gray-scale character images by thin

line modeling, Pattern Recognition, 32, 1999, pp. 743-752.

[5] H. S. Don, A noise attribute thresholding method for document image binarization,

Proc. of 3rd International Conference on Document Analysis and Recognition,

Canada, 1995, pp. 231-234.

17

[6] Q. Wang, C. L. Tan, Matching of double-sided document images to remove

interference, IEEE Conference on Computer Vision and Pattern Recognition,

CVPR2001, Hawaii, USA, 8-14 Dec 2001.

[7] C. L. Tan, R. Cao, P. Shen, Restoration of archival documents using a wavelet

technique, IEEE Tansactions on Pattern Analysis and Machine Intelligence, Vol.24,

No. 10, October 2002, pp. 1399-1404.

[8] S Mallat, A wavelet tour of signal processing, Academic Press, 1998.

[9] R. R. Coifman and D. Donoho, Translation invariantde-noising, Technical Report,

475, Dept. of Statistics, Stanford University, May, 1995.

[10] I Daubechies, Ten lectures on wavelets, SIAM, Philadelphia, PA, 1992.

[11] W. Niblack, An introduction to digital image processing, Englewood Cliffs, N. J.,

Prentice Hall, pp. 115-116, 1986.

[12] M. Junker, R. Hoch and A. Dengel, On the evaluation of document analysis

components by recall, precision, and accuracy, Proc. of 5th International

Conference on Document Analysis and Recognition, India, 1999, pp. 713-716.

18

Appendix

Proof

We only prove the first equation of (3), all other equations in (3) and (4) can be proved in

same way.

We first prove the following equation,

∑∑ >−+−+−++−+<=+ l kjjj lhkhlknyxlkmyxyxfnmjC )()()))(()(

21())()(

21(

21),,(),(1 φφ (9)

According to the definition,

>−+−⋅−+⋅>=<=<+ )nyx()myx(),y,x(f)y()x(),y,x(f)n,m(jC jjjn,jm,j 221

221

21

1 11 φφφφ

We prove (9) holding for 2 cases according to the parity of nm + ,

1) If Znm ∈+2

, then Znm ∈−2

We do the coordinate transform for XY plain by affine

transform

−

=

=

1111

,''

:1 Ayx

Ayx

A , A is the matrix in (2) with

4/,2c πθ == , then,

)nm'y()nm'x()n)yx(()m)yx((

)'y,'x(g)y,x(f

jjj

Ajjj

A

22

22

2122

21

1

11

+−−−→−+−−+

→

−−−− φφφφ

thus,

>+−−−=<+−− )

2'2()

2'2(

21),','(),(1

nmynmxyxgnmjC jjj φφ

In new coordinate system, the conventional 2-D wavelet transform

for 3,2,1,),,)(~(),,)(~( =∈ kZjnmfDnmfC kjj can be defined as (1) with 2-D wavelets and

scaling function,

19

)()(),(

)()(),(

)()(),(

)()(),(

,,3

,,

,,2

,,

,,1

,,

,,,,

yxyx

yxyx

yxyx

yxyx

njmjnmj

njmjnmj

njmjnmj

njmjnmj

ψψ

ψφ

φψ

φφ

=Ψ

=Ψ

=Ψ

=Φ

. (2�)

A simple extension of Mallat�s work [8] to 2-D case leads to,

∑∑

∑∑∑∑

>++−+−−<=

><=

+++−=

+−=+

++

+−

+

l kjjj

l k knmjlnmj

l kj

j

lhkhknmylnmxyxg

lhkhyxyxg

lhkhknmlnmgC

nmnmgCnmjC

)()())2

(21())

2(

21(

21),,(

)()()()(),,(

)()()2

,2

)(~(

)2

,2

)(~(),(1

1

2,

2,1

1

11

φφ

φφ

We do the affine transform 11

−A

Then,

∑∑ >−+−+−++−+<=+l k

jjj )l(h)k(h)))lk(n()yx(())lkm()yx((),y,x(f)n,m(jC21

21

21

1 φφ

2) If Znm ∉+2

, then Znm ∈+±2

1

We do the coordinate transform for XY plain by affine

transform

−

=

+

=

1111

,01

''

:2 Ayx

Ayx

A , then,

)2

1'2()2

1'2(21))(2())(2(

21

)','(),(

2

22

++−+−−→−+−−+

→

−−−− nmynmxnyxmyx

yxgyxf

jjj

Ajjj

A

φφφφ

thus,

>++−+−−=<+−− )

21'2()

21'2(

21),','(),(1 2

nmynmxyxgnmjC jjj φφ

20

Similar to case 1), we apply the conventional 2-D wavelet transform defined in (1) and

(2�),

∑∑

∑∑∑∑

>+++−++−−<=

><=

+++++−=

+++−=+

+++

++−

+

l kjjj

l k knm,jlnm,j

l kj

j

)l(h)k(h))knm(y())lnm(x(),y,x(g

)l(h)k(h)y()x(),y,x(g

)l(h)k(h)knm,lnm)(gC~(

)nm,nm)(gC~()n,m(jC

21

21

21

21

21

21

21

21

21

1

2

21

212

2

21

φφ

φφ

We do the affine transform 12

−A

Then,

∑∑ >−+−+−++−+<=+l k

jjj )l(h)k(h)))lk(n()yx(())lkm()yx((),y,x(f)n,m(jC21

21

21

1 φφ

So in both cases, (9) holds, from (9) we have,

∑∑∑∑∑∑

−+++=

>Φ<=

>−+−++−<=+

−+++

l kj

l klkn,lkm,j

l kjjj

)l(h)k(h)lkn,lkm(C

)l(h)k(h),y,x(f

)l(h)k(h))lkn(y())lkm(x(),y,x(f)n,m(jC 11 21

21

21

1 φφ

That is the first equation of (3). That completes the proof.