High-capacity reversible data hiding in binary images using pattern substitution

8
High-capacity reversible data hiding in binary images using pattern substitution Yu-An Ho a , Yung-Kuan Chan b , Hsien-Chu Wu c, , Yen-Ping Chu d a Department of Computer Science, National Chung Hsing University, No. 250, Kuokuang Rd., Taichung 402, Taiwan, ROC b Department of Management Information Systems, National Chung Hsing University, No. 250, Kuokuang Rd., Taichung 402, Taiwan, ROC c Department of Computer Science and Information Engineering, National Taichung, Institute of Technology, No.129, Sec. 3, San-min Rd, Taichung 404, Taiwan, ROC d Computer Science and Information Engineering, Tunghai University,181, Section 3, Taichung Port Road, Situn District, Taichung City, Taiwan, ROC abstract article info Article history: Received 24 March 2007 Received in revised form 12 September 2008 Accepted 28 September 2008 Available online 28 October 2008 Keywords: Data hiding Reversible data hiding Pattern substitution Data hiding, as the term itself suggests, means the hiding of secret data in a cover image. The result is a so- called stego-image. Reversible data hiding is technique, where not only the secret data can be extracted from the stego-image, but the cover image can be completely rebuilt after the extraction of secret data. Therefore, reversible data hiding is the choice in cases of secret data hiding, where the recovery of the cover image is required. In this paper, we propose a high-capacity reversible data hiding scheme based on pattern substitution. Our scheme gathers statistical data concerning the occurrence frequencies of various patterns and quanties the occurrence frequency as it differs from pattern to pattern. In this way, some pattern exchange relationships can be established, and pattern substitution can thus be used for data hiding. In the extraction stage, we reverse these patterns to their original forms and rebuild an undistorted cover image. Our experimental results demonstrate the practicability of the proposed method. In fact, our new scheme gives a better performance than pair-wise logical computation (PWLC) in terms of both hiding capacity and stego-image visual quality. © 2008 Elsevier B.V. All rights reserved. 1. Introduction In recent years, data hiding techniques [1] have been playing an important role in information transmission security and data authentication. Data hiding delivers an encrypted secret message hidden in a cover image. The image with the encrypted message is known as a stego-image. The sender hides the encrypted message in the cover image and sends the resultant stego-image to a receiver via the Internet or other transmission media, and the receiver receives the stego-image and extracts the secret message, using the corresponding decrypting method. Such secret messages can be authentication data for the cover images or other important data. So far, quite a number of data hiding techniques [27] have been proposed, but unfortunately those methods are suitable for gray or color images only. In comparison with gray and color images, binary images have less bit planes where secret data can be hidden. Data hiding techniques for binary images from the literature include methods of dithering patterns [8], line spacing and character spacing [9], and line shifting, word shifting or feature shifting coding [10,11]. Additionally, Wu et al. [12] offered a data hiding method for binary images that manipulates ippablepixels to create specic block-based relationships for embedding a signicant amount of data. The hiding capacity of Wu's method depends on the size of the image block. The methods of data hiding mentioned above can, however, cause permanent distortion of the cover image. In other words, the cover image becomes distorted as it transforms into the stego-image because of the embedding of the secret message. The original cover image cannot be fully restored from the stego-image after the secret message extraction. Distinct from data hiding methods that do permanent distortion to the stego-image, data hiding techniques that can fully recover the original image from the stego-image are known as reversible data hiding techniques. In authentication cases where medical images, ofcial documents, signatures or legal documents are involved, after successfully extracting the secret message, the receiver has to reconstruct the original image to the last detail without any distortion visible to the stego-image. For example, if diagnosis and medical treatment depend on an image, the distortion of the image can mean a serious, sometimes even fatal difference. Therefore, to ensure that data-tampering problems are avoided during delivery, secret messages should be properly hidden and then extracted for authentication. The stego-image that has passed authentication should be able to be fully recovered. Obviously, the data hiding techniques mentioned do not qualify as reversible data hiding techniques. Recently, some reversible data hiding techniques have been released. In the spatial domain, Honsinger et al. [13] used modulo 256 addition to embed the hash value of the original image. Another spatial domain technique was proposed in [14], where lossless compression of some selected bit planes left space for data embedding. Macq et al. [15] developed another technique in the transform domain based on a lossless multi-resolution transformation. Computer Standards & Interfaces 31 (2009) 787794 Corresponding author. Fax: +886 422196311. E-mail addresses: [email protected] (Y.-A. Ho), [email protected] (Y.-K. Chan), [email protected] (H.-C. Wu), [email protected] (Y.-P. Chu). 0920-5489/$ see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.csi.2008.09.014 Contents lists available at ScienceDirect Computer Standards & Interfaces journal homepage: www.elsevier.com/locate/csi

Transcript of High-capacity reversible data hiding in binary images using pattern substitution

Computer Standards & Interfaces 31 (2009) 787–794

Contents lists available at ScienceDirect

Computer Standards & Interfaces

j ourna l homepage: www.e lsev ie r.com/ locate /cs i

High-capacity reversible data hiding in binary images using pattern substitution

Yu-An Ho a, Yung-Kuan Chan b, Hsien-Chu Wu c,⁎, Yen-Ping Chu d

a Department of Computer Science, National Chung Hsing University, No. 250, Kuokuang Rd., Taichung 402, Taiwan, ROCb Department of Management Information Systems, National Chung Hsing University, No. 250, Kuokuang Rd., Taichung 402, Taiwan, ROCc Department of Computer Science and Information Engineering, National Taichung, Institute of Technology, No. 129, Sec. 3, San-min Rd, Taichung 404, Taiwan, ROCd Computer Science and Information Engineering, Tunghai University, 181, Section 3, Taichung Port Road, Situn District, Taichung City, Taiwan, ROC

⁎ Corresponding author. Fax: +886 4 22196311.E-mail addresses: [email protected] (Y.-A. Ho), y

(Y.-K. Chan), [email protected] (H.-C. Wu), ypchu@thu

0920-5489/$ – see front matter © 2008 Elsevier B.V. Adoi:10.1016/j.csi.2008.09.014

a b s t r a c t

a r t i c l e i n f o

Article history:

Data hiding, as the term its Received 24 March 2007Received in revised form 12 September 2008Accepted 28 September 2008Available online 28 October 2008

Keywords:Data hidingReversible data hidingPattern substitution

elf suggests, means the hiding of secret data in a cover image. The result is a so-called stego-image. Reversible data hiding is technique, where not only the secret data can be extracted fromthe stego-image, but the cover image can be completely rebuilt after the extraction of secret data. Therefore,reversible data hiding is the choice in cases of secret data hiding, where the recovery of the cover image isrequired. In this paper, we propose a high-capacity reversible data hiding scheme based on patternsubstitution. Our scheme gathers statistical data concerning the occurrence frequencies of various patternsand quantifies the occurrence frequency as it differs from pattern to pattern. In this way, some patternexchange relationships can be established, and pattern substitution can thus be used for data hiding. In theextraction stage, we reverse these patterns to their original forms and rebuild an undistorted cover image.Our experimental results demonstrate the practicability of the proposed method. In fact, our new schemegives a better performance than pair-wise logical computation (PWLC) in terms of both hiding capacity andstego-image visual quality.

© 2008 Elsevier B.V. All rights reserved.

1. Introduction

In recent years, data hiding techniques [1] have been playing animportant role in information transmission security and dataauthentication. Data hiding delivers an encrypted secret messagehidden in a cover image. The image with the encrypted message isknown as a stego-image. The sender hides the encrypted message inthe cover image and sends the resultant stego-image to a receiver viathe Internet or other transmissionmedia, and the receiver receives thestego-image and extracts the secret message, using the correspondingdecrypting method. Such secret messages can be authentication datafor the cover images or other important data. So far, quite a number ofdata hiding techniques [2–7] have been proposed, but unfortunatelythose methods are suitable for gray or color images only.

In comparisonwith gray and color images, binary images have lessbit planes where secret data can be hidden. Data hiding techniques forbinary images from the literature include methods of ditheringpatterns [8], line spacing and character spacing [9], and line shifting,word shifting or feature shifting coding [10,11]. Additionally, Wu et al.[12] offered a data hiding method for binary images that manipulates“flippable” pixels to create specific block-based relationships forembedding a significant amount of data. The hiding capacity of Wu'smethod depends on the size of the image block.

[email protected] (Y.-P. Chu).

ll rights reserved.

The methods of data hiding mentioned above can, however, causepermanent distortion of the cover image. In other words, the coverimage becomes distorted as it transforms into the stego-imagebecause of the embedding of the secret message. The original coverimage cannot be fully restored from the stego-image after the secretmessage extraction. Distinct from data hiding methods that dopermanent distortion to the stego-image, data hiding techniquesthat can fully recover the original image from the stego-image areknown as reversible data hiding techniques. In authentication caseswhere medical images, official documents, signatures or legaldocuments are involved, after successfully extracting the secretmessage, the receiver has to reconstruct the original image to thelast detail without any distortion visible to the stego-image. Forexample, if diagnosis and medical treatment depend on an image, thedistortion of the image can mean a serious, sometimes even fataldifference. Therefore, to ensure that data-tampering problems areavoided during delivery, secret messages should be properly hiddenand then extracted for authentication. The stego-image that haspassed authentication should be able to be fully recovered.

Obviously, the data hiding techniques mentioned do not qualify asreversible data hiding techniques. Recently, some reversible data hidingtechniques have been released. In the spatial domain, Honsinger et al.[13] used modulo 256 addition to embed the hash value of the originalimage. Another spatial domain technique was proposed in [14], wherelossless compression of some selected bit planes left space for dataembedding. Macq et al. [15] developed another technique in thetransform domain based on a lossless multi-resolution transformation.

Fig. 1. Data hidden by PWLC: (a) example of hiding “0”; (b) example of hiding “1”.

Fig. 2. The flowchart of the proposed method.

788 Y.-A. Ho et al. / Computer Standards & Interfaces 31 (2009) 787–794

Finally, Ni et al. [16] utilized the zero or the minimum point grayscalevalues to embeddata in virtually all types of images. The developmentofthese techniques has demonstrated the importance and practicability ofreversible data hiding. Yet, for binary images, Tsai et al. [17] proposed areversible data hiding mechanism based on pair-wise logical computa-tion (PWLC). Tsai's mechanism can perform a lossless reconstruction ofthe cover image without utilizing any information from the originalcover image.

In this paper, we propose a novel reversible data hiding techniquefor the binary image that is capable of both successful extraction of

Fig. 3. The results of the patterns against the original image pixels in D: (a) P0

Fig. 4. The results of the patterns against the original image pixels in D: (a) P

secret data from a stego-image and rebuilding the binary imagecompletely after the extraction of secret data. Our new method isbased on pattern substitution (PS). Adaptive patterns exchangerelationships can be established and embed secret data into thebinary image. In this way, the visual quality of the stego-image issuperior to those produced by traditional approaches under the sameembedding scale. Additionally, the data hiding capacity of our newmethod proves to be higher.

The remainder of this paper is organized as follows. The PWLCapproach will be reviewed in Section 2. Then, in Section 3, our new

000; (b) P0001; (c) P0010; (d) P0100; (e) P1000; (f) P0011; (g) P0110; (h) P1100.

1001; (b) P1010; (c) P0101; (d) P0111; (e) P1110; (f) P1101; (g) P0011; (h) P1111.

Fig. 6. The secret header is composed of PM, PF, K, and PFP.

Table 1The result of the patterns separated into two groups.

Group 1 Group 2

{P0000, P0011, P0110, P1100, P1001, P1010, P0101,P1111}

{P0001, P0010, P0100, P1000, P0111, P1110, P1101,P1011}

Fig. 7. The extra information and the secret data itself.

789Y.-A. Ho et al. / Computer Standards & Interfaces 31 (2009) 787–794

method of data hiding and data extractionwill be introduced in detail,followed by our experimental results presented in Section 4. Finally,we conclude in Section 5.

2. Pair-wise logical computation

In a binary image, one single bit is enough to indicate whether apixel is black or white. Most commonly, a black pixel appears wherethe image foreground or the focus is, while a white pixel usuallyappears as a part of the background. If the result of secret data hidingis some black pixels turned into white ones or some white pixelschanged into black ones, then the damage done to the cover image isusually quite serious. To avoid such damage, Tsai et al. [17] proposed apair-wise logical computation based approach for reversible datahiding. The approach is called PWLC data hiding.

First, the pair-wise logical computation approach transfers secretdata into a bit stream. Suppose that the bit stream is composed of Mbits. Add a bit “1” to the front of each bit of the stream, and the bitstream becomes one with 2M bits. Then, we combine every pair ofbits, and the bit stream becomes Sp p a {0,1,..., N} where there are Npairs of bits. In this pair-wise bit stream, the first place of every pair iscalled Sp1, and the second place is called Sp2. Because the first placecan be nothing else but a fixed interleaved “1,” only two possibilitiesexist for Sp: one is “10,” and the other is “11”.

Then, it is time to locate secret bit patterns that can fit into the coverimage from left to right and top to bottom. Here, the edges of the imageneed to be identified; in other words, themiddle pairsHp in “011111” and“100000” are to bemanipulated, as shown in Fig.1. These twopairsHp are“11” and “00”. Using Eq. (1) below, the stego-image can be accomplished.

SpPHp = HVp; ð1Þ

where ⊕ stands for an exclusive-or operation, the secret pair is Sp a

{10,11} , the cover pair is Hp {00,11} and the stego-image pair is H'p a{10,11,01,00}.

Fig. 5. Data hidden by pattern substitution against the original image pixels in D: (a) P0001 toP1110; (g) P1000 to P1011; (h) P1000 to P1101; (i) P0011 to P1111.

For extracting secret data, H'p can be found first. By Eqs. (2) and(3) Sp2 = HVp1PHVp2, all sequences Sp2 can be directly extractedand rebuild original secret data. Eqs. (2) and (3) explains theextraction process.

Sp1PHp1 = HVp1Z1PHp1 = HVp1ZHp1 = HVp1 = Hp2; ð2Þ

Sp2PHp2 = HVp2Z Sp2 = HVp1PHVp2; ð3Þ

To restore the cover image, Eq. (4) can be used to operate H'p andachieve Hp. Because Hp1=Hp2, the entire binary image can finally berebuilt.

Hp1 = Hp2 = HVp1; ð4Þ

Obviously, PWLC is capable of reversible data hiding. However, thenumber of patterns it can find is small, and therefore, the number ofbits it can hide is less. If too many secret bits are forced in, the resultwill be the obvious destruction of the cover image. To make adifference, in the next section, we present the new method we havedeveloped that offers a greater hiding capacity withmuch less damagedone to the cover image.

3. Novel technique for reversible data hiding

Modifying or adding a pixel on the edge of a binary image is notdiscernible to human eyes. Therefore, this paper explores utility

P0111; (b) P0001 to P1110; (c) P0001 to P1011; (d) P0001 to P1101; (e) P1000 to P0111; (f) P1000 to

Table 2In five test images, we divide images into non-overlapping patterns and calculate thenumbers by using four kinds of PM and PF pairs.

(PM, PF) Aerobics Biker Doves Mixed text English text

(P0001, P0111) (222, 0) (757, 0) (1211, 10) (814, 141) (729, 121)(P1000, P1110) (295, 0) (768, 1) (1355, 9) (781, 124) (694, 101)(P0011, P1111) (2, 0) (41, 0) (115, 1) (798, 111) (888, 45)(P1100, P1111) (4, 0) (37, 0) (139, 1) (595, 111) (751, 45)

Table 3In the five test images, we find corresponding patterns by overlapping and bycalculating the numbers of the four kinds of PM and PF pairs.

(PM, PF) Aerobics Biker Doves Mixed text English text

(P0001, P0111) (1016, 0) (3036, 1) (4784, 12) (2393, 184) (2402, 331)(P1000, P1110) (1018, 2) (3077, 7) (4937, 32) (2228, 338) (2219 429)(P0011, P1111) (9, 0) (114, 0) (476, 0) (2404, 60) (3242, 48)(P1100, P1111) (8, 0) (129, 0) (507, 3) (2240, 137) (3089, 147)

When we discover a corresponding pattern, we search the remaining places for it.

Table 4Assume that the size of the secret message is 1000 bits.

Methods Worst PSNR Average PSNR Best PSNR

PWLC 39.22 40.47 42.23PS 42.23 45.24 ∞

The PSNR values of the worst case, the average case, and the best case are analyzed asfollows.

790 Y.-A. Ho et al. / Computer Standards & Interfaces 31 (2009) 787–794

image flipping to find the boundaries where black and white pixelsmeet. This is called image differencing D [18]. Modifying theperimeters of these pixels can keep the changes insensible. Wecombine four pixels that form pattern P to get pattern Q becausethey replace each other, without causing serious image modification.The characteristic of Q is that it appears occasionally in the patternsof image D. Finally, we locate the places corresponding to pattern Por Q in the image differencing D. When hiding “1,” we choose apattern P; when hiding “0,” we choose a pattern Q. This way, thesecret bits can be successfully hidden with no remaining trace. Fig. 2is the flowchart of the proposed method.

3.1. Image differencing

Image differencing [18] is a method used to identify the edges of abinary image.

It records the flipping places of the black and white pixels as “1,”and all of the other places as “0.” Therefore, it utilizes Eq. (5) to formimage D. In image D, “1” is shown as a black pixel. In fact, the black

Fig. 8. Binary images of general picture

pixels in image D replace the places in the original binary imagewhere boundaries between white and black pixels occur.

D j; ið Þ =H j; ið Þ if j = 1 and i = 1H j; ið ÞPH j − 1; ið Þ if j ≠ 1 and i = 1H j; ið ÞPH j; i − 1ð Þ otherwise

;

8<: ð5Þ

In Eq. (5) above,⊕ represents the Exclusive-OR logic operation, and jand i represent the rows and columns of the cover image, respectively.Drepresents image differencing and H is the original binary image.

3.2. Pattern substitution

In this section, we illustrate pattern substitution. After the imagedifferencing, discussed in Section 3.1, image D acquires two char-acteristics. One is that “1” less frequently occurs as a continuouspattern, unless the black and white pixels interlace continuously inthe original binary image. Due to this characteristic, a large number ofcontinuous sets of 4 pixels can be picked out from image D as patterns.Sixteen pairs exist, and the changes of these patterns compared to theoriginal image pixels are shown in Figs. 3 and 4. In these patterns, ifthe number of “1” is greater, the occurrence probability is lower.

The other characteristic is that if only one bit in D is modified, allthe pixels of the rebuilt image following the modification will bedifferent from their originals in the initial image. As a result,modifying a bit of a pattern caused serious image rebuilding damage.Therefore, if a pattern has an odd number of “1” bits, the number of “1”bits must remain odd after the modification; likewise, if a pattern hasan even number of “1” bits, the number of “1” bits also has to remaineven after modification. Table 1 shows the result of separating 16kinds of patterns. As the table suggests, only 4 pixels, at most, will turnout different if each of the two pairs of patterns are exchanged.Otherwise, serious damagewould take place if patterns are exchangedin groups.

Among these patterns, we take two as a pair, where thesubstitution difference is small. When hiding “1,” we choose one ofthe two patterns; when hiding “0,”we choose the other pattern. First,as Table 1 shows, we put the patterns that have odd numbers of “1”bits into one group, and then all the other patterns, whose numbers of“1” bits must be even, are assigned to the other group. If we take twopatterns in the same group as a pair, the property of this pair is thatone pattern PM is more and PF is less or even none. Such a pair ofpatterns can be found, using the property of image D; if the amount of“1” is more, the appearing probability is less.

For example, suppose we utilize P0001 from the second group tolocate the patterns with fewer numbers in the same group, then onlythree “1” patterns fit this condition. A part of the exchange results areshown in Fig. 5. As Fig. 5(a) shows, we find that if P0001 is replaced byP0111, only one bit in a cover image H will be modified. According to

s: (a) Aerobics (b) biker (c) Doves.

Table 5The PSNR value at the maximum data hiding capacity.

Images PWLC PS-K PS-E

MDHC PSNR MDHC PSNR NDHC PSNR

Aerobics 960 40.55 1016 44.94 1016 44.94Biker 2723 36.11 3036 40.41 2999 40.41Doves 3857 34.61 4784 38.46 4571 38.46Mixed text 1857 37.75 2404 41.26 1423 41.26English text 1742 38.02 3242 40.13 2454 40.13

791Y.-A. Ho et al. / Computer Standards & Interfaces 31 (2009) 787–794

Fig. 5(b), if P0001 is replaced by P1110, three bits in H will be modified.On the other hand, utilizing P1000 to search, from Fig. 5(f), we find thatif P1000 is replaced by P1110, then only one bit in H will have to bemodified. Fig. 5(e) demonstrates the situation that occurs if P1000 isreplaced with P0111: only two bits in H will go through modification.Then, as Fig. 5(i) shows, if P0011 is replaced with P1111, only one bit willneed to be modified. From the above observations, we learn that foreach pattern, we can always pick out another pattern that is modifiedthe least and then pair up the two to indicate whether “0” or “1” ishidden.

After choosing a pair of patterns, we are ready to scan image Dfrom left to right and from top to bottom. If it fits PM (more numbers)and PF (fewer numbers), it is recorded as a hiding place. Afterscanning, we obtain PM and PF, which add up to be the MDHC(maximum data hiding capacity) [17] of the pair. Then the Xcoordinate and Y coordinate of each PFi=xi|yi is recorded and putdown in the form PFP=x1|y1|…|xn|yn. After that, a pseudo randomnumber generator is employed to generate a key to decide where thesecret data S can be hidden in MDHC. Then, PFP and the key can bedelivered or embedded into the image.

3.2.1. Pattern substitution by pseudo random number generator (PS-K)For security reasons, the secret message is hidden and extracted in

an encrypted form, and so the receiver needs the right key to decryptit after successfully extracting the encrypted secret message. Weutilized a pseudo random number generator to produce a key K torecord the hiding places. Assuming MDHC was 10 bits, generatorwould produce a sequence of 10 bits with the initial value being“0000000000.” If the secret bit stream has four bits “1011,” then wepick up four hiding places, according to K, shown as “1.” These four “1”bits in the bit stream “1010001001” are the hiding places. In otherwords, the hiding places are the first, the third, the seventh and thetenth bit. These places are PM or PF, depending onwhether the hiddenvalue is “1” or “0.” Finally, with PM, PF, K, and PFP put together, nowwe have a delivery secret header SH, as shown in Fig. 6. Theinformation SH has to be encrypted and delivered to the receiver sothat the receiver can solve the secret message step by step. This way,security can be ensured.

3.2.2. Pattern substitution by embedding secret header (PS-E)With the secret header embedded in, the receiver only needs to

know through which PM or PF the secret message can be extracted. Toextract themessage directly, the receiver needs to know the size of thesecret data and the number of PFs in PFP. The bits required are shownas SDB and PFPB. Moreover, the secret data SD and PFP are combinedto form new secret data NSD. Fig. 7 shows all the data formsembedded.

Then, to hide SDB, PFPB and NSD into D, MDHC has to be largerthan SDB, PFPB and NSD combined. First, we take a fixed size of the

Fig. 9. Binary images of general documents: (a) English text (b) Mixed text.

front part of MDHC to hide SDB and PFPB. Then, following Eq. (6), wefind the location of the hiding gap HG in the remaining part of MDHC.This is because the hiding places were decided by the hiding gap. Forexample, if HG=2, the hiding places are the first bit, the third, thefifth, etc.

Hidden gap = t MDHC − SDB − PFPBð Þ = NSDð Þb ð6Þ

The purpose of HG is to hide the secret data by dispersing it intothe image. This way, the secret data broken into pieces can furtherincrease the security level.

We provide two methods for users to choose their requirements.The PS-K method delivers the encrypting secret header to receivers.Then, the receivers extract the secret data by decrypting the secretheader from the stego-image, and rebuilding the original binaryimage. This method is the safest. The PS-E method proves that oursecret header does not require too much space, and can embed intothe cover image directly. Receivers can extract the secret header andsecret data directly, and can rebuild the original binary image.

3.3. Data extraction and recovery

To extract the secret data from the stego-image in PS-K, thereceiver extracts the fixed size PM and PF after decrypting SH from thesecure channel. In our experiment, the first four bits are recorded asPM, and the following four bits are recorded as PF. Scanning theimage with this pair of patterns, we obtain the maximum data hidingcapacity, MDHC. Then, using K, we determine the hiding places, andSD can be extracted. In fact, with the PF confirmed, we can not onlyextract SD, but also modify PF to PM, because most parts are hiddenaccording to PM. Finally, following the direction of PFP, we can extractevery X and Y coordinate of each PFi. In our experiment, 16 bits as aunit are required to extract each PFi.

The first 8 bits are X coordinates, and the subsequent 8 bits are Ycoordinates. According to the extracted coordinates, all PFs can berecovered.

In PS-E, PM and PF can be delivered through either a public methodor a private secure channel to the receiver. Based on PM and PF, thestego-image can be scanned to achieve theMDHC. Then, the fixed sizeSDB and PFPB can be extracted from the front part of the extractedsecret data. The size of SD can be obtained by SDB, and the number ofPFs by PFPB. With MDHC, SDB, and PFPB, we can get HG, using Eq. (6).BecauseHG is used as an extracting distance indicator, according to the

Table 6PSNR comparisons between PWLC and PS at the same hiding capacity levels.

Images PWLC PS-K PS-E

PSNR (dB) PSNR (dB) PSNR (dB)

Aerobics (240 bits) 46.62 51.30 51.30Biker (680 bits) 42.06 46.67 46.51Doves (964 bits) 40.53 45.13 44.30Mixed text (464 bits) 43.79 48.56 42.41English text (433 bits) 44.08 48.89 44.18

Fig. 11. Stego-image comparison: PWLC (a)(c) and PS-K (b)(d) with general documentbinary images: (a)(b)secret size is 464 bits; (c)(d)secret size is 433 bits.

Fig. 10. Stego-image comparison between PWLC (a)(c)(e) and PS: PS-K (b)(d)(f) withgeneral pictures— (a)(b)secret size is 240 bits; (c)(d)secret size is 680 bits; (e)(f)secretsize is 964 bits.

792 Y.-A. Ho et al. / Computer Standards & Interfaces 31 (2009) 787–794

size of SDB and the number of PFPB, it can be used to extract SDB andPFP. During the extraction of SD, if the pattern conforms to PF, notonly can SD can be extracted, but also PF can bemodified to PM. Finally,according to PFP, every X and Y coordinate of each PFi can be extracted.

3.4. Data hiding capacity

In this section, we discuss the maximum data hiding capacity(MDHC) and evaluate the worst case scenario of the visual quality ofthe stego-image. Generally, there are twomethods for hiding patterns.One is a fixed place non-overlapping method, and the other finds acorresponding pattern through overlapping, searching from one placeto the next for the pattern. In this paper, we adopt the secondmethod.We search every bit from the left to the right and from top to bottom tofind a suitable pattern. Tables 2 and 3 show the statistics of PM and PFpairs, which are searched using the two aforementioned methods inthe five test images. We demonstrate we can create more patterns andincrease hiding capacity in this way.

The second method requires two rules to avoid the overlappingproblem in both PM and PF. For example, there is a continued bitstream “0000011111.” We get a pattern P0001 at first, and assume thatP0001 is substituted by P0111 to hide the secret data and get a new bit

stream “0001111111.” In decryption, we make the mistake to extractP0001 as the first pattern. To avoid the above situation, though P0001 isestablished, we can search P0111 behind the P0001. If the P0111 isdetermined, thenwe use P0111 to hide secret data. Therefore, in the bitstream “0000011111,” we use the P0111 as the first pattern used forhiding secret data. Then, we take the P0111 substituted by P0001, andget a new bit stream “0000000111.” After decryption, though P0111 hasbeen established, it actually hides the P0001 pattern. To avoid theabove two conditions, we define two rules for hiding data:

Rule 1 When the PM and PF are overlap, take PF as the hidingpattern.

Rule 2 If “1” occurs five times continuously, the“11111” is anunacceptable hiding place.

According toTsai et al., the maximum data hiding capacity (MDHC)of their PWLC [17] data hiding method can be calculated as:

MDHC V M × Nð Þ= 6

Here M is the width, and N is the height of the cover image.Comparatively, the method used in this paper provides a higher

maximum data hiding capacity (MDHC). It can be calculated as:

MDHC V M × Nð Þ= 4

As for the visual quality of the stego-image, when PWLC modifiesone or two bits, our method needs none or only one bit modified.Table 4 assumes that the size of the secret message is 1000 bits andanalyzes the PSNR values of the worst case, the average case, and thebest case scenario. In the worst case, with PS, 1000 secret bits mayhave to be modified, which can be translated, using Eq. (8) to beMSE=3.89. We further attain 42.23 db through Eq. (7). By contrast,for PWLC, the worse case with 1000 secret bits to be embedded is that

Table 7Reorganization results by OCR after secret message hiding by PS and PWLC.

Images Recognition (%)

Cover image PWLC PS-K PS-E

Mixed text (464 bits) 92.13% 66.54% 80.31% 53.15%English text (433 bits) 98.05% 79.77% 95.33% 89.88%

793Y.-A. Ho et al. / Computer Standards & Interfaces 31 (2009) 787–794

up to 2000 bits may need modification. Translated using Eq. (8),MSE=7.78. We then put that figure into Eq. (7) and get 39.22 db. Bythe same token, the best and average case can also be estimated.Table 4 shows these analyses.

The PSNR value can be calculated as follows:

PSNR = 10log10255ð Þ2MSE

dB; ð7Þ

where MSE can be computed as

MSE =1

M × N

� �XMi=1

XNj=1

bij−bVij� �2

× 255 ð8Þ

HereM and N are the cover image's width and height, bij stands forthe pixel value of the original binary image in (i, j) and b'ij representsthe pixel value after modifications (i, j).

4. Experimental results

Compared with the well-accepted PWLC method, the experimen-tal results of themethod proposed in this paper show that our methodcan offer not only a higher maximum data hiding capacity but alsobetter visual quality. Fig. 8 shows three 256×256 binary images thatwe used in our experiments as test images [19]. Fig. 8(a) is a simplebinary image, Fig. 8(c) is a complex binary image and Fig. 8(b) is abinary imagewhose complexity is between that of Fig. 8(a) and that ofFig. 8(c). We also used some binary images of text data as test images,which are shown in Fig. 9. We used a binary image of an Englishdocument (see Fig. 9(a)) and a binary image of a document in Chinesemixed with English (see Fig. 9(b)).

By choosing PM and PF, we obtain the MDHC of each image andprobe into the PSNR value after hiding the maximum capacity. Table 5shows the PSNR and MDHC of both PWLC and PS. As the results show,the PS-Kmethod gave better performances in terms of both the hidingcapacity and the PSNR value than PWLC. Due to the embedding ofsome extra information such as PFP, a drop occurred in MDHC.However, PS-E'sMDHCwas lower than that of PWLC only in themixedtext case, because SDB, PFPB, and PFP had to be subtracted fromMDHC. In this case, SDB was 13 bits, PFPB was 8 bits, and PFP turnedout to be 60×16=960bits.

As a result, the final MDHC value was 2404-13-8-960=1423 bits.Table 6 shows the PSNR performances of both PWLC and PS at the

same hiding capacity levels. As the results reveal, the PSNRperformance of the PS-K method was always better than that of thePWLC no matter which test image was used. As for the PS-E method,on the other hand, it also did better than PWLC with all but the mixedtext image. In the mixed text case, PS-E had to record so much extrainformation that it gave a bad PSNR performance. Figs. 10 and 11 showthat the visual results of PS-K are more undetectable than those ofPWLC at the same hiding capacity level.

From Table 7, we learn that the recognition rate of the Englishdocument provided by our PS method is better than that of PWLC.

However, to the mixed text, our PS-E gave a poor recognition ratebecause there was so much extra information hidden. The recogni-tion rate (RR) can be calculated, using Eq. (9). The comparisonresults show that PS-K is the best choice when secret messages areto be hidden in images of text data with high expectations for goodvisual quality.

RR =AW − EW

AW× 100k ð9Þ

Here AW is the total number of words, and EW is the total numberwords misrecognized by OCR.

5. Conclusion

In this paper, we proposed a reversible data hiding method withpattern substitution. The proposed method can recover the coverimage after the secret message is extracted. The quality of the stego-image is greatly improved, and the maximum data hiding capacity isincreased.

Experimental results show that with the same secret informa-tion hidden, our new method is capable of offering better stego-image visual quality than PWLC. Using our new PS method, evenwhen the amount of secret data hidden is extremely large, the PSNRvalue can remain around 40dB. Additionally, when cover images oftext data are used, the recognition rate of the stego-image is alsogreat.

References

[1] W. Zeng, Digital watermarking and data hiding: technologies and applications,Proc. Int. Conf. Inf. Syst., Anal. Synth., vol. 3, 1998, pp. 223–229.

[2] C.C. Thien, J.C. Lin, A simple and high-hiding capacity method for hiding digit-by-digit data in images based on modulus function, Pattern Recognition 36 (13)(2003) 2875–2881.

[3] C.K. Chan, L.M. Cheng, Hiding data in images by simple LSB substitution, PatternRecognition 37 (3) (2004) 469–474.

[4] J. Wang, L. Ji, A region and data hiding based error concealment scheme for images,IEEE Transformations on Consumer Electronics 47 (2) (2001) 257–262.

[5] R.Z. Wang, C.F. Lin, J.C. Lin, Image hiding by optimal LSB substitution and geneticalgorithm, Pattern Recognition 34 (3) (2001) 671–683.

[6] S.L. Li, K.C. Leung, L.M. Cheng, C.K. Chan, A novel image-hiding scheme based onblock difference, Pattern Recognition 39 (6) (2006) 1168–1176.

[7] Y.K. Lee, L.H. Chen, An adaptive image steganographic model based on minimum-error LSB replacement, Proceedings of the Ninth National Conference onInformation Security, Taichung, Taiwan, 14–15, 1999, pp. 8–15, May.

[8] K. Matsui, K. Tanaka, Video-steganography: how to secretly embed a signature in apicture, Proc. IMA Intellectual Property Project, 1, No. 1, 1994.

[9] N.F. Maxemchuk, S. Low, Marking text documents, Proc. IEEE. ICIP'97, 1997.[10] J. Brassil, S. Low, N. Maxemchuk, L. O'Gorman, Electronic marking and identifica-

tion techniques to discourage document copying, IEEE Journal on Selected Areas inCommunications 13 (8) (1995) 1495–1504.

[11] S.H. Low, N.F. Maxemchuk, J.T. Brassil, L. O'Gorman, Document marking anidentification using both line and word shifting, Proceedings of Infocom, Boston,MA, 1995, pp. 853–860.

[12] M. Wu, B. Liu, Data hiding in binary image for authentication and annotation, IEEETransactions on Multimedia, 6, No. 4, 2004, pp. 528–538, Aug.

[13] C.W. Honsinger, P. Jones, M. Rabbani, J.C. Stoffel, “Lossless Recovery of anOriginal Image Containing Embedded Data,” U.S. Patent. 6 278 791 B1, Aug. 21,2001.

[14] J. Fridrich, M. Goljan, R. Du, Invertible authentication, Proc. SPIE SecurityWatermarking Multimedia Contents, San Jose, CA, 2001, pp. 197–208, Jan.

[15] B. Macq, F. Deweyand, Trusted headers for medical images, DFG VIII-D IIWatermarking Workshop, Erlangen, Germany, 1999, Oct.

[16] Z. Ni, Y.Q. Shi, N. Ansari, W. Su, Reversible data hiding, IEEE Transactions on Circuitsand System for Video Technology, 16 No. 3, 2006, pp. 354–362, March.

[17] C.L. Tsai, H.F. Chiang, K.C. Fan, C.D. Chung, Reversible data hiding and losslessreconstruction of binary images using pair-wise logical computation mechanism,Pattern Recognition 38 (11) (2005) 1993–2006.

[18] G.R. Robertson, M.F. Aburdene, R.J. Kozick, Differential block coding of bi-levelimages, IEEE Transactions on Image Processing 5 (9) (1996) 1368–1370 Sep.

[19] C.L. Wang, S.C. Wu, Y.K. Chan, R.F. Chang, Quad tree and statistical model-basedlossless binary image compressionmethod, Imaging Science Journal 53 (2) (2005)95–103.

794 Y.-A. Ho et al. / Computer Standards & Interfaces 31 (2009) 787–794

Yu-An Ho received the B.S. degree in Department of Informa-tion Management in 2002 from National Taichung Institute ofTechnology, Taichung, Taiwan. He received his Ph.D. inComputer Science and Engineering in 2008 from NationalChung Hsing University, Taichung, Taiwan. His researchinterests include image processing, image retrieval, imagecompression, and image hiding.

education, E-learning, distri

Yung-Kuan Chan received his M.S. degree in Computer Sciencein 1991 fromNewMexico Institute ofMining and Technology, U.S.A. He received his Ph.D. in computer science and information engineering in 2000 from National Chung Cheng University,Chiayi, Taiwan. From 2001 to 2002, he worked as an AssistantProfessor at the Department of Information Management,Chaoyang University of Technology. From August 2002 to July,2003, he was an Assistant Professor at the Department ofComputer Science and Information Engineering, National HuweiInstitute of Technology. From August 2003 to July 2005, he wasan assistant professor, from August 2005 to July 2008, an associate professor, and now a full professor of theManagement

Information Systems Department at National Chung Hsing University, Taichung, Taiwan. Hisresearch interests include image processing, image retrieval, image compression, and imagehiding, database systems, information engineering.

Hsien-Chu Wu received the B.S. and M.S. degrees in AppliedMathematics in 1985 and 1987, respectively, from the NationalChungHsingUniversity, Taichung, Taiwan. She receivedherPh.D.in Computer Science and Information Engineering in 2002 fromNational Chung Cheng University, Chiayi, Taiwan. From 1987 to2002, she was a lecturer of the Department of InformationManagement at National Taichung Institute Technology, Tai-chung, Taiwan. From August 2002 to July 2005, she was an

associate professor of the Department of Information Manage-ment at National Taichung Institute Technology, Taichung,Taiwan. Since August 2005, she has worked as a professor ofthe Department of Information Management at National

Taichung Institute Technology, Taichung, Taiwan. Her research interests include imageauthentication, digitalwatermarking, data hiding, imageprocessing and information security.

Yen-Ping Chu studied his Ph.D. program in Institute of ElectricalEngineering from 1981 to 1986 at National Cheng KungUniversity, Tainan, Taiwan. During the academic years of 1981–1986, he worked as a lecturer of the Institute of AppliedMathematics at the National Chung Hsing University. From1986–2000, he was an associate professor of the Institute ofApplied Mathematics at the National Chung Hsing University.From August 2000 to July, 2003, he has worked as a professor ofthe computer Science at the National Chung Hsing University.From 2003-2005, he was a professor and founding chair of theDepartment ofManagement Information System at the NationalChung Hsing University. Since August 2005, he is a professor in

the Computer Science and Information Engineering and the library director at TunghaiUniversity, Taichung, Taiwan. His research interests include multimedia and www on

buted system design and computer network design.