Document Image Processing for Paper Side Communications

11
1 Document Image Processing for PaperSide Communications Paulo Vinicius Koerich Borges, Student Member, IEEE, Joceli Mayer, Member, IEEE, Ebroul Izquierdo, Senior Member, IEEE Abstract— This paper proposes the use of higher order statis- tical moments in document image processing to improve the per- formance of systems which transmit side information through the print and scan channel. Examples of such systems are multi-level 2-D bar codes and certification via text luminance modulation. These systems print symbols with different luminances, according to the target side information. In previous works, the detection of a received symbol is usually performed by evaluating the average luminance or spectral characteristics of the received signal. This paper points out that, whenever halftoning algorithms are used in the printing process, detection can be improved by observing that third and fourth order statistical moments of the transmitted symbol also change, depending on the luminance level. This work provides a thorough analysis for those moments used as detection metrics. A print and scan channel model is exploited to derive the relationship between the modulated luminance level and the higher order moments of a halftone image. This work employs a strategy to merge the different moments into a single metric to achieve a reduced detection error rate. A transmission protocol for printed documents is proposed which takes advantage of the resulting higher robustness achieved with the combined detection metrics. The applicability of the introduced document image analysis approach is validated by comprehensive computer simulations. I. I NTRODUCTION Printed paper and its corresponding document image pro- cessing is critical for information storage, presentation and transmission of analogue and digital information. In addition to conventional text and images, paper communications include, for example, bar codes, bank and identity documents, and hardcopy text certification. Regarding bar codes, multi-level two-dimensional (2-D) bar codes [1], [2] have gained increased attention in the past few years. Instead of representing information with only black and white symbols, multi-level codes use gray levels to increase the symbol rate, as illustrated in Fig. 1(a). Consequently, a higher capacity version of 1-D bar codes is achieved. Examples of their utilization include representing encrypted information or serving as an auxiliary verification channel. A discussion regarding more applications and aspects related to coding/decoding of 2-D bar codes is given in [1], [2]. With respect to hardcopy text certification, several tech- niques have been proposed in the literature. Brassil et al. [26] This work was supported by CNPq, Proc. No. 202288/2006-4. Paulo Borges and Joceli Mayer are with the LPDS, Dept. of Electrical Engineering - Federal University of Santa Catarina, Florian¨ ı¿ 1 2 polis, Brazil, 88.040-900. Tel: +55 48 3721-7627, Fax: +55 48 3721-9280. Ebroul Izquierdo is with the Multimedia and Vision Lab - Dept. of Electronic Engineering - Queen Mary University of London, Mile End Road, London E1 4NS, UK. Tel: +44 20 7882 5354, Fax: +44 20 7882 7997. (a) Multi-level 2-D bar code. NA L PR O E S S IN G G I S C 1 0 0 1 1 0 0 0 1 1 1 0 100 1 (b) Example of text certification through luminance modulation. Fig. 1: Illustration of side communications over paper. propose and discuss several methods to embed and decode information in documents, which can survive the print and scan (PS) channel. In one of the methods, called line-shift coding, a line is moved up or down according to the bit to be embedded. In order to perform blind detection, line centroids can be used as references. One disadvantage of this method is that the centroids must be uniformly spaced, which does not always occurs in documents. Variations of the method include character and word and character shift coding [27], [23], [25], but they are essentially different implementations of the fundamental idea. Unfortunately, the line and word shifting techniques assume predictable spacing in the document. Equations, titles, and variable size logos or symbols complicate the coding process due to non-uniform spacing. An alternative class of methods (called pixel flipping) per- forms modifications on the characters pixels [28], [13], such as flipping a pixel from black to white, and vice versa. In [30], for example, the modifications are performed according to the shape and connectivity of the characters. In weak noise conditions, this types of methods present a very high information embedding rate, but since the method relies on small dots, when the PS distortions are considered, they required very high resolutions in both printing and scanning to reduce detection errors. For these pixel flipping techniques, an useful detection statistic when the signal is submitted to the PS process is proposed in [31], based on the compression bit rate of the modified signals. Another important technique is called text luminance modu- lation (TLM) [3], [4], [5]. It slightly modulates the luminance of text characters to embed side information. This modification is performed according to the target side information and can be set to cause a very low perceptual impact while remaining detectable after printing and scanning. An example of this technique is given in Fig. 1(b), where the intensity changes have been augmented to make them visible and to illustrate the underlying process. In contrast to some of the limitations

Transcript of Document Image Processing for Paper Side Communications

1

Document Image Processing for Paper SideCommunications

Paulo Vinicius Koerich Borges,Student Member, IEEE,Joceli Mayer,Member, IEEE,Ebroul Izquierdo,SeniorMember, IEEE

Abstract— This paper proposes the use of higher order statis-tical moments in document image processing to improve the per-formance of systems which transmit side information through theprint and scan channel. Examples of such systems are multi-level2-D bar codes and certification via text luminance modulation.These systems print symbols with different luminances, accordingto the target side information. In previous works, the detection ofa received symbol is usually performed by evaluating the averageluminance or spectral characteristics of the received signal. Thispaper points out that, whenever halftoning algorithms are usedin the printing process, detection can be improved by observingthat third and fourth order statistical moments of the transmittedsymbol also change, depending on the luminance level. This workprovides a thorough analysis for those moments used as detectionmetrics. A print and scan channel model is exploited to derivethe relationship between the modulated luminance level andthehigher order moments of a halftone image. This work employs astrategy to merge the different moments into a single metrictoachieve a reduced detection error rate. A transmission protocolfor printed documents is proposed which takes advantage ofthe resulting higher robustness achieved with the combineddetection metrics. The applicability of the introduced documentimage analysis approach is validated by comprehensive computersimulations.

I. I NTRODUCTION

Printed paper and its corresponding document image pro-cessing is critical for information storage, presentationandtransmission of analogue and digital information. In addition toconventional text and images, paper communications include,for example, bar codes, bank and identity documents, andhardcopy text certification.

Regarding bar codes, multi-level two-dimensional (2-D) barcodes [1], [2] have gained increased attention in the past fewyears. Instead of representing information with only blackandwhite symbols, multi-level codes use gray levels to increasethe symbol rate, as illustrated in Fig. 1(a). Consequently,a higher capacity version of 1-D bar codes is achieved.Examples of their utilization include representing encryptedinformation or serving as an auxiliary verification channel. Adiscussion regarding more applications and aspects related tocoding/decoding of 2-D bar codes is given in [1], [2].

With respect to hardcopy text certification, several tech-niques have been proposed in the literature. Brassil et al. [26]

This work was supported by CNPq, Proc. No. 202288/2006-4.Paulo Borges and Joceli Mayer are with the LPDS, Dept. of Electrical

Engineering - Federal University of Santa Catarina, Florianı¿12

polis, Brazil,88.040-900. Tel: +55 48 3721-7627, Fax: +55 48 3721-9280. Ebroul Izquierdois with the Multimedia and Vision Lab - Dept. of Electronic Engineering -Queen Mary University of London, Mile End Road, London E1 4NS, UK.Tel: +44 20 7882 5354, Fax: +44 20 7882 7997.

(a) Multi-level 2-D bar code.

NAL P R O E S S I NGGIS C1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1

(b) Example of text certification through luminance modulation.

Fig. 1: Illustration of side communications over paper.

propose and discuss several methods to embed and decodeinformation in documents, which can survive the print andscan (PS) channel. In one of the methods, called line-shiftcoding, a line is moved up or down according to the bitto be embedded. In order to perform blind detection, linecentroids can be used as references. One disadvantage ofthis method is that the centroids must be uniformly spaced,which does not always occurs in documents. Variations ofthe method include character and word and character shiftcoding [27], [23], [25], but they are essentially differentimplementations of the fundamental idea. Unfortunately, theline and word shifting techniques assume predictable spacingin the document. Equations, titles, and variable size logosorsymbols complicate the coding process due to non-uniformspacing.

An alternative class of methods (called pixel flipping) per-forms modifications on the characters pixels [28], [13], suchas flipping a pixel from black to white, and vice versa. In[30], for example, the modifications are performed accordingto the shape and connectivity of the characters. In weaknoise conditions, this types of methods present a very highinformation embedding rate, but since the method relies onsmall dots, when the PS distortions are considered, theyrequired very high resolutions in both printing and scanningto reduce detection errors. For these pixel flipping techniques,an useful detection statistic when the signal is submitted tothe PS process is proposed in [31], based on the compressionbit rate of the modified signals.

Another important technique is called text luminance modu-lation (TLM) [3], [4], [5]. It slightly modulates the luminanceof text characters to embed side information. This modificationis performed according to the target side information and canbe set to cause a very low perceptual impact while remainingdetectable after printing and scanning. An example of thistechnique is given in Fig. 1(b), where the intensity changeshave been augmented to make them visible and to illustratethe underlying process. In contrast to some of the limitations

2

of the methods detailed above (non-blind detection, need foruniform spacing between lines, errors from segmentation, etc),TLM can be designed to be more robust to these issues, asdiscussed in [6].

Due to the usefulness of TLM and multi-level 2-D barcodes, this work focuses on improving the detection in suchsystems. The contributions presented in this paper are many-fold:(i) In TLM and multi-level 2-D bar codes, extraction of the

embedded side information is usually performed by evaluatingthe average amplitude [1], [5], [6] or spectral characteristics [5]of the region of interest. However, considering that halftoningis usually employed in the printing process, other statistics ofthe received signal can also be exploited in the detection. Oneexample is the sample variance, which can be effectively usedas a detection metric, as proposed in [3]. In this work, higherorder statistical moments such as skewness and kurtosis areused as detection metrics, extending the work presented in[3].(ii) The provided analysis shows the relationship between

the modulated luminance and the higher order statistics ofa printed and scanned halftone region. Statistical assumptionsrelated to the 1st and 2nd order moments and to the PS channelmodel are based on [3]. This model includes the halftonequantization characteristics of the halftoning process whichare exploited here to derive the high order statistics baseddetection metrics.(iii) Robustness against PS distortions and consequently

reduced detection error, by combining the proposed metricsinto a single highly efficient metric. This unified model allowspreviously proposed metrics [5], [6] to be combined with theones proposed in this paper.(iv) A practical protocol for the certification of printed

documents is proposed which exploits the resulting highrobustness of the proposed unified metrics.

This paper is organized as follows. Section II describesthe halftoning process and a PS model, which can be seenas a noisy communications channel. Section III analyzes therelationship between the different moments and the modulatedaverage luminance before PS. Section IV performs a similaranalysis considering the PS distortions. Section V proposesthat the discussed metrics be combined into a single metric us-ing the Bayes classifier. Section VI proposes an authenticationprotocol for printed text documents. Experimental resultsarepresented in Section VII, followed by conclusions in SectionVIII.

II. T HE PRINT AND SCAN CHANNEL

A. The Halftoning Process

Due to limitations of most printing devices, image signalsare quantized to 0 or 1 (where0 representswhite and 1representsblack in this paper) prior to printing . The output‘0’ represents a white pixel (do not print a dot), and ‘1’represents a black pixel (print a dot). A halftone image (binary)b is generated from an original images and this quantizationis performed according to a direct comparison between theelements in a dithering matrixD and the elements ins. A

detailed description of halftoning algorithms can be foundin[7], [8]. The binary signalb is included in the PS channelmodel described in the next section.

B. A Print and Scan Channel

A number of analytical models of the PS channel have beenpresented in the literature [1], [2], [9], [6]. In general, the pro-posed models of the PS channel assume that the process can bemodeled by low-pass filtering, the addition of Gaussian noise(white or colored), and non-linear gains, such as brightnessand gamma alteration. Geometric distortions such as possiblerotation, re-scaling, and cropping may also occur, but theyareassumed controled as they can be compensated for.

The contributions presented in this work are directly relatedto the effects caused by the halftoning process. For this reason,a PS model which includes the halftoned signal is employedto describe the process. A detailed description of this modelis given in [3], where the PS operation is described by

y(m,n) = gs

gpr[b(m,n)] + η1(m,n)

∗ h(m,n)

+ η3(m,n),

(1)

In this equation,η1 represents the microscopic ink and paperimperfections andη3 is a noise term combining illuminationnoise and microscopic particles in the scanner surface. Theterm h represents a blurring effect combining the low passeffect due to printing and due to scanning. The termgpr(·) in(1) represents a gain in the printing process and the termgs(·)represents the response of scanners.

In the model in (1),b represents the halftone signal, gener-ated from an original signals, as described in Section II-A.

III. E FFECTSINDUCED BY THE HALFTONE

The halftoning algorithm quantizes the input to ‘print adot’ and ‘do not print a dot’ according to the dither matrixcoefficients. This quantization causes a predictable effect onthe variance, skewness, and kurtosis of a halftoned region,according to the input luminance. For the variance, this isdependence is discussed in [3]. In the following this approachis extended to the skewness and the kurtosis, to help improvethe detection in printed communications.

A. Skewness

The skewness measures the degree of asymmetry of a dis-tribution around its mean [14]. It is zero when the distributionis symmetric, positive if the distribution shape is more spreadto the right and negative if it is more spread to the left, asillustrated in Fig. 2(a).

The skewness of a halftone blockb0 of sizeJ × J is givenby:

γ1b0 =

1

J2

J∑

m=1

J∑

n=1

[b0(m,n)− b0]3

σ3b0

(2)

whereb0(m,n) ∈ 0, 1, b0 = (1/J2)∑J

m=1

∑Jn=1 b0(m,n)

and J2 is the number of coefficients in the dithering matrix

3

replacementsnegativepositive

(a) Skewness.

negativepositive

(b) Kurtosis.

Fig. 2: Illustration of the effect of positive and negativeskewness and kurtosis.

D. Sinceb0(m,n) ∈ 0, 1, b20(m,n) = b0(m,n), and (2) canbe written as

γ1b0 =b0 − 3b0

2+ 2b0

3

(

b0 − b02)3/2

(3)

B. Kurtosis

The kurtosis is a measure of the relative flatness or peaked-ness of a distribution about its mean, with respect to a normaldistribution [14]. A high kurtosis distribution has a sharperpeak and flatter tails, while a low kurtosis distribution hasamore rounded peak with wider “shoulders,” as illustrated inFig. 2(b).

The kurtosis of a halftone blockb0 of sizeJ × J is givenby:

γ2b0 =

1

J2

J∑

m=1

J∑

n=1

[b0(m,n)− b0]4

σ4b0

− 3

=b0 − 4b0

2+ 6b0

3− 3b0

4

(

b0 − b02)2 − 3 (4)

To deriveγ1b0 andγ2b0 as a function of the input luminances(m,n), b0(m,n) must be generated from a constant graylevel region, that is,s(m,n) = s0, m,n = 1, . . . J , wheres0 is a constant. Assuming thatD is approximately uniformlydistributed as illustrated inpD in Fig. 3, the probabilityp ofb(m,n) = 1, which is Pr[s0 > D(m,n)], is given by

p = Pr[s0 < D(m,n)] =1

J2

b(m,n)=1

b(m,n)

=1

J2

J∑

m=1

J∑

n=1

b(m,n) = b = s0

(5)

as illustrated by the areap in Fig. 3. Substituting this resultinto (3) and (4), yields

areap = s0

1s00

pD

0 1

1− s0 s01pb

Fig. 3: Representation of the uniform distribution assumedforthe coefficients ofD and the distribution ofb.

γ1b0(s0) =s0 − 3s20 + 2s30

(s0 − s20)3/2

(6)

γ2b0(s0) =s0 − 4s20 + 6s30 − 3s40

(s0 − s20)2 (7)

whereγ1b(s0) andγ2b(s0) represent respectively the skewnessand the kurtosis of a halftoned block that represents a regionof constant luminances0.

In [3], an analysis similar to the above is presented for thevariance, and it is shown that with the same assumptions thevarianceσ2

b0(s0) is given by

σ2b0(s0) = s0 − s20 (8)

C. Comments onγ1b and γ2b

The halftone signalb0 is binary and it is distributed ac-cording to b0(m,n) ∈ 0, 1, with probabilities1 − s0 ands0, respectively, as illustrated bypb in Fig. 3. Because theskewness and the kurtosis ofb0 depend ons0, these momentscan be used as detection metrics in text luminance modulationand multi-level bar codes, in addition to the average andvariance metrics employed in [3].

Regarding the skewness, it is equal to zero whens0 = 0.5and the distribution ofb0 is symmetric, represented by twopeaks of equal probability. The two symmetric peaks alsoflatten the distribution ofy in (1), minimizing the kurtosis.Whens0 < 0.5, b is composed of more white dots than blackdots, leaning the distribution ofy to the left and causinga positive skewness. The opposite occurs whens0 > 0.5,yielding a negative skewness. Likewise, the distribution ofy becomes more peaky ass0 approach the limits of theluminance range, consequently increasing the kurtosis.

IV. EFFECTSINDUCED BY THE PS CHANNEL

The statistical moments described in (3) and (4) are affectedby the low-pass characteristic and the noise in the PS channel.Considering these channel distortionsγ1y andγ2y are derivedin Sections IV-C and IV-D, respectively. Statistical and dis-tortion assumptions for the analyses are discussed in SectionIV-A. For simplicity, the(m,n) coordinate system is mappedto a one dimensional notation.

A. Statistical and Distortion Assumptions

In the model in (1), letb(n) = b + η2(n). The noiseη2is zero-mean with variance given byσ2

η2= σ2

b , and it isdistributed according toη2 ∈ −s0, 1 − s0, as illustratedin Fig. 4.

4

−s0 1− s0

1− s0 s0

pη2

Fig. 4: Distribution of the noiseη2.

Although gs in (1) is generally defined as non-linear, inmany devices it can be approximated to a linear model [1].This is particularly more reasonable in a TLM application,because detector operates in a small range of the luminancerange[0, 1] due to the low perceptual impact requirement. Forthis reason,gs is assumed linear andφ in gs is approximatedto 1 for simplicity.

Assuming thatb(m,n) is generated from a constant graylevel region, that is,s(m,n) = s0 = b, (1) can be written as

y(n) =

α[b+ η2(n)] + η1(n)

∗ h(n) + η3(n), (9)

The termα represents a gain (seegpr in (1)) that variesslightly throughout a full page due to non-uniform printer tonerdistribution. Due to its slow rate of change,α is modeledas constant inn but it varies in each realizationi satisfyingα ∼ N (µ2

α, σ2α), wherei represents thei−th symbol of a 2-D

bar code or thei−th character in TLM.Due to the nature of the noise (discussed in Section II) and

based on experimental observations,η1 and η3 can be gen-erally modeled as zero-mean mutually independent Gaussiannoise [1], [11], [2].

B. Variance

Based on the assumptions described above, it is shown in[3] that the sample variance of a scanned symboly is givenby

µσ2y= (µ2

α + σ2α)σ

2η2rh(0) + σ2

η1rh(0) + σ2

η3(10)

In the following an extension to the skewness and to thekurtosis is presented.

C. Skewness

The sample skewness of a scanned symboly is given by

µγ1y= E

1

σ3yN

N∑

n=1

[αs0 + αη2(n) ∗ h(n) + η1(n) ∗ h(n)

+ η3(n)− y]3

=1

σ3yN

N∑

n=1

E

[αη2(n) ∗ h(n) + η1(n) ∗ h(n)

+ η3(n)]3

(11)

Recalling thatη1, η2 and η3 are zero-mean mutually in-dependent random variables and that third order moments

of independent and identically distributed zero-mean randomvariables are zero, (11) becomes

µγ1y=

1

σ3y

Eα3E[η2(n) ∗ h(n)]3 (12)

Appendix I derives the termE[η2(n) ∗ h(n)]3 in equationabove, yielding

µγ1y=

1

(σ2y)

3/2(3σ2

αµα+µ3α)[(1−s0)(−s0)

3+(1−s0)3s0]h3

(13)whereσ2

y is described by (10) andh3 is given by:

h3 =

∞∑

k=−∞

∞∑

l=−∞

∞∑

r=−∞

h(k)h(l)h(r) (14)

D. Kurtosis

The sample kurtosis of a scanned symbol is given by

µγ2y= E

1

σ4yN

N∑

n=1

[αs0 + αη2(n) ∗ h(n) + η1(n) ∗ h(n)

+ η3(n)− y]4

=1

σ4yN

N∑

n=1

E

[αη2(n) ∗ h(n) + η1(n) ∗ h(n) + η3(n)]4

=1

σ4yN

N∑

n=1

Eα4[η2(n) ∗ h(n)]4

+ 6Eα2[η2(n) ∗ h(n)]2[η1(n) ∗ h(n)]

2

+ 2Eα2[η2(n) ∗ h(n)]2η23(n)+ E[η1(n) ∗ h(n)]

4

+ 6[η1(n) ∗ h(n)]2η23(n)+ Eη43(n)

(15)

Appendix II derives the termE[η2(n) ∗ h(n)]4 in equationabove, yielding

µγ2y=

1

σ4yN

(

(3σ4α + 6σ2

αµ2α + µ4

α)[(1 − s0)(−s0)4

+ (1− s0)4s0]h4 + 6(σ2

α + µ2α)σ

2η1σ2η2r2h(0)

+ 6(σ2α + µ2

α)σ2η2σ2η3rh(0) + 3σ4

η1rh2(0)

+ 6σ2η1σ2η3rh(0) + 3σ4

η3

)

(16)

whereh4 is given by:

h4 =

∞∑

k=−∞

∞∑

l=−∞

∞∑

r=−∞

∞∑

s=−∞

h(k)h(l)h(r)h(s) (17)

V. COMBINING THE METRICS

Extending the approach which combines the metrics averageand variance discussed in [3], it is also possible to combinethe skewness and the kurtosis along with another metrics of areceived symbol into a single metric to reduce the detectioner-ror rate. Considering a stochastic interpretation of the detectionmetrics, the result of each metric is approximately normallydistributed. For this reason, in this work the Bayes classifier

5

[15] is employed to combine the metrics, due to its optimalproperties for normally distributed patterns [15]. Results of thedetection using the Bayes classifier are given in Section VII.The reader is referred to [3] for an example on how to applythis classifier in the scenario discussed in this paper.

Although some detection metrics have better performancethan others, because all the first four statistical moments areuseful to separate classes, combining them increases the dis-tance between classes, and consequently reduces the detectionerror rate [16], at the expense of increasing computationalcomplexity. It is also possible to combine useful spectral orother non-statistical metrics, although this is not discussed inthis paper.

VI. A PRACTICAL AUTHENTICATION PROTOCOL

Taking advantage of the reduced error rate provided bycombining the detection metrics, a practical protocol fordocument authentication based on TLM [5], [6] is proposed.It is similar to the system proposed in [33], but with analternative detection method. Instead of employing TLM asa side message transmitter, the modulated characters (seeFigure 1(b)) are used to ensure that no character in thedocument has been altered. This is achieved by combiningcryptography, optical character recognition (OCR) [21], [22],and the detection of characters with modulated luminances,discussed in this paper. Notice, however, that the proposedsystem can also be used to authenticate digital documents thatare not subject to the PS channel.

The proposed framework for authentication scrambles thebinary representation of the original text string with a keythatdepends on the string. The resulting scrambled vector is usedto create another vector of dimension equal to the number ofcharacters in the document. This is used as a rule to modulateeach character individually, as illustrated in Figure 1(b).

A related approach for image authentication in which adigital watermark [10] is generated with a key that is a functionof some featuref in the original image has been proposedin the literature, as in [19], [20], [24], for example. Toavoid thatf be modified by the embedding of the watermarkitself, hence frustrating the watermark detection process, onlycharacteristics of a portion of the image must be used. Itis possible, for example, to extract features from the low-frequency components, and to embed the watermark in thehigh frequency components, as discussed in [10].

In contrast, in the authentication system proposed here, themodified characters luminances do not alter the feature usedto generated the permutation key, which are the characters“meanings.” The system is described in the following.

A. Encryption

• Let vectorc = [c1, c2, . . . , cK ] of sizeK represent a textstring withK characters.

• Let vectors = [s1, s2, . . . , sK ] represent the luminancesof characters[c1, c2, . . . , cK ], respectively.

• Let ci ∈ Ω (Ω = a,b,c, . . . , X, Y, Z, for example),whereΩ has cardinalityS.

• Let cbi be the binary representation of symbolci.

• Let cb be the binary representation ofc, wherecb hassize |cb| = K log2 S.

• Let κ = f(cb) be a function ofcb. κ is used as a key togenerate a pseudo-random sequence (PRS)k, such thatthe PRS’s are ideally orthogonal for different keysκ.

• Let c′b = cb k, where represents the “exclusive or”(XOR) logical operation.

• Let M be a function that mapsc′b, with |cb| bits, toanother vectorw, with K bits.

c s/b

cb

cb k

κe

M|cb| → K

κ

k

PRS(κ)

f(cb)

c′

b

w

we = s

Encrypt

Fig. 5: Encryption block diagram. Block ‘s/b’ representsstring-to-binary conversion. Block ‘M’ represents a mappingof c′b from |cb| bits to K bits. The symbol represents the“exclusive or” (XOR) logical operation.

In order to provide security,w is encrypted with the privatekey of a public key cryptosystem [17]. Public key cryptosys-tems use two different keys, one for encryption,κe, and onefor decryption,κd. The private keyκe is only available forusers who are allowed to perform the authentication process.On the other hand, anyone can have access to the public keyκd to only checkwhether a document is authentic, without theability to generate a new authenticated document.

Let we be the encrypted version ofw based on the keyκe, using the a public key encryption scheme such as theRSA [17], for example. To authenticate the text document,vectors (which represents the luminances of the characters inthe document) is modified such thats = we. Therefore, thedocument is authenticated by setting the luminance of eachcharacterci equal tosi.

B. Decryption

In the verification process, OCR is applied to the printeddocument. In addition, the luminance of each character isdetermined using the metrics proposed in this paper. Therefore,when testing for the authenticity of the document one hasaccess to a receivedc and a receiveds, wherec ands representthe received vectorsc and s, respectively. It is assumed thatthe conditions are controlled such that no OCR or luminancedetection errors occur. Moreover, one has access to a publickey κd for decryption in the RSA algorithm and a scramblingkey κ = f(cb), which depends onc.

Using the public keyκd, it is possible to decrypts = we

into w.

6

Usingκ, it is possible to scramblecb (the binary represen-tation of c) yielding c

b. Applying the same mapping ruleMof the encryption process toc′b yields a new vectorw′.

If w′ = w the document is assumed authentic. Else, it is

assumed that one or more characters have been altered. Ablock diagram of the authentication test process is given inFig. 6.

c s/b

cb

κd

f(cb)

c′

bc′

b

w Equality Test

MM|c′b| → K|c′b| → K

w′

cb k

Authentic?

s = weDecrypt

κ

k

PRS(κ)

Fig. 6: Decryption block diagram. Block ‘s/b’ representsstring-to-binary conversion. Block ‘M’ represents a mappingof c′b from |cb| bits to K bits. The symbol represents the“exclusive or” (XOR) logical operation.

If an attacker changes one or more characters in the docu-ment such thatc 6= c, w andw′ are two completely differentsequences (quasi-orthogonal) with very high probability,fail-ing the authentication test. A practical example of the proposedsystem is given in Section VII.

Although OCR has been included in the detection processassuming that the document has been printed and scanned,the proposed authentication protocol can be applied to digitaldocuments. It does not require the use of appended files and itis robust to format conversions, such a .pdf to .ps, for example.Hence, unlike a digital signature, which protects the binarycodes of the documents, the system proposed here protectsthe visual content or the meaning of the document.

VII. E XPERIMENTS

The goal of this section is to validate experimentally theanalyses of Sections III and IV and to illustrate that higherorder moments can be used to detect a luminance change inprinted symbols.

During the experiments, the noise and the distortion param-eters of the PS channel vary depending on the printing andscanning devices used. The experiments are conducted withdifferent combinations of printers and scanners, according tothe legend in Table I. The printing and scanning resolutionswere set to 300 dots/inch and pixels/inch, respectively. Typical

values for the parameters in (1) areση1= 0.018, ση3

= 0.01,µα = 0.8, σα = 0.03.

TABLE I: Combinations of printers and scanners used in theexperiments.

Printer Scanner LegendHP IJ-855C Genius HR6X C1HP IJ-855C HP 2300C C2HP IJ-855C HP SJ-5P C3

HP IJ-870Cxi Genius HR6X C4HP IJ-870Cxi HP 2300C C5HP IJ-870Cxi HP SJ-5P C6HP LJ-1100 Genius HR6X C7HP LJ-1100 HP 2300C C8HP LJ-1100 HP SJ-5P C9

As discussed in [3], comparing the frequency responsesof an original digital image and its PS version after severalexperiments, the responseh(m,n) is well represented by atraditional Butterworth low-pass filter described by

H(f1, f2) =1

1 + [F (f1, f2)/F0]2Q(18)

whereQ is the filter order,F0 is the cutoff frequency, andF (f1, f2) is the Euclidean distance from pointf1, f2 to theorigin (center) of the frequency spectrum. Although differentfilters could be used, for this model the filter orderQ and cut-off frequencyF0 which yield the best approximation of thefrequency response of the process are determined experimen-tally through curve fitting. In the tests, these parameters aregiven byQ = 1 andF0 = 0.17 for the devices used.

Using the noise, gain and blurring filter parameters de-scribed above, a character or symbol distorted with the pro-posed PS model is perceptually similar to an actual printedand scanned character, as illustrated in [3].

Regarging the perceptual impact of the method, if themodulation intensity exceeds a given perceptual threshold,it becomes less difficult for a human viewer to notice themodifications. Nevertheless, unlike regular images (like naturalphotos), where the “meaning” of the image depends on thevalues of the pixels, in text documents the “meaning” isgiven by the shape of the characters, such that letters canbe recognized and words interpreted. In this sense, characterscould be of any color and as long as they are readable, theinformation content of the modified document will be exactlythe same as the original all-black character document. Therelevance of the perceptual constraint is that it refrains thereader from being bothered or annoyed while reading, helpingto ensure that the reader’s attention is not drawn to the factthat the characters are modified.

A. Experiment 1

The effect of a halftone skewness level that depends onthe input luminance is illustrated in Fig. 7, where two curvesare presented. The black curve (‘Theoretical’) representsthetheoretical skewness presented in (7). The gray curve (‘Bayer’)represents the skewness of a halftone block (before PS)

7

0 0.2 0.4 0.6 0.8 1−10

−5

0

5

10

Luminance

Ske

wne

ss

TheoreticalBayer

Fig. 7: The effect of skewness dependent on the input lumi-nance.

0 0.2 0.4 0.6 0.8 1

0

5

10

15

20

25

30

Luminance

Kur

tosi

s

TheoreticalBayer

Fig. 8: The effect of kurtosis dependent on the input lumi-nance.

generated using the Bayer dithering matrix [18]. Similarexperiments are presented regarding the kurtosis, as shownin Fig. 8. These figures illustrate that the analyses of SectionIII are in accordance with the results obtained from a practicalhalftone matrix.

B. Experiment 2

This experiment illustrates the validity of the channel modeldescribed in Section II and the expected values of the higherorder moments as a function of the input luminance, deter-mined analytically in Section IV. The effect of a printed andscanned variance level that depends on the input luminanceis illustrated in Fig. 9, where two curves are presented. Theblack curve (‘Theoretical’) represents the theoretical variancedetermined in (10). The gray curve (‘Experimental’) representsthe variance of printed and scanned blocks, originally of size32 × 32. Similar experiments are presented regarding theskewness and the kurtosis determined in (13) and (16), asshown in Figures 10 and 11, respectively. The ‘Experimental’curve in these figures corresponds to the averaging of theresults obtained for the nine combinationsC1 − C9 of PSdevices.

Figure 12(a) shows the histogram of a PS block generatedfrom a constant luminance values0 = 0. Similarly, histogramsfor s0 = 90 ands0 = 180 are presented in Figures 12(b) and12(c), respectively, illustrating a change in the shape of thedistribution.

C. Experiment 3

In this experiment a multi-level 2-D bar code is printedwith a sequence of 56000 symbols with four possible lu-

0 0.2 0.4 0.6 0.8 10

0.01

0.02

0.03

0.04

0.05

Luminance

Var

ianc

e

TheoreticalExperimental

Fig. 9: The effect of variance dependent on the input lumi-nance, after PS.

0 0.2 0.4 0.6 0.8 1−2

−1

0

1

2

Luminance

Ske

wne

ss

TheoreticalExperimental

Fig. 10: The effect of skewness dependent on the inputluminance, after PS.

0 0.2 0.4 0.6 0.8 1−2

−1

0

1

2

3

4

5

6

Luminance

Kur

tosi

sTheoreticalExperimental

Fig. 11: The effect of kurtosis dependent on the input lumi-nance, after PS.

TABLE II: Experimental error rates for 2-D bar codes.Metric # of Errors Error Rate

Average (µ) 667 1.19× 10−2

Kurtosis (γ2) 1860 3.32× 10−2

Comb. (µ, σ2) 114 2.04× 10−3

Comb. (µ, γ1) 259 4.63× 10−3

Comb. (µ, σ2, γ1) 50 8.93× 10−4

Comb. (µ, σ2, γ1, γ2) 22 3.93× 10−4

TABLE III: Experimental error rates for text watermarking.Metric # of Errors Error Rate

Average (µ) 157 1.03× 10−2

Variance (σ2) 144 9.48× 10−3

Skewness (γ1) 280 1.84× 10−2

Kurtosis (γ2) 328 2.16× 10−2

Comb. (µ, σ2) 14 9.22× 10−4

Comb. (µ, γ1) 27 1.78× 10−3

Comb. (µ, σ2, γ1) 8 5.27× 10−4

Comb. (µ, σ2, γ1, γ2) 3 1.98× 10−4

8

(a) Histogram fors0 = 0.

(b) Histogram fors0 = 0.65.

(c) Histogram fors0 = 0.3.

Fig. 12: Histograms with different shapes illustrating thechange in the skewness and in the kurtosis of a PS region,according to the input luminances0.

minance levels (2 bits/symbol) drawn from the alphabet0.08, 0.34, 0.65, 0.95. Optimum values for the alphabet de-pend on the PS devices used, as discussed in [1] and [2],where the authors present a study on multilevel coding forthe PS channel assumed. The original size (prior to printing)of each symbol is8 × 8, corresponding to the size of onehalftone block. Table II shows the obtained bit error rates whenperforming the detection using the four suggested metrics(average, variance, skewness, kurtosis) separately. Thistablealso presents the result of combining the metrics with theBayes classifier, illustrating a smaller error rate. Because thevariance and the kurtosis are symmetric around the middle ofthe luminance range, they cannot be used alone as detectionmetrics.

In [2], for example, in a experiment comparable to the firstrow (average detection) in Table II, the observed error ratewas 1.817 × 10−2. This rate is significantly improved whenthe multilevel coding with multistage decoding (MLC/MSD)proposed in [2] is employed. Notice, however, that such moreadvanced coding/decoding methods can also be applied to thehigher order statistics proposed in this work.

The size of the8× 8 cell used is comparable to traditionalnon-multilevel 2-D bar codes used in commercial applications.In [2] the authors compare the rate (in bytes per square inch)

0.5 0.6 0.7 0.8 0.90

0.01

0.02

0.03

0.04

0.05

Luminance

Var

ianc

e

Bit 0Bit 1

Fig. 13: Decision boundary combining two metrics. Note thedispersion and the offset caused by the distortions of the PSchannel.

of well-know 2-D bar codes used in practice, such as DataMatrix, Aztec Code and QR code, to the rate of multilevel 2-Dbar codes. The multilevel codes show a superior performancein comparison the non-multilevel codes, when the modulationlevels of the multilevels codes are properly assigned, accordingto the print-scan devices employed.

D. Experiment 4

This experiment implemented the text hardcopy watermark-ing system [5], [6], which embeds data by performing modifi-cations in the luminances of characters, respecting a perceptualtransparency requirement. A sequence of15180 characters (asin ’abcdef...’ ) is printed and scanned. The font type testedwas ‘Arial’, size 12. The luminances of the characters wererandomly modified to0.95, 0.84 with equal probability,where 0.95 corresponds to bit 0 and 0.84 corresponds to bit1. To determine to which class (bit 0 or bit 1) each receivedcharacter belongs to, the four metrics discussed so far weretested. The resulting obtained error rates are given in Table III.An example of employing the Bayes classifier to combine twometrics (the average and the variance) is given in Fig. 13. Notethat decision boundary combining the metrics yields a reducederror rate, in comparison to the boundaries based solely onthe average or the variance. Similarly, Fig. 14 illustratesasurface separating the two classes in a 3-D space formed bythe average, the variance, and the skewness.

Notice that this small error rate is achieved using standardconsumer printing and scanning devices and regular paper.Using professional equipments, it is reasonable to assumethat the error rate is close to zero in small documents (suchas identification cards, passports, etc), specially if completeperceptual transparency is not a requirement. This assumptionsmake practical the authentication protocol proposed in SectionVI.

E. Experiment 5

This section illustrates two applications for the authentica-tion protocol proposed in Section VI. One in printed form andone in digital form.

9

0.5

1

0.010.020.030.040.050

2

4

6

AverageVariance

Ske

wne

ss

Bit 0Bit 1

Fig. 14: Decision boundary combining three metrics.

Fig. 15: ID authenticated using the protocol proposed inSection VI. Notice the modified character luminances.

1) ID Card: An identification card is authenticated usingTLM, as shown in Fig. 15. Notice that the intensity changesare visible to illustrate the underlying process. A modifiedversion of the card is generated, where the last digit in the‘Valid Until’ field is modified. To reduce the probability ofOCR and luminance detection errors, only numbers and upperand lower cases characters of the alphabet are considered inthis test.

For the non-tampered document in Fig. 15, using a 8-bitASCII table to represent the characters [17], the followingparameters (discussed in Section VI) are obtained:

• K = 124.• c = I,N,T,E,R,N,. . . ,2,0,0,8.• cb = [0100100101001110, . . . , 00111000].• κ = f(cb) = 1176020.• k = [0100110110101110, . . . , 00010101].• c

b = cb k = [0000010011100000, . . . , 00101101].• w = [11100, . . . , 0110].• we = [01101, . . . , 0111].

we is composed ofK elements, corresponding to thenumber of characters in the document. The document isauthenticated by altering the luminance of each characterciin c to wei in we. Notice that the characters luminances inthe document in Fig. 15 are modified according towe.

After printing, the parameters are again obtained, based onthe tampered with printed document in Fig. 16. Because thelast digit of the document is different,w and w

′ are twocompletely different sequences, failing the equality testshownin Fig. 6.

2) Paper Title: Some of the characters luminances on thetitle of this paper are slightly modulated to a gray level. Thiscan be verified using any screen capture tool and commonimage processing software. Increasing the luminance gain toa visible level, the text becomes:

Fig. 16: Scanned ID. The last digit in the birth date is modifiedfrom 8 to 9, as indicated by the arrow.

Document Image Processing for Paper SideCommunications

Using the 8-bit ASCII standard, the parameters for thissample are:

• K = 49.• c = D,o,c,u,m,e,. . . ,i,o,n,s.• cb = [0100010001101111, . . . , 01110011].

which yields to the modulation string

we = [10011101, . . . , 1011].

as illustrated in the characters luminance above. If the ‘D’in‘Document’ is modified to ‘d’:

• K = 49.• c = d,o,c,u,m,e,. . . ,i,o,n,s.• cb = [110010001101111, . . . , 01110011].

a completely different stringwe is generated:

we = [01100100, . . . , 1000].

and the authentication is not verified.

VIII. C ONCLUSIONS

This paper reduces the detection error rate of printed sym-bols, in cases where the luminances of the symbols dependon a message to be transmitted through the PS channel. Ithas been observed that as a consequence of modifying theluminances, the halftoning in the printing process also modifiesthe higher order statistical moments of a symbol, such as thevariance, the skewness and the kurtosis. Therefore, in additionto the average luminance and spectral metrics, these momentscan also be used to detect a received symbol. This is achievedwithout any modifications in the transmitting function. A PSchannel model which accounts for the halftoning noise isdescribed. Analyses determining the relationship betweentheaverage luminance and the higher order moments of a halftoneimage are presented, justifying the use of the new detectionmetrics. In addition to the extended detection metrics, thispaper also proposes an authentication protocol for printedanddigital documents, where it is possible to determine whetherone or more characters have been modified in a text document.The experiments illustrated: the successful applicability of thenew metrics; that a reduced error rate is achieved when themetrics are combined according to the Bayes classifier; twopossible applications of the proposed authentication protocol,

10

using as an example the title of this paper. Notice that thecontributions presented can be combined with other methods,serving as a practical alternative for document authentication.

APPENDIX I

This appendix derives the result presented in (13).

µy = E[(η2(n) ∗ h(n)]3

= E

( ∞∑

k=−∞

h(k)η2(n− k)

)3

= E

∞∑

k=−∞

∞∑

l=−∞

∞∑

r=−∞

h(k)h(l)h(r)

η2(n− k)η2(n− l)η2(n− r)

(19)

Let

h3 =

∞∑

k=−∞

∞∑

l=−∞

∞∑

r=−∞

h(k)h(l)h(r) (20)

Recalling thatη2 is uncorrelated noise andη2 ∈ −s0, 1−s0 with probabilities1− s0, s0 yields

Eη2(n− k)η2(n− l)η2(n− r) 6= 0 ⇔ k = l = r (21)

and

Eη32 =

−∞

η32fη2(η2)dη2

=

−∞

η32 [δ(η2 + s0)(1− s0) + δ(η2 − 1 + s0)s0]dη2

= (−s0)3(1− s0) + (1 − s0)

3s0

(22)

Therefore, equation (19) can be written as

µy = h3[(−s0)3(1− s0) + (1− s0)

3s0] (23)

whereh3 is given by (20).

APPENDIX II

This appendix derives the result presented in (16).

µy = E[(η2(n) ∗ h(n)]4

= E

( ∞∑

k=−∞

h(k)η2(n− k)

)4

= E

∞∑

k=−∞

∞∑

l=−∞

∞∑

r=−∞

∞∑

s=−∞

h(k)h(l)h(r)h(s)

η2(n− k)η2(n− l)η2(n− r)η2(n− s)

(24)

Let

h4 =∞∑

k=−∞

∞∑

l=−∞

∞∑

r=−∞

∞∑

s=−∞

h(k)h(l)h(r)h(s) (25)

Recalling thatη2 is uncorrelated noise andη2 ∈ −s0, 1−s0with probabilities1− s0, s0 yields

Eη42 =

−∞

η42fη2(η2)dη2

=

−∞

η42 [δ(η2 + s0)(1 − s0) + δ(η2 − 1 + s0)s0]dη2

= (−s0)4(1− s0) + (1− s0)

4s0(26)

Therefore, (24) can be written as

µy = h4(−s0)4(1− s0) + (1− s0)

4s0 (27)

REFERENCES

[1] N. D. Quintela and F. Pı¿12

rez-Gonzı¿12

lez. “Visible encryption: Usingpaper as a secure channel.” InProc. of SPIE, USA, 2003.

[2] R. Vı¿1

2llan, S. Voloshynovskiy, O. Koval, and T. Pun, “Multilevel 2D bar

codes: towards high capacity storage modules for multimedia security andmanagement,”IEEE Transactions on Information Forensics and Security,1, 4, pp. 405-420, December 2006.

[3] P. Borges and J. Mayer, “Text luminance modulation for hardcopywatermarking,”Signal Processing, 87, pp. 1754-1771, 2007.

[4] A. K. Bhattacharjya and H. Ancin, “Data embedding in textfor a copiersystem,”Proc. of IEEE Int’l Conf. on Image Processing, Vol. 2, 1999.

[5] R. Vı¿1

2llan, S. Voloshynovskiy, O. Koval, J. Vila, E. Topak, F. Deguil-

laume, Y. Rytsar and T. Pun, “Text data-hiding for digital and printeddocuments: theoretical and practical considerations” inProc. of SPIE,Elect. Imaging, USA, 2006.

[6] P. V. Borges and J. Mayer, “Document watermarking via characterluminance modulation,”IEEE Int’l Conf. on Acoust., Speech and SignalProc., May 2006.

[7] R. A. Ulichney, “Dithering with blue noise,” inProc. of IEEE, Vol. 76,No. 1, 1988.

[8] R. A. Ulichney, Digital Halftoning, 1988.[9] K. Solanki, U. Madhow, B.S. Manjunath, S. Chandrasekaran “Modeling

the print-scan process for resilient data hiding,” inProc. of SPIE, Elect.Imaging, USA, 2005.

[10] Ingemar J. Cox, Matthew L. Miller and Jeffrey A. Bloom,DigitalWatermarking, Morgan Kaufmann, 2002.

[11] S. Voloshynovskiy, O. Koval, F. Deguillaume and T. Pun,“Visualcommunications with side information via distributed printing channels:extended multimedia and security perspectives,” inProc. of SPIE, Elec-tronic Imaging 2004, San Jose, USA, January 18-22 2004.

[12] M. Norris and E. H. B. Smith “Printer modeling for document imaging,”in Proc. Int’l Conf. on Imaging Science, Systems and Technology, USA,2004.

[13] T. Amano, “A feature calibration method for watermarking of documentimages,” inIEEE Proc. of the Fifth Int’l Conf. on Document Analysis andRecognition,ICDAR ’99. 20-22 Sept. 1999.

[14] D. Manolakis, V. Ingle, S. KogonStatistical and Adaptive SignalProcessing, McGraw-Hill, 2000.

[15] R. Duda, P. Hart, D. StorkPattern Classification, Wiley-Interscience,2000.

[16] S. Theodoridis, K. Koutroumbas,Pattern Recognition, AP, 2006.[17] B.Sklar Digital CommunicationsPrentice-Hall 2001.[18] B.E. Bayer, “An Optimum Method for Two-Level Renditionof Contin-

uous Tone Pictures,” inIEEE Int’l Conf. on Communications, 1973.[19] J. Cannons and P. Moulin, “Design and statistical analysis of a hash-

aided image watermarking system,” inIEEE Trans. on Image Processing,Vol. 13, Issue 10, Oct. 2004 Page(s):1393 - 1408.

[20] Xiaoqiang Li and Xiangyang Xue, “Fragile authentication watermarkcombined with image feature and public key cryptography,” in 7thInternational Conference onSignal Processing, ICSP ’04. 2004. Volume3, 31 Aug.-4 Sept. 2004

[21] S. Mori, C.Y. Suen, K. Yamamoto, ”Historical review of OCR researchand development,” inProceedings of the IEEE, Volume 80, Issue 7, July1992 Page(s):1029 - 1058.

[22] Yihong Xu and G. Nagy, “Prototype extraction and adaptive OCR,” inIEEE Trans. on Pattern Analysis and Machine Intelligence, Volume 21,Issue 12, Dec. 1999 Page(s):1280 - 1296.

11

[23] A. M. Alattar and O. M. Alattar, “Watermarking electronic text docu-ments containing justified paragraphs and irregular line spacing,” Proc.of SPIE, Volume 5306, June, 2004.

[24] P. W. Wong and N. Memon, “Secret and public key image watermarkingschemes for image authentication and ownership verification,” IEEETrans. on Image Processing, Volume 10, Issue 10, Oct. 2001.

[25] H. Yang and A. C. Kot, “Text document authentication by integratinginter character and word spaces watermarking,”Proc.IEEE Int’l Conf. onMultimedia and Expo, 2004.

[26] J.T. Brassil, S. Low, N.F. Maxemchuk, “Copyright protection for theelectronic distribution of text documents,”Proc. of IEEE, Volume 87,No. 7, pp. 1181-1196, July 1999.

[27] D. Huang and H. Yan, “Interword distance changes represented bysine waves for watermarking text images,”IEEE Trans. on Circuits andSystems for Video Technology, Volume 11, No. 12, pp. 1237-1245, Dec.2001.

[28] Min Wu and Bede Liu, “Data hiding in binary image for authenticationand annotation,”IEEE Trans. on Multimedia, August 2004.

[29] M. Mese and P.P. Vaidyanathan, Recent advances in digital halftoningand inverse halftoning methods, IEEE Transactions on Circuits andSystems I: Fundamental Theory and Applications, Vol. 49, Issue 6, June2002 Page(s):790 - 805.

[30] Q. Mei, E. K. Wong, and N. Memon,“Data hiding in binary textdocuments,” inSPIE Proc. Security and Watermarking of MultimediaContents III, Vol. 2, San Jose, CA, January, 2001.

[31] M. Jiang, E. K. Wong, N. Memon and X. Wu, “Steganalysis ofDegradedDocument Images,” inIEEE Workshop on Multimedia Signal Processing,October, 2005.

[32] Steven M. Kay,Fundamentals of Statistical Signal Processing: Estima-tion Theory, Volume I, Prentice Hall, 1993.

[33] R. Villan, S. Voloshynovskiy, O. Koval, F. Deguillaumeand T. Pun,“Tamper-proofing of electronic and printed text documents via robusthashing and data-hiding,”in Proc. of SPIE-IST Electronic Imag. 2007,Sec., Steganography, and Watermarking of Multimedia Contents IX, SanJose, USA, 2007.