Towards Image Data Hiding via Facial Stego Synthesis with ...

Towards Image Data Hiding via Facial Stego Synthesis With Generative Model

Li Dong1,2, Jie Wang1,2, Rangding Wang1,2, Yuanman Li3 and Weiwei Sun4

1Faculty of Electrical Engineering and Computer Science, Ningbo University, Zhejiang, China, 3152112Southeast Digital Economic Development Institute, Zhejiang, China, 3240003Shenzhen University, Guangdong, China, 5180614Alibaba Group, Zhejiang, China, 310052

AbstractStego synthesis-based data hiding aims to directly produce a plausible natural image to convey secret message. However,most of the existing works neglected the possible communication degradations and forensic actions, which commonly occurin practice. In this paper, we devise a generative adversarial network (GAN)-based framework to synthesize facial stegoimages. The framework consists of four components: generator, extractor, discriminator and forensic network. Specifically,the generator is deployed to generate a realistic facial stego image from the secret message and key, while the extractor aims atextracting the secret message from the stego image with the provided secret key. To combat forensics, we explicitly integratea forensic network into the proposed framework, which is responsible for guiding the update of generator. Three degradationlayers are further incorporated, enforcing the generator to characterize the communication degradations. Experimental resultsdemonstrate that the proposed framework could accurately extract the secret message and effectively resist the forensicdetection and certain degradations, while attaining realistic facial stego images.

Keywordsdata hiding, stego synthesis, generative adversarial network

1. IntroductionData hiding aims to embed the secret message into acover signal, without incurring awareness of an adver-sary. It is widely used in many applications, e.g., covertcommunication [1] and multimedia data protection [2, 3].The primitive ad-hoc Least-Significant Bit (LSB) replacesthe bit in least significant bit-plane of each pixel withthe secret bit. While the modern data hiding methods at-tempt to eliminate the traces of data hiding action and im-prove the steganographic capacity. For example, content-adaptive steganography [1] designed sophisticated dis-tortion function according to prior knowledge and usedSyndrome-Trellis coding to embed the secret message.Recently, neural network-based data hiding is becomingone of the active research directions. Baluja [4] employedconvolutional neural networks to hide an entire secretimage into the cover image in an end-to-end fashion. Thework SSGAN [5] attempted to exploit GAN to synthesizea cover image which is more suitable for the subsequentsteganographic data embedding. ASDL-GAN [6] inte-grated the content-adaptive steganography and GAN, inwhich the generator was able to produce the modifica-

International Workshop on Safety & Security of Deep Learning, 21st-26th August, 2021Envelope-Open [email protected] (L. Dong); [email protected](J. Wang); [email protected] (R. Wang);[email protected] (Y. Li); [email protected](W. Sun)

© 2021 Copyright for this paper by its authors. Use permitted under CreativeCommons License Attribution 4.0 International (CC BY 4.0).

CEURWorkshopProceedings

http://ceur-ws.orgISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)

tion probability maps. For the methods HayersGAN [7],HiDDeN [8] and SteganoGAN [9], they all designed anencoder-decoder alike framework based on GAN. Thesemethods could automatically learn the suitable areas forembedding the secret bitstream message.For the last several years, the adversarial examples

to neural networks meet data hiding, and continuouslydrawing extensive attention from the community. Somestudies, e.g., [10, 11], found that adding slight pertur-bations to the input data would paralyze the predictioncapability of learning-based classifiers. As the opponentof data hiding, steganalysis aims to expose the data hidingon stego signal and usually involves machine-learningclassifiers. Therefore, it is possible for data hiding meth-ods to bypass steganalysis by borrowing some strategiesfrom the adversarial examples-related works. Tang et al.[12] presented the Adversarial Embedding (ADV-EMB)method that adjusts the modification cost of image ele-ments, according to the gradients that back-propagatedfrom the target steganalytic neural network. The con-structed adversarial stego could effectively fool the ste-ganalytic network, revealing the vulnerability of the deeplearning-based steganalyzer.Note that, all aforementioned data hiding techniques

are based on the cover modification. The common char-acteristic is that these methods can not be independent ofthe modification on the given cover image. As such, it in-evitably leaves artifacts exposing to steganalysis. On thecontrary, stego synthesis-based data hiding, e.g., [13, 14],refers to synthesizing the stego image directly from the

mailto:[email protected]





https://creativecommons.org/licenses/by/4.0

http://ceur-ws.org

http://ceur-ws.org

secret message. It could pose more challenges for ste-ganalysis. Under this concept, traditional methods triedto produce stego image based on some hand-crafted des-ignations. Although the capacity was relatively higher,they were limited to synthesizing patterned images, suchas textures and fingerprints. As an alternative solution,some methods [15, 16] use GAN to synthesize stego im-ages with rich semantics, e.g., face and food. However,the accuracy of message extraction was unsatisfactoryunder image degradations. Moreover, the synthesizedstego images can be easily identified by a well-trainedforensic detector. It is thus urgent to further improvethe robustness of message extraction and anti-forensiccapability of stego synthesis-based data hiding methods.

In this work, we propose a Facial Stego Image Synthe-sis method for data hiding with GAN, which is termed asFSIS-GAN. Unlike the cover modification-based data hid-ing methods, FSIS-GAN is designed without providinga cover image beforehand. Compared with the existingstego synthesis-based methods, FSIS-GAN can not onlysynthesize realistic facial stego images, but also achievesuperior performance in terms of robustness and anti-forensic capability. Experimental results conducted onthe public facial dataset validate such merits of our pro-posed method. The main contributions of this work canbe summarized as follows,

• We explicitly consider the image degradation dur-ing the covert communication, and integrate mul-tiple degradation layers into the framework. Thisboost the robustness performance in terms of themessage extraction.

• We incorporate a forensic network during train-ing FSIS-GAN. By exploiting the gradients fromsuch a forensic network, the stego image pro-duced by the learned generator could effectivelyfool the forensic network.

• We explicitly adopt the secret key into the datahiding procedure of FISI-GAN, which could fur-ther improve the reliability of the secret messageextraction.

The rest of this paper is organized as follows. SectionII briefly reviews the related work on stego synthesis-based data hiding. Section III describes the proposed FSIS-GAN, including network architecture and loss function.Section IV presents the experimental results, and the finalconclusions are drawn in Section V.

2. Stego Synthesis-based DataHiding

The majority of data hiding method involves the modifi-cation on the given cover images. However, such cover

Adversarial Training Anti-Forensics

Facial Stego Image Synthesis and Message Extraction

Genuine image 𝐈

Secretmessage 𝐦

Secret key 𝐤

Degraded stegoimage 𝓝(𝐒)

Synthesized stego image 𝐒

Extracted message 𝐦′

10101···11

10101···0110010···01

Discriminator 𝓓

Degradation layers 𝓝Generator 𝓖

Forensic network 𝓕𝜽

Extractor 𝓔

Figure 1: Overview of the proposed FSIS-GAN framework.

modification would leave embedding traces that can bedetected. To resist the detection by steganalyzer, stegosynthesis-based data hiding method could directly pro-duce the stego images from the given secret message.For early attempts, Wu et al. citewu2014steganographyproposed a texture image synthesis-based method, whichselectively distributes the source patches of the originaltexture image onto the synthesized stego image. Themessage hiding and extraction depend on the choice ofsource patches. Motivated by the fingerprint biomet-rics, Li et al. [14] proposed to use the hologram phaseconstructed from the secret message to synthesize fin-gerprint stego image. The hologram phase consists oftwo phases: The first spiral phase encodes the secretmessage to the two-dimensional points with different po-larities, and the second continuous phase is to synthesizefingerprint images. It is worth noting that conventionalstego image synthesis-based methods can only synthe-size patterned stego image such as textures, lacking richsemantics, which limits their practical applications.Instead, Hu et al. [15] suggested using the genera-

tor of GAN to synthesize a facial stego image from thesecret message. Meanwhile, the secret message can beextracted from the stego image by the corresponding ex-tractor network. Similarly, Zhang et al. [16] exploitedGAN to generate stego image with different semanticlabels, which could improve the robustness of data ex-traction but significantly scarifying the steganographiccapacity. The main advantage of the GAN-based works isthat they could synthesize stego images with rich seman-tics. However, we shall note that stego images can beeasily identified by some well-trained forensic networks.In addition, there is no trade-off between capacity andextraction accuracy.

3. Facial Image Data Hiding viaGenerative Stego Synthesis

In this section, we first give an overview of the proposedFSIS-GAN framework and then introduce each compo-nent of the framework, accompanied with thorough dis-cussion on the loss function, network structure and train-

ing procedure.

3.1. Overview of FSIS-GANThe proposed FSIS-GAN framework is illustrated in Fig-ure 1. In general, it is an end-to-end framework consist-ing of three parts, where each part is designed to achievea specific goal. First, the part of facial stego image syn-thesis and message extraction contains a generator G, anextractor E and the degradation layers N. The generatorG is deployed to convert the secret message along withthe secret key into a facial stego image. The degradationlayers N are used to simulate possible common imagedegradations within the communication channel. Theextractor E is learned to recover the secret message fromthe degraded stego image. Second, there is a discrimina-tor D in the part of adversarial training, which aims atdistinguishing the genuine data sample from the ones pro-duced by the generator G. Third, a well-trained existingforensic network F𝜃 (parameterized by 𝜃) is introducedin the part of anti-forensics, which could distinguish thegenuine from the synthesized facial stego image. Notethat this target forensic network is treated as a fixedadversary, and its network parameters are always frozen.

3.2. Stego Image Synthesis and MessageExtraction

The part of facial stego image synthesis and messageextraction achieve two functionalities. First, by usingthe generator G, one can convert the given secret mes-sage into a facial stego image. Second, the extractor E isresponsible for extracting the secret message from theinput stego image. Furthermore, a secret key is intro-duced to ensure the communication reliability and highdiversity of the generated facial stego image.Generally, generator G and extractor E aim to learn

two mappings, i.e., mapping the given secret messageinto a stego image, and vice versa. More formally, letm ∈ {0, 1}𝑙𝑚 and k ∈ {0, 1}𝑙𝑘 be the binary secret messageand the secret key, respectively. Generator G is intendedto learn the first mapping, transforming the message malong with the secret key k into a stego image:

S = G(m,k), (1)

where S denotes the synthesized facial stego image ofshape 𝐶 × 𝐻 ×𝑊. To recover the secret message, we nextintroduce the extractor E. Considering that the facialstego image S may be degraded during transmission, thesecondmapping should be from the degraded stego imagealong with the secret key k to the secret message, whichcan be expressed by

m′ = E(N(S),k), (2)

where N(⋅) models the image degradation process, andN(S) is the degraded stego image. Here, m′ ∈ (0, 1)𝑙𝑚denotes the extracted secret message. It shall be notedthat the extracted messagem′ shall be (approximately)equals the original secret messagem, and thus one canemploy error correcting mechanism to fully correct theerroneous bits.

To measure the distortion between the original secretmessage m and the extracted message m′, we use thecross-entropy loss to calculate the message extraction lossLE, which is given by

LE(m,m′) = − 1𝑙𝑚

𝑙𝑚∑𝑖=1

[𝑚𝑖log(𝑚′𝑖 ) + (1 − 𝑚𝑖)log(1 − 𝑚′

𝑖 )],

(3)where𝑚𝑖 and𝑚′

𝑖 is 𝑖-th element ofm andm′, respectively.Note that, our proposed FSIS-GAN framework explic-

itly receiving a secret key as an input, which is designedto satisfy the Kerckhoffs’ principle. It means that eventhe extractor E network is completely exposed to an at-tacker, the secret message m will be recovered only ifthe receiver obtain both the secret key k and the facialstego image S. It is worth emphasizing that, for most ofthe existing GAN-based methods, e.g., [15, 16], there isno involvement of a secret key. Further notice that asthe input of the extractor E, the dimensions of secret keyk is greatly smaller than that of the facial stego imageS. Thus, the extractor E tends to discard the secret keybecause it carries much less information. To mitigate thisissue, we propose to use randomly generated incorrectsecret key k ∈ {0, 1}𝑙𝑘 , where k ≠ k, as input during train-ing stage. Instead of directly using the correct secret keyand minimize the difference between the extracted andoriginal message, we maximize the differences betweenthe extracted and original message when applying incor-rect secret key. Mathematically, the loss term inverse lossLE, can be expressed by the negative cross-entropy loss:

LE(m, m′) = 1

𝑙𝑚

𝑙𝑚∑𝑖=1

[𝑚𝑖log(��′𝑖 ) + (1 − 𝑚𝑖)log(1 − ��′

𝑖 )], (4)

where ��′𝑖 is the 𝑖-th element of the extracted message m′

with the incorrect key k, i.e., m′ = E(N(S), k).Enhancing robustness with degradation layers:

In a practical communication channel, there often ex-ists degradations on the synthesized stego image S, whentransmitting the stego to a receiver. To this end, the datahiding system requires certain robustness to ensure theaccuracy of message extraction. Therefore, in this work,we take three representative degradations into account,i.e., image noise pollution, blurring, and compression.For noise pollution, we consider the one of the mostwidely-used noise models: Gaussian noise. For blurring,the Gaussian blurring is used. For signal compression,JPEG image compression is employed, which is exten-sively used for reducing the bandwidth of transmission

process. In experiments, we implement these three typesof degradation as neural network layersN to degrade thestego image. Specifically, three network layers are usedfor simulating each type of degradation. Gaussian noiselayer (GNL) is to add Gaussian noise to the facial stegoimage S. Gaussian blurring layer (GBL) blurs S. For JPEGcompression, considering that the quantitation operationis non-differentiable, we approximate the quantizationoperationwith a differentiable polynomial function. Suchdifferentiating technique can also be referred to the workHiDDeN [8].

3.3. Adversarial Training PartAs aforementioned, the hand-crafted stego synthesis-based data hiding methods [13, 14] only could synthesizepatterned images such as texture and fingerprint, limitingtheir practical applications. Synthesizing a natural imagewith semantics is a challenging task. However, this prob-lem can be alleviated with the guidance of adversarialtraining. In this part, the purpose of the discriminatorD is to conduct adversarial training with the generatorG and improve the plausibility of the synthesized facialstego images.

More specifically, let I be the genuine facial image sam-ple of shape 𝐶 × 𝐻 × 𝑊 from a publicly available genuinefacial image dataset. The discriminator D estimates theprobability that a given image sample belonging to a syn-thesized by the generator G. The generator G attempts tofool the discriminatorD. Through such adversarial train-ing, the generator G is encouraged to synthesize muchmore realistic facial stego images. As a variant of GAN,the network structure and loss function of BEGAN [17]provides a good reference for improving training stability.Thus, we in this work employ the adversarial trainingloss used in BEGAN. Mathematically, the adversarial lossLadv for the generator G can be calculated as

Ladv(D(S), S) = 1𝐶𝐻𝑊

[|D(S) − S|], (5)

where the shape of outputD(S) is same as the facial stegoimage. The adversarial loss LD for the discriminator Dis

LD(I, S) = 1𝐶𝐻𝑊

[|D(I) − I| − ℎ𝑡 ⋅ |D(S) − S|], (6)

where ℎ𝑡 controls the discrimination ability of D in the𝑡-th training step to equilibrate the adversarial training.It can be computed as

ℎ𝑡+1 = ℎ𝑡 +𝜆

𝐶𝐻𝑊[𝛾|D(I) − I| − |D(S) − S|]. (7)

Here the parameter 𝜆 is the learning rate of training,and 𝛾 is a hyper-parameter to control the diversity ofsynthesized facial images. The quality and diversity ofthe facial stego images can be freely adjusted by tuningthe parameter 𝛾.

3.4. Anti-forensics PartRemind that there is no explicit cover images involved instego synthesis-based data hiding methods. This meritmakes such type of data hiding method could effectivelyresist to conventional steganalysis detection. However,as pointed in [15], a well-trained forensic network couldreadily distinguish a synthesized stego image from thegenuine one, even the synthesized stego image is of noperceptual differences to an observer.

Although F𝜃 is an expert in such a detection task, somestudies [10, 11] have shown that deep neural network-based classifiers are vulnerable to adversarial examples.Inspired by this, we propose to apply strategies of ob-taining adversarial examples to evade the stego detectionnetwork as a way for realizing anti-forensics. In FSIS-GAN framework, we consider a white-box scenario, i.e.,assuming one has full knowledge of the target foren-sic network. The target forensic network F is trainedwith the genuine images from a publicly available facialdataset and the synthesized images that produced by BE-GAN [17]. Then, we integrate the well-trained F𝜃 intothe FSIS-GAN framework, in which F𝜃 receives the syn-thesized facial stego image S and output the confidence.The gradients that back-propagated by the F𝜃 are usedto update the parameters of the generator G. To measurethe loss of resisting forensic detection, we define the anti-forensic loss LF𝜃 to computes the cross-entropy betweenthe output of F𝜃 and our target genuine image label:

LF𝜃(S) = − log (1 − F𝜃(S)), (8)

where F𝜃(S) ∈ (0, 1) is the confidence output by F𝜃.Clearly, the decrement of LF𝜃 indicates the probabilityincrement of S being identified as a genuine image by F𝜃.

3.5. Network Structure and TrainingStrategy

The network architecture of the generator G and theextractor E are shown in Figure 2. For generator G, thesecret key vector k is first concatenated to the secretmessage vector m and then fed to subsequent layers.Then, G applies two fully-connected (FC) layers and threeconvtranspose (ConvT) layers to produce the facial stegoimage S. In particular, after each FC layer or ConvTlayer, we apply batch normalization (BN) [18] and ReLUactivation function to process intermediate vectors. Inexperiments, we found that both m and k are composedof binary number 0 or 1, and such form is not suitableas input and the adversarial training loss would diverge.To solve this issue, additional BN layers were added, andnormalization operation is carried out inside the network.Experiential results show that this trick could greatlyalleviate the divergence problem.

!"#$"%

&"''()"*!

!+,%-"'./"0*

'%")1 .&()"*#

!"#$"%

3"+*"

!"0!

!"#"

!"$0" % 0!&

!"!'()

!"#$%&'#%&'

(!

!"#)*+,-

+.'/0

(!

!"#)*+,-

+.'/0

!"#)*

+*%#1

*)"+

,"-

,

!*"+

("-

(

.("+

)"-

)

/"+"-

(a) Network structure of the generator G

!"#$%&'#%&'

!"#"$

(!)*+,-"+.

(!

!"/!

&"#"$ '& ( !)"#"$

*!+"#

!,"$

!,

!"#$"%

&"'(!

!')%*"+,-".(

+%"3/ ,123"(%

45%$2#%".

1"++23"(#$

,-"#

+"$

+

!+."#

-"$

-

+*,"#

."$

.

!"#/)0'12!"#/)0'12!"#/)0'12!"#/)0'12

!"/"

(b) Network structure of the extractor E

Figure 2: Network structure of the generatorG and the extrac-tor E. “Concat”, “FC”, “ConvT”, “BN”, “Conv” denote the con-catenation, fully-connected layer, convtranspose layer, batchnorm, and convolution layer, respectively.

For extractor E, we shall ensure the secret key vectork and the facial stego image matrix S in a way suchthat the extractor E would not neglect the informationprovided by the secret key. To this end, the extractorE first applies FC layer to the secret key to form theintermediate matrix, i.e., 1×𝑊 ×𝐻. Then, the facial stegoimage S and the intermediate matrix are concatenated,and then feed the fused tensor to the four convolutional(Conv) layers. Finally, the extractor E applies the FC layerand Sigmoid activation function to produce the messagevector m′ (or m′) with size of 1 × 𝑙𝑚.

For the discriminator D, we adopt the auto-encoderalike structure from BEGAN [17]. For the target forensicnetwork F, we use Ye-Net [19], which is a widely-usedsteganalytic method.

The training process of the proposed FSIS-GAN frame-work is iteratively optimize the loss function of eachnetwork, except the well-trained forensic network F𝜃.We apply the extraction loss LE and the adversarial lossLD as the loss function for the extractor E and the dis-criminator D, respectively. In particular, The total lossLG for the generator G is a proper fusion of the fourlosses aforementioned as follows

LG = Ladv + 𝛼(LE + LE) + 𝛽LF𝜃 , (9)

where Ladv is the adversarial loss for G, LEis the in-

verse loss, and LF𝜃 is the anti-forensic loss. The hyper-parameters of 𝛼 and 𝛽 are used to control the relativeimportance among the four losses.

4. Experiment resultsIn this section, we first introduce the experimental setup.Then, to verify the robustness of our proposed FSIS-GAN,it is evaluated under image degradation and withoutdegradation, respectively. Finally, the anti-forensic capa-bility of FSIS-GAN is validated.

4.1. Experimental SetupOur experiments are conducted on the CelebA dataset[20], where the region with face is identified and ex-tracted. All images are reshaped into 3 × 64 × 64. Thefollowing three metrics are used for evaluation:

• Fréchet Inception Distance (FID) [21], whichis a widely-used perceptual image quality assess-ment metric for synthesized images. FID is a defacto metric for assessing the image quality cre-ated by generator of GANs’. Lower FID scoreindicates better consistency with human’s per-ception on natural images.

• Accuracy of message extraction (ACC) that iscomputed byACC = 𝐿Ext

𝐿 , where 𝐿Ext is the lengthof correctly extracted message and 𝐿 is the lengthof secret message m.

• Probability of missed detection (PMD). Thismetric can be calculated by PMD = 𝐹𝑁

𝐹𝑁+𝑇𝑃 , where𝐹𝑁 (False Negative) is the ratio for case “synthe-sized facial image is misclassified as a genuineone”, and 𝑇𝑃 (True Positive) is the ratio for case“synthesized facial image is correctly detected”.Larger PMD indicates higher resisting ability tothe forensic network.

The proposed FSIS-GAN framework is implementedwith PyTorch and train on four NVIDIA GTX1080TiGPUs with 11GB memory. The number of trainingepochs is set to 400 with a mini batch-size of 64. Weuse Adam [22] as the optimizer with a learning rate of2 × 10−4. For the hyper-parameters 𝛼 and 𝛽 in (9), witha number of trials and errors, we empirically set themas 0.1 in experiments. The parameter 𝛾 in (7) is set to0.7, which is expected to produce reasonably diverse fa-cial stego images. The competing method is the mostrelated work [15]. We implement this work by ourselvesbecause there is no publicly available code. With certaintweaking and fine-tuning, the tested results were com-parable to the originally reported data from [15]. For afair comparison, the length of the secret message 𝑙𝑚 andthe secret key 𝑙𝑘 are all set to 100, so as to the payload isidentical to that of work [15].

Figure 3: Comparison of exemplar synthesized stego images.Top: Hu et al. [15]; Bottom: Proposed FSIS-GAN-WD.

4.2. Performance Without DegradationsNotice that the competing method [15] does not considerthe image degradations. To verify the effectiveness ofthe proposed method under same settings and make afair comparison. We in this subsection to evaluate theperformance without degradation layers N. The facialstego image S will be transmitted to extractor E withoutany degradation. To avoid confusion, this variation ofour proposed method is termed as FSIS-GAN-WD (WD isabbreviated for Without Degradations). We first comparethe visual quality of the facial stego images with thecompetingmethod [15]. As can be seen from Figure 3, theproposed FSIS-GAN-WD could synthesize more realisticfacial stego images in comparison with Hu et al. [15].With more careful inspection, one can notice that thestego images produced by FSIS-GAN-WD are more vividand with more correct semantic structures. It is difficultfor a common human to aware the inauthenticity of thefacial stego images synthesized by FSIS-GAN-WD. Incontrast, the stego images generated by Hu et al. [15] aretypically blurry and severely distorted, which apparentlydraw attentions from a forensic analyzer. For the FIDevaluation experiment, we use 10, 000 pairs of genuineimages and synthesized facial stego images to computethe FID score. The FID score of FSIS-GAN-WD is 23.20,which is much smaller than that of Hu et al. [15]’s 32.07.

Then, we evaluate the extraction accuracy for the caseof without degradation. The results are tabulated in Table1. To demonstrate the impact of the inverse loss L

Eon

the extraction accuracy, the ablation experiments are alsoconducted, by excluding the inverse loss during training.This L

E-ablated version is denoted as FSIS-GAN-WD (ex

LE). From the Table 1, one can draw the following con-

clusions. First, the extraction accuracy of FSIS-GAN-WDwith the correct secret key k is 98.76%, which dramati-cally outperforms 85.23% of the competing method [15].Second, by comparing FSIS-GAN-WD and FSIS-GAN-WD (ex L

E), one can see that, the extraction accuracy of

FSIS-GAN-WD with a correct secret key k slightly infe-rior to that of FSIS-GAN-WD (ex L

E). This suggests that

the introduced inverse loss would marginally harm theextraction accuracy. However, when comparing the caseof incorrect key k, the participation of the inverse loss L

E

Table 1Comparison of message extraction accuracy (%) for the caseof no communication degradations. Here, k and k denotethe correct and incorrect secret key, respectively. FSIS-GAN-WD is a variant of the proposed method by excluding thedegradation layers, and FSIS-GAN-WD (exLE) represents theFSIS-GAN-WD trained without inverse loss LE.

SchemeHu et al.

[15]

FSIS-GAN-WD FSIS-GAN-WD (ex LE)

with k with k with k with kAccuracy 85.23 98.76 71.50 99.41 97.01

would significantly deduce the extraction accuracy from97.01% to 71.50%, while FSIS-GAN-WD almost retainsthe same extraction accuracy. This phenomena meansthat the involvement of the secret key will not work ifwe exclude the inverse loss. In contrast, FSIS-GAN-WD(ex L

E) with the incorrect key k still attains a quite high

extraction accuracy of (> 97%). In a short summary, with-out the inverse loss L

E, the variant FSIS-GAN-WD (ex

LE) will violate the Kerckhsoffs’ principle.

4.3. Performance With DegradationsIn this subsection, we test the robustness performanceof the proposed framework under certain image degra-dations. The image degradation type and level are givenas prior knowledge. This scenario is common in practicebecause one can obtain some prior knowledge on thedegradation through probing the communication chan-nel. Thus, one can fix the degradation layers N and itsassociated parameters during training stage. Specifically,in our experiments, the standard deviation 𝜎1 of the Gaus-sian noise layer (GNL) is set to 0.2. The kernel width 𝑑and the standard deviation 𝜎2 of the Gaussian blurringlayer (GBL) are set to 3 and 1, respectively. The differ-entiable JPEG compression layer (JCL) is implementedas suggested by the work HiDDEN [8] For referring sim-plicity, this variation is termed as FSIS-GAN-FD (FD isabbreviated for Fixed Degradation) in the sequel.

Firstly, the stego images synthesized by FSIS-GAN-FDare provided in Figure 4. One can observe that somespeckle noises emerge in the generated stego images,which can be clearly seen from the highlighted regionswith red line in Figure 4 (b). Quantitatively, the FID scoreof FSIS-GAN-FD is 41.40, which is inferior to that ofFSIS-GAN-WD (23.20) and Hu et al. [15] (32.07). Never-theless, the stego images produced by FSIS-GAN-FD areintuitively more realistic than that of Hu et al. [15].

Secondly, in Table 2, we report the extraction accuracyperformance under fixed degradations. Not surprisingly,one can notice that the extraction accuracy of Hu et al.[15] and FSIS-GAN-WD greatly degrade, which can beattributed to the overlooking on degradation-resistantmessage extraction issue. In contrast, FSIS-GAN-FD ex-

(a) (b)Figure 4: The comparison of synthesized facial stego images,where four images of (a) are produced by FSIS-GAN-WD;images of (b) are stego images produced by FSIS-GAN-FD.With the introduction of degradation layers, minor specklenoises emerge (highlighted with red rectangular).

Table 2Comparison of message extraction accuracy (%) under variousdegradation conditions. The bold and marked value withan asterisk (*) denote the highest extraction accuracy withcorrect secret key k and the lowest extraction accuracy withthe incorrect secret key k, respectively.

SchemeHu et al.

[15]

FSIS-GAN-WD FSIS-GAN-FD

with k with k with k with k

W/o degradation 85.23 98.76 71.50∗ 98.22 72.08Fixed GNL 52.72 59.78 56.23∗ 95.58 72.74Fixed GBL 69.68 57.52 54.68∗ 98.58 73.78Fixed JCL 65.33 61.38 58.00∗ 98.46 72.67

hibits quite promising results. Under three types of degra-dation layers, the extraction accuracy typically exceeds94% (though lower than that of FSIS-GAN-WD, which isspecifically designed for the non-degradation scenario).The results verify that for the case of known degrada-tions, the proposed framework could learn to effectivelyresistant the fixed degradations, by employing the fixeddegradation layers during the training.Finally, to illustrate how the robustness of message

extraction changes under different degradation levels,we test different degradation types with a variety ofdegradation levels. Due to space limit, we only reportthe JPEG compression degradation in Figure 5. As canbe seen, with the decrement of quality factor (𝑄𝐹), theextraction accuracy generally decreases. Although theJCL that adopted from HiDDEN [8] could handle non-differentiable JPEG compression, it cannot perfectly re-produce the JPEG compression artifacts. Nevertheless,FSIS-GAN-FD still achieve superior robustness, whencomparing with other two schemes.

4.4. Performance of Anti-forensicsRecall that, owing to that no cover images are involve-ment for data hiding, our method has a relatively goodundetectability when exposed to a steganalyzer. How-

95 75 55 45 15 5Compression quality factors (QF)

50

60

70

80

90

100

Message extraction accuracy

Hu et al.FSIS-GAN-WDFSIS-GAN-FD

Figure 5: Comparison of the message extraction accuracy (%)under various levels of JPEG compression degradation.

ever, as pointed in [15], a well-trained forensic networkcan effectively identify a synthesized image. To solve thisissue, we explicitly considered the anti-forensics scenarioand introduce the anti-forensic loss LF𝜃 .

To demonstrate the influence of anti-forensic loss LF𝜃 ,we conduct the ablation experiment by excluding the losstermLF𝜃 , and thus this variant is termed as FSIS-GAN (exLF𝜃). For a concrete example, we employ thewell-trainedforensic network Ye-Net [19] F𝜃 to detect 3000 facialstego images produced by different methods, and recordthe probability of missed detection (PMD). The PMD’sof Hu et al. [15], FSIS-GAN (ex LF𝜃), and FSIS-GAN are3.23%, 8.84%, and 89.91, respectively. As clearly shown,for FSIS-GAN (ex LF𝜃), despite the facial stego imageslook natural for human, they are easily exposed to theforensic network, where the PMD value is lower than 10%.In contrast, by introducing the anti-forensic loss term, thevalue of PMD of FSIS-GAN could reach 89.91%. Thismeans the proposed method FSIS-GAN could effectivelybypass the existing forensic network, retaining an niceanti-forensic capability.

5. ConclusionIn this work, we proposed a stego-synthesis based datahiding method using generative neural network, byexplicitly considering the image degradation and anti-forensic need. Specifically, the generator is to synthe-size a facial stego image from the given secret messageand secret key. The extractor aims to recover the secretmessage with the secret key. Through the adversarialtraining with the discriminator, the generator could pro-duce realistic facial stego images. The degradation layersare introduced during the training, which significantlyenhance the robustness of message extraction. A forensicnetwork is incorporated during training, in response tothe possible adversarial forensic analysis in communi-cation channel. Experimental results verified that, ourapproach could generate more natural facial stego im-ages, while retaining higher message extraction accuracyand nice anti-forensic ability.

AcknowledgmentsThis work was supported in part by the National Natu-ral Science Foundation of China under Grant 61901237,in part by the Open Project Program of the State KeyLaboratory of CADCG, Zhejiang University under GrantA2006, and in part by the Ningbo Natural Science Foun-dation under Grant 2019A610103. Thanks to SoutheastDigital Economic Development Institute for supportingthe computing facility.

References[1] V. Sedighi, R. Cogranne, J. Fridrich, Content-

adaptive steganography by minimizing statisticaldetectability, IEEE Trans. Inf. Forensics Security 11(2015) 221–234.

[2] J. Zhou, W. Sun, L. Dong, X. Liu, O. C. Au, Y. Y. Tang,Secure reversible image data hiding over encrypteddomain via key modulation, IEEE Trans. CircuitsSyst. Video Technol. 26 (2015) 441–452.

[3] L. Dong, J. Zhou, W. Sun, D. Yan, R. Wang, Firststeps toward concealing the traces left by reversibleimage data hiding, IEEE Trans. Circuits Syst. II, Exp.Briefs 67 (2020) 951–955.

[4] S. Baluja, Hiding images within images, IEEE Trans.Pattern Anal. Mach. Intell. 42 (2020) 1685–1697.

[5] H. Shi, J. Dong, W. Wang, Y. Qian, X. Zhang, SS-GAN: secure steganography based on generativeadversarial networks, in: Pacific Rim Conferenceon Multimedia, 2017, pp. 534–544.

[6] W. Tang, S. Tan, B. Li, J. Huang, Automatic stegano-graphic distortion learning using a generative ad-versarial network, IEEE Signal Process. Lett. 24(2017) 1547–1551.

[7] J. Hayes, G. Danezis, Generating steganographicimages via adversarial training, in: Proc. Adv. Neu-ral Inf. Process. Syst., 2017, pp. 1954–1963.

[8] J. Zhu, R. Kaplan, J. Johnson, F. Li, HiDDen: Hid-ing data with deep networks, in: Proc. Eur. Conf.Comput. Vis., 2018, pp. 657–672.

[9] K. A. Zhang, A. Cuesta-Infante, L. Xu, K. Veera-machaneni, SteganoGAN: High capacity im-age steganography with GANs, arXiv preprintarXiv:1901.03892 (2019).

[10] C. Szegedy,W. Zaremba, I. Sutskever, J. Bruna, D. Er-han, I. Goodfellow, R. Fergus, Intriguing propertiesof neural networks, arXiv preprint arXiv:1312.6199(2013).

[11] I. J. Goodfellow, J. Shlens, C. Szegedy, Explain-ing and harnessing adversarial examples, arXivpreprint arXiv:1412.6572 (2014).

[12] W. Tang, B. Li, S. Tan, M. Barni, J. Huang, CNN-based adversarial embedding for image steganogra-

phy, IEEE Trans. Inf. Forensics Security 14 (2019)2074–2087.

[13] K. Wu, C. Wang, Steganography using reversibletexture synthesis, IEEE Trans. Image Process. 24(2014) 130–139.

[14] S. Li, X. Zhang, Toward construction-based datahiding: From secrets to fingerprint images, IEEETrans. Image Process. 28 (2018) 1482–1497.

[15] D. Hu, L.Wang,W. Jiang, S. Zheng, B. Li, A novel im-age steganography method via deep convolutionalgenerative adversarial networks, IEEE Access 6(2018) 38303–38314.

[16] Z. Zhang, G. Fu, R. Ni, J. Liu, X. Yang, A genera-tive method for steganography by cover synthesiswith auxiliary semantics, Tsinghua Science andTechnology 25 (2020) 516–527.

[17] D. Berthelot, T. Schumm, L. Metz, BEGAN: Bound-ary equilibrium generative adversarial networks,arXiv preprint arXiv:1703.10717 (2017).

[18] S. Ioffe, C. Szegedy, Batch normalization: Acceler-ating deep network training by reducing internalcovariate shift, arXiv preprint arXiv:1502.03167(2015).

[19] J. Ye, J. Ni, Y. Yi, Deep learning hierarchical repre-sentations for image steganalysis, IEEE Trans. Inf.Forensics Security 12 (2017) 2545–2557.

[20] Z. Liu, P. Luo, X. Wang, X. Tang, Deep learningface attributes in the wild, in: Proc. IEEE Int. Conf.Comput. Vis., 2015.

[21] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler,S. Hochreiter, Gans trained by a two time-scaleupdate rule converge to a local nash equilibrium,in: Proc. Adv. Neural Inf. Process. Syst., 2017, pp.6629–6640.

[22] D. P. Kingma, J. Ba, Adam: A method for stochas-tic optimization, arXiv preprint arXiv:1412.6980(2014).

Towards Image Data Hiding via Facial Stego Synthesis with ...

Documents

Transcript of Towards Image Data Hiding via Facial Stego Synthesis with ...