Abstract

The traditional reversible data hiding technique is based on cover image modification which inevitably leaves some traces of rewriting that can be more easily analyzed and attacked by the warder. Inspired by the cover synthesis steganography-based generative adversarial networks, in this paper, a novel generative reversible data hiding (GRDH) scheme by image translation is proposed. First, an image generator is used to obtain a realistic image, which is used as an input to the image-to-image translation model with CycleGAN. After image translation, a stego image with different semantic information will be obtained. The secret message and the original input image can be recovered separately by a well-trained message extractor and the inverse transform of the image translation. The experimental results have verified the effectiveness of the scheme.

1. Introduction

Information hiding [15], also called data hiding, is an important information security technique widely used in secret transmission [6], digital copyright protection [7], and other scenarios. If we classify data hiding by the reversibility of cover image, data hiding can generally be divided into two types: irreversible data hiding (IDH) [8, 9] and reversible data hiding (RDH) [1012]. The traditional data hiding methods can be classified into the former type, while the latter type is a special technique which is mainly applied to medical, judicial, and military fields.

With the emergence and development of artificial intelligence [1316] and other new techniques, the IDH methods using deep learning models have achieved a series of breakthroughs in methods and performance and become the trend of development in this field [1720]. Among these new methods, secret data can be hidden and extracted well without any modification in the original cover image and cannot be detected by the warder (steganalysis algorithm). Comparatively, RDH with deep learning has received less attention. As far as our best knowledge, currently, no RDH method can hide data without modification. At present, RDH methods can be divided into two types: RDH in unencrypted images [2123] and RDH in encrypted images [24, 25]. Among these RDH methods, data hiding is based on cover image modification, which is more and more easily to be detected by increasingly advanced machine learning detection tools.

In [18], a new image steganography method via deep convolutional generative adversarial networks (DCGANs) is proposed. In this method, a mapping from the secret data to random noise is designed. With this mapping, a corresponding relationship between secret data and the stego image generated by the DCGAN model is obtained, and an extractor used to extract secret data is trained. This method has a strong ability to resist state-of-the-art detection tools, and it has provided great inspiration for the RDH method without modification.

Cycle-consistent generative adversarial network (CycleGAN) [26] is a newly proposed image-to-image translate model, which learns to automatically translate an image from a source domain into a target domain in the absence of paired examples. In this generative adversarial network (GAN) model, there are two generators and two discriminators. Cycle-consistency loss is defined to train the CycleGAN model. Using CycleGAN, one type of picture can be transformed into another, and this transformation is reversible. Obviously, this kind of technique can be applied to the RDH field.

In [27], a framework for RDH in encrypted images based on reversible image transformation is proposed. At first, a cover image is transformed into another target image by image transformation. Then, secret data are embedded into the transformed image, which is regarded as the encrypted image. There have been many similar methods [2830]. In this type of method, the image transformation is regarded as a special type of image encryption. And the embedding method belongs to the traditional method and essentially relies on pixel modification. However, the generative model uses neural networks to learn the data distribution rule of samples, and the generated image has strong randomness, which enhances the security of the data hiding algorithms. This advantage makes it far superior to traditional methods.

In this paper, the generative reversible data hiding (GRDH) method based on the GAN model is proposed. Learning the secret data mapping method in [18] and the image recovery method in CycleGAN, a new GRDH framework is proposed. In this framework, a cover image is generated by a noise vector, which is transformed by the secret data. Then, the cover image is transformed into a marked image by the CycleGAN model. Similar to the frame in [27], the transformed image can be regarded as a special encrypted image. In addition, a new extractor is trained to extract the secret data, which make the data hiding framework reversible. The experimental results have proved the feasibility of the proposed GRDH method.

2. DCGAN and CycleGAN

DCGAN [31] is an upgraded version of GAN. In the DCGAN model, a convolutional neural network is introduced to design a generator and discriminator. To improve the quality of generated samples and the speed of the convergence process, some changes to the structure of the original convolutional neural network have been made. With the powerful feature extraction ability of the convolutional neural network, the learning effect of the generative adversarial network has been significantly improved. The illustration of the DCGAN model is shown in Figure 1. The fake image is generated by random noise and a generator. The discriminator is designed in order to judge whether the generated image is a real image or fake image. The goal of the generator is to generate real images to deceive the discriminator. The goal of the discriminator is to separate the fake image from the real one as far as possible. In this way, the generator and discriminator constitute a dynamic game process.

CycleGAN [26] is essentially a ring network which is made up of two symmetrical GAN models. The illustration of the CycleGAN model is shown in Figure 2. On the one hand, two GAN models share two generators. On the other hand, each GAN model contains a discriminator. Thus, there are two generators and two discriminators in a CycleGAN model. Figure 3 shows the illustration of a one-way GAN model. Real images in domain A can be transformed to fake images in domain B based on a discriminator and then transformed to the recovered image of domain A by another generator.

3. Generative Reversible Data Hiding

The illustration of GRDH is shown in Figure 4.

There are two processes in GRDH: preparation process and implementation process, consisting of the following 4 phases:

Phase 1 (CycleGAN training). A generator and a restorer are generated by the CycleGAN method. With two discriminators and , two image mapping goals are achieved: and , where and are image collections.

Phase 2 (generator training). A generator is obtained by a GAN method (e.g., DCGAN or BEGAN) with the help of a discriminator .

Phase 3 (extractor training). In this phase based on the two discriminators and obtained earlier, we can achieve the transformation from random noise to image collection . Then, we train a new extractor based on the GAN technique and ensure that the generated output is as close as possible to the input .

Phase 4 (send and receive). Before data hiding, the sender sends an extractor and a restorer to the receiver. Both sides learn a mapping from secret data to noise . Corresponding to the traditional RDH methods, the image generated by and can be regarded as the cover image and marked image. Then, the sender sends the marked image to the receiver. At the receiver side, recovered image can be obtained and the embedded data can be extracted. We will go into detail about various phases in the following.

3.1. Preparation Process

The purpose of the preparation process is to train related models and prepare for data hiding, including three phases: CycleGAN training, generator training, and extractor training. The detailed steps can be described as follows:

Phase 1 (CycleGAN training). Our goal of this phase is to generate a marked image generator and a recovered image restorer . Without loss of generality, we choose the original CycleGAN model to train. Assume and denote two image collections, corresponding to the cover image and marked image, respectively. Firstly, two image databases are built: a cover image database and a marked image database . Each database contains one type of image. For example, the contains images of normal horses and the contains images of zebras. Then, the cover image in the following phases will be an image of a normal horse, and the marked image to be sent to the receiver can be an image of a zebra. In Phase 1, the training process is based on the original CycleGAN model. We apply the two adversarial losses: and , for mapping and defined in [26], and the full objective function can be described as follows:where denotes the cycle-consistency loss.

Phase 2 (generator training). Our goal of this phase is to generate a cover image generator . Without loss of generality, we can choose the original DCGAN model to train. According to the principle of the DCGAN model, a generator can be generated by the cover image database and the discriminator . Then, the mapping from random noise to image collection can be learned. The structures of the DCGAN model are introduced in [31]. Both the generator and the discriminator are CNN structures. Denote and as the real image and its distribution from the cover image database , and then the objective function to be optimized is as follows:Other unsupervised GAN models (e.g., BEGAN) can also be used for generator training.

Phase 3 (extractor training). After Phase 1 and Phase 2, two generators and have been trained. Based on these two generators and input noise , the marked image can be generated by . Our goal of Phase 3 is to train an extractor for secret data. We draw on Hu’s method [18]. The construction method of is similar to that of the discriminator in the DCGAN model, which has four convolutional layers and a fully connected layer. A leak-Relu activation function and batch normalization are used in each layer. Different from conventional CNN models, there is no pooling layer or dropout operation in the extractor. The illustration of the extractor in the GRDH method is shown in Figure 5.

If the input of is the marked image with the size 64 × 64 × 3, the output size after the first layer is 32 × 32 × 64. In the following layers, the image dimensions are halved, and the number of channels is doubled from that in the previous layer. The final output is a noise vector of 100 dimensions. Each noise value in the vector is between −1 and 1. The defined loss function for extractor training can be described as follows:

We use this loss function to train the extractor as much as possible so that its output is as close as possible to the input noise .

3.2. Implementation Process

After the preparation process, two generators and , an extractor , and a restorer are obtained. As shown in Figure 4, the sender holds and and sends and in advance by a secure channel. The above process is similar to that of key distribution in public key cryptography.

According to the steps of the preparation process, the noise vector is transformed into an image by at first and then transformed into another image by . From the view of the RDH technique, the first image can be seen as the cover image, and the second image can be seen as the marked image. In the implementation process, the only thing that the image owner needs to do is sending the marked image to the receiver. At the receiving end, the receiver can recover the cover image by the restorer and extract the noise vector by the extractor . From the view of RDH in encrypted images, the above process at the receiving end belongs to a separable scheme. It means that the receiver can not only recover the image before data extraction but also recover the image after data extraction. Beyond that, the mapping method proposed in [18] is used to realize the mapping from the secret binary bits to the noise vector. The mapping method can be described as follows.

At first, the secret binary bits are divided into several groups. Each group contains bits. For example, {110101100} is divided into the three groups {110}, {101}, and {100} when . Then, each group is mapped to a random noise with a given interval according to the following equation:where m denotes the decimal value of the group to be mapped and denotes the gap between the divided intervals. For example, and . We map every three secret bits into a random noise with the value between −1 and 1. The mapping from the group to the interval is shown in Table 1. At last, all the mapped noise is packaged into a vector. The above mapping method allows a deviation tolerance in data extraction and ensures the extraction accuracy of the secret data during the implementation process. This mapping method will be shared by both the sender and the receiver. The sender maps the secret data to the noise vector, and the receiver maps the noise vector to the secret data.

4. Experimental Results

In this section, a group of experiments is conducted to verify the effectiveness of the GRDH method proposed in this paper. These experiments consist of two parts. First, we train the GAN models and the extractor for GRDH preparation. Then, we use these trained models for GRDH to verify the feasibility of the method. In all the experiments, we generate random bits (using random.randint function in NumPy) as secret information. All images in the datasets were resized to 64 × 64 in advance for model training. All experimental results are obtained by the Lenovo graphic workstation ThinkStation P500 with NVIDIA GeForce GTX 1080 Ti GPU and 128 GB of memory.

4.1. Experimental for GRDH Preparation
4.1.1. CyclyGAN Model Training

We select the image database of horses and zebras in [32] as training samples. In the CycleGAN training stage, we set the initial learning rate to 0.0002 and the least batch size to 100. Besides, the stochastic gradient descent (SGD) [33] is selected as the optimization algorithm in model training. Some visual experimental results of CycleGAN training are shown in Figure 6.

The first line stands for the mapping from the X-domain image (horse) into the Y-domain image (zebra). The images from left to right are, respectively, the original image, the transformed image, and the reconstructed image. The second line stands for the mapping from the Y-domain image (zebra) into the X-domain image (horse). From Figure 6, it can be seen that when the model is trained for 210000 steps, the visual results of image transformation and image recovery can be acceptable in some special scenes when the demand for reversibility is not high. The visual results are related to the chosen image database and sample size.

We also use the man2woman image set [34] to train the model. The batch size is set to 100, and the random number seed is set to 1234; the initial learning rate value is set to 0.002, and the rate remains the same in the first 10,000 steps and then decays every 10,000 steps until it decays to zero (one step represents a batch of image training). Assuming that the Man image set is X and the Woman image set is Y, the adjustment parameters are set to 10.0. In training in both directions and , the first-moment parameter of the gradient descent optimizer is set to 0.5 and the number of filters of the first convolutional layer is set to 64. Figure 7 shows the image quality using the CycleGAN model after different training steps. The CycleGAN model is trained for 600,000 steps and takes about 52 hours, that is, about 11,500 steps per hour. It can be seen from the figure that the quality of the gender-converted image after the number of trainings more than 100,000 has reached an acceptable level. As the number of trainings continues to increase, the image quality of the CycleGAN model for gender conversion and image restoration is gradually improved.

4.1.2. Generator Training

Because of the high image quality of BEGAN’s production, we directly use the BEGAN model for generator training. We use the CelebA image library [35]; the batch size is set to 16; the initial base learning rate is set to 0.001, and the learning rate is gradually attenuated. Furthermore, the initial value of the parameter k0 = 0, and the initial value of the parameter γ = 2. The quality of the image generated by the BEGAN model after different training steps is shown in Figure 8. It can be seen from the figure that the quality of the generated image gradually increases with the increase of the number of training steps, and the number of steps is gradually increased to a more realistic level with the natural image.

It is worth noting that one of the important factors affecting the quality of the generated image is the type of the GAN model selected. The BEGAN model is relatively simple, the training time is low, and the image quality is relatively acceptable. Compared with the BEGAN model, although the training of the StyleGAN model takes a lot of time, the generated image is more realistic and closer to the natural image. Figure 9 shows a partial generation of the StyleGAN model based on the FFHQ image library. It can be seen that it is very close to the natural image, but even if the GPU of the NVIDIA Tesla V100 model needs to be trained for about five weeks.

4.1.3. Extractor Training

First, we generate 10,000 marked images by the generator from the trained BEGAN (random noise as the input) and CycleGAN. Then, we used these 10,000 images as training sets to train the extractor with a large number of random noise vectors. In the training procedure, the minibatch is set to 100, Adam optimization is used for training, and the learning rate is set to 0.0002. We train the extractor for 200,000 steps. The loss function value of the extractor is shown in Figure 10.

From the experimental results, it can be seen that the extractor converges rapidly and will eventually converge to a small value, which means that the output of the extractor is very close to the input noise vector. This means that we can reverse map the output of the extractor back to the secret message using data in Table 1.

4.2. Experiments for GRDH Implementation

First, the secret information bit string (generated using random.randint in NumPy) is divided into several segments (3 bits/segment, i.e., k = 3), and then the binary bit form segment is converted into a random noise form of the range [−1, 1] by formula (4). Image generation, image transformation, and image restoration were performed using the trained CycleGAN model and BEGAN model. Among them, CycleGAN model training image sets are the Man image set and Woman image set, and the training frequency is 600,000 steps; the BEGAN model training image set is CelebA image library, and the training frequency is 400,000 steps; the noise dimension is set to 100, so the amount of embedded data in the experiment was set to 300 bits (i.e., 100 × 3). The experimental results of image generation, image transformation, and image restoration in some samples are shown in Figure 11(a). Because of the limitations of image quality generated by BEGAN, the visual quality of gender conversion and image restoration is relatively low. Figure 11(b) shows some experimental results when the StyleGAN model is adopted in the image generation process.

In addition, we calculated the average value of peak signal-to-noise ratio (PSNR) for 1,000 recovered images from two GAN models separately, as shown in Table 2.

Although the average PSNR value of restored images is not high, the subjective visual effect is acceptable. Meanwhile, it is easy to see from the figure that the visual quality of gender conversion and image restoration is relatively high. Therefore, whether in the preparation phase or the implementation phase, the image visual quality is mainly affected by the type of the selected GAN model.

To test the extractor, we measured the accuracy of the extraction of secret information from 1,000 marked images. The average extraction accuracy of the extractor is 88.7%. In addition, to test the effects of parameters and on the extractor recovery accuracy rate , some further experiments are carried out. The effects of parameters and on recovery accuracy are shown in Tables 3 and 4, respectively.

From the tables, it can be seen the recovery accuracy significantly decreases with the rise of parameter and slightly increases with the rise of parameter . It is just because the smaller the parameter is, the better the correction capability of the algorithm will be. Although the extractor recovery accuracy is not perfect, this problem can be resolved by including error-correction codes in the input noise.

Because the generative steganography is to fully fit the data distribution in the natural image dataset through the training generator, this technique is very resistant to the machine learning-based steganographic analyzer, which was mentioned in [18]. In other words, it is safer than traditional modification-based steganography. Besides, because of the data hiding principle, the embedding capacity is very limited now. In the future, with the progress of the GAN model, the steganographic capacity of this scheme will gradually increase.

5. Conclusions

In this paper, a novel RDH scheme named “GRDH” based on the GAN model is proposed. Firstly, we use the GAN model to train a powerful image generator to get realistic images. This image is imported into the CycleGAN model to obtain images with different semantic information. In order to achieve message embedding, we then establish a mapping relationship between noise and messages. The extraction of the message is achieved by training an extractor to remove noise from the final dense image. The experimental results have demonstrated the effectiveness of the proposed method. Although the 100 percent reversibility cannot be achieved because of the existing performance of CycleGAN, the proposed method can achieve the first RDH scheme without cover modification. However, compared with those of the traditional methods, the embedding capacity of the proposed method is very limited, but the security is higher. In the future work, we will try to experiment with some new generative models to improve the quality of restored images and to improve the steganographic capacity of the method.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (nos. 61379152, 61403417, 61402530, and 61872384) and Shaanxi Provincial Natural Science Foundation (2014JQ8301).