Abstract

Pruning is a method of compressing the size of a neural network model, which affects the accuracy and computing time when the model makes a prediction. In this paper, the hypothesis that the pruning proportion is positively correlated with the compression scale of the model but not with the prediction accuracy and calculation time is put forward. For testing the hypothesis, a group of experiments are designed, and MNIST is used as the data set to train a neural network model based on TensorFlow. Based on this model, pruning experiments are carried out to investigate the relationship between pruning proportion and compression effect. For comparison, six different pruning proportions are set, and the experimental results confirm the above hypothesis.

1. Introduction

Model compression is a common method to transplant artificial intelligence from the cloud to the embedded terminal. Network pruning is a particularly effective compression solution for models [1, 2]. In [1, 3], Han et al. proposed a method of compression based on pruning but did not investigate the relationship between pruning proportion and compression effect. At the same time, He et al. [2] studied channel pruning for accelerating very deep neural networks, yet the pruning rate on the prediction effect is not stated. In fact, some studies of pruning methods have been carried out in recent years. However, to the best of our knowledge, there are very few studies on the relationship between the pruning proportion and the size, accuracy, and computing time which is used to make predictions. It is also the motivation of our research.

In a trained neural network model, pruning sets all parameters with values less than a specific threshold to zero. After pruning, retraining and sparsification are normally conducted, where sparsification can delete connections with the zero values to compress the size of the model [4, 5]. As an example, the two figures show the comparison before and after pruning, where Figure 1 shows the original structural diagram, and Figure 2 shows the structural diagram after pruning.

Here, based on TensorFlow, we will use MNIST as the data set to train a neural network model. TensorFlow is an open-source machine learning framework. Specifically, it is software, and users need to build mathematical models by programming in Python and other languages. These models are used in the application of artificial intelligence. MNIST data set is a handwritten data set with 60,000 handwritten digital images in the training library and 10,000 in the test library. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

In this paper, we make the hypothesis that the pruning proportion is positively correlated with the compression scale of the model but not with the prediction accuracy and calculation time. So, our research object is the preliminary relationship between pruning proportion and compression effect in the neural network model. Specifically, this paper studies the relationship from three aspects: first, the relationship between pruning proportion and model size; second, the relationship between pruning proportion and model prediction accuracy; lastly, the relationship between pruning proportion and computing time for model predictions. For the above objective, a great number of experiments are carried out to investigate the relationship between pruning proportion and compression effect, and the above hypothesis is confirmed, which is our main contribution in this paper.

The rest of this paper is organized as follows. In Section 2, the neural network model is proposed first. To test the hypothesis, an original model and an experimental plan are introduced in Section 3. Section 4 gives the experimental procedures, and Section 5 gives the experimental results and analysis. Finally, Section 6 concludes this paper.

2. Neural Network Model

A neural network is constituted by one input layer, one or several hidden layers, and one output layer, and every layer is constituted by a certain number of neurons. These neurons are interconnected, just like the nerve cells of humans. Figure 3 shows the structure of the neural network.

We assume that is the ith individual (solution) in the population. The mutation operator aims to generate mutant solutions. For each solution , a mutant solution is created by the corresponding mutation scheme. There are some classical mutation schemes listed as follows:(1)DE/rand/1:(2)DE/rand/2:(3)DE/best/1:(4)DE/best/2:where are five randomly selected individual indices between 1 and N, and is usually used. is the global best individual (solution).

The crossover operator focuses on recombining two different individuals and creates a new one. In DE, a trial solution is created based on the following crossover operation:where CR is called the crossover rate, the random value randj is in the range [0, 1], and jr is a randomly selected dimension index. As seen, Ui inherits from Vi and Xi based on the value of CR. For a large CR, most dimensions of Ui are taken from Vi. For a small CR, most dimensions of Ui are taken from Xi. For the latter case, Ui is similar to its parent Xi.

3. Design of the Experiment

3.1. Structure of the Original Model

The basic neural network structure consists of the following layers in sequence: convolutional layer, pooling layer, convolutional layer, pooling layer, and two fully connected layers [6, 7], which is shown in Figure 4. In the experiment plan, pruning is performed by default on the weight parameters of the two fully connected layers. Alternative pruning is performed on all network parameters, and the specific operations are executed by changing the command line parameters [8, 9].

3.2. Experiment Plan

The experiment is based on the TensorFlow framework and used MNIST as the dataset. An original model is trained in the beginning, and then six pruning practices with different pruning proportions are employed [10, 11]. For each pruning, retraining and sparsification are subsequently performed. When all three operations are completed on the original model, the task of pruning compression is also finished [12, 13]. Then, the data are collected and analysed for comparison (size, accuracy, and computing time for making predictions).

4. Experimental Procedures

4.1. Run Command of the Pruning Experiment

Model pruning is executed by the following command: python train.py −1 −2 −3 --train_data_dir /tmp/mnist_data --train_dir /tmp/mnist_train --variables_dir /tmp/mnist_variables --max_steps 10000 --batch_size 32 --sparse_ratio 0.9 --pruning_variable_names w_fc1, w_fc2. Table 1 specifies the parameters in this command [1416].

4.2. Pruning Effect View Command

The effects of the −1 or −2 parameters can be viewed through eval_predict_with_dense_network.py. The specific command is python eval_predict_with_dense_network.py --test_data_dir /tmp/mnist_data --checkpoint_dir /tmp/mnist_train/step_2_2 --batch_size 32 --max_steps 10. Table 2 specifies the parameters in this command [1719].

The effect of −3 sparsification can be viewed through eval_predict_with_sparse_network.py. The specific command is python eval_predict_with_sparse_network. py --test_data_dir /tmp/mnist_data --checkpoint_dir /tmp/mnist_train/step_3 --batch_size 32 --max_steps 10. Table 3 specifies the parameters in the command.

4.3. Hardware and Software Configuration of the Experiment

Pruning experiments are based on the following hardware and software parameters and versions [20, 21]:Operating system: Windows 10GPU: NVIDIA GeForce GTX 1080 Ti 11.0 GCPU: Intel(R) Core(TM) i3-4160 CPU @ 3.60 GHzMemory: 16.0 GB DDR3Disk: Lenovo SSD SL700 240GSoftware: TensorFlow-GPU 1.5.0, Python 3.6

4.4. Construction of Experimental Environment

The experiment is based on the MNIST dataset and the TensorFlow framework. The experimental environment was constructed by the following three steps [2224]:Step 1: constructing the Python environment. Directly following Anaconda and then directly adding and running Anaconda3-4.3.1-Windows-x86_64.exe.Step 2: constructing the plug-ins of NVIDIA GPU. Directly running the cuda_9.0.103_win10.exe for installation. Unzipping cudnn-9.0-windows10-x64-v7.zip and copying its contents to the folder C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0.Step 3: constructing the TensorFlow environment. Executing the installation command on the CMD commands: pip install tensorflow-gpu = = 1.5.0.

5. Experimental Results and Analysis

In this section, six different pruning proportions are employed in this experiment. The six groups of tables show the specific data of pruning proportion, model size, accuracy, and computing time for predictions.

5.1. 10% Pruning Proportion

First, the pruning proportion is set to 10%; Table 4 shows the parameters of pruning effect in the first scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.012034996 and 0.013038448. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 2,890,137 and 9,215, respectively, making exactly 10% of the parameter values of the two fully connected layers equal to 0. However, the model size after the pruning, retraining, and sparsification is 66.5 M, which is larger than the size (37.5 M) of the original model. Hence, no compression effect is achieved. In addition, compared with the original model, the accuracy does not change, and the computing time for predictions slightly increases [25, 26].

5.2. 30% Pruning Proportion

Second, the pruning proportion is set to 30%; Table 5 shows the parameters of pruning effect in the first scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.036936015 and 0.039559085. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 2,247,884 and 7,167, respectively, making exactly 30% of the parameter values of the two fully connected layers equal to 0. However, the model size after the pruning, retraining, and sparsification is 51.8 M, which is larger than the size (37.5 M) of the original model. Hence, no compression effect was achieved. Again, compared with the original model, the accuracy does not change, and the computing time for predictions slightly increases.

5.3. 50% Pruning Proportion

Third, the pruning proportion is set to 50%; Table 6 shows the parameters of pruning effect in the first scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.06429165 and 0.068891354. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 1,605,631 and 5,119, respectively, making exactly 50% of the parameter values of the two fully connected layers equal to 0. The model size after pruning, retraining, and sparsification is 37.1 M, which is slightly smaller than the size (37.5 M) of the original model. Here, compression takes effect. Besides, both accuracy and computing time for predictions slightly decrease as compared with those of the original model.

5.4. 70% Pruning Proportion

Fourth, the pruning proportion is set to 70%; Table 7 shows the parameters of pruning effect in the fourth scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.09749276 and 0.10360378. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 963,379 and 3,071, respectively, making exactly 70% of the parameter values of the two fully connected layers equal to 0. The model size after pruning, retraining, and sparsification is 22.3 M, which is smaller than the size (37.5 M) of the original model. The compression effect is obvious. Moreover, both accuracy and computing time for predictions slightly decrease as compared with those of the original model.

5.5. 80% Pruning Proportion

Fifth, the pruning proportion is set to 80%; Table 8 shows the parameters of pruning effect in the fifth scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.11903707 and 0.12662686. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 642,252 and 2,047, respectively, making exactly 80% of the parameter values of the two fully connected layers equal to 0. The model size after pruning, retraining, and sparsification is 14.9 M, which is smaller than the size (37.5 M) of the original model, and compression is 60%. Additionally, as compared with the original model, the accuracy slightly decreases and the computing time for predictions slightly increases.

5.6. 90% Pruning Proportion

Lastly, the pruning proportion is set to 90%; Table 9 shows the parameters of pruning effect in the sixth scene. In this group of experiments, the parameter threshold values of the two fully connected layers are set to 0.14814831 and 0.15710811. In this way, the valid parameter numbers are reduced from 3,211,264 and 10,240 to 321,126 and 1,023, respectively, making exactly 90% of the parameter values of the two fully connected layers equal to 0. The model size after pruning, retraining, and sparsification is 7.6 M, which is compressed by 80%. Furthermore, both accuracy and computing time for predictions slightly decreased as compared with those of the original model.

5.7. Comparison Results

Figure 5 shows the comparison results for persistence model size of the four networks, with the pruning ratio increases, and the model size represented by the red columns decreases gradually. Apparently, the pruning proportion is positively correlated with the model size.

Figure 6 shows the comparison results for testing accuracy of the four networks, with the pruning ratio increases, and the testing accuracy represented by the red columns has no obvious changes. This means that there is no positive relationship between pruning proportion and accuracy.

Figure 7 shows the comparison results for computing time of the four networks. With the pruning ratio increases, the computing time for prediction represented by the red columns changes irregularly. Also, there is no positive relationship between pruning ratio and computing time for predictions.

6. Conclusions

By comparing the experimental data of six different pruning proportions, it is found that pruning does not necessarily compress the size of the model. Compression takes effect only when the pruning proportion reaches 50% or more. Furthermore, we found a positive relationship between the pruning proportion and the model size. However, there was no positive relationship between pruning proportion and accuracy and between pruning proportion and computing time for predictions.

Since there is no specific experimental verification for other models, the conclusion does not apply to other models. Additionally, the experimental is based on the pruning method, pruning is only one of the compression methods of various models; thus, the conclusion of this study is not applicable to other compression methods [2729].

Data Availability

The data used to support the findings of this study can be accessed publicly in the website http://yann.lecun.com/exdb/mnist/.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the key project of the Natural Science Research of Higher Education Institutions in Anhui Province (grant no. KJ2018A0461); Anhui Province Key Research and Development Program Project (grant no. 201904a05020091); and a provincial quality engineering project from Department of Education Anhui Province (grant no. 2019mooc283).