Abstract

Traditional fault diagnosis methods require complex signal processing and expert experience, and the accuracy of fault identification is low. To solve these problems, a fault diagnosis method based on an improved convolutional neural network (CNN) is proposed. Based on the traditional CNN model, a variety of convolution stride modes were added to extract features of different scales of signals and expand the feature dimension. Firstly, the vibration signals were collected and grouped. Then, the data were divided into a training set and a test set and input into improved CNN for feature extraction and model training to realize fault identification. The proposed model achieved a classification accuracy of 99.3% when testing the vibration data of the armored vehicle. Finally, the proposed model was used to classify different fault types of planetary gearboxes. The gradient-weighted class activation mapping (Grad-CAM) method was used to visualize the classification weight of samples. The results showed that the classification accuracy reaches 98% under various working conditions of the planetary gearbox.

1. Introduction

The gearbox is an important part of the transmission unit of armored vehicle chassis. The gears are the key components of the gearbox. Their breakdowns will threaten the normal operation of the transmission unit and even the whole equipment. Therefore, it is of great importance to study the fault diagnosis methods of gears.

The vibration data are widely used signals since the characteristics of vibration signals will change with the appearance of faults in rotating machinery. Ma and Chu [1] proposed a novel ensemble learning approach for fault diagnosis of the rotor-bearing system. Experimental studies for rotor-bearing fault diagnosis were demonstrated for validation of the proposed method. Wang and Xiang [2] proposed a novel minimum entropy deconvolution (MED)-based CNN to classify faults in axial piston pumps. An experimental data investigation of an axial piston pump was performed to manifest the superiority of the method. Song et al. [3] proposed a simulation model-based fault diagnosis method by a combination of finite element method (FEM), wavelet packet transform (WPT), and support vector machine (SVM). Experimental investigations were performed to verify the effectiveness of the method. Jiang et al. [4] proposed an improved deep recurrent neural network (DRNN). The proposed method was verified with experimental rolling bearing data. Shao et al. [5] proposed a new deep-learning method to automatically learn the useful fault features from the raw vibration signals. The proposed method was applied to the fault diagnosis of the rotor and bearing. Wang et al. [6] proposed a novel three-stage intelligent fault diagnosis approach for practical industrial process monitoring. The proposed method was able to reliably and accurately identify different faults with less prior knowledge.

Compared with other signal processing methods, the advantage of the CNN network is that it can automatically extract fault features. By inputting the original data into the model, it can intelligently complete the whole process of feature extraction and classification, so as to achieve the purpose of fault diagnosis [7, 8]. Guo et al. [9] proposed an adaptive deep convolution neural network (ADCNN) and applied it to face recognition. The experimental results showed that the algorithm could accelerate the convergence procedure and improved the recognition accuracy. Fuan et al. [10] proposed a novel method called the adaptive deep CNN for rolling bearing fault diagnosis. The proposed method was applied to diagnose rolling bearing faults, and the results confirmed that the proposed method was more effective than other intelligent methods. Gienger et al. [11] used different fault scenarios for training a CNN with dropout regularization and achieved good classification results. Yang et al. [12] proposed a novel fault diagnosis method based on spearman rank correlation-based convolutional neural networks (SR-CNNs) for a complicated system, and the model worked well in fault diagnosis. Chen et al. [13] proposed a deep transfer convolutional neural network (DTCNN) model. The proposed model was validated on two datasets collected from motor bearings. Zhang et al. [14] proposed a new method that combined deep convolutional neural network (DCNN) and transfers’ learning (TL) for fault diagnosis to handle different fault types. Eren [15] proposed a one-dimensional convolutional neural network (1D-CNN) for a fast and accurate bearing fault detection system. Wang et al. [16] presented convolutional neural network-based hidden Markov models (CNN-HMMs) to classify multifaults in mechanical systems. Classification results confirmed the superior performance of the model. Shenfield and Howarth [17] proposed a novel dual-path recurrent neural network with a wide first kernel and deep convolutional neural network pathway (RNN-WDCNN) capable of operating on raw temporal signals to diagnose rolling element bearing faults in data acquired from electromechanical drive systems.

In recent decades, numerous outstanding studies have been carried out. For example, Jia et al. [18] proposed a fault-diagnosis algorithm based on 1D-CNN. Experiments showed that the algorithm could achieve more than 99% accuracy. Nishat Toma et al. [19] used the generated images of various fault conditions to train an appropriate CNN model. The image generated with fault signatures accurately classified multiple faults with CNN. Qu et al. [20] proposed an adaptive fault diagnosis algorithm based on 1D-CNN called ADCNN-FD. Experiments of rolling bearing datasets demonstrated that the proposed method achieved more than 99% fault recognition accuracy. Won et al. [21] presented the use of 1D-CNN for automated structural damage detection. The presented approach used a convolutional network to extract damage-sensitive features for automated structural damage identification. Yao et al. [22] proposed a novel deep learning-based gear fault diagnosis method based on sound signal analysis. Experiment results showed that the method achieved much better performance on gear fault diagnosis compared with traditional gear fault diagnosis methods. Hao et al. [23] proposed a fault diagnosis method for planetary gearbox based on deep belief networks (DBNs). The identification accuracy reached 97% under five working conditions.

Scholars’ research showed that, with the help of sensors, a large number of data were collected and used for the health management of mechanical equipment. Fault recognition of mechanical equipment based on deep learning has gradually become a research hotspot. Among various deep learning methods, CNN is one of the most widely used models in the field of mechanical equipment fault recognition. CNN can adaptively learn features from signals and has higher recognition accuracy than other methods. CNN will lose some valuable information in the process of feature extraction. To solve these problems, many researchers began to modify the CNN model. This study improved the traditional CNN model and realized the fault identification of mechanical equipment.

Although deep learning methods have developed rapidly, for researchers, the focus of the network model in feature extraction was not understood. The relationship between the input and output of the network model has low interpretability. By understanding and exploring the feature extraction process of the network model, the reliability and effectiveness of the data-driven deep learning method can be improved. Grad-CAM method is an effective research method, which can explain the part that the CNN model pays more attention to when extracting features. Selvaraju et al. [24] proposed the Grad-CAM method, which used the gradients of any target concept, flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image. Jonas et al. [25] used the Grad-CAM method to identify which electroencephalogram (EEG) features were used by the CNN network to classify an EEG epoch as a favorable or unfavorable outcome. Up to date, there are still several problems in the research work, such as complex processes of feature extraction and model construction, strong dependence on a sample set size, and working conditions. To solve these problems, this study proposes an improved CNN model, which is applied to fault identification of armored vehicles. Grad-CAM method is used to visualize the weight vectors of convolutional layers and locate the signal segment of interest to the model. The proposed model has been verified by testing the planetary gearboxes of armored vehicles.

The main contributions of this study are as follows:(1)The CNN model was improved to improve the dimension of feature extraction and increase the diversity of features. A variety of convolution stride modes were added in the improved CNN model to extract features of different scales of signals and expand the feature dimension. The training efficiency of the model was improved. It has been applied and verified in the planetary gearbox of an armored vehicle.(2)t-SNE was used to present the learning situation in different layers of the improved CNN. Grad-CAM method was used to visualize the classification weight of samples and locate the signal segment concerned by the model, which enhanced the interpretability of the model.

2. Structure of the Improved CNN Model

Common data characteristics include skewness, kurtosis, shape factor, crest factor, impulse factor, clearance factor, and kurtosis value. Feature extraction requires complex calculations and depends on expert experience. For different experimental data, the selected characteristics have a certain randomness, the efficiency of fault diagnosis is low, and the universality is poor. In this study, an improved CNN model was used to analyze and process the experimental data, which could realize the automatic extraction of features, and the diagnosis accuracy was high.

2.1. Structure of Typical CNN

CNN is a neural network model inspired by mammalian cortical visual receptors, which normally consists of three layers, i.e., input layer, feature extraction layer, and classification layer. The feature extraction layer usually includes two or more groups of a convolution layer and a pool layer, as shown in Figure 1. The convolution layer obtains data characteristics through convolution kernel operation. The pooling layer can compress the incoming characteristics, reduce the complexity of convolution layer features, and suppress the overfitting phenomenon. Pooling methods include average pooling, maximum pooling, and L2-norm pooling. After two groups of convolution layer and pool layer, there is a classification layer. The classification layer includes a full connection layer, middle layer, and classifier. The neurons in the classification layer are connected in the way of full connection. CNN has excellent ability for feature extraction, classification, and recognition. CNN is widely used in fault diagnosis. The structure of CNN is shown in Figure 1.

When data enter the CNN through the input layer, the feature extraction layer will extract the characteristics of the data and output a column vector as the input of the full connection layer. The full connection layer integrates the input characteristics of the feature extraction layer. Characteristics are passed to the classifier through the middle layer. The characteristics are classified by the softmax regression layer, and the mapping relationship is established with the output label. Assuming that there is a K-class classification problem, the output of softmax regression can be calculated as follows:

The weight and bias of CNN are trained by the backpropagation (BP) neural network algorithm. W and b are the weight matrix and bias value of the convolution kernel, respectively, and O is the final output of CNN. The loss function is cross entropy.

2.2. Structure of Improved CNN

In order to improve the dimension of data characteristics, the aforementioned CNN structure was optimized and the convolution kernels with three moving steps were used to extract data characteristics. The structure of the improved CNN is shown in Figure 2. The improved model was still composed of an input layer, feature extraction layer, and classification layer. In the feature extraction layer, the group number of the convolution layer and the pooling layer can be changed according to the complexity of the data, which is later discussed in Section 2.2.1. Convd1, Convd2, and Convd3 represent convolution kernels with moving steps of 1, 2, and 3, respectively. Different CNN layer groups are matched with convolution kernels with different moving steps, as later discussed in Section 2.2.2. The data characteristics of different dimensions can be obtained after the convolution operation. The data characteristics of different scales are obtained by convolution kernels with different moving steps. The number of convolution kernels in the two CNN layer groups is N1 and N2, respectively.

Signal Si is selected; is obtained after the first convolution operation as follows:where is deaveraged to increase the sparsity of the model and improve the training speed of the model. The function is used as the activation function. Then, the variable is pooled (). The output of the first CNN layer group is as follows:

Element in the second CNN layer group is the average value of outputs. The outputs are obtained from the output of the first CNN layer group after the convolution operation as follows:

The output of the second CNN layer group is as follows:

are connected to get a column vector, which is input into the full connection layer. The full connection layer, middle layer, and output layer constitute a traditional neural network.

2.2.1. Number of CNN Layers

Before the model is trained, the convolution kernel weight vector and bias are initialized. After initialization, the weight vector is normally distributed, and its standard deviation is 0.1. In the training process, the weights are iterated continuously, and the number of CNN layers is determined by the distribution state of the weight vector after training. After 500 times of training, layers of the weight of the frequency distribution are shown in Figure 3. It can be seen that the weighted frequency of the second, third, and fourth layer groups is similar to the initial normal distribution. It can be seen that the contribution of adding third and fourth layer groups to the model is not obvious, so two CNN layer groups are selected.

2.2.2. Length of Convolution Kernel

The number of convolution kernels in each layer usually increases by a multiple of 8. The number of convolution kernel in the first CNN layer group is 8, and the number of convolution kernel in the second CNN layer group is 16. The length of the convolution kernel in the CNN layer group is halved in turn to ensure the training stability. Too long convolution kernel will lead to over fitting. The recognition rate of the model will be reduced if the convolution kernel length is too short. The convolution kernel length of the CNN1 is 30. The convolution kernel length of CNN2 is 15.

2.3. Grad-CAM for Signal Weight Visualization

Before the full connection layer of CNN network, the convolution layer retains the lost spatial information. Grad-CAM can use the gradient information of the last convolution layer and then backpropagate the information to other convolution layers, so as to understand the importance of each feature map in the convolution to make a decision on a specific classification, that is, the weight. The feature map of the convolution layer is multiplied by its corresponding importance weight; the feature location map is obtained. Grad-CAM can be directly used for general networks with a full connection layer. It can be divided into the following steps:(1)We suppose is the location map of category , , and which are width and height.(2)The score gradient of category is calculated before the output layer.(3)The measured convolution parameters are extracted. If the number of convolution kernels is , is used to calculate the partial derivative of characteristic graphs, respectively. The qth characteristic diagram is and the partial derivative is .(4)The gradient of Q feature maps are pixel averaged to obtain the importance weight of the feature map as follows:where represents the importance of the qth characteristic graph to the decision of category , is the number of pixels of , and is the pixel value at (i, j) in .(5) characteristic graphs are multiplied by their corresponding weight and then summed to obtain . Eigenvalues that have a positive contribution to category should be paid attention to. Elements that contribute negatively should be discarded. A ReLu function is added. Eigenvalues less than 0 are set to 0. The calculation formula is as follows:(6)Finally, the obtained is resampled and matched with the input sample size to obtain the location map of the network’s classification weight for the samples.

3. Validation of the Proposed Method in Vehicle Health Status Recognition

The effect of the improved CNN model on health status recognition is verified by the experimental data of different armored vehicle health statuses. The sensor installation diagram and test system are shown in Figure 4.

A certain type of armored vehicle was taken as the research object; four armored vehicles with the same model and different health statuses were selected during the experiment. The engine speed was set to 600 r/min. The armored vehicle vibration signals were collected by the vehicle working condition data acquisition system, which was composed of a CoCo90 dynamic signal analyzer and EDM software. CoCo90 dynamic signal analyzer is a powerful, high-precision, and easy-to-use data recorder and dynamic signal analyzer. It is equipped with two USB interfaces, a 100BaseT Ethernet interface, an SD card interface, an audio input/output port, a 5.7-inch color LCD display, and a key panel. The collected data can be downloaded to the PC for management and analysis by using the EDM software. In the experiment, the sampling frequency was set to 250 Hz. The sensor sensitivity was 10 mV/(g). The sensor model was DYTRAN 3023M2 S/N 7238. The vibration signals of four armored vehicles are shown in Figure 5.

In order to prove the health status recognition effect of the improved CNN model, the first 80% of the data were taken as the training set and the last 20% of the data were taken as the test set to train the improved CNN model. The learning rate of improved CNN is 0.0004. The batch size of the improved CNN is 100. The loss function of improved CNN is crossentropy. The optimizer of improved CNN is AdamOptimizer. The number of neurons in the middle layer is 1024. The structure parameters of improved CNN are shown in Table 1. Compared with the traditional CNN model and 1D-CNN model, the curves of training results are shown in Figure 6.

The confusion matrix of the classification results of the test samples is shown in Figure 7. The vertical axis represents the real situation, and the horizontal axis represents the prediction. The accuracy of the training and classification results reaches 99.3%.

In order to intuitively reflect the learning situation of improved CNN on different data characteristics, the clustering effect of data characteristics in improved CNN is analyzed by t-distributed stochastic neighbor embedding (t-SNE). t-SNE is an effective nonlinear dimensionality reduction method. Based on the probability distribution of random walk on the proximity graph, the structural relationship can be found in the data. It is concerned with maintaining the local structure of data and the dimension is reduced to two-dimensional space to preserve the popular structure of data. The data characteristics of the input layer, first CNN layer group, second CNN layer group, and output layer are visualized in two dimensions, and the results are shown in Figure 8.

It can be seen from Figure 8 that, as the data pass through the first CNN layer group, second CNN layer group, and output layer, the distribution boundaries of different health states are gradually clear. The four health states are gradually separated, and the separability is gradually strengthened. The classification distance of the four health states does not represent the real classification distance of the data characteristics, but is just a clustering diagram.

4. Fault Diagnosis of Planetary Gearbox Based on Improved CNN

To further verify the fault diagnosis capabilities of the improved CNN model, a planetary gearbox with artificial faults are tested. The test bench of the planetary gearbox is shown in Figure 9. The test bench was mainly composed of a planetary gearbox, generator, hydraulic station, variable frequency motor, sensors, and so on. The planetary gearbox was taken as the research object. The faulty gears were machined by cutting. During the experiment, two working conditions were set, each of which was set with four kinds of faults: broken planetary gear with 15 teeth, broken planetary gear with 18 teeth, broken sun gear with 30 teeth, and broken sun gear with 31 teeth.

The main control platform was used to control the start and stop of the planetary gearbox, adjust the speed, and set the experimental conditions. In working condition 1, the load was set to 900 N·m, and the output shaft speed was set to 1500 r/min. In working condition 2, the load was set to 1200 N·m, and the output shaft speed was set to 1500 r/min. The measuring points were set on the surface of the planetary gearbox. The vibration acceleration sensors were selected for the experiment. The data acquisition system was used to collect the vibration signal, and the sampling frequency was 20 kHz. Vibration signals were collected under five working conditions: normal gear, broken planetary gear with 15 teeth, broken planetary gear with 18 teeth, broken sun gear with 30 teeth, and broken sun gear with 31 teeth (Figure 10).

In order to ensure the diagnostic accuracy and calculation speed of the model, the sampling points include the sampling points of the whole cycle sampling time of each gear, that is, more than 1835 points. Each working cycle was 0.125 s. Under experimental conditions, the experimental sampling points of the broken sun gear with 30 teeth were 704512, and the other four working conditions were 700416. In order to facilitate the calculation, the first 700000 data points were selected for analysis.

The technical roadmap adopted in this study is shown in Figure 11. The vibration signals of the planetary gearbox were collected and grouped. The signals were analyzed in the time domain. An improved CNN model was established. Network parameters of the improved CNN model were determined. The data were divided into a training set and a test set. The training set was used for feature extraction and classification recognition of improved CNN, and the test set was used to verify the feature extraction ability of improved CNN.

The collected vibration data were analyzed in the time domain. Due to the high sampling frequency, the regularity of the time-domain waveform was not obvious. Time-domain diagram of the vibration signal is shown in Figure 12.

Multiple groups of vibration data were collected through the planetary gearbox test bed. The data were grouped and input into the model. With the increase of the grouping proportion of the training set, the accuracy of the model first increases and then decreases. The accuracy under different grouping proportion is shown in Figure 13. It can be seen that when the grouping proportion of the training set is 80% and the grouping proportion of the test set is 20%, the accuracy of the model is the highest.

In order to prove the diagnosis effect of improved CNN, the first 80% of the data were selected as the training set, and the last 20% of the data were selected as the test set. If the number of iterations is properly selected, the best fitting effect can be obtained, and the time cost can be minimized. As the number of iterations increases, the recognition accuracy continues to improve. When the number of iterations reaches 200, the recognition accuracy is basically stable. The curves of training results are shown in Figure 14. It can be seen from Figure 14 that the improved CNN model has higher recognition accuracy and faster convergence speed than the traditional network model.

The fault features of the output layer are visualized in two dimensions, and the results are shown in Figure 15. The ten states are basically separated in the output layer. The data features have a good clustering effect in the model. In order to test the fault diagnosis effectiveness of the improved CNN model, the confusion matrix of the classification results of test samples is drawn in Figure 16. The vertical axis represents the real situation, and the horizontal axis represents the prediction situation. The accuracy of training and classification reaches 98%.

The visualization of the weight of signal classification in the last convolution layer of the model qualitatively extracted by the Grad-CAM algorithm is shown in Figure 17. The color changes from blue to red, and the weight increases gradually. As can be seen from Figures 17(a)17(j), in order to distinguish these ten working conditions, the focus of attention is different in the signal of each working condition, which is the reason why the model can learn characteristics from the signal and classify faults.

This section explored how the model learned and classified signals and where the focus of classification was. The Grad-CAM algorithm was used to visualize the classification weight of the signal by the model. The figures correspond to the weight of the signal concerned by the model. The color changes from blue to red, and the attention of the model increases gradually. The model pays different attention to different fault modes, indicating that the characteristics of different positions of the signal affect the classification results.

5. Conclusion

In this study, the improved CNN model was established and its network parameters were determined for fault recognition. The classification accuracy of the improved CNN classifier was verified by the experimental data. The collected fault data were analyzed in the time domain. The training set and test set were distinguished. The collected data were input into an improved CNN model to realize vibration signal feature extraction and fault identification of the planetary gearbox of an armored vehicle. The Grad-CAM method was used to visualize the classification weight of samples. The results show that an improved CNN model can be used for fault recognition with high accuracy.

Data Availability

The data used in this study can be obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China, under Grants 51875576 and 52005510.