Abstract

In state analysis of rolling bearings using collaborative representation theory, how to construct an excellent redundant dictionary to collaboratively represent the acquired normal or abnormal data has been being a significant issue. Thus, a new method for fault detection and classification of rolling bearings is proposed in this paper. The proposed algorithm mainly consists of three components. First, a wavelet transform is employed to extract features, which takes advantage of the observation that vibration signals under different conditions have similar frequency spectra. This similarity ensures that we can collaboratively represent any test sample by using training samples. Second, under the similarity assumption, a dictionary pair learning strategy is employed to build an overcomplete dictionary pair, which is used to realize an optimal representation of the vibration signal. Meanwhile, the sparse constraint is also taken into account during dictionary training to enhance the robustness of the classification. Finally, the learned dictionary combined with collaborative representation is used to intelligently perform pattern classification of rolling bearings. The effectiveness and superiority of the method are verified by applying the proposed algorithm on the simulated and real vibration signals. The results show that, for different fault categories generated from different fault size and motor loads, our method can rapidly and accurately identify the fault category to which the input sample belongs.

1. Introduction

Among the majority of rotating machines, fairly important and frequently encountered components are rolling bearings, and the operating efficiency of an entire machine or an entire system is directly affected by their operating state. Therefore, the state detection and classification of rolling bearings have always been and will still be a research hotspot because of their low-cost maintenance and the reduction of unpredictable influences in some cases [1]. In the past few years, the state diagnosis of bearings has been intensively studied with respect to vibration and acoustic measurement analysis and the research has achieved many satisfactory results [24]. However, due to environmental noises and the shaft rotational speed, the acquired vibration signals of rolling bearings are always more complicated. Thus, how to determine the state of bearings via obtained vibration signals is still an important issue worth studying.

In the early stage of fault diagnosis, some statistics (e.g., the root mean square and kurtosis) were extensively used to extract the fault characteristics of the original signals in time domain. Although these statistical parameters can be computed using fairly simple methods [5, 6], they can not completely represent the fault information features due to the existence of unknown periods or frequencies when analyzing vibration signals with complex faults, which may lead to low classification precision. To obtain more detailed fault information, many new methods were developed in recent years. As most popular methods in fault detection, feature-based algorithms assume that each fault of a rolling bearing produces a specific vibration that can be measured and has a unique characteristic. Currently, with respect to state detection, a variety of feature-based detection techniques (e.g., frequency-domain methods and time-frequency methods) [710] have been studied and successfully applied in practice. In general, feature-based fault diagnosis methods mainly involve feature extraction and pattern classification. First, a distinct feature can improve accuracy rate and reduce the complexity of the classifier. In [11], Wang et al. extracted a distinct feature (i.e., the singular values of a feature matrix) from the considered data using singular value decomposition (SVD) and empirical mode decomposition and employed a very simple classifier called the Mahalanobis distance to accomplish fault clustering. In [12], a characteristic information vector is extracted using the wavelet package transform, and then the modified features (after dimensionality reduction) are fed into twin support vector machine (SVM) for fault detection. Second, a good classifier can provide excellent predictions for both training samples and testing (or query) samples and have good compatibility with the extracted features. For example, an SVM can produce high accuracy classification and diagnosis due to its excellent generalization performance [1315]. In addition, Wang [16] constructed redundant statistical features using a binary wavelet packet transform with a wavelet called the Daubechies 44 and then employed the simple Kankar nearest neighbors to identify the features, which achieved high prediction accuracies. However, in spite of the theoretical correctness of these methods, the selection of features needs a wealth of prior knowledge and the features vary greatly in practice, which complicates the adaptivity. Meanwhile, for some classifiers, the parameter selection and optimization of larger samples are also challenging tasks, which may increase the complexity of the model.

To improve the diagnostic accuracy rate, sparse representation (SR) theory has been proposed to resolve the problem of fault diagnosis. The SR uses a dictionary that is predefined or learned from the underprocessed data to represent the signal, which has greater flexibility and adaptivity than the traditional orthogonal transform. More importantly, the identifying information in the SR vector is convenient for pattern classification. Since the advent of SR, it has been intensively studied and widely used in signal processing (e.g., image processing [17, 18], compression sensing [19], and fault diagnosis [20, 21]). Meanwhile, the sparse characteristics of dictionary-based SR are also applied in various classification tasks. In [22], a classification model named the SR-based classification was devised and achieved an exciting result for face classification. However, the investigation shows that the sparsity of SR classification may experience the leakage of weak fault features, which leads to bad classification performance in the case of complicated data with multiple features.

To capture the weak features, collaborative representation (CR) was devised by Zhang et al. to conduct classification in a more effective way than the norm sparsity (i.e., the SR classification), and it is less sparse than SR classification [23]. Since the birth of CR, it has been widely applied, and a great number of methods based on CR have been devised and successfully employed in visual classification [2428]. Nevertheless, CR-based algorithms that are used for fault classification are few and far between. The success of CR classification lies in the fact that the acquired images with natural similarity could be directly used as samples to produce a redundant dictionary [29], and the samples belonging to other classes can also be employed to express any test example. The dictionary is elaborately designed to make the atoms collaborate together to obtain optimal coefficients that can yield the minimum residuals for the classification. In the framework of CR classification, dictionary learning is considered to be more important than similar feature extraction; therefore, it can also be viewed as a learning method based on similar features. However, there are no such similarities in the raw vibration signals of rolling bearings, and thus a specific class of signal may not be accurately represented by all the samples that are contained in the dictionary. Fortunately, some artificial similarity can be generated by using some available feature extraction methods such as fast Fourier transform and wavelet transform. In this work, wavelet transform was employed to build similar features, and a feature matrix whose columns are formed using the wavelet coefficients was also constructed. Since the features are obtained directly from the signals, the learned dictionary has great flexibility. However, in the case where the training matrix contains plenty of similar samples, the traditional learning method will lead to a redundant dictionary with high correlation, which will result in poor signal reconstruction performance, and the ability of CR is largely affected by the capacity of the trained dictionary. Therefore, to achieve perfect signal reconstruction, a new strategy called dictionary pair learning (DPL) that was proposed in [30] is used to train a pair of dictionaries to analyze and synthesize the signals.

Inspired by the DPL and the traditional CR classification, we proposed an improved CR classification method to diagnose bearings in this paper. The main novelty of our work consists of two aspects: (1) the features are directly extracted from the input raw signals using the wavelet transform, and the feature matrix is trained by the DPL; and (2) in the fault diagnosis phase, the sparse and projection coefficients are combined to enhance the robustness of the CR classification and accuracy. Since the CR represents a query sample using the dictionary that comes from all of the raw samples without undergoing training, the CR-based classification method lacks robustness. After combining DPL and CR, the performance of the CR classification is improved and the collaboration between different categories is further enhanced. By using the proposed method, the problem of classification with small samples can also be properly resolved. Under the trained DPL, the case in which a query sample is represented using an incomplete subdictionary can be avoided, which will have a positive effect on the results of the fault diagnosis. The complete flow chart of the proposed diagnostic model is shown in Figure 1. Since the process of dictionary learning is time-consuming when the samples are highly dimensional, the PCA technique is employed to reduce the dimensionality before dictionary training, which can reduce the computational costs while maintaining the main features of the data. In fact, the function of PCA is mainly to remove the redundant components of the feature vectors.

The rest of the contents of this paper will be structured as follows. In Section 2, the principles of the SR and CR classification are reviewed, and the details of the improved augmented CR classification algorithm based on DPL are also described. In Section 3, quantitative evaluations and comparisons are presented on the basis of experimental results from the simulated and real vibration signals. Finally, we summarize the paper in Section 4.

2. CR Classification Based on DPL

2.1. Brief Introduction of CR Classification

SR is an effective signal representation method which represents a sample using a linear combination of training samples that belong to the same type. Suppose that the all samples in training set T are arranged to form an initial dictionary , that is,where comes from the j-th class that contains n samples and has r dimensions; m is the number of classes, and the total number of samples is . Thus, for any one test example q belonging to the j-th class in test set Q, it can be faithfully expressed by using those samples coming from the j-th class:where is the coefficient vector. However, in most cases, the number of training samples in any dataset T is limited and the representation of q only using is not optimal. Thus, it is a wise strategy to represent q using combined with other classes.

The innovation of CR is that it applies all samples in a dictionary to achieve an optimal representation for q, which can subsequently obtain a minimum reconstruction residual to determine the given query sample to which a category belongs. Therefore, to achieve the minimum reconstruction residual, the CR uses the norm as the regularization term instead of the norm. Thus, the CR problem can be formulated aswhere ρ is used for balancing the residual and the sparsity of the solution, the coefficient vector corresponds to the training samples that are located in all categories, and denotes the norm that introduces some sparsity and achieves a steady solution. Since the objective function in (3) is a quadratic and derivable function, it has a closed-form solution and the computational complexity is low. Then, the residual that is generated from the j-th class can be directly calculated using , and the class that produces the minimum residual is considered as the category to which the q belongs. Therefore, the CR classification algorithm can be presented as Algorithm 1.

Input: dictionary and threshold ρ
Main procedure:
(1)Normalize the columns of to make them regularize ( norm)
(2)Let q be encoded on C through
(3)Calculate residuals
(4)Identify the class of q by using TD , where TD (q) is the label of identified class to which the sample q belongs.
Output: identified class to which q belongs.
2.2. Dictionary Learning for DPL
2.2.1. Dictionary Learning Model

The collected vibration signals are always contaminated by noise, and the noise is superimposed on every point of the underlying vibration signal. Since the frequencies of noise mainly focus on the high-frequency band, they can be separated from the main content of the underlying vibration signal after wavelet decomposition, and the low-frequency coefficients of each sample will be more similar than their original forms. In this paper, we construct the feature matrix (subdictionary) using the coefficients of the wavelet transform computed directly from the raw bearing signals. Even though the dictionary C is better than that of the one formed by original raw samples, it is only a predefined nonadaptive subdictionary. Thus, to obtain better results, we expect to construct a synthesis dictionary () and an analysis dictionary by employing the DPL learning strategy, where is the learned subdictionary that is used to reconstruct the raw signal, and is the learned projection subdictionary that is used to analyze the raw signals from the j-th class.

Under each synthesis subdictionary , the coefficient matrix of corresponding dataset is . Then, the DPL model can be obtained by solving the following optimization problem:where is the complementary matrix of . This dictionary learning model (4) is different from most of the current dictionary learning methods, where the term is the reconstruction error and the other term is a regularizer that is used to penalize the projection coefficients that are not related to .

2.2.2. Optimization

On the basis of obtaining the features of the raw signals, a dictionary pair C and D can be found by using an appropriate optimization method. Since the expression of the objective function in (4) is nonconvex, we introduce a variable matrix V to obtain a closed-form solution. Thus, the optimization problem in (4) can be reformulated as

Obviously, the converted optimization problem in (5) has a closed-form solution. As with general multivariable optimization problems, it can be resolved iteratively through two alternative optimization steps. At the beginning of iteration, the corresponding parameters that are involved in the objective function need to be initialized. In this paper, the columns of dictionary pair C and D are randomly initialized using the norm. In addition, to reduce the unnecessary time consumption, a stopping criterion is set; that is, when the residual is less than a very small positive value (e.g., ), the iterative operation terminates. When C and D are fixed, the optimization problem in (5) can be simplified to the following form:

Since the objective function in (6) is separable, the variable V can be obtained by solving m suboptimization problems and each has the following expression:

Similarly, when V is fixed, C and D can be obtained as follows:

In addition, the optimal value of C and D can be solved class by class as follows:where γ is a very small positive value. According to the above analysis, we know that both C and D have closed-form expressions that can be efficiently computed in each iteration. Once C and D are obtained, we can use them to collaboratively represent the test samples. To assess the convergence of the DPL method, the relation between the value of objection function and iteration is illustrated in Figure 2.

2.3. Proposed Classification Algorithm

A classification algorithm based on collaborative representation using DPL is proposed in this subsection. The analysis dictionary D can produce a coding coefficient vector for the given sample q. In addition, the samples from the j-th category can be reconstructed via the synthesis subdictionary using the coding coefficient vector with a fairly small residual . Now, according to the CR classification, we can discriminate the fault classes by using the minimum residual. Since there are some correlations among the atoms in C, more than one linear combination can be used to represent the corresponding x with the same residual. Thus, to find the appropriate (or correct) linear combination that coincides with the actual situation (i.e., the sample belonging to the j-th class should be represented by using the atoms from the j-th class), we introduce the sparsity of the coding coefficient vector into the CR representation to augment the dense representation to improve the performance of CR-based classification [31].

To find a sparse representation, the orthogonal matching pursuit (OMP) [32] is used to perform the following optimization:

In general, contains large positive coefficients whose indexes are related to the right category, and it has either small positive or negative coefficients for other indices. Thus, if and are properly fused, the probability of finding the correct linear combination will be greatly enhanced. For simplicity, the improved CR coefficient is written as follows:

Obviously, amplifies the coefficients of the correct classes and reduces the relative energy of the unamplified coefficients; thus, can effectively approximate the test sample. Meanwhile, this strategy also enhances the stability of CR classification. Subsequently, we can calculate the residuals corresponding to all the classes as follows:

The test example is classified using TD , where TD (q) is the label of the identified class to which the sample q belongs. To study the adaptiveness of the dictionaries learned from DPL to the test samples, the relation between training accuracy and each iteration of DPL training is shown in Figure 3. It can be seen that the accuracy rate gradually rises with the increase in iterations.

2.4. Computational Complexity

The complexity of the proposed algorithm is investigated in this subsection. Since the final classification step is a simple computation of the residuals, the vast majority of the computational complexity of the proposed method comes from the training process of a pair of dictionaries. Recall that C, D, and V have closed-form solutions, and the computational costs of updating , , and are , , and (where r denotes the dimension of the features, m is the number of samples in each class, and d is the number of atoms in each subdictionary), respectively, in each iteration. Among , , and , the largest computational cost is the update of because the calculation of an matrix is involved in the training of and the number of subdictionary atoms and training samples in each class are smaller than r. Fortunately, the factor in can be precomputed before the training process since it only relates to and the other predetermined parameters, which can greatly reduce the training time. In summary, the complexity of the proposed algorithm is .

3. Experiments

In this section, one set of simulated vibration signals, three sets of laboratory bench signals, and one set of real-operating signals were constructed to evaluate the performance of the proposed algorithm. To exhibit the advantages of the proposed algorithm, four other comparison classification methods which include the SVM [13], k nearest neighbor (KNN) [33], decision tree (DT) [34], and back-propagation neural network (BPNN) [35] are also employed to conduct the experiments. The wavelet used in our method is the Haar wavelet, and the reason is that (1) the Haar wavelet has the shortest filter length which can avoid excessive smoothing of the low-frequency component of signal and effectively ensure that the main features are not lost; and (2) Haar wavelet transform has the character of easy calculation, which ensures that the construction of feature matrix using wavelet coefficients is very efficient. Since the parameters have important impacts on the performance of a given method, the parameters involved in the proposed algorithm were optimized by hand to obtain the best results in the experiments; that is, the corresponding values were set as , , , , and , respectively. Moreover, 70 atoms were chosen in each subdictionary, which are randomly selected from all the samples in each class. For the four other machine learning algorithms that were used for the comparison, the optimal values of the free parameters were taken from the reference papers.

As to the evaluation of performance of the proposed method, an indicator called the accuracy (Acc) is adopted in this paper, which is defined aswhere N denotes the number of test samples, represents the indicator function, and and represent the predicted label and the true label of the i-th test sample, respectively. In general, the Acc is calculated by using a confusion matrix [36].

3.1. Experiment Using Simulated Data

In this subsection, a set of simulated signals are generated to test the proposed method. In general, when a rotating bearing has a fault in some key components, the vibration of the bearing will be intensified and impact the performance, which results in the instantaneous high-frequency resonance of the bearing system. Therefore, the vibration signals of some failures are usually characterized using impulsive series. According to this, we generate the simulated signals using the following model [37]:where is an impulsive component, is the resonance component, is the vibration component that is produced by the mechanical equipment, and denotes the additive Gaussian noise. In this experiment, six different frequencies (corresponding to one normal state and five fault states) were used to generate six fault vibration signals which are shown in Figure 4. The vibration signals are formulated as follows:where n denotes the damping factor contributing to the signal that affects the pulse duration, and denote carrier centers and characteristic frequencies, respectively, and is the duration of the simulated signal. The sampling frequency was set to 12 kHz, and points were collected for the simulated signal under the condition that the value of the damping factor is 500. Considering the stability of the classification and the simplicity of the training, 200 samples (each sample contains 2000 points) were generated from the simulated signal for each class, in which 100 samples were randomly selected to train the subdictionary and the other 100 were used as test samples. After the PCA, the dimension of the features was reduced to 300. In addition, under the trained dictionary pair, the reconstruction precisions of each test sample are assessed using all subdictionaries and in the experiment, and the corresponding errors are shown in Figure 5. It can be clearly seen that when a test sample belongs to a certain class, it produces the smallest error for that corresponding class because almost all of the energy of the given test example is projected in its corresponding analysis subdictionary , which ensures that the reconstructed sample using the synthesis subdictionary closely approximates the original sample.

Moreover, the DPL greatly contributes to the accuracy of the CR-based classification methods. An comparative experiment was conducted to verify this point, and the results are reported in Figure 6. From the figure, one can clearly see that the accuracy that is obtained by the proposed algorithm is in all cases, while the result using the traditional CR classification without the DPL strategy is only approximately 91% and is also unstable. Meanwhile, it should be noted that the sparsity constraint that is involved in DPL also makes a certain contribution to the classification accuracy.

By applying the four comparison methods on the simulated samples, we further investigate the performance of the proposed method. The results that are obtained by our algorithm and the comparison methods are listed in Table 1. From the table, it can be seen that the classification accuracy obtained by our algorithm is still in all cases, while the results produced by other four methods are all lower than ours. This result indicates that the bearing states can be precisely classified using our algorithm. The reason is that the features are directly extracted from the raw data without an artificial operation, and the method is adaptive to the input data. By carefully analyzing Table 1, we know that the performance of the DT algorithm is the worst among the used methods. This is because the features in DT are extracted by finding the smallest error after an artificial operation. Obviously, this minimum value is not accurate enough and it will lower the accuracy of DT, especially when there is more than one class. In addition, the classification accuracies of the other three methods are between the DT method and the proposed algorithm. The SVM achieves the second best results for all the fault states. The main reason is that only 100 samples are tested for each class, which is a small-sample problem at which the SVM is particularly skilled. However, the problem that is encountered in this paper is not just a small-sample problem, but is also a multiclass problem; thus, the SVM only outperforms the KNN, BPNN, and DT. Meanwhile, skewed datasets that arise in the multiclass problem also degrade the performance of the SVM. Moreover, the confusion matrix that is used to compute the Acc for the proposed method is listed in Figure 7. The elements on the main diagonal represent the correct classification accuracy for each state, while the elements on the other places denote the wrong classification accuracy. Figure 7 shows that the values on the main diagonal are all equal to 1, which means that all the test samples are correctly classified by our method.

Finally, to compare the computational burden of our method with other four methods, the running times are assessed using Matlab (R2018a, 64 bit) on a personal computer with an i5-7500 CPU (3.40 GHz) and 8.00 GB of memory. The running times that are consumed by the SVM, KNN, DT, BPNN, and ours method are 74.73 s, 126.03 s, 4.07 s, 80.86 s, and 3.85 s, respectively. Obviously, the proposed method shows significant superiority over the comparison algorithms. This is because (1) the feature extraction step is fairly simple, which dramatically decreases the computational time and (2) the proposed method does not need to estimate too many parameters, which also saves a large portion of the execution time.

3.2. Experiment on Real Data

In this subsection, two types of real data are employed to further evaluate the performance of the proposed method. One is the vibration signals that are generated from bearings with predefined faults, and the other is generated from bearing without predefined faults. The parameters that are involved in the tested methods are the same as those that are used in the above simulation experiment.

3.2.1. Real Data from Bearings with Predetermined Faults

The used experimental data of rolling bearings are provided by Case Western Reserve University [38]. First, we provide a brief introduction of the equipment (Figure 8) that is used to generate the real vibration signals. The objective bearings (6203 wind turbine end bearing (FE) and 6205 drive end bearing (DE)) with single faults were tested under different loads (0–3 horsepower). The single faults with 4 different defect diameters (, and 0.007 inches) are introduced to the rolling bearings using electrodischarge machining. Two sampling frequencies (12 kHz and 48 kHz) were used to collect the vibration signals. In this experiment, we selected twelve running states of rolling bearings with different types of faults and fault severities. To visually observe the real vibration signal, twelve raw signals (DE with different faults) with a sampling frequency 12 kHz are illustrated in Figure 9. Based on the raw fault signals, we generated 200 samples (each sample contains 2000 points) for each class, and the detailed information for the generation of raw samples is reported in Table 2.

Once the samples were acquired, we randomly and equally divided them into two sets. One set (100 samples) is used to train the dictionaries, and the other (100 samples) is used to test the proposed model. To obtain reliable results, this experiment was repeated 30 times to produce averaged results as the final output. The Accs that are obtained by our method and the other comparison methods are listed in Table 3, and the corresponding confusion matrix for our method is shown in Figure 10. It can be clearly seen that the classification accuracy that is obtained by our work is still , which indicates that the performance of the proposed method is very effective and stable. By observing Table 3, one can find that there is a sharp distinction between the proposed work and the DT; namely, our work gets the best result, and the DT obtains the lowest accuracy in all fault cases. Overall, the results that are obtained in this experiment accord with those of the simulated experiment; that is, our method is also effective for real vibration signals and outperforms other classification methods.

3.2.2. Real Data from Bearings without Predetermined Faults

To prove the performance of our method for bearings in real operating situation, the vibration signals from the accelerated degradation of bearings (without predetermined faults) were assessed in this subsection. The data are provided by the Changxing Sumyoung Technology and the Xi’an Jiaotong University (https://www.mediafire.com/folder/m3sij67rizpb4/XJTU-SY_Bearing_Datasets). The vibration signals were obtained by running a bearing from a normal state until reaching the various faults under different conditions, which can accurately reflect the real operating situations of bearings. The 4 different fault conditions were adopted in this experiment, which are inner-race fault, cage fault, ball fault, and outer-race fault. For simplicity, we labeled them as 1, 2, 3, and 4, respectively, and the parameters of the bearings are listed in Table 4. In addition, 200 samples (each sample contains 2000 points) were generated for each fault class, in which 100 samples were randomly selected to train the subdictionary, and the other 100 were used as test samples.

The accuracies that were obtained by all the fault diagnosis methods are reported in Table 5, and the confusion matrix for the proposed method is illustrated in Figure 11. As presented in the previous experiments, our proposed method acquires a 100% recognition rate in all fault states, the DT has the worst performance, and the Accs that are obtained by the rest are between the proposed method and the DT algorithm. These results agree with the results of the simulated experiment. The results also indicate that our method can identify the fault states of the bearings operating in real situations.

4. Conclusion

In the paper, a classification method using CR based on DPL has been proposed to detect and classify the states of bearings. We construct a feature matrix that is used to train the dictionary pair, and the Haar wavelet was employed to decompose the raw samples. Two operations are adopted to improve the accuracy of classification: one is the introduction of the sparse constraint into the dictionary training of DPL, and the other is the fusion of sparse coefficients and projection coefficients. The results that are obtained by testing the simulated vibration signals and the real vibration signals indicate that the proposed work provides fairly good classification effects in all cases, and the recognition performance is very stable. However, since the OMP algorithm is exploited to find the sparse vector, the computation burden of our method is also increased. Moreover, the size of training samples used in the proposed method is relatively small. In fact, long samples are common in many practical situations, and a DPL-based fault diagnosis algorithm that considers samples with large size will be studied in the future.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was funded by the Key Science Program of Anhui Education Department (KJ2018A0012) and the Research Fund for Doctor of Anhui University (J01003266) and was also supported by the National Natural Science Foundation of China (NSFC) (61402004 and 61601003).