Abstract

Axle box bearings are the most critical mechanical components of railway vehicles. Condition monitoring is of great benefit to ensure the healthy status of bearings in the railway train. In this paper, a novel fault diagnosis model for axle box bearing based on symmetric alpha-stable distribution feature extraction and least squares support vector machines (LS-SVM) using vibration signals is proposed which is conducted in three main steps. Firstly, fast nonlocal means is used for denoising and ensemble empirical mode decomposition is applied to extract fault feature information. Then a new statistical method of feature extraction, symmetric alpha-stable distribution, is employed to obtain representative features from intrinsic mode functions. Additionally, the hybrid fault feature sets are input into LS-SVM to identify the fault type. To enhance the performance of LS-SVM in the case of small-scale samples, Morlet wavelet kernel function is combined with LS-SVM for the classification of fault type and fault severity and the particle swarm optimization is used for the optimization of LS-WSVM parameters. Finally, the experimental results demonstrate that the proposed approach performs more effectively and robustly than the other methods in small-scale samples for fault detection and classification of railway vehicle bearings.

1. Introduction

Rolling element bearings have been widely used in industrial applications. Axle box bearings are one of the critical mechanical components of railway vehicles. The frequent failures including pitting, stripping, wear, crack, and abrasion of train bearings have a great influence on the traffic safety. Therefore, effective identification of bearing health status is indispensable to monitor the working condition of axle box bearings for train maintenance [1, 2]. Currently, vibration analysis and acoustic analysis are two main approaches for defect detection [3, 4]. Vibration-based diagnosis has become the most common monitoring technique because of its higher reliability.

In the process of fault diagnosis, extracting defect features from noisy vibration signals remains a great challenge. Many sources of signal contamination including additive noise, the signals from shafts, gearboxes, and other mechanical components of railway vehicles overlap signals of interest in both time and frequency. Thus, it is vital to advance signal denoising method to get rid of the noises and extract the fault characteristics. For that reason, a lot of algorithms have been developed for vibration signal denoising. In recent years, the methods based on the discrete wavelet transform (DWT) [57] coefficient shrinkage, the empirical mode decomposition (EMD) [813], and the nonlocal means (NLMs) [14, 15] have been introduced as three popular methods to rotating mechanical fault diagnosis. DWT is characteristic of analyzing signals on multiple scales by discarding the lower magnitude, and the performance of the wavelet transform relies on the selection of the wavelet basis function. In the EMD method, the clean vibration signal is obtained by discarding first few intrinsic mode functions (IMFs). Mode mixing [16], resulting from signal intermittence, is the disadvantage of EMD. To overcome this obstacle, the ensemble EMD (EEMD) was proposed by Wu and Huang [17]. The fast NLMs (FNLM) approach is a very successful image denoising method [14], which has been applied for rotating machinery fault diagnosis. In this study, the FNLM approach and the EEMD approach are combined to denoise the raw vibration signal.

After denoising step, the feature parameters which correctly represent the health status of the axle box bearing should be extracted. According to previous studies, the fault features such as permutation entropy [18, 19], subband energy [20], and statistical features (variance, kurtosis) [13] in frequency domain, time domain, and time frequency domain could be extracted. However, the methods above are not stable for complex signal. As a number of non-Gaussian signals have an impulsive property and heavy tail in engineering, alpha-stable distribution has been widely applied in various fields [2123]. Three estimation methods for alpha-stable distribution were comparatively analyzed theoretically [21], and the results showed that the stability and estimation accuracy of empirical characteristic function method (ECF) ranked first. The kurtogram and stable parameter α have been proposed to detect incipient bearing faults in [22]. After the analysis on stability and sensitivity of parameters, optimal parameters are selected for bearing fault diagnosis [23].

However, the estimation of the symmetry parameter β and location parameter μ can be calculated with the estimated value of α and γ, resulting in cumulative error propagation of such β and μ. Meanwhile, because the characteristic function of α stable distribution is intermittent in , the estimation error is particularly serious in and . Furthermore, the geometry of the bearing structure is symmetric, so the distribution is a more accurate statistical model to describe the bearing signals. Therefore, to enhance the computational efficiency and recognition accuracy of rolling bearings diagnosis, it attempts to extract fault feature using symmetric α stable distribution in this paper.

After feature extraction and selection, the early fault of axle box bearing should be detected via the classification of the selected fault characteristics. Recently, based on statistical learning theory, support vector machine is widely used in pattern classification and fault diagnosis of rotating machinery due to its high classification accuracy [18, 19, 2427]. For its low complexity and improved computational efficiency, LS-SVM has better performance in applications. The kernel function of LS-SVM is critical for a better classification result. Multiple kinds of kernel functions including Polynomial Kernel, Gaussian Kernel, and Sigmoid Kernel are applicable. In order to obtain better performance, WSVM is proposed here with the combination of Morlet wavelet kernel and SVM. Compared with RBF kernel, the Morlet wavelet kernel shows a more reasonable hyperlane. Thus, this article will employ LS-SVM with wavelet kernel function and optimized parameters by PSO to enhance the accuracy of fault diagnosis.

The remainder of this paper is organized as follows. We briefly describe the FNLM and EEMD denoising methods in Section 2. The introduction of the feature extraction based on symmetric alpha-stable distribution is presented in Section 3. Section 4 describes the proposed PSO-LSWSVM method. In Section 5 the proposed approach is validated by experimental data. The conclusion is drawn in Section 6.

2. Denoising and Feature Extraction

2.1. Fast Nonlocal Means Algorithm

With the additive noise models, the definition of noise signals can be expressed as , where is the true signal and is additive noise. For a given sample , the estimate of signal is a weighted sum of values within their neighbourhood :where , and the weights are [14]

This similarity is measured via the weighted Euclidean distance. The weight takes a large value if the patch is similar to the patch j and vice versa. In (2), is a bandwidth parameter, while stands for a local patch of samples surrounding , with samples included. To reduce the computing time, the fast NLM has been proposed. For a signal of length , given a translation vector , corresponds to the discrete integration of the squared difference of the sample y and its translation by .Now let and define ; the patch size is . Thus, can be rewritten as follows:We split the sum and use the identity in (3); we obtainThis is the key expression that computes the weight for a pair of pixels in constant time.

2.2. Ensemble Empirical Mode Decomposition

As an improved version of EMD, EEMD can decrease the mode mixing effect. The algorithm can be given as follows [17].

Add white noise with the given amplitude to the original signal to generate a new signal:where represents the noise-added signal of the th trial, while .

With EMD algorithm, the signal is decomposed into some IMFs.where represents the IMFs , is the final residue, and is the number of IMFs.

Repeat steps and while , with various white noise series every time to acquire an ensemble of IMFs.

Ensemble means of the corresponding IMFs of the decomposition is calculated; the final result is as follows:where is the th IMF decomposed by EEMD, while , and . The IMFs include different frequency bands ranging from high to low. In this study, the first five IMFs are chosen for analysis.

3. Symmetric Alpha-Stable Distribution

3.1. Alpha-Stable Distribution

It is found that alpha-stable distribution can provide useful models for non-Gaussian signals with impulsive waveform and heavy tail probability density. Since the probability density function of an alpha-stable random variable cannot be given in a closed-form, the characteristic function can always be given as follows:where

Thus, the characteristic function is a four-parameter family of distribution and is denoted by . The first parameter is the characteristic exponent which describes the tail of the density function. The second parameter is called symmetric parameter controlling the skewness. The parameters and are the scale parameter and the location parameter, respectively.

3.2. Symmetric Alpha-Stable Distribution

In the case of , the distribution is symmetric about , called symmetric alpha-stable (), which has a characteristic function such thatFurthermore, as the large estimation error, the location parameter cannot describe the health condition of bearings. Thus, the parameter is set to be zero so as to improve processing speed; the characteristic function could be rewritten as

3.3. Empirical Characteristic Function Parameter Estimation Method

In practical applications of engineering, the real-time parameter estimation of random sequence is crucial in alpha-stable distribution. In the literature, there are three major methods used to obtain the parameter value: quantiles method, logarithmic moment method, and empirical characteristic function method. By comparative analysis [21], the empirical characteristic function approach has the highest estimation accuracy for four parameters of alpha-stable distribution with best stability. The parameter estimation process based on ECF is described as follows [28]:

Calculating the sample characteristic function is as follows:where is the sample of a random variable.

Equation (14) can be easily achieved based on (12):

The characteristic exponent and the scale parameter are acquired by linear regression estimation:where , , , is the random error, , and parameter is accessible judging by Table in [28].

The distribution densities with different and value are shown in Figure 1. For the fault bearing signal, the defect characteristic parameters, such as the exponent , the scale parameter , and the maximum PDF (MPDF) value which represent the healthy status of the bearing, can be gained by the distribution method.

4. Bearing Defect Diagnosis Methodology Based on PSO-LS-WSVM

4.1. Morlet Wavelet Kernel-Based SVM

Support vector machines (SVM) have been shown to be effective for many classification problems. The SVM, a new supervised machine learning technique on the basis of statistical learning theory, is designed to find optimal hyperplanes among diverse kinds of input training data in high dimensional feature space. Subsequently, with separating hyperplanes, testing data could be sorted. In order to reduce computing time and enhance recognition accuracy, the least squares SVM (LS-SVM) was proposed. Kernel mapping is applied to map the data in input space to a high dimensional feature space, where the problem is linearly separable. Therefore, the kernel function is a critical factor for classification accuracy. Several types of kernel functions including Sigmoid Kernel, Gaussian (RBF) Kernel, and Polynomial Kernel are generally used in many applications; specifically, Gaussian Kernel has been widely used due to excellent performance. In recent years, the wavelet kernel as a type of multidimensional wavelet can approximate arbitrary nonlinear function, and Zhang et al. have proven that wavelet kernel is better than the Gaussian Kernel [26].

We consider that the wavelet analysis is a function with a family of functions emerging from dilating and translating of a mother wavelet function:where , is a dilation factor, is a translation factor, and is the mother wavelet. The product of one-dimensional wavelet function can be written as follows:where . If , the dot-product wavelet kernels areAnd the translation invariant wavelet kernels areWith no loss of generality, people can construct Morlet wavelet functions as translation invariant wavelet kernel functions as follows:The Morlet wavelet function is shown in Figure 2. Equation (19) defines the mother wavelet, of which the wavelet kernel can be described as follows:

4.2. Particle Swarm Optimization for Parameter of LS-WSVM

Particle swarm optimization (PSO) is a population based on stochastic optimization technique inspired by social behavior of bird flocking or fish schooling [29]. Compared to genetic algorithm (GA) [30], the advantages of PSO are easy to implement and there are few parameters to adjust. Thus, it shows better performance with optimization problems.

In PSO, suppose that the search space is -dimensional; there are particles in the population. The position of the particle in generation is expressed as -dimensional vector, . The position represents the particle velocity vector, . The position and velocity of each particle are replaced continuously according to the formula as follows:where is the updated iteration of the particle. The best position of the th particle in -dimensional search space can be recorded as follows: , and the best position in whole swam is recorded as follows: . and are the acceleration constants. and are two independent random parameters which obey the uniform distribution ranging for . Parameter , being applied to resolve the capabilities of global and local exploration, should be set as follows:where is the minimal inertia weight and is the maximal inertia weight, iter is the current iteration number, and is the maximum iteration number. The optimization procedure is illustrated in Figure 3.

5. The Proposed Intelligent Bearing Fault Diagnosis Method and Experimental Results

5.1. The Proposed Intelligent Bearing Fault Diagnosis Methodology

On the basis of the superiorities of FNLM, EEMD, , and PSO-LS-WSVM, researchers put forward a new bearing fault diagnosis approach, with the purpose of sorting multiple and normal types of faulty bearing. Figure 4 shows the proposed procedure and the steps are displayed as follows.(1)Samples of vibration signals are taken by acceleration sensors at a particular sampling frequency under various operating conditions.(2)FNLM and EEMD methods are applied to preprocess the vibration-based signals, aiming at acquiring a range of IMF components. Subsequently, for feature extraction, the first five IMF components with more important state information are chosen.(3)Extract the , , and MPDF feature parameters using ; then choose the best IMF with feature parameters which describe the bearing health condition to construct the new fault feature vector.(4)The obtained feature data is partitioned into training samples and testing samples.(5)The Morlet wavelet kernel function is chosen for LS-WSVM classification and its parameters are optimized by PSO.(6)Put the training and testing samples into the classifier to conduct automatic fault diagnosis.

5.2. Experimental Results

With the purpose of examining the effectiveness of the proposed approach, the axle box bearing vibration data are used as an example. Figure 5 displays the experimental test on the axle box bearings of railway conducted in test rig. The test rig for data acquisition consists of two motors, two friction wheels, hydraulic loading installation, and control electronics (not shown). Experimental bearing is mounted to the wheelset which is fixed by installation of the test rig, and the wheelset is driven by a friction wheel. Figure 6 displays the experimental test on the axle box bearings of railway conducted in test rig. Every fault condition consists of two sizes: the width is set as 0.1 mm; the depth is set as 0.23 mm and 0.43 mm, respectively. An accelerometer, attached to the housing with glue, is applied to collect vibration data. As the occupation at six o’clock of housing, the accelerometers are located at the two o’clock position on the housing, with the sampling frequency of 25.6 kHz. The parameters of the axle box bearings are shown in Table 1; and defect frequencies of the axle box bearings are shown in Table 2.

As shown in Table 2, the present research needs to distinguish 7 classes in total. For each condition, 80 samples can be obtained. The gathered original signals are classified into training samples and testing samples for each condition, with each sample containing 5000 data points. The training samples are used to train the classifier model and the testing samples are used to evaluate the effectiveness of the proposed fault diagnosis methods.

Figure 7 shows the typical waveforms in time domain by FNLM denoising method and EEMD decomposing algorithm. Generally, defect information is contained in the first five IMF components, which could be utilized to extract defect features using . The detailed steps to extract features have already been discussed in Section 3.2.

Table 3 shows characteristic exponent values of all five modes, and the third mode (c3) performs a better fault indication, where three values observed under normal situation and abnormal situations with inner race fault (0.43 mm) and roller fault (0.43 mm) are 1.1757, 1.1618, and 1.1409, respectively. The above three values are supposed to be pretty close. Therefore, single alpha values of mode c3 fail to show the difference of bearing healthy status. Scale parameter and MPDF value of mode c3 in Tables 4 and 5 show the distinctions under different bearing fault conditions. Thus, characteristic exponent, scale parameter, and MPDF value have been combined to describe the working condition of axle box bearing. The parameters α, γ and MPDF value of 40 training samples are aligned in Figures 8(a), 8(b), and 8(c), respectively. It can be seen that anyone of the three parameters cannot identify different fault types, but when the three parameters were combined, samples of the same class exhibit excellent clustering result in Figure 8(d).

After feature extraction, the different feature sets, including α, γ, MPDF, and hybrid feature set, are used as input to the wavelet-based LS-SVM for fault diagnosis. Based on experience and experimental tests, computation complexity is taken into consideration in the experiment and the parameters of PSO optimization are set as follows: the number of particles is set as 20, the acceleration constants both are set as 2.0, and the evolutional generation is set as 100. As shown in Table 6, when using 40 training samples and 40 testing samples as the input to proposed classifiers, the classifiers yield recognition rates, 88.57% for the α feature, 92.86% for the γ feature set, 93.57% for the MPDF feature set, and 95.71% for the hybrid feature set. It shows that the hybrid feature set contains more information characterizing the condition of axle box bearing. Moreover, Figure 9 shows that the consuming time of Morlet wavelet kernel is shorter than RBF kernel.

For the reason that the performance of diagnosis methods is closely related to the amount of training samples, we will study the recognition rates in different samples. In the testing experiment, 5, 10, 20, 30, and 40 samples for each class as the training and testing set were randomly selected, aiming at assessing the classification accuracy of diverse methods, respectively. To show the classification accuracy of wavelet-based LS-SVM and RBF-based LS-SVM in different numbers of samples, the above training sets were input to the classifiers, respectively. The comparing recognition results are shown in Figure 10, which proves that the proposed method has reached higher recognition accuracy than that of RBF-based LS-SVM in different training sample. With the increase of the number of samples, the classification accuracy rate is also rising, and the proposed approach showed good performance in the case of a very small number samples.

To obtain the better recognition accuracy, many optimization algorithms including PSO, GA, and Grid Search (GS) were combined with LS-SVM classifier. In this paper, the PSO algorithm is used in our work; thus the GA and Grid Search algorithm are compared with PSO in optimizing parameters. The parameters of GA are set as follows: the population size is set as 20, the iteration number is set as 100, and the crossover probability and the mutation probability are set as 0.5 and 0.1, respectively. The comparison result of 40 samples for each fault class is shown in Table 7 and Figure 11; the classification result of PSO-LS-WSVM is 95.71%, in comparison with the 92.14% and 93.57% using GS-LS-WSVM and GA-LS-WSVM, respectively. As shown in Figure 11, the consuming time of PSO-LS-WSVM is longer than that of the other two approaches. The main reason is that PSO is not good at binary coding. Moreover, the average classification accuracy of wavelet-based LS-SVM optimized by PSO, GA, and GS in different number training sets and testing sets is compared. From Figure 12, it can be seen that the recognition rate of PSO algorithm is obviously higher than that of the other two methods. It can be concluded that the classification accuracy is affected by the number of training samples.

It should be noted that the Morlet-LSSVM method has better performance than RBF-LSSVM for small feature dataset. From the above analysis, the result of RBF-LSSVM is seriously affected by the number of training samples in small size samples. The training time and the testing time of the classifiers rely on the sample size and coded programming. Hence, under the same condition, the smaller the sample size, the less the time it consumes. Furthermore, the consuming time of the Morlet-LSSVM method is less than that of the RBF-LSSVM. This phenomenon may be attributed to the fact that the Morlet kernel is approximately orthonormal, but the RBF kernel is not.

6. Conclusion

We proposed a novel bearing multifault diagnosis method based on FNLM and EEMD for denoising, symmetric alpha-stable distribution for feature calculation, and an appropriate PSO-LS-WSVM classifier. The results of experiment suggest that the denoised method FNLM-EEMD improves efficiency of defect feature extraction, and the proposed parameter extraction method is capable of making the most discriminate and efficient features for fault diagnosis. By comparing combinations of feature parameter with the LS-SVM based Morlet wavelet kernel and RBF-based classifiers and then optimizing with different algorithm, respectively, the classification capacity of the above classification methods has been studied under various sizes of training and testing samples. All results reveal that the wavelet-LSSVM has better performance than the RBF-LSSVM when the size of data sample is very small. For its higher recognition accuracy and computational efficiency, the bearing fault diagnosis based on FNLM-EEMD, , and the PSO-LS-WSVM classifier is an effective and powerful tool for monitoring the health status of axle box bearings.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by National Natural Science Foundation of China (no. U1234208) and the Fundamental Research Funds of the Traction Power State Key Laboratory, Southwest Jiaotong University (2013TPL_T04).