Abstract

Industrial big data bring a large number of high-dimensional sample datasets. Although a deep learning network can well mine the internal nonlinear structure of the dataset, the construction of the deep learning model requires a lot of computing time and hardware facilities. At the same time, there are some nonlinear problems such as noise and fluctuation in industrial data, which make the deep architecture extremely complex and the recognition accuracy of the diagnosis model difficult to guarantee. To solve this problem, a new method, named stochastic learning algorithm (SL), is proposed in this paper for dimension reduction. The proposed method consists of three steps: firstly, to increase the computational efficiency of the model, the dimension of the high-dimensional data is reduced by establishing a random matrix; secondly, for enhancing the clustering influence of the sample, the input data are enhanced by feature processing; thirdly, to make the clustering effects more pronounced, the noise and interference of the data need to be processed, and the singularity value denoising method is used to denoise training data and test data. To further prove the superiority of the SL method, we conducted two sets of experiments on the wind turbine gearbox and the benchmark dataset. It can be seen from the experimental results that the SL method not only improves the classification accuracy but also reduces the computational burden.

1. Introduction

Deep neural networks have extensive applications in artificial intelligence mainly including computer vision [18], speech recognition [913], medical detection [1419], and mechanical fault diagnosis [2028]. Compared with human ability, the DNN model is more capable of solving this complicated problem. But it also has certain challenges. For example, to complete different tasks effectively, different DNN models need to be trained, tuning the parameters through repeating the trial and error, which optimizes the model structure [29]. Therefore, training a DNN model to effectively process specific tasks takes up days or even weeks of the entire computing cluster time [30]. In addition, the parameter optimization of the DNN model not only requires high-performance GPU, TPU, and other higher computer hardware environments but also has high requirements for datasets. Therefore, applications that require high real-time performance or data samples that lack markup are not suitable [31].

In addition to deep learning methods, shallow learning algorithms (PCA, KNN, LPP, etc.) are still largely applied in the artificial intelligence area [3237]. Although this kind of shallow learning algorithm has the advantages of simple structure, low hardware environment configuration requirements, and relatively high computational efficiency, it also has limitations that are difficult to overcome, such as classifying data containing a large number of variables and a simplified sample set, and the problem of nonlinear nature [38]. In contrast, to overcome the shortcomings of the above algorithm, random forest (RF) was proposed and validated [39, 40]. In addition, meanwhile, RF also has many other advantages [41]. For example, it is easy to understand and simple to implement, tests fast, is highly able to handle outliers and nonlinearity, and shows good performance in parallel training and big data. As a result, it is widely used in medicine, computer vision, machine learning, and other fields, which achieved great results [4250]. However, the number of decision trees is an important parameter of RF, which will affect the RF’s classification accuracy and computational efficiency. In this paper, a better machine learning algorithm-random learning (SL) is put forward to solve this problem. SL method uses a random mapping matrix to randomly decrease the high-dimensional data’s dimensionality, improving the data after the dimensionality reduction. Moreover, the feature is denoising based on SVD to improve the classification rate. Therefore, this method has outstanding advantages: (1) the sample dimension is greatly reduced after processing; (2) the calculation efficiency of the random forest is improved; and (3) the recognition effect of random learning is ensured by the reinforcement process.

In this paper, the other sections are arranged as follows: Section 2 introduces the proposed method’s theoretical research; Section 3 introduces the experimental setup and analyzes the results and the experiments; and Section 4 gives the conclusion.

2. The Proposed Method

This section describes the detailed information of the proposed stochastic learning method, and the structure of the proposed model is shown in Figure 1. The strategy of stochastic learning is introduced in Section 2.1. The basics of stochastic learning used for classification are explained in detail in Section 2.2. The implementation of the stochastic learning strategy is presented in Section 2.3. Section 2.4 introduces the proposed algorithm’s specific processing process.

2.1. Strategies for the Present Stochastic Learning Method

Sample size and dimensionality would affect the machine learning method's computational efficiency directly. Inspired by the extreme learning machine from the operating mechanism, the input data is randomly reduced in dimensionality to obtain low-dimensional sample data. Unlike the extreme learning machine, the samples are not simply classified, but enhanced in two steps: firstly the sample characteristics were strengthened to obtain a good sample clustering effect; secondly, the feature denoising method is used to improve the cluster. Based on these steps, the recognition accuracy as well as the calculation efficiency of the model can be improved.

2.2. Stochastic Learning Strategy

(1)The research in this part is reference [51]. For an input dataset , and if the random feature extraction layer has d nodes, then is the corresponding layer’s features. The randomly extracted features are expressed by the formulawhere is the biases and W represents the input weights. and W are produced randomly before the start of training, and they remain fixed during the training state without any iterations.(2)The research in this part is reference [51]. Since the input data seems to be nonlinear, in order to improve classification, an aggregate method of the activation function is introduced. For the random features , through the activation function, the dataset can be transformed intowhere represents a weight vector connecting the improved classification operation and the random feature output, represents the better classification layer’s output, and Q is an activation function. Assuming that the training samples contain c categories, qi represents the i-th category’s sample amount (i = 1, 2, …, c), and pi represents the i-th category’s centrality, thenwhere the center matrix is represented by t, .In order to get the connection weight vector Wim, the improved classification’s output and the random features can be converted asTo guarantee the mapping’s stability which is between the input and the target while preventing the overfitting of the function, regularization processing is performed. The regularization formula is as follows:where e represents the unit diagonal matrix and λ represents the penalty coefficient.Therefore, the output dataset Y is expressed asThe above expression includes the model-building process. Regarding the test datasets, it is assumed that the input dataset is . The random feature is expressed by the following formula:With regard to random features , through the activation function Q, the dataset can be transformed intoFor the test dataset, the output of the improved classification layer is expressed as follows:(3)Although the above process can reduce the impact of noise and data fluctuation on clustering results, it is impossible to avoid overfitting completely, and noise still exists in the transformation process. Therefore, training samples and test samples are denoised at the same time. After dimension reduction, the dimension of the sample is greatly reduced and the computational complexity is greatly reduced. Traditional data mining methods only rely on training samples to build models, and test samples cannot be processed.

First, based on the phase space reconstruction method, each feature vector of Y can be converted into an n × m matrix F where m represents the phase space’s embedding dimension, which can be decided by the mutual information and false nearest neighbor approaches, respectively [19, 20]. Then, the SVD-based is applied for exact factorization of the matrix F.where VT represents the complex unitary matrix or an m×m real, U represents the complex unitary matrix or an n×n real, and Λ represents a diagonal matrix with dimension n×m in which the diagonal entries of the matrix F are singular value.

The energy concentration of features and noise is reflected by the distribution of singular values. Among them, the useful feature corresponds to a larger value, and the first k useful features are retained, while the noise corresponds to a smaller value, and the noise is set to zero, a new diagonal matrix Λ′ can be obtained, and then no-free features can be obtained through the inverse singular value transformation.

Finally, the no-noise features F′ is transformed into the new feature by inverse space reconstruction and the new features space can be rewritten as Y′.

Likewise, the clustering effect of test samples can be greatly improved by denoising test samples directly here.

2.3. Proof

Assuming the input dataset can be represented as follows:where represents the difference ideal dataset while represents the perturbation dataset. Then,

For the random features , the dataset is represented aswhere represents the difference ideal dataset while represents the perturbation dataset.

For the random features , through the activation function, the dataset can be converted to

As described in (5), the relationship is expressed by the formula as follows:

Thus,

That is

End.

2.4. Detailed Description of the Proposed Algorithm

First, reduce the dimensionality of the input dataset, then optimize the transformation by improving the classification, and then improve the internal class. Relying on the artificial intelligence network driven by big data can effectively increase the fault diagnosis’s recognition accuracy and reduce the calculation time, improve the calculation efficiency, and overcome the shortcomings of depending on the physical field’s knowledge to extract features manually.

On the basis of the above ideas, the schematic diagram of the proposed SL method in actual fault diagnosis is shown in Figure 2. Here are the concrete steps:Step 1: install the sensor at the position where the vibration is most direct, conduct experiments under different working conditions, and collect raw dataStep 2: process the sensor’s full-channel data to generate sample dataset and corresponding labels required for training and testingStep 3: the training dataset is used to train the SL modelSubstep 1: reduce the dimensionality of the input dataset to a low dimension by using the method of random feature extractionSubstep 2: Aggregate nonlinear mapping matrices using random features extracted from samples and sample informative labels to shorten the distance between samples of the same typeSubstep 3: the training samples after improve classification was denoised by SVDSubstep 4: input the information labels and training samples in order to build the classifiers modelStep 4: the testing dataset is used to test the trained SL modelStep 5: finally, the fault diagnosis accuracy is outputted

3. Stochastic Learning Algorithm for Condition Recognition

Fault diagnosis experiments are carried out based on the wind turbine gearbox and bearing benchmark dataset, respectively, to verify the proposed SL method’s effectiveness. The experimental results of the two experiments are analyzed to verify whether the proposed method can increase the calculation efficiency and diagnosis accuracy.

In the experiments, determine the effect of model parameters (such as the number, step size, and dimension of decision trees) of the proposed stochastic learning method on computation time and classification accuracy is determined. The decision trees range from 100 to 500, the step size by 50, and the dimensions are set according to the input dataset.

3.1. Fault Diagnosis for Wind Turbine Gearbox
3.1.1. Failure Experiment Setup

We conduct experiments on the transmission platform that is named DDS which is designed by SpectraQuest Inc. (company website “http://www.pinxuntech.com/”), and the experimental transmission is shown in Figure 3. The experimental device mostly consists of a drive motor, a two-stage planetary gearbox, two grade parallel shaft gearboxes, and a magnetic powder brake. In the experiment, four typical gear failures were studied, namely, surface wear, tooth cracks, chipped tooth, and missing tooth. At the same time, in order to ensure the unity of the experiment, the transmission platform adopts ordinary gears for the experiment. The four most typical gear failures are discussed, including surface wear, cracked teeth, chipped teeth, and missing teeth.

The main component of the drivetrain diagnostics simulator is a gearbox, and it is also the place where the drivetrain diagnosis simulator is prone to failure. In the experiment, the secondary sun gear of the planetary gearbox was diagnosed by using the acceleration sensor to collect the vibration signals of different fault types of faults in the transmission process. Table 1 shows the fault state settings and working condition settings of the gearbox.

During the experiment, a total of 5 patterns were set up, 1 normal pattern and 4 different failure patterns. The vibration signal in each pattern was collected by the acceleration sensor. We performed a wavelet transform on the collected vibration signals to extract impulse signals. According to the literature [52], features are extracted from the original signal and impulse signal, respectively, and a total of 50 features are counted. We can obtain 2000 × 50 feature samples in these experiments. During model training, the dataset is divided, 50% of the samples are applied for model training, and 50% of the samples are applied for testing.

The experimental outcomes of the SL algorithm are indicated in Figure 4. We conclude from Figure 4(a) that the data dimensions and the decision trees quantity will influence the SL algorithm’s classification accuracy, but the sensitivity of the recognition accuracy to the dimensions is below the number of decision trees. Figure 4(b) illustrates that decision trees quantity mainly influences the calculation time of the SL algorithm, and the data dimension has a small effect on the calculation time. Therefore, in order to ensure that the algorithm model has high classification accuracy and, at the same time, high computational efficiency, the parameters of the SL algorithm should be set to multiple decision trees and fewer dimensions. From the yellow dot area in Figure 4(b), we can see that the SL algorithm can classify up to 92%, and its corresponding calculation time is relatively low, which is 15 seconds.

3.1.2. Comparison of Different Methods of Diagnosis

The data collected in the experiment is used for different algorithm models for fault diagnosis, the recognition accuracy of different methods is compared, and the diagnostic performance of the proposed SL method is further discussed. The classification accuracy and calculation time of different methods are shown in Table 2. In Table 2, the average recognition accuracy rates of SVM, ESN, SAE + ESN, and SAE + Softmax are 22.10%, 76.98%, 44.73%, and 62.00%, respectively. It is unacceptable in actual engineering applications. The corresponding std is 0, 0.679, 1.988, and 2.770, respectively. In addition, as a shallow learning algorithm, the SVM method must have a lower computational efficiency than the deep learning algorithm SAE. The calculation time of SAE + ESN and SAE + Softmax exceeds 12 s, while the calculation time of SVM and ESN is 0.96 s and 0.68 s, respectively. The classification accuracy and calculation time that are about the SL method are shown in Figure 5. The blue in the picture shows the recognition accuracy, and the red shows the calculation time. With a different number of decision trees, the identification accuracy obtained by the SL method is more than 90%, ranging from 90.08% to 90.28%, far exceeding the recognition accuracy of other algorithms. The greater the number of decision trees, the longer the calculation time. The calculation time of the SL method ranges from 3.25 s to 14.55 s. From Table 2 and Figure 5, compared with other methods, the SL method has the highest computational efficiency. The proposed SL method is preceded by other methods in recognition accuracy and calculation time in Table 2 and Figure 5.

3.2. Fault Diagnosis for the Benchmark Dataset
3.2.1. Failure Experiment Setup

To further approve not only the effectiveness but also the superiority of the SL method, the benchmark data is used to evaluate the proposed SL model. The benchmark dataset is a rolling bearing fault dataset offered with the Case Western Reserve University Bearing Data Center (CWRU) [53]. For the benchmark dataset, we consider the bearings under normal conditions and the bearings with 3 different faults, including outer ring faults, memory faults, and ball faults. Each fault includes 3 defect levels (0.18 mm, 0.36 mm, and 0.53 mm wide grooves). During the course of this experiment, the sensor sampling frequency was set to 48 kHz, and the data were collected under a load of “1,” “2,” and “3” HP and the acquisition time of each group of vibration signals last for 10 s. A total of 10 sets of bearing data in different states were obtained in the experiment, including 1 set of healthy bearings and 9 sets of faulty bearings, called NR and fault 1 to fault 9.

The dataset is extracted from the original signal, 800 small sample sets are collected for each load, each small sample set has a total of 300 data, each group of bearing data has 2400 small sample sets, and the benchmark dataset uses a total of 24,000 small samples set. Therefore, the size of the original data in the high-dimensional space is 24000 × 300, and the data plan used to train the model is the same as in Section 3.1.

Figure 6 has displayed the experimental results of the SL algorithm. As shown in Figure 6, the SL algorithm has a recognition accuracy of over 98% under the conditions of different numbers of decision trees and diverse data dimensions, and the algorithm is not sensitive to changes in dimensions and decision trees. When the amount of decision trees is 400 and the dimension is 100, the classification accuracy of the SL method almost reaches 100% at the highest. Generally speaking, the identification accuracy of the algorithm is influenced by input dimensions and decision trees quantity, but the former is more influential, and the recognition accuracy is positively correlated with the amount of decision trees.

3.2.2. Comparison of Different Methods of Diagnosis

As described above, the corresponding comparison results are exhibited in Table 3 and Figure 7. The average classification accuracy rate of SVM, ESN, SAE + ESN, and SAE + Softmax is 26.28% ± 0.673, 39.07% ± 1.27, 37.14% ± 0.902, and 10.02% ± 0.0289, respectively. Furthermore, the SVM and ESN method must have a lower computational efficiency than the deep learning algorithm SAE. The computing times of SAE + ESN and SAE + Softmax are about 290 s, while that of SVM and ESN is 171.55 s and 7.95 s, respectively. As shown in Figure 7, the SL method classification accuracy and calculation time under different amounts of decision trees. The blue in the figure shows the recognition accuracy, and the red shows the calculation time. Figure 7 shows that the average recognition accuracy rate is 96.08% to 97.46% and increasing the number of decision trees can improve the classification accuracy. Meanwhile, the calculation time is also increasing. The calculation time of the SL method ranges from 75.36 s to 341.7 s.

4. Conclusions

This paper proposes a random learning dimensionality reduction algorithm method and uses it for machine fault state recognition. The classification accuracy and operation efficiency of the dimension reduction algorithm is affected by the size of the dimensions. In the stochastic learning method (SL), random feature extraction is performed on the input high-dimensional data through a random mapping matrix. After random feature extraction, low-dimensional feature data will be obtained for model training. Therefore, after feature extraction, the input SL model sample dimension is largely decreased, and the calculation efficiency is greatly improved. The use of information-enhanced data for training guarantees the classification effect of the SL algorithm.

Data Availability

The data used to support the findings of this study are obtained from previously reported studies. These prior studies (and datasets) are cited at relevant places within the text as references.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this study.

Acknowledgments

This work was supported in part by the special projects in Key Fields of Ordinary Colleges and Universities in Guangdong Province (2020ZDZX3029) and Dongguan Science and Technology Commissioner Project (20201800500212and 20201800500282).