Abstract

Grey prediction model has good performance in solving small data problem, and has been widely used in various research fields. However, when the data show oscillation characteristic, the effect of grey prediction model performs poor. To this end, a new method was proposed to solve the problem of modelling small data oscillation sequence with grey prediction model. Based on the idea of information decomposition, the new method employed grey prediction model to capture the trend characteristic of complex system, and ARMA model was applied to describe the random oscillation characteristic of the system. Crops disaster area in China was selected as a case study and the relevant historical eight-year data published by government department were substituted to the proposed model. The modelling results of the new model were compared with those of other traditional mainstream prediction models. The results showed that the new model had evidently superior performance. It indicated that the proposed model will contribute to solve small oscillation problems and have positive significance for improving the applicability of grey prediction model.

1. Introduction

Big data technology is a computational strategy and method for processing large data sets. It is based on large data and has gradually become a research hotspot in recent years. However, sometimes it is difficult to obtain large data. Due to technological capabilities or historical reasons, there are still many small data, such as unconventional energy production, short-term traffic flow, sulfur dioxide emissions, crops disaster area, and so on [14]. The above problems show that there are many grey systems in the real world, and the data of these grey systems are limited. Big data technology can not effectively describe the grey system from the small data.

Grey prediction model is a useful method to study uncertain systems with partly known information and partly unknown information [5, 6]. At present, there are mainly two kinds of sequences suitable for grey prediction model, one is monotone sequence [710], the other is a sequence with saturated “” shape [1113]. For other sequences, such as oscillation sequence or fluctuation sequence, the performance of grey prediction model is poor. However, the real world is complex. The monotonic sequence and the saturated -shaped sequence are only two special cases, and more sequences show oscillation characteristic [1416]. Therefore, how to reasonably construct a grey prediction model to model with oscillation sequence has become a research trend.

Currently, grey prediction model has made some achievements in modelling with oscillation sequence. These studies are mainly manifested in the following three aspects: (a) increasing smoothness of oscillating sequences: the poor smoothness of the oscillation sequence is the main reason for the poor modelling accuracy of the oscillation sequence, so smoothing the oscillation sequence becomes a way to improve the modelling accuracy. At present, sequence smoothness is mainly improved by sequence transformation, such as smoothness operator and amplitude compression [1720]; (b) modelling oscillation interval by envelope: from the perspective of scope, the oscillation sequence envelope is modelled. The envelope is modelled by grey prediction model, and the simulation and prediction of the oscillation sequence variation range are realized [2123]; (c) improving the structure of grey prediction model by periodic operator: in order to adapt to periodic sequence, scholars have introduced periodic factor of triangular function and have established periodic grey prediction model to match the periodic fluctuation of sequence and reduce modelling error [24, 25].

The above methods can improve the modelling ability of grey prediction model for oscillation sequence to a certain extent, but they still have some shortcomings. The sequence transformation method destroys the characteristics of the original sequence and can not make full use of the information transmitted by the sequence. The randomness of envelope design is too large and its generalization is weak. Grey periodic prediction model not only increases the complexity of the model structure, but also only works for periodic and regular fluctuation sequences. When the sequence has oscillation characteristic, the performance of grey periodic prediction model is poor.

The oscillation sequence is composed of different scales information, such as trend, randomness, periodicity, etc. It reflects the final result of the system under the influence of various uncertainties [2628]. A single prediction method is suitable for modelling with a single time scale sequence. It can not simultaneously simulate and predict two or more time scale information of the oscillation sequence, which ultimately can not get intended effect.

However, preprocessing complex sequence into simpler mode, has often led to satisfactory predicting results. Empirical mode decomposition (EMD) algorithm is a multi-scale analysis method. It decomposes complex oscillation sequences into a set of sub-sequences, which contain the information of the original sequence in different time scales [29]. According to the characteristics of sub-sequences, appropriate models are selected to simulate and predict the corresponding sub-sequences. Integrating the simulated and predicted values of sub-sequences will obtain the simulated and predicted value of the original sequence.

Decomposed by EMD algorithm, the small sample oscillation sequence is usually decomposed into two sub-sequences. One part is short-time trend sub-sequence. The other is one or more random fluctuation sub-sequences. GM(1,1) model is the most classical model in grey prediction model, needs only a little data (not less than 4). It excavates the trend of system through grey generation processing, and then achieves the effect of simulating and prediction. Therefore, GM(1,1) has superior performance in modelling with small trend sub-sequence. Random fluctuation subsequence is usually modelled by ARMA model. Based on the above facts, we use GM(1,1) model and ARMA model to simulate and predict sub-sequences, respectively. According to the result of decomposition, there may be other kinds of sub-sequences, but trend and fluctuation subsequences are the most common cases. Therefore, we mainly study the general situation and specifically analyse the other situations.

In this paper, a hybrid grey model for predicting small oscillation sequence is proposed based on information decomposition. In order to verify the validity of the proposed model, we select the crops disaster area in China as the modelling object, which has small oscillation characteristics. Comparing the simulation accuracy of the new model with that of the traditional ARIMA and GM(1,1) models, the result shows that the new model is obviously superior to the traditional model, which proves the validity of the new model.

The remainder of this paper is organized as follows. In section 2, the principle of empirical mode decomposition is introduced. In Section 3, the EMD-ARMA-GM(1,1) prediction model is proposed. In Section 4, modelling condition and testing method of model errors are studied. This is followed by comparisons of the proposed model with ARIMA and GM(1,1) model, and the proposed model is used to predict crops disaster area in China. Then, conclusions are drawn in Section 6.

A chart showing the structure of this paper is given as Figure 1.

2. Empirical Mode Decomposition Principle

Empirical mode decomposition (EMD) is a method of signal decomposition, which does not depend on prior data and completely relies on the intrinsic characteristic of the data itself. After EMD adaptively decomposed the original data according to its intrinsic characteristic, the obtained Intrinsic Mode Functions (IMFs) reflect the inherent characteristic of the data [30]. IMF satisfies the following two conditions at the same time: (i) in the whole data set, the number of extrema and the number of zero-crossings must either equal or differ at most by one; (ii) at any point, the mean value of the envelope defined by local maxima and the envelop defined by the local minima is zero [31]. The operation steps of the EMD algorithm for oscillation sequence are as follows [32]:

Step 1. Recognize all the maximum points and minimum points in sequence , and use cubic spline interpolation function to fit all the maximum points to form the upper envelope, and then fit all the minimum points to form the lower envelope, which are marked as and , respectively.

Step 2. In each time period , the average of upper and lower envelopes of sequence is denoted as , and is calculated as

Step 3. Minus the average envelope of sequence :If sequence has negative local maxima and positive local minima, then is regarded as a new original sequence . Repeat the above process until satisfies the two conditions of IMF. It is denoted as , where , which is called the first IMF component after decomposition of the original sequence .

Step 4. Sequence is separated from the original sequence and the residual component is obtained, which is denoted as , that is

Step 5. The residual component is regarded as a new original sequence, and the “filtering” process of Step 1 is repeated until the new IMF component can not be separated. At this time, the original sequence is “filtered” by EMD algorithm to get IMFs and one residual component, where

An example of the empirical mode decomposition of an oscillating sequence is shown in Figure 2.

3. A Hybrid Grey Prediction Model of EMD-ARMA-GM(1,1)

Definition 1. Assume sequence , , , , , is called oscillation sequence.

Definition 2. Assume is an oscillation sequence and satisfies the following condition,(a)(b), is a constant;(c), , is independence with ,then is a stationary oscillation sequence.

Definition 3. Assume is an oscillation sequence, are IMFs and residual components of decomposed by EMD algorithm, respectively. Thenis called ARMA model. When , is called the data. When , is called the predicted data.
In Equation (7), is a stationary oscillation sequence; is the ACF tail order of sequence and is the PACF tail order of ; , are real parameters and be estimated by identification function ARMAX.

Definition 4. Assume sequence is stated as Definition 3. is accumulating generation sequence with one order of , is mean sequence by consecutive neighbours of , where

Definition 5. Assume , , are stated as Definition 3 and Definition 4. Then the following equationis called GM (1,1) model, where , are real parameters.

Theorem 6. Assume that is parameter vector of Equation (10), wherethen the parameters of GM(1,1) model are identified as .

Proof. GM(1,1) model is rewritten as follows,Matrix form of GM(1,1) model is as follows,that isAmong the above equations, and are known and is sequence of undetermined parameters. The number of known equations is one, and the variables are two. Moreover, when the equations are incompatible, there is no solution, but the least square solution can be obtained by the least square method (LSM).
Assume is error sequence as follows,Letthat isAccording to LSM, can be minimised with respect to parameters , to obtainParameters , can be obtained, as follows,Equations (20) and (21) are the expanded displays of parameter identification matrix. The proof is over.

Definition 7. Assume sequence , , and are stated as Definition 4 and Theorem 6, then the following equationis named the whitenization equation of GM(1,1) model.
The solution of the differential Equation (22) is as follows,Equation (23) is also called the time response function of the whitenization differential equation. When , is called the simulated data; When , is called the predicted data.

Definition 8. Assume , are stated as Definitions 3 and 7, then the following equationis called EMD-ARMA-GM(1,1) model.

In the hybrid prediction model, EMD algorithm decomposes the original time series into sequence and sequence to extract intrinsic characters of the complex system. sequence is inputted into the ARMA model to describe the random changes and sequence is substituted into GM(1,1) model to describe the trend. The value obtained by superposition and realizes the simulation or prediction of the original sequence. The flow of EMD-ARMA-GM(1,1) model is shown in Figure 3.

4. Modelling Condition and Error Checking Method for the EMD-ARMA-GM(1,1) Model

4.1. Modelling Condition of the EMD-ARMA-GM(1,1) Model

Each prediction model has a specific modelling condition and applicable rang. A model can be used for prediction only when the modelling condition is satisfied.

Definition 9. Assume that , where for , then the following is referred to as the smoothness ratio of sequence :The concept of smoothness ratio reflects the smoothness of a sequence. Obviously, the smoother the change of sequence is, the smaller the smoothness ratio is.

Definition 10. If a sequence , where for satisfies the following, then is referred to as a quasi-smooth sequence:(1)(2)(3)The quasi-smooth condition of residual component is used to act as the criteria to test whether an oscillation sequence can be used to establish EMD-ARMA-GM(1,1) model.

4.2. Error Checking Method for the EMD-ARMA-GM(1,1) Model

A model’s performance can be judged by testing, and only the model that pass test can be meaningfully employed to make predictions.

Definition 11. Assume that a raw sequence isEMD-ARMA-GM(1,1) model is employed to simulate sequence , and its corresponding simulation sequence is as follows,The residual sequence of is , as follows,whereThe relative simulation percentage error (RSPE) of the simulation sequence iswhereThe mean relative simulation percentage error (MRSPE) of simulation sequence is as follows:For given threshold value in which the threshold is set according to the specific situation of the system, when holds true, the grey model is said to be error-satisfactory.

5. Application

China is a large agricultural country, but its special geographical location and climate environment lead to natural disasters frequently, which cause a large number of crops disasters every year. Large-scale crops disaster has seriously affected the national grain security, the basic status of agriculture and the sustainable development of rural economy. A scientific prediction of crops disaster areas can provide reasonable reference for arranging agricultural production subsidy and disaster relief subsidy, which has positive significance for promoting the sustainable development of agriculture and China’s economy.

The crops disaster in China has a long history. To prevent and mitigate disasters, Chinese government proposes and implements many significant policies since 2010. These policies have effectively improved the situation of crops disaster and profoundly influenced the crops disaster area in China. The data of crops disaster area in China from 2010 to 2017 are a small oscillation sequence.

The data of crops disaster area in China from 2010 to 2017 are shown in Table 1.

5.1. Data Decomposing

EMD algorithm is applied to decompose the sequence of crops disaster area in China, and an IMF1 and a residual component are obtained. The results are shown in Figures 4 and 5, respectively.

As can be seen from Figure 4, IMF1 is a curve of oscillations around the -axis, showing linear and random characteristic of original sequence.

In Figure 5, is a monotonic decreasing curve and shows the decreasing trend characteristic of the original sequence.

5.2. Checking the Quasi-Smooth Condition before Modelling

According to Definition 9, a sequence can be used to build the new model only when its residual component satisfies the quasi-smooth condition. Therefore, we check the quasi-smooth condition of sequence before building the new model to predict the crops disaster area.

From Definition 9, we can obtain the smoothness ratio of sequence and the values of smoothness ratio are shown in Table 2. , , and . Then, sequence of crops disaster area in China satisfies the quasi-smooth condition and can be used to build the new model. The modelling process is detailed in the next subsection.

5.3. Modelling

Firstly, IMF1 is introduced into ARMA model. By increasing its order gradually, IMF1 is closer to the dependence of data. When fitting effect of the data is best, it stops and gets the value of and . Next, the parameter identification function ARMAX is used to estimate , . The optimal order and parameters are obtained as shown in Table 3.

As shown in Table 3, the proper value of is 2 and is 1. So we use ARMA model to simulate IMF1, and draw the simulated curve of this model based on IMF1, as shown in Figure 6.

Next, is introduced into GM(1,1) and the parameters are estimated by least square method as shown in Table 4.

As provided in Table 4, We substitute the parameters into the whitening equation of GM(1,1) model, and get the simulated value of . The simulated curve of is shown in Figure 7.

Finally, through integrating the simulated values of IMF1 and , we can get the simulated value of China’s crops disaster area. The simulated curves of crops disaster area in China is shown in Figure 8.

5.4. Result and Analysis

To verify the performance of EMD-ARMA-GM(1,1) model, we compare the MRSPE of EMD-ARMA-GM(1,1) to that of traditional mainstream prediction models, including ARIMA model and GM(1,1) model. The simulated values , and MRSPE of the three models are presented in Table 5.

As shown in Table 5, the proposed EMD-ARMA-GM(1,1) model has the lowest MRSPE among the three models and the MRSPE is 4.0393%; the MRSPEs of the other two models are more than 10%. Comparatively, the performance of the GM(1,1) model is second to that of EMD-ARMA-GM(1,1) model because it does not consider the effect of random oscillation characteristic; the performance of the ARMA model is the worst among the three model because it does not consider the influence of trend characteristic. In order to clearly illustrate the simulation effects of the three models for China’s crops disaster area, we draw the simulated curves and errors of the three models based on the data in Table 5 in MATLAB as shown in Figures 912.

According to Figures 912, the performance of EMD-ARMA-GM(1,1) model is best among the above three models. Thus it is evident that the performance of EMD-ARMA-GM(1,1) model is better than that of traditional mainstream prediction models.

5.5. Prediction of Crops Disaster Area in China

The EMD-ARMA-GM(1,1) model is used to predict the crops disaster area in China from 2018 to 2021, and the results are shown in Table 6.

Table 6 shows that the overall trend of crops disaster area in China is decreasing in the next four years, but the crops disaster area is still very large. By 2021, it will reach 19633390 hectares. The large of crops disaster area may cause shortage of grain and inhibit rural economic. Therefore, in order to maintain the sustainable development of agriculture and national economy, the Chinese government needs to develop policies for production subsidies and disaster relief subsidies, and set aside sufficient funds to deal with the problems of crops failures caused by future natural disasters.

6. Conclusion

In this paper, the shortcomings of grey prediction model in modelling small oscillation sequence are analysed, and then we find out the reason why grey prediction model is not effective in predicting oscillation sequence by analyzing the intrinsic characteristics of oscillation sequence: the system of oscillation sequence is complex, and the trend and random oscillation are often combined. Therefore, based on information decomposition and aiming at extracting the intrinsic characteristics of the sequence, a hybrid grey prediction model is established in this paper. The results of case analysis show that the proposed model considers the complexity of system information, effectively describes the operation behavior and rules of the system, and the effect is higher than that of a single classical prediction model.

The new grey hybrid prediction model provides a new idea and method for small oscillation sequence. However, when the size of oscillation sequence is big, the big data methods can be used to simulate and predict the oscillation sequence, such as neural network and support vector machine. At this time, the performance of the new hybrid grey prediction model needs to be compared with that of the big data method, and the simulation and prediction errors can be used to determine the performance of those methods, and then the superior one is selected for study the oscillation sequence.

In the following work, we will further consider the other characteristics of the sub-sequence generated by EMD algorithm, and establish suitable methods to study the oscillation sequence.

Data Availability

The China’s crop disaster area data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (71771033).