Abstract

The monthly precipitation data from 29 stations in Serbia during the period of 1946–2012 were considered. Precipitation trends were calculated using linear regression method. Three CLINO periods (1961–1990, 1971–2000, and 1981–2010) in three subregions were analysed. The CLINO 1981–2010 period had a significant increasing trend. Spatial pattern of the precipitation concentration index (PCI) was presented. For the purpose of PCI prediction, three Support Vector Machine (SVM) models, namely, SVM coupled with the discrete wavelet transform (SVM-Wavelet), the firefly algorithm (SVM-FFA), and using the radial basis function (SVM-RBF), were developed and used. The estimation and prediction results of these models were compared with each other using three statistical indicators, that is, root mean square error, coefficient of determination, and coefficient of efficiency. The experimental results showed that an improvement in predictive accuracy and capability of generalization can be achieved by the SVM-Wavelet approach. Moreover, the results indicated the proposed SVM-Wavelet model can adequately predict the PCI.

1. Introduction

Precipitation is one of the important climatic variables due to its changes in the intensity and the amount affecting appearing of the hydrological hazards such as flood and drought [1]. Therefore, numerous studies on precipitation variability and development of statistical indices to evaluate the changes of precipitation have been undertaken [27]. In this study, the precipitation concentration index (PCI) is analysed. The PCI allows quantifying the relative distribution of precipitation patterns. It also provides a good presentation to the spatial variability of monthly precipitation [5, 8] and information on long-term total variability in the precipitation amount record [9, 10]. The PCI can be used as an indicator of hydrological hazard risks such as floods and droughts.

In this study, the prediction model of PCI is introduced using the soft computing method, namely, the Support Vector Machine (SVM). The SVM, one of the novel soft computing learning algorithms, has found wide application in the field of computing, hydrology, and environmental science [1116]. Furthermore, it has been majorly applied in pattern recognition, forecasting, classification, and regression analysis [1720]. The most commonly used kernels include linear, polynomial, and radial basis function (RBF), whose selection depends on the nature of the observed data [21]. Shamshirband et al. [22] used adaptive neurofuzzy inference system (ANFIS) and support vector regression (SVR) for precipitation estimation, while S. Chattopadhyay and G. Chattopadhyay [23], Nastos et al. [24], and Wu and Chau [25] applied artificial neural networks (ANNs). Chen et al. [26] implemented SVM and multivariate analysis to project daily precipitation. Meyer et al. [27] compared four machine learning algorithms for their applicability in rainfall retrievals.

Metaheuristic optimization algorithms such as ant colony optimization (ACO), genetic algorithm (GA), particle swarm optimization (PSO), and cuckoo search (CS) have been applied in different fields of science [2837]. These algorithms are based on the mechanism of selection of the fittest in biological systems. A more recent approach in biological inspired metaheuristic optimization algorithms is firefly algorithm (FFA) developed by Yang [38]. The FFA has been adjudged to be more efficient and robust in finding both local and global optima compared to other biological inspired optimization algorithms [3943]. The prediction accuracy of the SVM model highly relies on proper determination of model parameters [4447]. Although organized strategies for selecting parameters are important, model parameter alignment also needs to be made. In this study, the FFA is used for determination of SVM parameters, while the SVM was coupled with discrete wavelet transform.

Wavelet transform (WT) has a number of basis functions for selection that depends on the analysed signal. Wavelet analysis was used to decompose the time series of data into its various components, after which the decomposed components can be used as inputs for the SVM model. Over the past few years, this technique has become of enormous interest in engineering applications [4851]. Nalley et al. [52] used discrete wavelet transform (DWT) to analyse trends in precipitation in Canada, while Hsu and Li [53] clustered spatial-temporal precipitation data using WT. Partal and Kucuk [54] analysed long-term precipitation trend using DWT in Turkey. Kisi and Cimen [55] applied wavelet-Support Vector Machine conjunction model for daily precipitation forecast and concluded the proposed model increases the forecast accuracy.

The objectives of the current study are as follows: to provide presentation of the spatial variability of monthly precipitation and information on long-term total variability in the precipitation data using precipitation concentration index and to construct, develop, and evaluate the results of SVM-Wavelet, SVM-FFA, and SVM-RBF for PCI prediction.

2. Materials and Methods

2.1. Study Area and Used Data

Monthly precipitation data were chosen from 29 meteorological stations in Serbia (Figure 1(a)) over the period of 1946–2012. Data were obtained from the Republic Hydro Meteorological Service of Serbia (http://www.hidmet.gov.rs/). There are no missing values in the data set.

According to Gocic and Trajkovic [56], precipitation increases with the altitude; that is, dry areas in the northeast part of Serbia have the precipitation below 600 mm, and the area along the valley of the South Morava to Vranje has the precipitation to 650 mm, while in the mountains precipitation may rise up to 1000 mm per year. The mean annual precipitation for the observed period for the whole country is 662.4 mm. The spatial distribution of the mean annual precipitation in Serbia for the analysed period is illustrated in Figure 1(b).

2.2. Methodology for Precipitation Analysis

The spatial distribution of the number of wet and dry years can be obtained using a transformed annual precipitation departure for each station aswhere is the annual precipitation, is the annual mean precipitation, and is the standard deviation of the annual precipitation. The dry year existed, where , and wet one existed if [57].

Precipitation concentration index (PCI) [58] is calculated as follows:where is the precipitation amount in month . Classification of PCI values is shown in Table 1.

2.3. Soft Computing Methodologies
2.3.1. Support Vector Machine

Support Vector Machine (SVM) [59, 60] is based on machine learning theory to maximize predictive accuracy; that is, where is a normal vector, is the regularization term, is the error penalty factor, is a bias, is the loss function, is the input vector, is the target value, is the number of elements in the training data set, is a feature space, and and are upper and lower excess deviation.

The architecture of SVM is shown in Figure 2. The kernel function, that is, radial basis function (RBF) is denoted as where variables and are vectors in the input space and is the regularization parameter. Lagrange multipliers are presented as

The accuracy of prediction is based on the selection of three parameters, that is, , and , whose values are determined using firefly algorithm.

2.3.2. Firefly Algorithm

The firefly algorithm (FFA) [38, 61, 62] is based on the behaviour of insect named firefly. The major issues in FFA development are the formulation of the objective function and the variation of the light intensity.

A firefly is a kind of insects that uses the principle of bioluminescence to attract mates or prey. The luminance produced by a firefly enables other fireflies to trail its path in searching of their prey. This concept of luminance production helps in the development of algorithms that solve optimization problems.

For example, in the optimal design problem involving maximization of objective function, the fitness function is proportional to the brightness or the amount of light emitted by the firefly. Therefore, decreasing in the light intensity due to distance between the fireflies will lead to variations of intensity and thereby lessen the attractiveness among them. The light intensity with varying distance can be represented as where is the light intensity at distance from a firefly, represents initial light intensity, that is, when , and is the light absorption coefficient. As firefly’s attractiveness is proportional to the light intensity observed by adjacent fireflies, the attractiveness at a distance from the firefly can be represented aswhere represents the attractiveness at distance .

The Cartesian distance between any two fireflies and is given byThe movement of firefly as attracted to another brighter firefly and can be represented aswhere the first term in the equation is due to the attraction, the second term represents the randomization with as randomization coefficient, and is the random number vector derived from a Gaussian distribution. The next movement of firefly is updated asSteps in FFA development are presented in Figure 3.

2.3.3. Discrete Wavelet Transform

The wavelet transform (WT) represents a mathematical expression for decomposing a time series’ frequency signal into different components. In this study, wavelet analysis was used to decompose the time series of precipitation data into various components, after which the decomposed components were used as inputs for the SVM model. Flow chart of discrete wavelet algorithm, that is, used to determine SVM parameters, is shown in Figure 4.

Continuous wavelet transform (CWT) of a signal is a time-scale technique of signal processing that can be defined as the integral of all signals over the entire period multiplied by the scaled, shifted versions of the wavelet function , given mathematically as where is the mother wavelet function, is the scale index parameter, that is, inverse of the frequency, and is the time shifting parameter, also known as translation. The discrete wavelet transform (DWT) can be derived by discretizing (10), where the parameters and are given as follows:where the variables and are integers. Replacing and in (10) gives

2.4. Evaluating Accuracy of Proposed Models

In this study, the following statistical indicators were applied to compare the developed SVM models:

root mean square error (RMSE):

coefficient of determination :

coefficient of efficiency (EI):where and are the experimental and predicted values of PCI index, respectively, and is the size of test data.

3. Results and Discussion

3.1. Analysis of Precipitation Distribution

The number of dry and wet years is tabulated in Table 2. The most frequented number of dry years is in the north of Serbia, while the number of wet years is greater than the number of dry years in the west of country. The number of dry years is 20, while the number of wet years is 19 for whole Serbia.

According to Gocic and Trajkovic [56], three precipitation subregions were detected: subregion (12 stations) is located in the north part of the country with the precipitation ranging from 223 to 1051 mm and the average value of precipitation of 608.2 mm, subregion (7 stations) is the wettest one and includes stations in the west of country with the precipitation between 385 and 1282 mm and with the mean value of precipitation of 784.5 mm, and subregion (10 stations) in the east and south part of Serbia with precipitation between 302 and 1113 mm and the mean of precipitation of 623.3 mm.

The annual precipitation shows an increasing trend in Serbia during the period of 1946–2012 (stronger in and ). Three CLINO periods (1961–1990, 1971–2000, and 1981–2010) were illustrated in Figure 5. The CLINO period 1981–2010 shows a significant increasing trend at all subregions. The most precipitation falls in June and has the value of 80.8 mm in Serbia (41.15% of total precipitation in summer), which is directly connected with the intensive convection of colder and humid, usually maritime, air masses.

Precipitation distribution is determined using the PCI. Figure 6 illustrates the spatial distribution of PCI in Serbia. The minimum PCI values were detected in Zlatibor (10.43) and Pozega (10.83), while the maximum was in Negotin (12.49). The majority of the stations had the values between 11.12 in Sjenica and 11.94 in Banatski Karlovac.

3.2. Performance Evaluation of Proposed SVM Models

Precipitation data was used to obtain six parameters such as annual total precipitation, mean winter precipitation amount, mean spring precipitation amount, mean summer precipitation amount, mean autumn precipitation amount, and mean of precipitation for vegetable period (April–September). For the experiments, 38 years (57% of data) was used to train samples and the subsequent 29 years (43% of data) served to test samples. Table 3 illustrates six variables using the following statistical indicators, that is, the minimum, maximum, median, mean, standard deviation, and skewness.

In this study, three SVM models, that is, SVM-Wavelet, SVM-RBF, and SVM-FFA, were analysed to predict the PCI index. The RBF was implemented as the kernel function to obtain three parameters, , , and , whose selection directly influences prediction accuracy. Table 4 provides the optimal values of parameters for the proposed SVM models. Firefly algorithm founds optimal SVM parameters according to searching algorithm. For the SVM-Wavelet and SVM-RBF approaches the parameters are selected manually after several trial and error iterations.

To evaluate SVM model performance, calculated PCI was plotted against the predicted ones. Figure 7(a) presents the accuracy of developed SVM-Wavelet PCI predictive model, while Figures 7(b) and 7(c) present the accuracy of developed SVM-RBF and SVM-FFA PCI predictive models, respectively. The most of the points fall along the diagonal line for the SVM-Wavelet prediction model. It means the prediction results are in a very good agreement with the measured values for the SVM-Wavelet model. The confirmation of this is the high value for   ().

Figure 8 illustrates the spatial distribution of PCI in Serbia using three SVM methods, that is, SVM-Wavelet, SVM-FFA, and SVM-RBF. According to the obtained results, it can be concluded that the spatial distribution using values of SVM-Wavelet method is similar to the spatial distribution in Figure 6.

3.3. Performance Comparison of SVM Models

To illustrate the performance characteristics of the developed SVM models for PCI prediction, three SVM models’ prediction accuracies were compared with each other. The statistical indicators such as RMSE, , and EI were used for comparison. Table 5 summarizes the prediction results for test data sets since training error is not credible indicator for prediction potential of particular model. Results in Table 5 are obtained for the same number of runs and according to the multiple runs average results are calculated for each method. The same number of interactions is used in order to make the comparison fair and accurate. SVM-Wavelet produced better results than the other two approaches since wavelet algorithm decomposes nonlinear series in multiple linearized series in order to make it easier to regress.

The SVM-Wavelet model outperforms the SVM-RBF and the SVM-FFA models according to the obtained results. The predictions from the SVM models correlate highly with the actual PCI data.

4. Conclusion

The study carried out a systematic approach to create the SVM models for the PCI prediction such as SVM-Wavelet, SVM-RBF, and SVM-FFA. The proposed SVM-Wavelet model was obtained by combining two methods, that is, the SVM and the wavelet transform. The RBF was selected as the kernel function for the SVM, while the FFA was used to obtain the SVM parameters.

Each of these SVM approaches has some advantages and disadvantages. SVM-FFA has firefly searching algorithm in order to find optimal SVM parameters. Wavelet approach divides series into subgroups in order to make it more linear and at the end all groups are merged. SVM-RBF approach is the basic approach with manual estimation of SVM parameters. Therefore SMV-RBF results are not as good as the other two approaches as was presented.

A comparison of the SVM-Wavelet, the SVM-RBF, and the SVM-FFA was performed in order to assess the prediction accuracy. Accuracy results, measured in terms of RMSE, , and EI, indicate that SVM-Wavelet predictions are superior to the SVM-RBF and the SVM-FFA.

The main advantages of the SVM schemes are as follows: computationally efficient and well-adaptable with optimization and adaptive techniques. The developed strategy is not only simple, but also reliable and may be easy to implement in real time applications using some interfacing cards for control of various parameters. This can be combined with expert systems and rough sets for other applications.

The further research will test the proposed soft computing methods in a different part of the world and different climate types to confirm the results. Also, some hybrid soft computing models will be applied to compare with the developed models presented in this study.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The study is supported by the Ministry of Education, Science and Technological Development, Republic of Serbia (Grant no. TR37003) and the ICT COST Action IC1408 Computationally Intensive Methods for the Robust Analysis of Non-Standard Data (CRoNoS). This research is supported by University of Malaya under UMRG grant (Project no. RP036A-15AET: Efficient Detection and Reporting of Flood Wastes using Wireless Sensor Network RP036B-15AET: Measurement of the Flood Waste Volume based on the Digital Image RP036C-15AET: Intelligent Estimation and Prediction of Flood Wastes).