Trivariate Stochastic Weather Model for Predicting Maize Yield

Chidzalo, Patrick; Ngare, Phillip O.; Mung’atu, Joseph K.

doi:https://doi.org/10.1155/2022/3633658

Journal of Applied Mathematics

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments Supplementary Materials References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 3633658 | https://doi.org/10.1155/2022/3633658

Trivariate Stochastic Weather Model for Predicting Maize Yield

Patrick Chidzalo,¹Phillip O. Ngare,²and Joseph K. Mung’atu¹

Academic Editor: Wei-Chiang Hong

Received12 Sept 2022

Revised21 Oct 2022

Accepted22 Oct 2022

Published15 Nov 2022

Abstract

Maize yield prediction in the sub-Saharan region is imperative for mitigation of risks emanating from crop loss due to changes in climate. Temperature, rainfall amount, and reference evapotranspiration are major climatic factors affecting maize yield. They are not only interdependent but also have significantly changed due to climate change, which causes nonlinearity and nonstationarity in weather data. Hence, there exists a need for a stochastic process for predicting maize yield with higher precision. To solve the problem, this paper constructs a joint stochastic process that acquires joints effects of the three weather processes from joint a probability density function (pdf) constructed using copulas that maintain interdependence. Stochastic analyses are applied on the pdf and process to account for nonlinearity and nonstationarity, and also establish a corresponding stochastic differential equation (SDE) for maize yield. The trivariate stochastic process predicts maize yield with and under a deep learning framework. Its aggregated values predict maize yield with up to 0.9765 and under common machine learning algorithms. Comparatively, the is 0.8829% and , under the maize yield SDE. Thus, the joint stochastic process can be used to predict maize yield with higher precision.

1. Introduction

Seasonal realizations of variation in climatic factors are responsible for loss in maize yield recorded in the sub-Saharan Africa. It has been shown that patterns in major weather elements have considerably changed in the region. In addition to factors emanating from adaptation to population rise, such as increase in cultivation area and the use of improved technology, climatic factors like temperature, rainfall, and reference evapotranspiration are the most significant determinants for maize yield in the region [1]. These climatic factors exhibit both comonotonic and counter-monotonic dependence nature [2], which makes derivation of maize yield forecast from them a difficult undertaking. Maize yield prediction is imperative in the process of alleviating the risk of loss in maize grain harvest due to weather changes through correct pricing of a crop harvest insurance or weather derivative hedge instruments.

The oldest approach of predicting maize yield is that of using a multivariate linear regression with climate, production potential, and management factors as explanatory variables. One challenge of this approach is the difficulty of collecting the data for climate and management factors simultaneously [3]. Secondly, management data is usually unavailable. Lastly, interdependence of climatic factors and linearity assumptions in the model imply that the model cannot accurately predict maize yield because the relationship between climate factors and yield is nonlinear [4]. To take care of error independence violation for incorporation of dependent weather variables in a linear regression, a time series model can be fitted which does not violate first order Markov-Chain assumption [5]. Most time series model might fail to account for change in climate because of stationarity assumption [6], which might lower crop prediction precision [7].

Another classical approach in literature [8] is that of fitting multivariate regressions through random forests, support vector machines, and artificial neural network in which both climate and nonclimate data are used to predict maize yield. These machine learning models can be trained to recognize nonlinearity in a data set [9]. This approach is suitable for experimental data set [10] in which a number of explanatory variables can be collected almost simultaneously. Precise crop yield forecast using one weather variable only at a time has been reported in many research works [11]. This approach has been more successful at predicting crop yield especially through quantile random forest and decision trees regression (DTR). The common problem of these approaches is that they do not account for simultaneous or joint effect and interdependence of weather random variables.

A viable way of constructing a process with joint effect of the weather variables is through construction of a joint probability distribution of rainfall, temperature, and reference transpiration, which, by definition, is a probability that the three weather conditions attain at most some values at a point simultaneously. To derive a multivariate distribution function, authors use a variety of methods such as mixtures, convolutions, variable transformations, and copulas [12]. In the case of obtaining probability distribution of a sum of random variables, the use of convolutions can be important [13]. In a mixture the domains of marginal probability distributions should be identical. Variable transformations usually assume independence or identical probability distributions [14].

A copula is different in a joint distribution that is formed from cumulative distribution functions rather than the probability distribution functions themselves. This eliminates the need for ensuring that domains are identical. This is so because it can be proved that a cumulative distribution function is a uniform random variable in the domain [15] . In addition, a copula obeys invariance property under comonotonic transformation like the cumulative distributions themselves. Therefore, a copula itself is a measure of dependence among random variables for which a joint distribution is to be formulated. Perfect comonotonic and counter-monotonic properties are satisfied by Fletcher-Hoeffding bounds [16].

It is very common, in literature, that authors use copulas to measure dependence; derive joint probability distributions; and calculate covolatility. The applications range from image processing, flood, to weather modeling in physics, statistics, and finance. The pioneering use of copulas in order to achieve similar aims of this paper can be seen in the works of Leobacher and Ngare [17] where they are using them to combine a Markov-Chain probability density with that of a gamma random variable for rainfall modeling. A composite modeling could as well achieve the same as seen in Dzupire et al. [18], because the Markov chain variable for rainfall frequency could simply be modeled as a Poisson or Negative binomial process. The trend was maintained till the work of Dzupire et al. [19] extended the modeling to two different weather processes of rainfall and temperature. The research was replicated by Bressan and Romagnoli [20] who never extended to three weather variables.

Common methods of constructing multivariate copulas include direct substitution method, conditioning, and nesting. The direct method involves that of simply substituting the cumulative distribution functions into a copula [21]. The only advantage of this method is that it is simple. It requires one to know exact formulas of copula. Archimedean copulas are well suited to this method. In addition this method requires variables to exhibit positive dependence. That is, it cannot be well suited to weather modeling that exhibit negative dependencies in some cases.

Once copulas are constructed, authors like Pietsch et al. [22], use uniform distribution sampling techniques that investigate whether pairwise dependencies are preserved in the distribution. Whenever these algorithms are applied to copulas obtained through the use of direct substitution method, pairwise dependencies preservation cannot be satisfied. Nesting also faces the same problem. It can pass on some but not all pairwise dependencies.

The most successful way of constructing copulas in a way that preserves pairwise dependencies is through use of conditioning. In this method, the copula is expressed as a product of two or three probabilities in lower variables. Acar et al. [23] used this type of method to construct trivariate copulas of vine type. Sometimes, components of the product cannot be easily constructed or their closed forms are not known. In Salvadori and De Michele [24] methods of approximating product components are presented. This clearly means that this method may not give unique copulas because the final copula is dependent on the method used to approximate a conditional copula in the product.

To take care of problems such as these, Plackett [25] derived a class of copulas called Plackett Family that are constructed by making sure that pairwise dependence is satisfied through a measure called product ratio. A trivariate copula is then constructed using another ratio by solving a fourth order polynomial. This method has been used to model drought variables by Song and Singh [26]. Further applications in engineering as well as advantages of this copula including easiness of getting measures pairwise dependence are fully described in Zhang and Singh [27].

In summary, there are three problems exhibited in literature. The first problem is that of existence of a joint stochastic process of the three major processes of temperature, rainfall amount, and reference evapotranspiration that affect maize yield, given that there are interdependent and have significantly changed over time. Weather data shows not only nonlinearity but also nonstationarity due to climate change. The second problem is of existence of stochastic model of maize yield that takes in the joint process as argument. In other words, how can the joint stochastic process be incorporated in a multivariate stochastic differential to predict maize yield? The last problem which is dependent on the first two problems is this: given such model is derived, how does it compare with common and most recent methods of grain yield prediction in sub-Saharan Africa?

The aim of this paper is to model and predict maize yield using a stochastic weather process of rainfall amount, temperature, and reference evapotranspiration. Specifically, we construct the stochastic process in a manner that takes care of joint or simultaneous effect of the three weather processes as well as the impact climate change which causes nonlinearity and nonstationarity in the weather data. By deriving these qualities from trivariate joint probability distribution and the Fokker-Planck equation, the joint process increases accuracy in weather elements and maize yield prediction.

In summary, this paper makes the following contributions: (i)We construct a joint stochastic process from the weather processes of temperature, rainfall amount, and reference evapotranspiration. Through copulas and sampling from solutions of Fokker-Planck equation in , the trivariate joint process captures interdependence, simultaneous weather events, and nonlinearity and nonstationarity information in the data that are caused by climate change. This makes it give precise prediction of weather variables even before using it in forecasting maize yield.(ii)We model maize yield in terms of the joint stochastic process. This model captures volatility variation, usual linear trend in maize yield, and also significant nonlinear and nonstationary trend caused by simultaneous effects of the weather variables.(iii)We predict maize yield using the stochastic maize yield, and the joint stochastic weather models constructed. The two stochastic processes give accurate and precise maize yield predictions under Monte-Carlo simulations, and machine learning methods. The results show that the joint stochastic process increases the performance of models in predicting maize yield.

The rest of this paper is organized as follows: stochastic models for each random variable are analyzed or derived in Section 2; joint probability densities of rainfall, temperature and reference evapotranspiration are derived and analyzed in Section 3; a joint stochastic process that follows the joint probability density is constructed and validated in Section 4, which is used to predict maize yield in Section 5. Conclusions are presented in the last section.

2. Stochastic Models

2.1. Rainfall Amount

In order to model rainfall process for derivative pricing or crop yield prediction, authors have used models that are based on Poisson, Gamma, and mixed Exponential distributions. Rainfall is considered to be a stochastic process consisting of two random variables: one representing frequency, which is a two state Markov Chain, and the other for rainfall amount. These variables can be modeled separately as in Cabrales et al. [5]. They can also be combined as a composite variable as in Dzupire et al. [18]. They are also studied jointly through copulas as in Leobacher and Ngare [17]. This paper considers modeling rainfall amount only. With respect to the data collected for this study, from Chitedze in Malawi, rainfall amount follows Gamma probability distribution (Figure 1(a)).

(a) Histogram for rainfall amount

(b) Rainfall amount vs quantiles

Let random variable represent daily rainfall amounts, then the probability density of the is given by the form in

It can also be assumed that ’s are autocorrelated, that is . If the probability density [23] satisfies Fokker-Planck partial differential equation, then the corresponding stochastic differential is given by where is a standard Brownian motion. Ross [28] defines the Wiener Process as the stochastic process such that , , , , and is the independent process. The model in Equation (2) is an example of Ito’s diffusion equality of the form where is called drift coefficient and the diffusion coefficient. Thus, Model [29] can be derived by using the method in Bykhovsky [30] as follows:

Putting ([7, 8] in Equation (3) gives [29].

2.2. Reference Evapotranspiration

The following formula is used to calculate daily reference evapotranspiration () data for Malawian conditions [2]

is the reference evapotranspiration measured in , is solar radiation in mega joules per square meter, is soil heat flux density in the same units of solar radiation, is the average of maximum and minimum temperatures in degree Celsius, is the wind speed in meters per second, is the actual water vapor measured in kilo pascals, is the pressure gradient in kilo pascals per unit of temperature, and is the constant of phychometry.

There has been no consensus in the suitable type of probability distribution for evapotranspiration. Khanmohammadi [31] fitted annual reference evapotranspiration data to Lognormal, generalized logistic, and Pearson III distributions and realized that Pearson III was the most appropriate. Uliana et al. [32] fitted evapotranspiration to Gamma distribution, which is a type of Pearson III distribution. Msowoya et al. [33] fitted the evapotranspiration data to the generalized logistic distribution. This suggests that for daily data and sub-Saharan conditions, the best candidate distributions are generalized logistic and the Gamma distribution. The data used for this study shows that reference evapotranspiration also follows gamma distribution (Figure 2).

Therefore, reference evapotranspiration can also be modeled by the stochastic differential equation of the form in

2.3. Daily Temperature

Authors usually model temperature () using the mean reverting Ornstein–Uhlenbeck process [34] of the form where is cyclic seasonal mean, is daily average temperature, is rate of mean reversion, and is seasonal volatility of . The advantages of using this model is that the model takes into account of the fact that, given a source of randomness in temperature, tends to move back to its seasonal trend say and the speed of doing that is captured by .

Dzupire et al. [19] assume that is Normal Inverse Gaussian Levy process. This would mean that is an independent process [35]. To use this model in maize prediction, one can derive a new partial differential equation of the Ornstein-Unlenbeck process, and then, calculate derivative predictions using Alaton or finite difference method. Such forecasts could not be computationally realistic as temperature is also dependent on rainfall as well as reference evapotranspiration. Alternatively, one could assume that is fractional Brownian motion. Under this assumption, the model would take the following form [36]

This would explain how the seasonal mean temperature would be updated by mean reverting process. The disadvantage of such assumption would be that predictions estimated from a stochastic equations realized from such assumption would not be computationally more realistic than that one assuming Levy Process. The stochastic differential equation in Model [30] can be expressed in the form

Therefore, if , then is approximately normally distributed with mean and variance [37] whenever and . Similarly, if , then is approximately normally distributed with mean and variance .

To prove this, consider

and that

Then by the method in Bykhovsky [30], we have

Hence, the stochastic differential equation in Model [14] can be expressed in the form

The stochastic models in Equations (13) and (10) are suitable for the temperature data at hand (Figure 3). Table 1 shows that we cannot reject the null hypothesis that the data for the variables at hand follow the assumed probability distributions.

(a) Histogram for daily temperature

(b) Temperature vs. quantiles

3. Trivariate Probability Density

3.1. Verification of Pairwise Dependence

The bivariate copulas, from Frank family, are used to find joint cumulative distributions , , and . Frank family of copulas are picked because, unlike other types of copula, they admit both negative (counter monotonic) and positive (comonotonic) dependencies [27]. This is done to ensure preservation of pairwise dependencies in the trivariate copula. The bivariate Frank copula is given by Equation [38].

Since is related to Kendall’s by the formula where , is estimated using nonparametric method in Equation (15). Upon estimating three values of Kendall’s , corresponding values are estimated using in MATLAB2021a. The results (Table 2) are summarized.

The estimates of both and for rainfall and temperature are found to be negative and smaller in magnitude, which suggests slight counter-monotonic dependence. Similar interpretation can be conveyed for rainfall and evapotranspiration pair, where the negative dependence seems to be slightly higher. The values for the relationship between reference evapotranspiration and temperature are all positive and bigger in magnitude, which suggests higher positive dependence (Figure 4(b)). Hence, the joint cumulative distribution were found using the relations: , and , where , , and .

(a) Joint CDF

(b) Copula density

To verify pairwise dependence, the random variable that follow each of the joint distribution is calculated using the nonparametric method in Equation (16)

The expected values of this random variable are then calculated using the method in Equation (17). Then, a plot, which is a graph of the order statistic against is plotted. The curve existing below line represents counter-monotonic dependence; coinciding with implies independence, and moving above represents comonotonic dependence [38].

Figure 5(a) suggests that there is higher comonotonic (positive) dependence between temperature and reference evapotranspiration , because the curve is on the upper side of the diagonal line and much closer to the curve of perfect positive dependence. For rainfall and evapotranspiration, and also rainfall and temperature, the curve is located below the line , which suggests negative (counter-monotonic) dependence. However, Figure 5(c) suggests that there is slight dependence between temperature and rainfall, because the curve is much closer to the line . The curve does not coincide with the independence straight line , which suggests that there is interdependence among the three variables. To get significant evidence of interdependence among the three variables, cross product ratios are calculated among pairs using the following formula in Equation [9].

(a) vs. temperature

(b) vs. rainfall

(c) Temperature vs. rainfall

In addition, a nonparametric form of the above equation can also be used. Its formula is given by formula in where ’s are medians. Both methods were used to calculate the values of the cross-product ratios. Nonparametric method allows for checking whether the data can give reliable estimates of the ’s or not.

Figure 6 shows that the cross-product ratio for the three pairs are constant (Figure 6), which agrees with hypothesis in Plackett [25]. The central lines (averages) are summarized in Table 3.

Clearly, , and (Table 3), which implies that there is significant negative dependence between rainfall and evapotranspiration, and also between rainfall and temperature. Similarly, which means there is significantly high positive interdependence between temperature and evapotranspiration. In addition, the nonparametric estimations from the data set are close to the parametric ones because the sample size is high. Hence, nonparametric method are used for .

3.2. Trivariate Probability Density

The trivariate cross product ratio is given by formula in where , , , , , , , and .

Since , and there is need to calculate the trivariate copula, nonparametric methods are applied to estimate . In order to use such methods as these, the probabilities are replaced by counts that satisfy the conditions for which the probabilities in the formula can be calculated. For example, is replaced by and can also be substituted by [39], where ’s are medians. The estimates are done using the data sets for , and also using estimated in the first section. The results in Table 4 are obtained. The fourth order polynomial in Equation (21) is solved to get trivariate copula of the form by using the results in all previous steps where , , , , , , , and . Factor method is used instead of Newton-Raphson method, because by fundamental theorem of algebra, every polynomial has a solution in . The Newton-Raphson method requires one to guess the initial value of , which is difficult in Plackett copulas [40]. The factor method gives four solutions at each value of coefficient. The cross-product ratio is always constant. The solution that satisfies definition of copula and Frechet-Hoeffding bounds is chosen. The results are summarized in Figure 7. This figure shows how the values of trivariate joint cumulative distribution change with time in the growing seasons between 2019 and 2021. It also shows changes in rainfall in the same growing seasons.

Since the cumulative marginal distribution functions are continuous, Sklar’s Theorem [16] guarantees existence of a copula such that . Assuming smoothness of and the definition of joint probability density function

Since there is no closed form of , the values of are calculated using algorithm in Zhang and Singh [27] and also using the fact that

Figure 8 shows three scatter plots of the joint against each weather variable for data sampled during the maize growing seasons only. The patterns are different and points rise considerably to greater height which informs uniqueness and maintenance of pairwise dependence.

(a) vs. temperature

(b) vs. rainfall

(c) vs. evapotranspiration

The trivariate Frank copula that is not symmetric is given by the Formula (24). This is calculated through nesting method of the form [41]. A closed form of can be calculated using symbolic integration in MATLAB2021a or Python 3.10.1. Such expression is very complicated but it is the exact copula density. Figure 9 are scatter plots of against the weather variables. The points pattern are different which suggests that the information of pairwise dependence is passed on to the trivariate copula. However, the points are dense at the bottom which suggests that uniqueness is not guaranteed by the copula. where .

(a) vs. temperature

(b) vs. rainfall

(c) vs. evapotranspiration

The definition of conditional probability density provides that . It can also be proved that

Thus, Equation (25) can be expressed as follows

Using results in the previous section

Uniqueness of can be affected if one chooses to be . However, for Frank copula, , can attain the following form where .

The parameter is estimated by pseudo-maximum likelihood estimation method [42]. Let , , . We have

Let , which is the denominator.

Let , which is the coefficient of in right hand side.

Let , which is the coefficient of in the numerator. Let , which is the coefficient of in the right hand side.

Then, the whole expression is reduced to

If , then

This is an implicit function of because contains . Thus, it can be solved recursively. It is also solved explicitly by assuming , because by putting the numerator tends faster to than the denominator . Using MATLAB, under the second assumption we have

Since , we choose the parameter to be . Assuming that , we solve this equation recursively for . Since , we initialize to . The results are compared with those for , where is assumed to be , in Figure 10.

Clearly, the estimate of whenever reverts to that of calculated upon assuming that (Figure 10). Hence, we take as reasonable estimate of . In Acar et al. [23] such estimate as this is also a function of . However, this quantity is estimated as cubic regression which is avoided in the estimation adopted in this study.

Figure 11 suggests also that the dependence structure of the three variables is maintained since the scatter plots are different. It also suggests that there is higher level of uniqueness because the points are not clustered at the bottom. It also suggests that the vine copula is better than the other copulas with respect to the data at hand.

(a) vs. temperature

(b) vs. rainfall

(c) vs. evapotranspiration

Akaike Information Criterion (AIC) is given by , where is the maximum value of likelihood function (MLF), and is the number of parameters in the joint distribution function. Let and be MLF’s for determined through Plackett copula, Frank copula, and conditioning method.

The Schwartz Bayesian Information Criterion (SBIC) is given by . Table 5 shows that the Frank trivariate copula obtained through nesting is not well suited to the data at hand because it has high value of AIC. It also suggests that copula vine, copula found through conditioning is comparable to the Plackett copula.

4. Joint Stochastic Process

Definition 1. An Ito’s differential equation is the one of the form where and ’s are drift functions, ’s are diffusion coefficient functions, ’s are correlation coefficients, and ’s are Wiener processes.

Definition 2. Let and where is copula density. Then the Kolmogorov forward (Fokker-Planck) [43] equation is given

In order to use the Fokker Plank equation we put and . The Fokker-Planck equation can be solved by an operator splitting technique with six split subproblems. Thus, where

Solving for yields the following equation that also depends on copula density

This result is also true for the second split subproblem because both rainfall and reference evapotranspiration follow gamma distribution. This mean . Hence, without loss of generalization, the third and forth split subproblems can also be expressed as follows

Solving for yields the following

Coupling results in Equations (42) and (44) yields

Considering the fifth split subproblem and also the fact that and where is a constant that does not explicitly contain variables. That is,

Finding the second order partial derivatives results in

Solving this for gives the following results

Picking the last split subproblem, we have

Isolating yields the following

Combining the results in Equations (49) and (51) yields

Thus, the whole Fokker-Planck equation is reduced to the form

Since , then the copula density can also be approximated by the formula in Equation [44].

Numerical values of , using results in Equations (45) and (52) give values that are in the interval .

In order to use ’s, , , and in the Stochastic Differential Equations (2), (3), and (6), their values were modeled by use of truncated Fourier series. Temperature and evapotranspiration data for cool and hot dry seasons were omitted in the analysis. The stochastic differential equations were solved through Monte Carlo Euler-Muruyama method. The results (Figures 12, 13(d)–13(f)), show that the stochastic differential equations predict weather variables are with higher precision.

(a) Prediction of temperature

(b) Prediction of evapotranspiration

(a) Predicting

(b) Predicting

(c) per growing season

(d) Rainfall vs. temperature

(e) Temperature vs.

(f) vs. rainfall

We propose a joint process such that where . To derive , let process , which is a random variable whose probability density is . Assume , , and , where ’s are cross-correlations among the standard Brownian motions. By Ito’s lemma we have

If is a Wiener process that is independent of both and , then it can be proved that . Similarly, . Therefore, . Choosing , , , , , and , the process defined by Equation (56) is the proposed multivariate process whose probability distribution is given by Equation (22). The results of modeling the process , both under uniform and nonuniform noises assumption, are given in Figures 13(a) and 13(b).

The coefficients of determination in predicting from Model [11] are per season (Figure 13(c)). The overall coefficient of determination is for the model is under correlated ’s assumption is 0.8899 and was estimated using the formula . The values of were also estimated per growing season (Figure 13(c)) and 95% confidence intervals for each values was estimated using , where [45, 46], is length of growing season, and is number of parameters in the SDE. Therefore, the stochastic process predict wether variables with high precision.

5. Maize Yield Modeling and Prediction

The aim of this section is to model maize yield through stochastic models. According to Allen [29], amount produced by unit change in follows the differential equation where is the volatility of process , ’s are weight parameters, and is the number of times ’s are observed in time . The Equation (57) has continuous form below

Maize yield is realized once at the end of growing season. We assume that the mass of grains of the maize seed only represents the maize yield. As such we can put and therefore, . This assumption means that record of maize yield are taken at the end of the growing season. Thus, it can be assumed that is a function of time in years only, that is . Under this assumption, we have so that the model in Equation [32] can be expressed as

The definition of Riemann integral requires that

The number of days in a season to which maize plants are exposed to rainfall, temperature, and evapotranspiration, respectively, is constant. The plants are exposed to the last two processes continuously but rainfall does not fall every second. Hence, it can be assumed that . Thus,

The changes in the processes have an effect on variations in yields. However, the amounts of temperature, rainfall, and evapotranspiration have significant impact on maize yield in the region. Under such a scenario is replaced by . Thus, this section assumes that

The weights can also be assumed to be constant in one growing season only and hence, one can write so that

Considering the last components of Equation (59), the definition of integration, assumptions made in this subsection, and properties of partial derivatives require that

Since is not a smooth function, it can be assumed that

The constant of proportionality is chosen to be and hence, the whole model for maize yield can be expressed as

This stochastic maize model(SMM) is slightly different from the model in Equation (57). The constant for each growing season can be estimated through a pseudo-maximum likelihood techniques according to where , and . Thus, and . Finding , where yields

Putting , gives

The variable is modeled as simple linear regression first to take on impacts of time or technological changes and also to be consistent with the assumptions above. Then, is found by calculating standard deviation of each random error of the regression and then finding a quadratic linear regression of the standard deviations. These are included in model in the Equation (66) to include the effect of ’s on maize yield. Thus, the proposed maize yield model is taking into account both linear trend model and the joint model . Another model to predict maize could be that of choosing intervals , such that for each realization of Model [26],

This is modeling is called decision tree regression [47] and it was implemented in python. The most common method for predicting crop yield is the neural network (NN) model of the form [9]

In the context of this study, stands for the vector of values estimated by model in Equation (55), ’s are vectors of parameters to estimated in order to estimate , and ’s are activation functions. In this case, is chosen to be to minimize computational time [44]. The advantage of model in Equation (72) is that values of which are calculated daily need not to be aggregated. Their contribution to a single annual value of yield is determined by weights. Another advantage is that different joint probability densities, estimated in previous sections, can be compared through the loss function of the form where is Lagrange multiplier. In order to find ’s and that minimize , gradient descent optimization method of the form , where , is applied. Numerical methods are implemented in python.

Thus, maize yield estimate is . The metrics are summarized in Table 6. Similarly, we also predict maize yield by finding Support vector machine regression (SVM) of the form [48] where ’s are Lagrange multipliers, are maize yield data values from training sample, and is the model used to predict maize yield. Figure 14 summarizes the methodology used to achieve the objectives in order to derive maize yield .

The results in Table 7 shows that Frank-Copula is not suitable for modeling the trivariate density in order to predict maize yield because is negative. This can also be seen in Figure 15(b) in which the maize yield prediction is always lower than the actual values. Similarly, the forecast from Plackett density and in the neural network is accurate with and . The seasonal aggregates of ’s in SVM predict maize yield with and which a bit lower than those of Decision trees where and . These results are better than the training sample is for growing seasons between and .

(a) Yield from SVM (74)

(b) Yield from neural network (72)

(c) Yield from decision trees (70)

(d) Yield from multivariate SDE (66)

We also conducted backtesting (BT), for the models that performed well, using historic data set between and . The results show that the probability density for Vine and Placket have comparable precision. In fact, the vine probability density gives and MAPE values that are slightly better than those for Plackett probability density.

We also conducted uncertainty and reliability analysis using common formulas in literature [49].

The results are summarized (Table 8). It is clear, from uncertainty analysis, that if the study is replicated using the same methods, then the interval for which results can be found to be different are quite similar for all the methods. Reliability was calculated as the probability that absolute percentage error (APE) is less than . This cut off point was arrived from the results in Table 7 in which the MAPE for models with high performance is generally less than . It is very plausible to note that the stochastic maize yield model, despite having modesty and slightly higher of MAPE, has lowest scatter index (SI) and highest reliability (). This is a strong evidence that the trivariate stochastic model can predict maize yield with higher accuracy in many models. The low reliability for Frank copula is an indication that it is not suitable for modeling the joint stochastic model.

6. Conclusion

The main aim of this paper is to predict maize yield from a stochastic process of rainfall amount, temperature, and reference evapotranspiration. To achieve this, trivariate probability distribution functions were derived through Plackett, Frank asymmetric, and vine copulas. It is clear that the Frank asymmetric 3-copula, derived through nesting, is not suitable for both modeling the joint stochastic weather and maize yield processes. The Plackett and vine copulas are suitable for modeling the joint stochastic weather and maize yield processes.

The stochastic process derived the quality to account for nonlinearity and nonstationarity through satisfaction of Fokker-Planck equation with much success. The evidence for this is the fact that it has led to precise and accurate maize yield forecasts in frameworks of the derived multivariate stochastic maize yield process, the neural networks, and the common machine learning methods. Thus, the joint stochastic weather process is suitable for predicting maize yield.

Data Availability

Data can be provided on formal request from the authors

Conflicts of Interest

Authors declare no conflict of interest as regards to publication of this paper.

Acknowledgments

The research is funded by the Pan African University of Basic Science, Technology and Innovations, and the African Union

Supplementary Materials

algorithms written in MATLAB2021a and Python 3.10.1 are provided as supplementary materials. (Supplementary Materials)

References

A. L. Hoffman, A. R. Kemanian, and C. E. Forest, “Analysis of climate signals in the crop yield record of sub-Saharan Africa,” Global Change Biology, vol. 24, no. 1, pp. 143–157, 2018.
View at: Publisher Site | Google Scholar
A. M. Komarek, C. Thierfelder, and P. R. Steward, “Conservation agriculture improves adaptive capacity of cropping systems to climate stress in Malawi,” Agricultural Systems, vol. 190, article 103117, 2021.
View at: Publisher Site | Google Scholar
T. Van-Klompenburg, A. Kassahun, and C. Catal, “Crop yield prediction using machine learning: a systematic literature review,” Computers and Electronics in Agriculture, vol. 177, article 105709, 2020.
View at: Publisher Site | Google Scholar
J. Pant, R. Pant, M. Singh, M. Singh, and H. Pant, “Analysis of agricultural crop yield prediction using statistical techniques of machine learning,” Materials Today: Proceedings, vol. 46, pp. 10922–10926, 2021.
View at: Publisher Site | Google Scholar
S. Cabrales, J. Solano, C. Valencia, and R. Bautista, “Pricing rainfall derivatives in the equatorial pacific,” Agricultural Finance Review, vol. 80, no. 4, pp. 589–608, 2020.
View at: Publisher Site | Google Scholar
S. Khaki and L. Wang, “Crop yield prediction using deep neural networks,” Frontiers in Plant Science, vol. 10, p. 621, 2019.
View at: Publisher Site | Google Scholar
G. Benrhmach, K. Namir, A. Namir, and J. Bouyaghroumni, “Nonlinear autoregressive neural network and extended Kalman filters for prediction of financial time series,” Journal of Applied Mathematics, vol. 2020, Article ID 5057801, 6 pages, 2020.
View at: Publisher Site | Google Scholar
R. Ben Ayed and M. Hanana, “Artificial intelligence to improve the food and agriculture sector,” Journal of Food Quality, vol. 2021, Article ID 5584754, 7 pages, 2021.
View at: Publisher Site | Google Scholar
S. A. Gyamerah, P. Ngare, and D. Ikpe, “Probabilistic forecasting of crop yields via quantile random forest and epanechnikov kernel function,” Agricultural and Forest Meteorology, vol. 280, article 107808, 2020.
View at: Publisher Site | Google Scholar
S. Gupta, A. Geetha, K. S. Sankaran et al., “Machine learning- and feature selection-enabled framework for accurate crop yield prediction,” Journal of Food Quality, vol. 2022, Article ID 6293985, 7 pages, 2022.
View at: Publisher Site | Google Scholar
J. Shook, T. Gangopadhyay, L. Wu, B. Ganapathysubramanian, S. Sarkar, and A. K. Singh, “Crop yield prediction integrating genotype and weather variables using deep learning,” PLoS One, vol. 16, no. 6, article e0252402, 2021.
View at: Publisher Site | Google Scholar
R.-G. Cong and M. Brady, “The interdependence between rainfall and temperature: copula analyses,” The Scientific World Journal, vol. 2012, Article ID 405675, 11 pages, 2012.
View at: Publisher Site | Google Scholar
R. Hogg, J. Mc Kean, and A. Craig, Introduction to Mathematical Statistics, Pearson, Springer Science & Business Media, Boston, Massachusetts, vol. 202, 2019.
V. H. L. Dávila, C. R. B. Cabral, and C. B. Zeller, “Scale mixtures of skew-normal distributions,” Finite Mixture of Skewed Distributions, Springer, pp. 15–36, 2018.
View at: Publisher Site | Google Scholar
A. J. Patton, “A review of copula models for economic time series,” Journal of Multivariate Analysis, vol. 110, pp. 4–18, 2012.
View at: Publisher Site | Google Scholar
R. B. Nelsen, An Introduction to Copulas, Springer Science & Business Media, 2007.
G. Leobacher and P. Ngare, “On modelling and pricing rainfall derivatives with seasonality,” Applied Mathematical Finance, vol. 18, no. 1, pp. 71–91, 2011.
View at: Publisher Site | Google Scholar
N. C. Dzupire, P. Ngare, and L. Odongo, “A poisson-gamma model for zero inflated rainfall data,” Journal of Probability and Statistics, vol. 2018, Article ID 1012647, 12 pages, 2018.
View at: Publisher Site | Google Scholar
N. C. Dzupire, P. Ngare, and L. Odongo, “A copula based bi-variate model for temperature and rainfall processes,” Scientific African, vol. 8, article e00365, 2020.
View at: Publisher Site | Google Scholar
G. M. Bressan and S. Romagnoli, “Climate risks and weather derivatives: a copula-based pricing model,” Journal of Financial Stability, vol. 54, article 100877, 2021.
View at: Publisher Site | Google Scholar
C. Genest, J. Nešlehová, and N. B. Ghorbal, “Estimators based on Kendall's tau in multivariate copula models,” Australian & New Zealand Journal of Statistics, vol. 53, no. 2, pp. 157–177, 2011.
View at: Publisher Site | Google Scholar
C. Pietsch, U. Kelter, and T. Kehrer, “From pairwise to family-based generic analysis of delta-oriented model-based spls,” in Proceedings of the 25th ACM International Systems and Software Product Line Conference - Volume A, pp. 13–24, Leicester, United Kingdom, 2021.
View at: Publisher Site | Google Scholar
E. F. Acar, C. Genest, and J. Nešlehová, “Beyond simplified pair-copula constructions,” Journal of Multivariate Analysis, vol. 110, pp. 74–90, 2012.
View at: Publisher Site | Google Scholar
G. Salvadori and C. De Michele, “Statistical characterization of temporal structure of storms,” Advances in Water Resources, vol. 29, no. 6, pp. 827–842, 2006.
View at: Publisher Site | Google Scholar
R. L. Plackett, “A class of bivariate distributions,” Journal of the American Statistical Association, vol. 60, no. 310, pp. 516–522, 1965.
View at: Publisher Site | Google Scholar
S. Song and V. P. Singh, “Frequency analysis of droughts using the plackett copula and parameter estimation by genetic algorithm,” Stochastic Environmental Research and Risk Assessment, vol. 24, no. 5, pp. 783–805, 2010.
View at: Publisher Site | Google Scholar
L. Zhang and V. P. Singh, Copulas and their Applications in Water Resources Engineering, Cambridge University Press, 2019.
View at: Publisher Site
S. M. Ross, Introduction to Probability Models, Academic Press, 2014.
E. J. Allen, “Derivation of stochastic partial differential equations,” Stochastic Analysis and Applications, vol. 26, no. 2, pp. 357–378, 2008.
View at: Publisher Site | Google Scholar
D. Bykhovsky, “Simple generation of gamma, gamma–gamma, and k distributions with exponential autocorrelation function,” Journal of Lightwave Technology, vol. 34, no. 9, pp. 2106–2110, 2016.
View at: Publisher Site | Google Scholar
N. Khanmohammadi, H. Rezaie, M. Montaseri, and J. Behmanesh, “Regional probability distribution of the annual reference evapotranspiration and its effective parameters in Iran,” Theoretical and Applied Climatology, vol. 134, no. 1-2, pp. 411–422, 2018.
View at: Publisher Site | Google Scholar
E. M. Uliana, D. D. da Silva, J. G. F. da Silva, M. de Fraga, and L. Lisboa, “Estimate of reference evapotranspiration through continuous probability modelling,” Engenharia Agrícola, vol. 37, no. 2, pp. 257–267, 2017.
View at: Publisher Site | Google Scholar
K. Msowoya, K. Madani, R. Davtalab, A. Mirchi, and J. R. Lund, “Climate change impacts on maize production in the warm heart of Africa,” Water Resources Management, vol. 30, no. 14, pp. 5299–5312, 2016.
View at: Publisher Site | Google Scholar
P. Li, “Pricing weather derivatives with partial differential equations of the Ornstein-Uhlenbeck process,” Computers & Mathematics with Applications, vol. 75, no. 3, pp. 1044–1059, 2018.
View at: Publisher Site | Google Scholar
T. Wang, “Ergodic convergence rates for time-changed symmetric Levy processes in dimension one,” Statistics & Probability Letters, vol. 183, article 109343, 2022.
View at: Publisher Site | Google Scholar
S. Prabakarana, I. Garciab, and J. Morac, “A temperature stochastic model for option pricing and its impacts on the electricity market,” Economic Analysis and Policy, vol. 68, pp. 58–77, 2020.
View at: Publisher Site | Google Scholar
S. A. Gyamerah, P. Ngare, and D. Ikpe, “Regime-switching temperature dynamics model for weather derivatives,” International Journal of Stochastic Analysis, vol. 2018, Article ID 8534131, 15 pages, 2018.
View at: Publisher Site | Google Scholar
C. Genest and A.-C. Favre, “Everything you always wanted to know about copula modeling but were afraid to ask,” Journal of Hydrologic Engineering, vol. 12, no. 4, pp. 347–368, 2007.
View at: Publisher Site | Google Scholar
S.-C. Kao and R. S. Govindaraju, “Trivariate statistical analysis of extreme rainfall events via the plackett family of copulas,” Water Resources Research, vol. 44, no. 2, 2008.
View at: Publisher Site | Google Scholar
F. N. Ogana, J. J. Gorgoso-Varela, and J. S. A. Osho, “Modelling joint distribution of tree diameter and height using frank and plackett copulas,” Journal of Forestry Research, vol. 31, no. 5, pp. 1681–1690, 2020.
View at: Publisher Site | Google Scholar
F. Durante and C. Sempi, Principles of Copula Theory, CRC Press Boca Raton, Boca Raton, FL, vol. 474, 2016.
H. Joe and D. Kurowicka, Eds., Dependence Modeling: Vine Copula Handbook, World Scientific, 2011.
N. Ince and A. Shamilov, “An application of new method to obtain probability density function of solution of stochastic differential equations,” Applied Mathematics and Nonlinear Sciences, vol. 5, no. 1, pp. 337–348, 2020.
View at: Publisher Site | Google Scholar
E. Samikwa, T. Voigt, and J. Eriksson, “Flood prediction using IoT and artificial neural networks with edge computing,” in 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), pp. 234–240, Rhodes, Greece, 2020.
View at: Publisher Site | Google Scholar
A. Barzkar, M. Najafzadeh, and F. Homaei, “Evaluation of drought events in various climatic conditions using data-driven models and a reliability-based probabilistic model,” Natural Hazards, vol. 110, no. 3, pp. 1931–1952, 2022.
View at: Publisher Site | Google Scholar
D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Computer Science, vol. 7, article e623, 2021.
View at: Publisher Site | Google Scholar
J. Suzuki, Statistical Learning with Math and Python, Springer, 2021.
View at: Publisher Site
C. M. Bishop and N. M. Nasrabadi, Pattern Recognition and Machine Learning, Springer, vol. 4, no. 4, 2006.
F. Saberi-Movahed, M. Najafzadeh, and A. Mehrpooya, “Receiving more accurate predictions for longitudinal dispersion coefficients in water pipelines: training group method of data handling using extreme learning machine conceptions,” Water Resources Management, vol. 34, no. 2, pp. 529–561, 2020.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Patrick Chidzalo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

540

Downloads

551

Citations