Abstract

Maize yield prediction in the sub-Saharan region is imperative for mitigation of risks emanating from crop loss due to changes in climate. Temperature, rainfall amount, and reference evapotranspiration are major climatic factors affecting maize yield. They are not only interdependent but also have significantly changed due to climate change, which causes nonlinearity and nonstationarity in weather data. Hence, there exists a need for a stochastic process for predicting maize yield with higher precision. To solve the problem, this paper constructs a joint stochastic process that acquires joints effects of the three weather processes from joint a probability density function (pdf) constructed using copulas that maintain interdependence. Stochastic analyses are applied on the pdf and process to account for nonlinearity and nonstationarity, and also establish a corresponding stochastic differential equation (SDE) for maize yield. The trivariate stochastic process predicts maize yield with and under a deep learning framework. Its aggregated values predict maize yield with up to 0.9765 and under common machine learning algorithms. Comparatively, the is 0.8829% and , under the maize yield SDE. Thus, the joint stochastic process can be used to predict maize yield with higher precision.

1. Introduction

Seasonal realizations of variation in climatic factors are responsible for loss in maize yield recorded in the sub-Saharan Africa. It has been shown that patterns in major weather elements have considerably changed in the region. In addition to factors emanating from adaptation to population rise, such as increase in cultivation area and the use of improved technology, climatic factors like temperature, rainfall, and reference evapotranspiration are the most significant determinants for maize yield in the region [1]. These climatic factors exhibit both comonotonic and counter-monotonic dependence nature [2], which makes derivation of maize yield forecast from them a difficult undertaking. Maize yield prediction is imperative in the process of alleviating the risk of loss in maize grain harvest due to weather changes through correct pricing of a crop harvest insurance or weather derivative hedge instruments.

The oldest approach of predicting maize yield is that of using a multivariate linear regression with climate, production potential, and management factors as explanatory variables. One challenge of this approach is the difficulty of collecting the data for climate and management factors simultaneously [3]. Secondly, management data is usually unavailable. Lastly, interdependence of climatic factors and linearity assumptions in the model imply that the model cannot accurately predict maize yield because the relationship between climate factors and yield is nonlinear [4]. To take care of error independence violation for incorporation of dependent weather variables in a linear regression, a time series model can be fitted which does not violate first order Markov-Chain assumption [5]. Most time series model might fail to account for change in climate because of stationarity assumption [6], which might lower crop prediction precision [7].

Another classical approach in literature [8] is that of fitting multivariate regressions through random forests, support vector machines, and artificial neural network in which both climate and nonclimate data are used to predict maize yield. These machine learning models can be trained to recognize nonlinearity in a data set [9]. This approach is suitable for experimental data set [10] in which a number of explanatory variables can be collected almost simultaneously. Precise crop yield forecast using one weather variable only at a time has been reported in many research works [11]. This approach has been more successful at predicting crop yield especially through quantile random forest and decision trees regression (DTR). The common problem of these approaches is that they do not account for simultaneous or joint effect and interdependence of weather random variables.

A viable way of constructing a process with joint effect of the weather variables is through construction of a joint probability distribution of rainfall, temperature, and reference transpiration, which, by definition, is a probability that the three weather conditions attain at most some values at a point simultaneously. To derive a multivariate distribution function, authors use a variety of methods such as mixtures, convolutions, variable transformations, and copulas [12]. In the case of obtaining probability distribution of a sum of random variables, the use of convolutions can be important [13]. In a mixture the domains of marginal probability distributions should be identical. Variable transformations usually assume independence or identical probability distributions [14].

A copula is different in a joint distribution that is formed from cumulative distribution functions rather than the probability distribution functions themselves. This eliminates the need for ensuring that domains are identical. This is so because it can be proved that a cumulative distribution function is a uniform random variable in the domain [15] . In addition, a copula obeys invariance property under comonotonic transformation like the cumulative distributions themselves. Therefore, a copula itself is a measure of dependence among random variables for which a joint distribution is to be formulated. Perfect comonotonic and counter-monotonic properties are satisfied by Fletcher-Hoeffding bounds [16].

It is very common, in literature, that authors use copulas to measure dependence; derive joint probability distributions; and calculate covolatility. The applications range from image processing, flood, to weather modeling in physics, statistics, and finance. The pioneering use of copulas in order to achieve similar aims of this paper can be seen in the works of Leobacher and Ngare [17] where they are using them to combine a Markov-Chain probability density with that of a gamma random variable for rainfall modeling. A composite modeling could as well achieve the same as seen in Dzupire et al. [18], because the Markov chain variable for rainfall frequency could simply be modeled as a Poisson or Negative binomial process. The trend was maintained till the work of Dzupire et al. [19] extended the modeling to two different weather processes of rainfall and temperature. The research was replicated by Bressan and Romagnoli [20] who never extended to three weather variables.

Common methods of constructing multivariate copulas include direct substitution method, conditioning, and nesting. The direct method involves that of simply substituting the cumulative distribution functions into a copula [21]. The only advantage of this method is that it is simple. It requires one to know exact formulas of copula. Archimedean copulas are well suited to this method. In addition this method requires variables to exhibit positive dependence. That is, it cannot be well suited to weather modeling that exhibit negative dependencies in some cases.

Once copulas are constructed, authors like Pietsch et al. [22], use uniform distribution sampling techniques that investigate whether pairwise dependencies are preserved in the distribution. Whenever these algorithms are applied to copulas obtained through the use of direct substitution method, pairwise dependencies preservation cannot be satisfied. Nesting also faces the same problem. It can pass on some but not all pairwise dependencies.

The most successful way of constructing copulas in a way that preserves pairwise dependencies is through use of conditioning. In this method, the copula is expressed as a product of two or three probabilities in lower variables. Acar et al. [23] used this type of method to construct trivariate copulas of vine type. Sometimes, components of the product cannot be easily constructed or their closed forms are not known. In Salvadori and De Michele [24] methods of approximating product components are presented. This clearly means that this method may not give unique copulas because the final copula is dependent on the method used to approximate a conditional copula in the product.

To take care of problems such as these, Plackett [25] derived a class of copulas called Plackett Family that are constructed by making sure that pairwise dependence is satisfied through a measure called product ratio. A trivariate copula is then constructed using another ratio by solving a fourth order polynomial. This method has been used to model drought variables by Song and Singh [26]. Further applications in engineering as well as advantages of this copula including easiness of getting measures pairwise dependence are fully described in Zhang and Singh [27].

In summary, there are three problems exhibited in literature. The first problem is that of existence of a joint stochastic process of the three major processes of temperature, rainfall amount, and reference evapotranspiration that affect maize yield, given that there are interdependent and have significantly changed over time. Weather data shows not only nonlinearity but also nonstationarity due to climate change. The second problem is of existence of stochastic model of maize yield that takes in the joint process as argument. In other words, how can the joint stochastic process be incorporated in a multivariate stochastic differential to predict maize yield? The last problem which is dependent on the first two problems is this: given such model is derived, how does it compare with common and most recent methods of grain yield prediction in sub-Saharan Africa?

The aim of this paper is to model and predict maize yield using a stochastic weather process of rainfall amount, temperature, and reference evapotranspiration. Specifically, we construct the stochastic process in a manner that takes care of joint or simultaneous effect of the three weather processes as well as the impact climate change which causes nonlinearity and nonstationarity in the weather data. By deriving these qualities from trivariate joint probability distribution and the Fokker-Planck equation, the joint process increases accuracy in weather elements and maize yield prediction.

In summary, this paper makes the following contributions: (i)We construct a joint stochastic process from the weather processes of temperature, rainfall amount, and reference evapotranspiration. Through copulas and sampling from solutions of Fokker-Planck equation in , the trivariate joint process captures interdependence, simultaneous weather events, and nonlinearity and nonstationarity information in the data that are caused by climate change. This makes it give precise prediction of weather variables even before using it in forecasting maize yield.(ii)We model maize yield in terms of the joint stochastic process. This model captures volatility variation, usual linear trend in maize yield, and also significant nonlinear and nonstationary trend caused by simultaneous effects of the weather variables.(iii)We predict maize yield using the stochastic maize yield, and the joint stochastic weather models constructed. The two stochastic processes give accurate and precise maize yield predictions under Monte-Carlo simulations, and machine learning methods. The results show that the joint stochastic process increases the performance of models in predicting maize yield.

The rest of this paper is organized as follows: stochastic models for each random variable are analyzed or derived in Section 2; joint probability densities of rainfall, temperature and reference evapotranspiration are derived and analyzed in Section 3; a joint stochastic process that follows the joint probability density is constructed and validated in Section 4, which is used to predict maize yield in Section 5. Conclusions are presented in the last section.

2. Stochastic Models

2.1. Rainfall Amount

In order to model rainfall process for derivative pricing or crop yield prediction, authors have used models that are based on Poisson, Gamma, and mixed Exponential distributions. Rainfall is considered to be a stochastic process consisting of two random variables: one representing frequency, which is a two state Markov Chain, and the other for rainfall amount. These variables can be modeled separately as in Cabrales et al. [5]. They can also be combined as a composite variable as in Dzupire et al. [18]. They are also studied jointly through copulas as in Leobacher and Ngare [17]. This paper considers modeling rainfall amount only. With respect to the data collected for this study, from Chitedze in Malawi, rainfall amount follows Gamma probability distribution (Figure 1(a)).

Let random variable represent daily rainfall amounts, then the probability density of the is given by the form in

It can also be assumed that ’s are autocorrelated, that is . If the probability density [23] satisfies Fokker-Planck partial differential equation, then the corresponding stochastic differential is given by where is a standard Brownian motion. Ross [28] defines the Wiener Process as the stochastic process such that , , , , and is the independent process. The model in Equation (2) is an example of Ito’s diffusion equality of the form where is called drift coefficient and the diffusion coefficient. Thus, Model [29] can be derived by using the method in Bykhovsky [30] as follows:

Putting ([7, 8] in Equation (3) gives [29].

2.2. Reference Evapotranspiration

The following formula is used to calculate daily reference evapotranspiration () data for Malawian conditions [2]

is the reference evapotranspiration measured in , is solar radiation in mega joules per square meter, is soil heat flux density in the same units of solar radiation, is the average of maximum and minimum temperatures in degree Celsius, is the wind speed in meters per second, is the actual water vapor measured in kilo pascals, is the pressure gradient in kilo pascals per unit of temperature, and is the constant of phychometry.

There has been no consensus in the suitable type of probability distribution for evapotranspiration. Khanmohammadi [31] fitted annual reference evapotranspiration data to Lognormal, generalized logistic, and Pearson III distributions and realized that Pearson III was the most appropriate. Uliana et al. [32] fitted evapotranspiration to Gamma distribution, which is a type of Pearson III distribution. Msowoya et al. [33] fitted the evapotranspiration data to the generalized logistic distribution. This suggests that for daily data and sub-Saharan conditions, the best candidate distributions are generalized logistic and the Gamma distribution. The data used for this study shows that reference evapotranspiration also follows gamma distribution (Figure 2).

Therefore, reference evapotranspiration can also be modeled by the stochastic differential equation of the form in

2.3. Daily Temperature

Authors usually model temperature () using the mean reverting Ornstein–Uhlenbeck process [34] of the form where is cyclic seasonal mean, is daily average temperature, is rate of mean reversion, and is seasonal volatility of . The advantages of using this model is that the model takes into account of the fact that, given a source of randomness in temperature, tends to move back to its seasonal trend say and the speed of doing that is captured by .

Dzupire et al. [19] assume that is Normal Inverse Gaussian Levy process. This would mean that is an independent process [35]. To use this model in maize prediction, one can derive a new partial differential equation of the Ornstein-Unlenbeck process, and then, calculate derivative predictions using Alaton or finite difference method. Such forecasts could not be computationally realistic as temperature is also dependent on rainfall as well as reference evapotranspiration. Alternatively, one could assume that is fractional Brownian motion. Under this assumption, the model would take the following form [36]

This would explain how the seasonal mean temperature would be updated by mean reverting process. The disadvantage of such assumption would be that predictions estimated from a stochastic equations realized from such assumption would not be computationally more realistic than that one assuming Levy Process. The stochastic differential equation in Model [30] can be expressed in the form

Therefore, if , then is approximately normally distributed with mean and variance [37] whenever and . Similarly, if , then is approximately normally distributed with mean and variance .

To prove this, consider

and that

Then by the method in Bykhovsky [30], we have

Hence, the stochastic differential equation in Model [14] can be expressed in the form

The stochastic models in Equations (13) and (10) are suitable for the temperature data at hand (Figure 3). Table 1 shows that we cannot reject the null hypothesis that the data for the variables at hand follow the assumed probability distributions.

3. Trivariate Probability Density

3.1. Verification of Pairwise Dependence

The bivariate copulas, from Frank family, are used to find joint cumulative distributions , , and . Frank family of copulas are picked because, unlike other types of copula, they admit both negative (counter monotonic) and positive (comonotonic) dependencies [27]. This is done to ensure preservation of pairwise dependencies in the trivariate copula. The bivariate Frank copula is given by Equation [38].

Since is related to Kendall’s by the formula where , is estimated using nonparametric method in Equation (15). Upon estimating three values of Kendall’s , corresponding values are estimated using in MATLAB2021a. The results (Table 2) are summarized.

The estimates of both and for rainfall and temperature are found to be negative and smaller in magnitude, which suggests slight counter-monotonic dependence. Similar interpretation can be conveyed for rainfall and evapotranspiration pair, where the negative dependence seems to be slightly higher. The values for the relationship between reference evapotranspiration and temperature are all positive and bigger in magnitude, which suggests higher positive dependence (Figure 4(b)). Hence, the joint cumulative distribution were found using the relations: , and , where , , and .

To verify pairwise dependence, the random variable that follow each of the joint distribution is calculated using the nonparametric method in Equation (16)

The expected values of this random variable are then calculated using the method in Equation (17). Then, a plot, which is a graph of the order statistic against is plotted. The curve existing below line represents counter-monotonic dependence; coinciding with implies independence, and moving above represents comonotonic dependence [38].

Figure 5(a) suggests that there is higher comonotonic (positive) dependence between temperature and reference evapotranspiration , because the curve is on the upper side of the diagonal line and much closer to the curve of perfect positive dependence. For rainfall and evapotranspiration, and also rainfall and temperature, the curve is located below the line , which suggests negative (counter-monotonic) dependence. However, Figure 5(c) suggests that there is slight dependence between temperature and rainfall, because the curve is much closer to the line . The curve does not coincide with the independence straight line , which suggests that there is interdependence among the three variables. To get significant evidence of interdependence among the three variables, cross product ratios are calculated among pairs using the following formula in Equation [9].

In addition, a nonparametric form of the above equation can also be used. Its formula is given by formula in where ’s are medians. Both methods were used to calculate the values of the cross-product ratios. Nonparametric method allows for checking whether the data can give reliable estimates of the ’s or not.

Figure 6 shows that the cross-product ratio for the three pairs are constant (Figure 6), which agrees with hypothesis in Plackett [25]. The central lines (averages) are summarized in Table 3.

Clearly, , and (Table 3), which implies that there is significant negative dependence between rainfall and evapotranspiration, and also between rainfall and temperature. Similarly, which means there is significantly high positive interdependence between temperature and evapotranspiration. In addition, the nonparametric estimations from the data set are close to the parametric ones because the sample size is high. Hence, nonparametric method are used for .

3.2. Trivariate Probability Density

The trivariate cross product ratio is given by formula in where , , , , , , , and .

Since , and there is need to calculate the trivariate copula, nonparametric methods are applied to estimate . In order to use such methods as these, the probabilities are replaced by counts that satisfy the conditions for which the probabilities in the formula can be calculated. For example, is replaced by and can also be substituted by [39], where ’s are medians. The estimates are done using the data sets for , and also using estimated in the first section. The results in Table 4 are obtained. The fourth order polynomial in Equation (21) is solved to get trivariate copula of the form by using the results in all previous steps where , , , , , , , and . Factor method is used instead of Newton-Raphson method, because by fundamental theorem of algebra, every polynomial has a solution in . The Newton-Raphson method requires one to guess the initial value of , which is difficult in Plackett copulas [40]. The factor method gives four solutions at each value of coefficient. The cross-product ratio is always constant. The solution that satisfies definition of copula and Frechet-Hoeffding bounds is chosen. The results are summarized in Figure 7. This figure shows how the values of trivariate joint cumulative distribution change with time in the growing seasons between 2019 and 2021. It also shows changes in rainfall in the same growing seasons.

Since the cumulative marginal distribution functions are continuous, Sklar’s Theorem [16] guarantees existence of a copula such that . Assuming smoothness of and the definition of joint probability density function

Since there is no closed form of , the values of are calculated using algorithm in Zhang and Singh [27] and also using the fact that

Figure 8 shows three scatter plots of the joint against each weather variable for data sampled during the maize growing seasons only. The patterns are different and points rise considerably to greater height which informs uniqueness and maintenance of pairwise dependence.

The trivariate Frank copula that is not symmetric is given by the Formula (24). This is calculated through nesting method of the form [41]. A closed form of can be calculated using symbolic integration in MATLAB2021a or Python 3.10.1. Such expression is very complicated but it is the exact copula density. Figure 9 are scatter plots of against the weather variables. The points pattern are different which suggests that the information of pairwise dependence is passed on to the trivariate copula. However, the points are dense at the bottom which suggests that uniqueness is not guaranteed by the copula. where .

The definition of conditional probability density provides that . It can also be proved that

Thus, Equation (25) can be expressed as follows

Using results in the previous section

Uniqueness of can be affected if one chooses to be . However, for Frank copula, , can attain the following form where .

The parameter is estimated by pseudo-maximum likelihood estimation method [42]. Let , , . We have

Let , which is the denominator.

Let , which is the coefficient of in right hand side.

Let , which is the coefficient of in the numerator. Let , which is the coefficient of in the right hand side.

Then, the whole expression is reduced to

If , then

This is an implicit function of because contains . Thus, it can be solved recursively. It is also solved explicitly by assuming , because by putting the numerator tends faster to than the denominator . Using MATLAB, under the second assumption we have

Since , we choose the parameter to be . Assuming that , we solve this equation recursively for . Since , we initialize to . The results are compared with those for , where is assumed to be , in Figure 10.

Clearly, the estimate of whenever reverts to that of calculated upon assuming that (Figure 10). Hence, we take as reasonable estimate of . In Acar et al. [23] such estimate as this is also a function of . However, this quantity is estimated as cubic regression which is avoided in the estimation adopted in this study.

Figure 11 suggests also that the dependence structure of the three variables is maintained since the scatter plots are different. It also suggests that there is higher level of uniqueness because the points are not clustered at the bottom. It also suggests that the vine copula is better than the other copulas with respect to the data at hand.

Akaike Information Criterion (AIC) is given by , where is the maximum value of likelihood function (MLF), and is the number of parameters in the joint distribution function. Let and be MLF’s for determined through Plackett copula, Frank copula, and conditioning method.

The Schwartz Bayesian Information Criterion (SBIC) is given by . Table 5 shows that the Frank trivariate copula obtained through nesting is not well suited to the data at hand because it has high value of AIC. It also suggests that copula vine, copula found through conditioning is comparable to the Plackett copula.

4. Joint Stochastic Process

Definition 1. An Ito’s differential equation is the one of the form where and ’s are drift functions, ’s are diffusion coefficient functions, ’s are correlation coefficients, and ’s are Wiener processes.

Definition 2. Let and where is copula density. Then the Kolmogorov forward (Fokker-Planck) [43] equation is given

In order to use the Fokker Plank equation we put and . The Fokker-Planck equation can be solved by an operator splitting technique with six split subproblems. Thus, where

Solving for yields the following equation that also depends on copula density

This result is also true for the second split subproblem because both rainfall and reference evapotranspiration follow gamma distribution. This mean . Hence, without loss of generalization, the third and forth split subproblems can also be expressed as follows

Solving for yields the following

Coupling results in Equations (42) and (44) yields

Considering the fifth split subproblem and also the fact that and where is a constant that does not explicitly contain variables. That is,

Finding the second order partial derivatives results in

Solving this for gives the following results

Picking the last split subproblem, we have

Isolating yields the following

Combining the results in Equations (49) and (51) yields

Thus, the whole Fokker-Planck equation is reduced to the form

Since , then the copula density can also be approximated by the formula in Equation [44].

Numerical values of , using results in Equations (45) and (52) give values that are in the interval .

In order to use ’s, , , and in the Stochastic Differential Equations (2), (3), and (6), their values were modeled by use of truncated Fourier series. Temperature and evapotranspiration data for cool and hot dry seasons were omitted in the analysis. The stochastic differential equations were solved through Monte Carlo Euler-Muruyama method. The results (Figures 12, 13(d)13(f)), show that the stochastic differential equations predict weather variables are with higher precision.

We propose a joint process such that where . To derive , let process , which is a random variable whose probability density is . Assume , , and , where ’s are cross-correlations among the standard Brownian motions. By Ito’s lemma we have

If is a Wiener process that is independent of both and , then it can be proved that . Similarly, . Therefore, . Choosing , , , , , and , the process defined by Equation (56) is the proposed multivariate process whose probability distribution is given by Equation (22). The results of modeling the process , both under uniform and nonuniform noises assumption, are given in Figures 13(a) and 13(b).

The coefficients of determination in predicting from Model [11] are per season (Figure 13(c)). The overall coefficient of determination is for the model is under correlated ’s assumption is 0.8899 and was estimated using the formula . The values of were also estimated per growing season (Figure 13(c)) and 95% confidence intervals for each values was estimated using , where [45, 46], is length of growing season, and is number of parameters in the SDE. Therefore, the stochastic process predict wether variables with high precision.

5. Maize Yield Modeling and Prediction

The aim of this section is to model maize yield through stochastic models. According to Allen [29], amount produced by unit change in follows the differential equation where is the volatility of process , ’s are weight parameters, and is the number of times ’s are observed in time . The Equation (57) has continuous form below

Maize yield is realized once at the end of growing season. We assume that the mass of grains of the maize seed only represents the maize yield. As such we can put and therefore, . This assumption means that record of maize yield are taken at the end of the growing season. Thus, it can be assumed that is a function of time in years only, that is . Under this assumption, we have so that the model in Equation [32] can be expressed as

The definition of Riemann integral requires that

The number of days in a season to which maize plants are exposed to rainfall, temperature, and evapotranspiration, respectively, is constant. The plants are exposed to the last two processes continuously but rainfall does not fall every second. Hence, it can be assumed that . Thus,

The changes in the processes have an effect on variations in yields. However, the amounts of temperature, rainfall, and evapotranspiration have significant impact on maize yield in the region. Under such a scenario is replaced by . Thus, this section assumes that

The weights can also be assumed to be constant in one growing season only and hence, one can write so that

Considering the last components of Equation (59), the definition of integration, assumptions made in this subsection, and properties of partial derivatives require that

Since is not a smooth function, it can be assumed that

The constant of proportionality is chosen to be and hence, the whole model for maize yield can be expressed as

This stochastic maize model(SMM) is slightly different from the model in Equation (57). The constant for each growing season can be estimated through a pseudo-maximum likelihood techniques according to where , and . Thus, and . Finding , where yields

Putting , gives

The variable is modeled as simple linear regression first to take on impacts of time or technological changes and also to be consistent with the assumptions above. Then, is found by calculating standard deviation of each random error of the regression and then finding a quadratic linear regression of the standard deviations. These are included in model in the Equation (66) to include the effect of ’s on maize yield. Thus, the proposed maize yield model is taking into account both linear trend model and the joint model . Another model to predict maize could be that of choosing intervals , such that for each realization of Model [26],

This is modeling is called decision tree regression [47] and it was implemented in python. The most common method for predicting crop yield is the neural network (NN) model of the form [9]

In the context of this study, stands for the vector of values estimated by model in Equation (55), ’s are vectors of parameters to estimated in order to estimate , and ’s are activation functions. In this case, is chosen to be to minimize computational time [44]. The advantage of model in Equation (72) is that values of which are calculated daily need not to be aggregated. Their contribution to a single annual value of yield is determined by weights. Another advantage is that different joint probability densities, estimated in previous sections, can be compared through the loss function of the form where is Lagrange multiplier. In order to find ’s and that minimize , gradient descent optimization method of the form , where , is applied. Numerical methods are implemented in python.

Thus, maize yield estimate is . The metrics are summarized in Table 6. Similarly, we also predict maize yield by finding Support vector machine regression (SVM) of the form [48] where ’s are Lagrange multipliers, are maize yield data values from training sample, and is the model used to predict maize yield. Figure 14 summarizes the methodology used to achieve the objectives in order to derive maize yield .

The results in Table 7 shows that Frank-Copula is not suitable for modeling the trivariate density in order to predict maize yield because is negative. This can also be seen in Figure 15(b) in which the maize yield prediction is always lower than the actual values. Similarly, the forecast from Plackett density and in the neural network is accurate with and . The seasonal aggregates of ’s in SVM predict maize yield with and which a bit lower than those of Decision trees where and . These results are better than the training sample is for growing seasons between and .

We also conducted backtesting (BT), for the models that performed well, using historic data set between and . The results show that the probability density for Vine and Placket have comparable precision. In fact, the vine probability density gives and MAPE values that are slightly better than those for Plackett probability density.

We also conducted uncertainty and reliability analysis using common formulas in literature [49].

The results are summarized (Table 8). It is clear, from uncertainty analysis, that if the study is replicated using the same methods, then the interval for which results can be found to be different are quite similar for all the methods. Reliability was calculated as the probability that absolute percentage error (APE) is less than . This cut off point was arrived from the results in Table 7 in which the MAPE for models with high performance is generally less than . It is very plausible to note that the stochastic maize yield model, despite having modesty and slightly higher of MAPE, has lowest scatter index (SI) and highest reliability (). This is a strong evidence that the trivariate stochastic model can predict maize yield with higher accuracy in many models. The low reliability for Frank copula is an indication that it is not suitable for modeling the joint stochastic model.

6. Conclusion

The main aim of this paper is to predict maize yield from a stochastic process of rainfall amount, temperature, and reference evapotranspiration. To achieve this, trivariate probability distribution functions were derived through Plackett, Frank asymmetric, and vine copulas. It is clear that the Frank asymmetric 3-copula, derived through nesting, is not suitable for both modeling the joint stochastic weather and maize yield processes. The Plackett and vine copulas are suitable for modeling the joint stochastic weather and maize yield processes.

The stochastic process derived the quality to account for nonlinearity and nonstationarity through satisfaction of Fokker-Planck equation with much success. The evidence for this is the fact that it has led to precise and accurate maize yield forecasts in frameworks of the derived multivariate stochastic maize yield process, the neural networks, and the common machine learning methods. Thus, the joint stochastic weather process is suitable for predicting maize yield.

Data Availability

Data can be provided on formal request from the authors

Conflicts of Interest

Authors declare no conflict of interest as regards to publication of this paper.

Acknowledgments

The research is funded by the Pan African University of Basic Science, Technology and Innovations, and the African Union

Supplementary Materials

algorithms written in MATLAB2021a and Python 3.10.1 are provided as supplementary materials. (Supplementary Materials)