Abstract

Parameter estimation is a growing area of interest in statistical signal processing. Some parameters in real-life applications vary in space as opposed to those that are static. Most common methods in estimating parameters involve solving an optimization problem where the cost function is assembled variously, for example, maximum likelihood and maximum a posteriori methods. However, these methods do not have exact solutions to most real-life problems. It is for this reason that Monte Carlo methods are preferred. In this paper, we treat the estimation of parameters which vary with space. We use the Metropolis-Hastings algorithm as a selection criterion for the maximum filter likelihood. Comparisons are made with the use of joint estimation of both the spatially varying parameters and the state. We illustrate the procedures employed in this paper by means of two hyperbolic SPDEs: the advection and the wave equation. The Metropolis-Hastings procedure registers better estimates.

1. Introduction

Most state-space models are characterized by, among other things, parameters—which can be constant or varying. A parameter is comprehended and signified as a measurable factor which defines a model and influences its operation. As the parameter changes, so does the model; expressed differently, a parameter is unique to a model it characterizes. The choice of a certain model, therefore, is achieved by choosing the right parameters. It occurs more often than not—in hidden Markov models, for instance—that measurements are available but the underlying signal is not readily apparent. This forms an example where parameter estimation is paramount: measurements are used to learn the model parameters, which, in turn, are used to fit the model.

Consider a signal and measurement equations, with a spatially varying parameter signified by a -dimensional vector, , where the terms are as described in Table 1.

Parameter estimation problem concerns finding the optimal parameter so that the signal best fits the data [1, 2]. This is, classically, achieved by an optimization procedure where a cost function is minimized [3]. The cost function mostly defines the discrepancy between the state and the measurements. Intuitively, parameter estimation can be seen as a procedure for seeking a parameter value that gives the least discrepancy between the state and the corresponding measurements (also known as the algebraic distance or the residual). The method of least squares has been extensively used to define an objective function. Given the increment in measurement, , of the state, , at time , the objective function, , in the least-squares sense, is given by where is the weighting function.

Most commonly used procedures in the framework of least squares include the following: linear least squares, orthogonal least squares, gradient-weighted least squares, and bias-corrected renormalization. This paper, however, attends not to the study of least-squares approaches, they being outside the scope of its design. Suffice it to only direct the interested reader to the article [4] for an elaborate explanation and application of least-squares methods in computer vision. Instead, we study parameter estimation by means of filtering. But before that, we mention a few merits and demerits of least squares and other methods defined by algebraic distances.

The use of algebraic distances in defining a cost function is computationally efficient, and closed-form solutions are possible. The end result, however, is not satisfactory. This is due, in one part, to the fact that the objective function is mostly not invariant with respect to Euclidean transformations, for example, translation. This limits the coordinate systems to be used. In the other part, outliers may not contribute to the parameters the same way as inliers [4]. Other more satisfactory parameter estimation methodologies are highly desired. We consider, in this paper, the use of Bayesian inference techniques.

Estimation of parameters by means of a filter can be achieved in a number of ways, one of them being the use of the filter evidence, or its near approximation, and selection criteria for parameters which give a reasonable estimate of the evidence. The second method involves updating the parameters and the state at the same time. This is known as dual estimation, which further subdivides into joint estimation and a dual filter. Joint estimation entails subjoining the vector of parameters to the state vector to form an extended state-space. The filter is then implemented and run forward in time with the hope of filter convergence to the optimal state and parameter values. A dual filter, on the other hand, involves implementing a filter for the state and parameters simultaneously. The filter provides a self-correcting mechanism which may lead to convergence of state and parameter estimates.

The rest of this paper is arranged as follows. In the first place, we introduce some notions on Bayesian inference of parameters where we pass the limit in the mean in order to move from a discrete setting to a continuum in time. We introduce the different ways in which a filter can be used for parameter estimation. In anticipation of application in the later part of the paper, we introduce the stochastic advection and wave equation together with their spatial discretisation. We then illustrate how parameters are to be estimated by means of a filter likelihood and the dual filters. Results and discussions follow, and the conclusion forms the closing part of this paper.

2. Bayesian Parameter Inference

In Bayesian inference of parameters, the parameters are treated as a random variable. The parameter is assigned a prior, , based on some initial belief. Let such that be a partition of the interval and let . Bayes’ rule gives the joint posterior of parameters and the states; where ,

Now to arrive at the marginal posterior of parameters, we integrate out the states from the joint posterior of states and parameters, Equation (3a):

It turns out that direct computation of the integral in (4) is difficult, especially with the increase in measurements [2]. This challenge is circumvented through the use of recursive techniques which include the use of filters and smoothers, maximum a posteriori (MAP) estimates, and drawing samples from the posterior using Markov Chain Monte Carlo (MCMC) methods.

To use recursive methods, we begin with the following parameter posterior where

The state’s predictive distribution, , can be computed recursively as follows [2]:

Instead of working with the posterior, , it is quite convenient to use the negative log-posterior obtained by expressing the posterior as follows: where the energy function, , is given by

The maximum a posteriori (MAP) estimate can then be obtained by the mode of the posterior distribution or, equivalently, the minimum of the energy function, the latter of which is easier to compute; that is,

One demerit of the MAP estimate is that it yields a point estimate of the parameter posterior and therefore ignores the dispersion of the estimate. Setting the prior, , to be a uniform density, (9) yields a maximum likelihood estimate.

3. Metropolis-Hastings Method

Metropolis-Hastings (named after Nicholas Constantine Metropolis (1915-1999) and Wilfred Keith Hastings (1930-2016)) [5] is a Markov Chain Monte Carlo sampling algorithm with optimal convergence. It is premised on detailed balance and ergodicity. Given a probability density, say , from which it is difficult to sample (for instance, if the said distribution is known to a normalization constant), and given another distribution , say from which it is easy to sample, then detailed balance is the condition where is a transition distribution. The detailed balance condition is necessary for any random walk to asymptotically converge to a stationary distribution. Ergodicity is meant that there is a nonzero probability in moving from a state to any other state in a Markov Chain.

The following algorithm summarises the Metropolis-Hastings procedure.

1: Draw from
2: Set with probability
3: Otherwise, set

4. Dual Estimation

Dual estimation comprehends simultaneous estimation of state and parameters by means of an appropriate filter. The self-correcting mechanism of the filter is taken advantage of to converge to both the true state and the true parameters. Depending on the initial parameter, the filter sooner or later converges to the true parameter value. Dual estimation can be achieved in two ways: joint estimation and by a dual filter [68].

4.1. Joint Estimation (Augmented State-Space)

In joint estimation, the state vector is augmented with the vector of parameters to form an extended state-space, and then, the filter is run forward in time for an update of both the state and the parameters. The parameters are induced with artificial dynamics or are made to assume a random walk; that is, respectively, where or where in which is a -dimensional standard Brownian motion vector process and is a small constant. A filter is then implemented with the augmented state in the place of . The demerit of this method is that the extended state-space has an increased degree of freedom owing to many unknowns, of both the state and the parameters, which renders the filter unstable and intractable, especially in nonlinear models [8].

4.2. Dual Filter

Dual filtering of the state and parameters is attained by use of two filters, one for state update and another for updating parameters, both run simultaneously. The two filters interact symbiotically in that one provides the update of the state to be used by the other, whilst the other provides an update of the parameters to be used by the former. A very good example in literature is the dual extended Kalman filter [9] used for estimating neural network models and the weights. In this case, the state is the model signal and the weights are parameters. Another example appears in [10] where a dual filter comprising the ensemble transform particle filter (ETPF) and the feedback particle filter (FPF) is used for simultaneous estimation of the state of a wave equation and its speed.

The model for the dual filter of our consideration comprises a -dimensional vector equation of artificial dynamics of parameters together with the state-space model, Equations (1a) and (1b): where the nomenclature and dimensions remain as stipulated for Equations (1a) and (1b).

In the following, we introduce the equations to which we shall apply dual filters in estimating spatially varying parameters.

5. Application Equations

5.1. Advection Equation

We take up an advection equation, excited with the space-time white noise process, with some diffusion term added to it, and on a periodic spatial domain of length , which we write as follows: where is a spatially varying velocity (of which constant velocity is a special case), is a constant, and is the function to be obtained, whose function describes the state of the signal. is a constant whilst is the space-time white noise process where the dot denotes the singularity of the noise process.

Equation (14) needs a remark owing to the roughness of the stochastic-forcing term , which is a mixed distributional derivative of the Brownian sheet. As is well known (see [11] for details), the Brownian sheet is nowhere differentiable. We, however, use the method introduced in [12]; that is, we approximate the noise term as follows. Let the domain be tessellated into rectangles of dimensions each, for and so that and . Then, where are independent and identically distributed random variables of mean and unit variance. and are characteristic functions and are given by

By a three-point upwind scheme in space [13], we discretise (14) and arrive at the following: where is the spatial step size and is the standard Brownian motion process. The th grid point is represented by . With this notation, is understood to mean the value of at the th grid point at time .

Time discretisation, by means of the Euler-Maruyama scheme, leads to where is a random variable of mean and variance . The time increment, , is such that the limit of as is . Furthermore, . We use the following initial value.

Moreover, where

Considering every grid point in (18) leads to a vector representation of the signal . To do so requires the following shorthand for operations: so that

We finally have where is an -dimensional column vector at time comprising of elements , and in which is a diagonal matrix made of the elements of and is the th-order identity matrix.

The surface and contour plots for the stochastic advection equation are shown below, that is, when . The ruggedness evident in Figure 1 is consequent upon the addition of noise to the underlying dynamics.

In the next subsection, we introduce the wave equation.

5.2. Wave Equation

The wave equation—for our consideration—is given by where is the wave velocity and is for a wave travelling in a heterogeneous medium and is the function to be obtained, whose function, as in the previous subsection, describes the state of the signal. Moreover, is a constant and , as before, is the space-time white noise.

We employ mixed difference schemes to discretise (26) in space, so that we have where and is the spatial step size. For time integration, we use Verlet’s method, because of its geometric properties, namely, volume preservation, symplecticity, conservation of first integrals, and reversibility [14, 15]—whose method, applied to the deterministic part of Equations (27a) and (27b), yields where is the time step and . Substituting Equation (28b) into Equations (28a) and (28c), we get

We use the following initial values: where is the length of the domain. Now, where

Considering every grid point leads to a vector representation of the signal . We use the following shorthand:

Equations (29a) and (29b) then become where where is the identity matrix of order whilst is an th-order null matrix. is an th-dimensional null vector. , , and .

The contour and surface plots for a single realisation of stochastic wave equation are in Figure 2.

For the purpose of estimating the speed of the wave, which varies spatially, we give a little prelude to estimation of spatially varying parameters in the next section.

6. Estimating Spatially Varying Parameters

Varying parameters exist naturally, an example being in the speed of a wave moving in a heterogeneous media. Varying parameters present a challenge owing to their varying nature as opposed to static (in time) parameters. There are three broad categories of varying parameters: spatially varying parameters, parameters that vary with time, and parameters that vary with both time and space. Estimation of time-varying parameters, albeit for deterministic models, and application to estimation of parameters in a HIV/AIDS model, appears in [16]—in which least-squares methods have been used. Although consistent estimates are realised, extension to nonlinear models remains to be realised. In this chapter, we study spatially varying parameters using filtering algorithms. We also provide applications in order to evaluate the performance of the algorithms introduced here.

The strategy employed in this study comprises of three steps: (1)Express the parameter as a Fourier series with a given number of modes, say, and collect all the coefficients in a vector (2)By means of an appropriate filtering algorithm, estimate the vector of hyperparameters, (3)Substitute the estimated constant coefficients back in the Fourier series to obtain an estimate of the parameter

To illustrate this method, we employ it to estimate the velocity of a wave travelling in a heterogeneous media. Let the parameter, , , where is the number of dimensions, be given by where where the coefficients , , and are drawn randomly from a normal distribution of mean and variance . This way, the parameter will be positive for all values of . The aim of this section is to obtain an estimate of the wave velocity by means of the Kalman-Bucy filter (KBF) and ensemble Kalman-Bucy filter (EnKBF) [17]. To this end, we use a function where so that

Let be a vector whose elements are the coefficients of the function . That is,

Estimation of the spatially varying parameter, , is equivalent to estimating the parameters that form the vector . We consider two methods: using the likelihood with the Metropolis-Hastings method and using a dual filter. Let us now consider each method in turn.

7. Using the Likelihood with Metropolis-Hastings Method

We now adapt Algorithm 1 to estimating the vector of parameters, . We let each parameter have artificial dynamics with the transition density where is the mode number and the constants and are to be chosen. The density is defined by the filter likelihood, where is the filter prediction of the state at time . Notice that the parameters, , are implicitly contained in the dynamics.

Example 1. Advection equation.
We take the advection equation, of Section 5.1, and the given initial conditions. Furthermore, let there be time-continuous measurements of the state, , given by where is the standard space-time Brownian motion process. The initial values of , , and are uncorrelated. The aim is to estimate the spatially varying velocity, , by means of filter likelihood and the Metropolis-Hastings algorithm.

We follow the discretisation described in Section 5.1 for the advection equation.

The measurements’ equation, (42), is discretised in time using the Euler-Maruyama scheme to yield upon substituting and where . The observation likelihood pdf for the KBF is Gaussian since the initial condition and the observation errors are Gaussian. So is the posterior pdf. With observation increments expressed as in (43), the observation increment likelihood pdf is

Similarly, the filter forecast pdf is

The next step is to implement a KBF and EnKBF and to obtain the likelihood at each time step. This is followed by implementing a Metropolis-Hastings algorithm.

Require:, , , , , and .
Ensure:.
1: fortodo
2: Draw .
3: Compute .
4: forto, do
5:  Run a single-step KBF prediction mean
6:  Run a single-step KBF prediction covariance
7:  Run a single-step KBF analysis mean
8:  Run a single-step KBF analysis covariance
9: end for
10: Metropolis-Hastings
11: Compute
12: Compute
13: ifthen
14:  
15: else
16:  
17: end if
18: end for

The same is repeated but with EnKBF in the place of KBF. Algorithm 3 is the pseudocode showing the basic steps.

Require:, , , , , , and .
Ensure:.
1: fortodo
2: Draw .
3: Compute .
4: forto, do
5:  fortodo
6:   Run a single-step EnKBF prediction ensemble
7:   Run a single-step EnKBF analysis ensemble
8:  end for
9:  Compute prediction ensemble mean: .
10:  Compute analysis ensemble mean: .
11:  Compute covariance: .
12: end for
13: Metropolis-Hastings
14: Compute
15: Compute
16: ifthen
17:  
18: else
19:  
20: end if
21: end for

Results for the first 2 parameters are shown in Figure 3. The results in Figure 3 show that the EnKBF performs like the KBF filter. This agrees with the theory (see, for example, [17]) that the EnKBF yields optimal results in the limit . It is also noteworthy that the Metropolis-Hastings algorithm converges to the true parameter estimate. This can be seen in Figure 3(a), for example, where the filter estimate converges after about 100 parameter draws.

We now look at the errors in the parameter estimates, the better to see the performance of the filters for the hyperparameters.

Figure 4(b) plots the boxplots showing the dispersion of parameter estimates resulting from the use of EnKBF and the root mean square errors for parameter estimates for both the EnKBF and KBF. The RMSE values are computed as follows. where is the estimate of the th parameter at the Metropolis-Hastings cycle and is the trueth parameter.

The RMSE for both the KBF and EnKBF, as shown in Figure 4(b), indicate that the performance of EnKBF matches that of the KBF for the 21 parameters. These heuristic results corroborate the theoretical findings. The boxplot, Figure 4(a), shows the dispersion of parameter samples in the case when EnKBF is used. The result indicates that the estimates match the true parameter values, with not so many outliers. This is indicative of the performance of not only EnKBF but also the Metropolis-Hastings procedure in locating the true parameter values and ensuring that no large excursions are made from the true parameter values.

We now implement Algorithms 2 and 3 for the discretised wave equation, (33).

Example 2. Wave equation.
We take the wave equation of Section 5.2 and the associated initial conditions, Equations (30a) and (30b). The measurements are given by (42). The initial values of , , and are uncorrelated. The aim is to estimate the spatially varying velocity, , by means of filter likelihood and the Metropolis-Hastings algorithm.

The discretisation of the wave equation is as shown in Section 5.2. Figures 5(a) and 5(b) show the results.

The results in Figure 5 indicate a close match in the performance of the EnKBF and KBF. This is as was anticipated in the theoretical findings, some of which are found in [17]. Notice also that the two filters converge to the true parameter values after a few parameter draws (about 50 in Figure 5(a) and 100 in Figure 5(b))—which is indicative of the robustness of the Metropolis-Hastings algorithm atop the EnKBF and KBF filters. The results also show that there are no wide excursions from the true parameter values, which testifies to the good performance of Algorithms 2 and 3.

In Figure 6 are plotted the boxplots showing the dispersion of the 21 hyperparameter estimates resulting from the use of EnKBF and the root mean square errors for parameter estimates for both the EnKBF and KBF.

The EnKBF elicits an optimal performance as can be seen in Figure 6(b) where the RMSEs for both the EnKBF and the optimal filter (KBF) match for all the hyperparameters. This also, as in the advection equation above, is in agreement with the theoretical findings that the EnKBF attains an optimal estimate in the limit . The boxplot, Figure 6(a), shows that the mean of EnKBF parameter estimates matches the true parameter values. What is more, there are not many outliers in the estimates. All these show that the EnKBF-Metropolis-Hastings algorithm is robust.

In the next section, we apply the concepts on dual estimation to stochastic hyperbolic PDEs, which, in this case, are the advection and the wave equations.

8. Simultaneous Estimation of the State and Spatially Varying Parameters

The dual filter (see Section 4.2 for details) can be adapted to allow for estimating both the state and spatially varying parameters, contemporaneously. The idea here is to replace the parameters in the dual filter with hyperparameters of the varying parameters to be approximated. The hyperparameters are then updated simultaneously and in parallel, with the state at each iteration, where one filter estimates the state and the other filter updates the hyperparameters—with each filter making use of the outcome of the other. To illustrate this argument, we turn to the advection and wave equation described in Examples 1 and 2, respectively, and use the KBF-EnKBF dual filter—in which the state is propagated and updated by means of the KBF whilst the hyperparameters are updated using EnKBF—and ENKBF dual filter—where both the state and the hyperparameters are propagated and updated using two EnKBFs running in parallel. The spatially varying velocity is as shown in Section 6.

In the KBF-EnKBF dual filter, we update, for every th parameter particle , the state estimate, , using the KBF; that is,

The parameters are updated using the EnKBF; that is, each parameter hypothesis, , is updated using where where

Algorithm 4 gives a summary of the KBF-EnKBF dual filter.

Require:, , , , and .
Ensure:, .
1: forto, do
2: fortodo
3:  Update using Equations (47a) and (47b)
4:  Update parameters using (48)
5:  end for
6:  Compute
7:  Compute
8: end for

The EnKBF dual filter consists of an update of particles of the state, , for every parameter particle, , using the EnKBF; that is, where and are standard Brownian motion vector processes. The parameters are updated using the EnKBF given by (48). The summary of the EnKBF dual filter is given in Algorithm 5.

Require:, , , , and .
Ensure:, .
1: forto, do
2: fortodo
3:  fortodo
4:   Calculate using (50)
5:  end for
6:  Update using Equation (49c)
7:  Update parameters using (48)
8: end for
9: Compute
10: Compute
11: end for

Figures 7(a) and 7(b) show the results for the first two parameters in (39) when the dual filters are applied to the advection equation.

Evidently, from Figure 7, the performance of the EnKBF and KBF-EnKBF dual filters almost matches. Both filters converge to the true estimate after a few iterations (about 100 in Figure 7(a)). This is another testament to the fact that EnKBF attains optimal estimates at high ensemble values. Moreover, both the EnKBF and KBF-EnKBF parameter estimates are not much spread as compared to the previous case where Metropolis-Hastings was used. This is more evident in the following results for the 21 hyperparameters estimated.

In the following panels are plotted the root mean square errors for parameter estimates for both the EnKBF and KBF-EnKBF dual filters and the boxplots showing the dispersion of parameter estimates resulting from the use of the EnKBF dual filter applied to the advection equation.

That the boxplots of EnKBF dual filter parameter estimates have short whiskers (see Figure 8(a)) and few outliers and that the estimates match the true parameter values indicate that the EnKBF dual filter is robust in this setting. The EnKBF dual filter registers a slight variation in RMSE from the KBF-EnKBF dual filter. This indicates that the EnKBF, in this setting, performs optimally.

We now repeat the same procedure but with the wave equation described in Section 5.2 in place of the advection equation. The following panels show the results. The results shown in Figure 9 are indicative of a dismal performance of the two dual filters—KBF-EnKBF and EnKBF dual filters—when applied to the wave equation as compared to the results obtained when the dual filters are applied to the advection equation (see Figure 7). We note that the wave equation is partially observed; that is, onlyis observed in the discretised wave equation, (33), whereas the advection equation is fully observed. Furthermore, the number of unknowns in the state-parameter system of the wave equation is whilst that in the advection equation is 200. These account for the dismal performance of the KBF-EnKBF and EnKBF dual filters when applied to the wave equation.

In Figure 10 are plotted the root mean square errors for parameter estimates for both the EnKBF and KBF and the boxplots showing the dispersion of parameter estimates resulting from the use of EnKBF.

9. Conclusions

We have considered a special case of parameter estimation where the parameter to be estimated is spatially varying. Such a case arises, for example, in the velocity of a wave travelling through an inhomogeneous media. We proposed and studied two approaches: the use of filter likelihood and the Metropolis-Hastings procedure and joint estimation of state and parameters. The parameter is expressed as a Fourier series with constant coefficients. The coefficients are approximated and then substituted back to the Fourier series to obtain an approximation of the velocity. The Kalman-Bucy filter and the ensemble Kalman-Bucy filter are used. The filter likelihood with the Metropolis-Hastings procedure registers a better performance compared to the joint estimation procedure in both advection and wave equations. From the foregoing, Metropolis-Hastings with the filter evidence performs well in estimation of parameters compared to the dual filters—especially when the number of unknowns is large. This is indicative of the robustness of the Metropolis-Hastings algorithm in searching, and remaining in, the high probability region of the state-space.

Data Availability

The data is obtained from computer simulation using MATLAB software. The codes generating the results can be presented on request put to the author via [email protected].

Conflicts of Interest

The author declares that they have no conflicts of interest.

Acknowledgments

The author would like to acknowledge the assistance of Professor Dr. Sebastian Reich, of Potsdam University, Germany, in supervising their doctoral research [17], of which this paper is part. The manuscript of this work can be found in [18].