Abstract

The paper is motivated by a present lack of clear model performance guidelines for shelf sea and estuarine modellers seeking to demonstrate to clients and end users that a model is fit for purpose. It addresses the common problems associated with data availability, errors, and uncertainty and examines the model build process, including calibration and validation. It also looks at common assumptions, data input requirements, and statistical analyses that can be applied to assess the performance of models of estuaries and shelf seas. Specifically, it takes account of inherent modelling uncertainties and defines metrics of performance based on practical experience. It is intended as a reference point both for numerical modellers and for specialists tasked with interpreting the accuracy and validity of results from hydrodynamic, wave, and sediment models.

1. Introduction

Although a need to standardise model build, calibration, and validation processes around one agreed approach is widely acknowledged, only limited guidance is available (e.g., [1, 2]) and often ambiguous and sometimes conflicting advice if offered in the grey literature (e.g., [3, 4]). A wide variety of different modelling practices are employed by consultants and academics, and frequently insufficient attention is given to the potential errors associated with the measured (and modelled) data used for model calibration and validation. This can result in poor model performance and unreliable model predictions. Without an agreed methodology and a performance standard for model calibration and validation, there is a risk that the quality of different approaches will vary, efforts will be wasted following inefficient or inappropriate calibration methods, and inconsistencies in methodologies will make model intercomparisons problematic.

This paper provides an evidence-based review and presents examples of calibration data sources and of model calibration and validation practices for estuarine and shelf sea models. It is intended to provide guidance to the assessment and use of model calibration data and to offer procedural clarity and simplification to the model calibration and validation process. In doing so, it acknowledges that some degree of compromise between the complexity of the natural system and the model representation must be reached. For this reason, the paper does not address complex modelling issues around wave-driven currents, littoral drift, and shoreline evolution where specialist models (e.g., the nonhydrostatic version of XBeach and CFD) must be employed.

Since the accuracy of the model calibration depends critically on the calibration data used, attention is given to some of the most common issues associated with data quality. The paper also provides (a) the end users of model data more specialist guidance on modelling approaches, (b) the calibration procedures most frequently applied, and (c) the uncertainty in the model predictions. The paper draws on practical experience of modelling and expands on the earlier and limited guidance on the model calibration and validation that focus on Eulerian point-based criteria defining model performance (e.g., [2, 5, 6]). It also takes account of results and recommendations from modelling case studies where calibration issues have been the focus of the work (e.g., [710]).

Specifically, the paper describes (1) general factors that must be considered at the outset of all numerical modelling activities, (2) the quantitative assessment of model performance, (3) data sources and modelling guidelines for hydrodynamic, wave, and noncohesive and cohesive sediment models, and (4) morphological models. Special attention is given to one of the greatest challenges to the modelling community concerned with measuring and modelling sediment transport and associated erosion and accretion. While the focus of the work is based on practical applications of modelling shelf sea and estuarine processes, many of the issues discussed are relevant to a wide range of geophysical models.

2. What Is Model Calibration and Validation?

It is important from the outset to define the terms commonly used by numerical modellers: (a) calibration is a process which requires the adjustment of certain model parameters to achieve the best performance of the model for specific locations and applications; (b) verification ascertains if the model implements correctly the assumptions made; and (c) validation seeks to establish the agreement between the predictions and the observations (e.g., [11]). Validation is achieved by running the model using data covering an alternative period and/or a different location without making any additional adjustment to the model parameters (e.g., [12]). Of course, the accuracy of the model outputs cannot be proved to be greater than the accuracy of the original calibration data used, and validation does not imply verification, nor does verification imply validation. However, in practice, when measured data are available for the system being modelled, validation is often blended with verification [11]. If a comparison of measurements and model results suggests that the predictions from the model are close to the measurements, then the implemented model is assumed to be both a verified implementation of the assumptions and a valid representation of the system being modelled.

Irrespective of the model accuracy, the model calibration must express (a) express the level of agreement achieved; (b) express how realistic is the representation of the processes, and (c) define the criteria by which it has been judged as being fit for purpose. The quantitative assessment of data error, accuracy, and uncertainty in models then defines metrics against which model performance can be judged.

As an illustration of the typical calibration and validation processes applied in most coastal and estuarine models, a schematic diagram of steps followed for a hydrodynamic model is shown in Figure 1. In the initial model run, model parameters are set to the recommended values provided by the modelling software guidance (i.e., “factory settings”). Critical parameters affecting model performance (e.g., bed roughness) are then adjusted to achieve the best possible agreement between model predictions and measurements. Care must be taken to ensure the values set for these adjusted parameters are physically meaningful and appropriate [1]. Achieving a good model calibration for the wrong reasons is as bad as a poorly calibrated model.

A useful first step in the calibration and validation process is the determination of the most sensitive parameters in the model. While expert judgment can be helpful, less-experienced model users should undertake sensitivity analyses. Here, the aim is to determine the rate of change in model output with respect to changes in model inputs (parameters). To undertake sensitivity analyses, it is necessary to identify key model parameters and to define the parameter precision required for the calibration (e.g., [13]). Sensitivity analysis approaches can be (a) local, where parameter values are changed one at a time, or (b) global, where all parameters are adjusted simultaneously. Both approaches have drawbacks. For example, the sensitivity of one parameter often depends on the value of other related parameters so that the correct values of other fixed parameters cannot be determined. In global sensitivity analyses, many simulations are required. Despite these drawbacks, both approaches provide insight into the sensitivity of the model parameters and are necessary steps in the model calibration process. However, “manual” calibration of models, where parameters are adjusted in a stepwise fashion, can be very time-consuming and inefficient.

The second step in the calibration process is undertaken to reduce the uncertainty in the model predictions. Normally, this uses carefully selected values for model input parameters and compares model predictions with observed data for the same conditions. In common with the process described in step 1, this is often done iteratively without any fixed rule and is guided by the experience of the user and knowledge of the processes being modelled. The third step in the calibration process involves validation of the model output of interest (e.g., water level, flow speed, and direction). Validation involves running a model using parameters that were determined during the calibration process and comparing the predictions to observed data not used in the calibration.

The use of automated techniques for model calibration is now widespread (e.g., [14]). Typically, autocalibration procedures rely on Monte Carlo or other sampling schemes to estimate the best choice of values for multiple input parameters. For example, the autocalibration procedure described by van Liew et al. [15] is based on the shuffled complex evolution algorithm of Duan et al. [16], which allows for the calibration of model parameters based on a single objective function. While autocalibration can provide a powerful, labour-saving tool that can be used to substantially reduce the frustration and subjectivity that frequently characterises manual calibrations, care must be exercised when using these approaches to ensure the theoretical boundaries for each specific input parameter are not violated.

Frequently, the evaluation of postcalibration numerical model results is subjective and based on specialist interpretation of graphical output only. Examples of this approach include water-level curves or discharge time series, current vector distributions, and spreading patterns of heat and spills. Indeed, for many practitioners, a good visual fit between model predictions and observations is often sufficient to demonstrate good model performance without the need to quantify this further. Objective measures of model performance are also not new and are used, for example, in the Deltares (semi-) automated model calibration tools and in adjoint modelling (e.g., [17]). However, the increasing complexity of model functionality, and the use of model output by technical end users requiring information on model accuracy to reduce risks, has led to an increasing need for better guidance on how to quantify and evaluate the performance of models. A description of the calibration process applied to a biological-physical model [18] provides a useful example of typical procedures followed in the calibration process.

3. General Considerations

Irrespective of the model being used, there are several generic elements that require consideration prior to and during the model build phase. These elements will each impact on model performance and include bathymetry, bed roughness, model grid setup, the incorporation of specific structures and features, data accuracy and uncertainty, and model boundary conditions.

3.1. Bathymetry

One of the most common problems associated with the calibration of a hydrodynamic model concerns errors in the underlying bathymetric data. The use of accurate bathymetry is pivotal in all shelf sea and estuarine modelling studies, and effort is required to ensure that the best possible bathymetric information is used. As standard practice, the analysis of bathymetric data should ensure (e.g., through a data review of the study area) that the most recent bathymetry survey data are used. Key features and contours should be checked against historical maps and charts. LiDAR data across water surface must be discarded. Suitable grid dimensions should be determined that reflect the spatial distribution of the bathymetric data, and where data are already gridded, poor interpolations/reductions/extrapolations onto model grids must be identified by reference to the original data sources.

A summary of bathymetric and topographic data requirements for models is shown in Table 1 [3]. Here, a distinction is made between application types, with the most exact being associated with scheme designs (e.g., flood defences) with less accuracy required for appraisal and/or strategy studies. These distinctions are used in other tables and are useful as they define the accuracy of key data required to build a model for different applications. The correct use of the most appropriate data for a given application can save time and effort. While Table 1 reflects the bathymetric and topographic data requirements for modelling estuaries, including specifications for average distances between survey positions, the minimum acceptable channel cross-section spacing, survey age, and the age and resolution for LiDAR data, they provide equally useful guidelines for shelf sea models.

Careful checks on the horizontal and vertical survey datum should always be undertaken prior to any model runs, and models should always aim to use a common reference datum. Typically for vertical positions, national reference points (e.g., Ordnance Datum Newlyn (ODN) in the UK), chart data (related to the lowest astronomical tide or to mean lower low water), or mean sea level (MSL) is used widely. However, while national reference points are useful in local-scale models, MSL has wider utility in larger regional models at all geographical locations. In the UK waters, the Vertical Offshore Reference Frame (VORF (http://www.ucl.ac.uk/vorf)) provides spatial maps of values that can be used to convert between vertical data. Similarly, VERTCON [19] and more recently VDATUM (http://vdatum.noaa.gov/) in the USA allow vertically transformation of geospatial data among a variety of tidal, orthometric, and ellipsoidal vertical data. In addition, satellite altimetry data can also be used to inform the offset between one or more tidal layers and the relevant satellite or geoid-based datum.

To illustrate a simple datum error, Figure 2(a) shows nearshore bathymetry from a coastal location in southern Portugal with a clear vertical datum problem. This issue is resolved in Figure 2(b) using a simple datum correction. This is a simple case for illustrative purposes only, and often datum errors are more complex and harder to correct.

Other errors can arise in hydrodynamic models due to (a) changes in charting properties (e.g., older Admiralty charts from the UK projected to OSGB which has now changed to WGS84), (b) data types, which have inherent weaknesses (e.g., poorly interpolated bathymetry which may lead to an underestimation of depth and thus tidal volume), and (c) postprocessing using GIS or other “smoothing” software. To minimise the errors introduced in the model bathymetry, it is recommended initially to visually inspect the raw bathymetry using suitable software (e.g., Matlab or Fledermaus). Abrupt changes in elevations and spikes in data should be treated with caution. It is also helpful to examine gradients, and where possible, to compare interpolated data with known soundings.

Careful consideration should be given to data interpolation since interpolation routines can vary significantly between programs, and the options available (e.g., linear, nearest neighbour, inverse distance weighted, and spline methods) can also result in significantly different answers. Furthermore, some interpolation methods are better suited to sparse data sets (e.g., inverse distance weighted) and others to well-populated data distributions (e.g., nearest neighbour). The selection of the interpolation methods should always recognise this. It is important also to consider the scale of features on the bed that requires resolving in a model. For example, large bedforms, such as sand banks, redirect flows and must be resolved in the model. Smaller bedforms such as sand waves provide a resistance to the flow that can be parameterised through the bed friction term. Eliminating the need to resolve these features individually can reduce the model run time. In other applications, sand waves may need to be resolved on an individual basis to assess, for example, migration rates and pipeline or cable routes.

Time and care spent ensuring that the underlying bathymetry has been correctly interpolated (datum, projection) and that it is free of spurious values and correctly represents the features of interest will contribute to improving model performance. Without good underlying bathymetric data, the task of trying to calibrate a hydrodynamic model will be extremely difficult, especially in shallow coastal and estuarine areas. For example, Cea and French [20] have investigated how errors in bathymetry can impact on the performance of estuarine shallow water models. They demonstrate that correcting errors in the measured depth can be significantly more efficient than a “classic” calibration approach based only on adjustment of the hydrodynamic roughness of the bed. Their proposed bathymetry calibration framework may offer improved performance from the current generation of numerical models. Further guidance on the use of bathymetry in models is given by Plant et al. [21, 22] and Mourre et al. [23].

3.2. Bed Roughness

The hydrodynamic roughness of the bed (hereafter termed “bed roughness”) is a primary calibration variable for all coastal and estuarine models. It is also essential for modelling other processes accurately such as sediment transport and wave attenuation. Irrespective of the method chosen for defining bed roughness, values are typically manipulated iteratively by the user within the ranges reported in the literature. Any bed roughness can be generally assigned to a so-called equivalent sand roughness, ks [24]. The equivalent sand roughness depends on the arrangement (pattern), distance (density), and shape of the roughness elements such as sand grains and ripples. However, in most models, bed roughness is typically parameterised by (a) a drag coefficient defined at a specified height above the bed, (b) the Manning number, n [25], or (c) the Chézy number, C [26].

To illustrate the range of drag coefficient values appropriate for different estuarine and coastal environments, Table 2 shows empirically derived values of the drag coefficient C100 measured at 1 m above the bed for different bottom types. In the absence of data to define the bed roughness accurately, these “typical” values are often employed in model applications. However, in many cases, this is an oversimplification, and care must be taken to obtain as much information as possible about bed characteristics so that appropriate bed roughness values can be assigned.

In “industry-standard” models such as MIKE21 and Delft3D, “roughness maps” can be used to define the spatial distribution of bed roughness values across the model domain. A good account of this approach is given by Lefebvre and Lyons [27]. Figure 3 presents a typical example of a bed roughness map showing the spatial distribution of (a) the measured median grain size, D50, obtained from seabed samples and (b) the derived drag coefficient, Cd, which accounts for D50 and bedforms detected in a multibeam survey.

Bed roughness has been mapped by an ADCP (e.g., [28]), and high-resolution bathymetry and granulometry samples have been used by Huybrechts et al. [29] to derive bed roughness maps used in a TELEMAC model (cf. [30, 31]). Recently, the use of high-resolution multibeam sonar has provided bathymetric data at a resolution of less than 1 m and revealed the details of sea bed features (e.g., DORIS (http://www.dorsetwildlifetrust.org.uk/doris.html)) as well as provided information on sediment properties. The use of these data in modelling studies is currently experimental and requires high computing power to resolve the details. However, it offers the possibility of better defining bed roughness and thus may contribute significantly to reducing the effort needed for model calibration.

3.3. Model Grid Setup

The selection and setup of the model grid is a very important initial stage in the model build process. While some models still employ regular grid structures, most models now employ some form of flexible mesh usually comprising triangular elements. This approach allows high resolution of areas of interest and lower resolution over areas where bathymetry and/or processes are largely spatially invariant. In virtually all coastal and estuarine applications, the use of flexible mesh models provides the best model grid solution (e.g., [32]).

Taking a generic estuarine model as an example, some key points about model grids emerge: (a) the model grid should be designed to ensure the grid resolution can define the main morphological features (including structures) that could have influence on hydrodynamics; (b) narrow channels and banks should have at least three grid cells (preferably 5) to determine the base or crest widths; (c) as far as possible areas of increased grid resolution should follow the course of the main channels, particularly in a curvilinear grid; and (d) when considering the location of upstream and downstream boundaries in the model, boundaries should not be fixed too close to areas of interest. However, this may be constrained by the actual aims of the modelling as well as available boundary data. Further examples of the issues arising when defining a model grid are given by Hsu et al. [33], Kernkamp et al. [34], Liu and Ren [35], and Maynard and Johnson [36].

Table 3 shows an example of the typical model grid resolution required for estuarine models intended for studies of water levels and flow velocities [3]. Table 3 indicates also the minimum number of model grid points required to correctly represent features such as channels and sand banks in the model. Similar grid requirements apply equally to coastal models. It is noted that a fine grid resolution (<2 m) is required to correctly represent the deliberate breaching of flood defences when designing managed realignment schemes.

During the process of grid generation, irrespective of the modelling software used, interpolation of the bathymetry will take place. Again using the example of a generic estuarine model to illustrate some key points, the following checks should be made on completion of bathymetric interpolation processes: (a) the gridded bathymetry must show the same characteristics as the original bathymetry; (b) the gridding process must not displace and/or narrow/constrict channels; (c) different interpolation methods should be assessed; (d) channels must not be widened or narrowed, particularly when these make up a considerable proportion of the estuary cross section; and (e) depths adjacent to the boundaries should be inspected to ensure correct interpolation has occurred.

A further key point to note concerns the spatial resolution in the computational grid of a given model. Typically, a model prediction is only applicable at the spatial resolution defined by the computational grid. In contrast, the measurements, typically used to calibrate and verify model predictions, are obtained at a single location and represent the local environment only. Thus, when comparing a model prediction with a measured value at a point, consideration must be given to the tolerance of this spatial resolution. Most often, the grid cell most closely colocated with the declared deployment location is used in model/measurement intercomparisons. If the grid size is large, the single point measurement may not be representative of the wider area in the model grid element. Conversely, if the instrument is deployed on a mooring line that has the capacity to move (such as a wave buoy), then, through time, the instrument may pass through multiple grid points if the model has a fine-scale resolution.

In some circumstances, data extraction from a grid cell adjacent to the measurement location may better represent the actual conditions at the measurement point if that point lies close to the boundary of a model element. Depending on the model grid resolution, it is advisable therefore to extract model outputs from all grid cells adjacent to the measurement location and to make comparisons with the observations.

Taking the modelling of fluvial bedforms as an example, El Kheiashy et al. [37] discuss the selection of an appropriate model grid. In common with many modelling applications, a compromise must be reached between the resolution needed to define the bathymetry accurately and the consequent execution time required for a certain grid resolution. Their study showed that the apparent bed resistance (shear stress) and bedform steepness decreased with increasing grid spacing. Increasing the grid spacing also created artificial bedform fields giving rise to grid-dependent resistance. The model grid therefore has a significant influence on the model predictions. It is therefore important to be aware of these issues when interpreting model results and to check them whenever possible against all available data sources.

3.4. Model Boundary Conditions

Experienced modelling practitioners will ensure that the intended boundary type is being used at each open boundary and that the cell notation and order of data are correct. Indeed, most industry-standard models (e.g., Delft3D) give a visual representation of the boundaries for checking purposes. It is recommended practice to align boundaries with the dominant flow direction, tidal characteristics (avoiding amphidromes), waves, or geographic features.

The input data to estuarine and shelf sea model boundaries typically fall into 2 types: (a) a water level and (b) a flux (discharge). A water-level boundary is normally obtained from existing models or measurements. In the case of model-derived boundary data, knowledge of how the boundary data are produced by the larger model is required to define the accuracy and reliability (e.g., the number of constituents used and the spatial and temporal resolution). However, a water-level boundary may not be applicable in areas with little or no tidal height variation.

Noting that a flux (discharge) carries momentum and water-level variations across the model domain have to generate momentum, it can be argued that a flux (discharge) boundary condition is a more robust option for model calibration purposes. However, it is usually much more difficult to describe and apply. Reliance on water levels only can lead to serious model underperformance, and wherever possible, attempts must be made to use water level and flux at the model boundaries.

Since the appropriate model calibration accuracy can be obtained, the following boundary condition issues also need to be fully understood: (a) spikes in modelled boundary data attributable to instabilities in the original boundary data and (b) the selection of boundary data from larger model domain in unsuitable locations (e.g., close to land domain or elements that dry).

A flux (discharge) can be applied at any model boundary (e.g., the point of freshwater input into an estuarine model). Generally, these data are provided by measurements or derived from a coarser-scale model. The quality of these data depends on the accuracy of the measuring device or model. It is recommended that in areas of small tidal variation, or where multiple boundaries are included, at least one boundary is of a flux (discharge) type. It may be noted that using flexible grids (or nested models), the model domain can be extended to provide more robust boundary conditions owing to the large phase differences and gradient effects across the model domain. However, in practice, many model setups use only water levels as the primary driving force, and in many applications, this proves to be successful.

It is recommended that if measured tidal levels are used as model boundary conditions, then these are checked to ensure consistent phase and amplitude with values obtained from the harmonic constituents. While there will be small differences in amplitude attributable to meteorological effects, the phase should be very similar.

4. Assessing Model Performance

In engineering and environmental modelling studies, the use of quantitative model evaluation methods is perceived as providing more objective, consistent, and reproducible model validation and assessment. However, it is also self-evident that the identification of systematic or random errors in model results can also be detected quickly by the human eye. In practice, the assessment of model output is most effective when both qualitative and quantitative approaches are employed. For example, in most shelf sea or estuarine applications, a combined visual and quantitative evaluation may be achieved by presenting the spatial distribution of current vectors for visual examination together with statistics that quantify differences between measured and predicted current speeds from several locations. These statistics can provide useful additional information about spatial coherence, correlations, and consistency and will often indicate explanations and origins of the possible differences between the model results and the measurements.

It is also important that model results receive expert assessment, ideally against a conceptual understanding of processes in each model domain established using a range of data resources. This might include some obvious checks on current speed, phase, and direction as well as more detailed investigations of sedimentation patterns. It is recommended that the initial assessment of model performance by whatever means should be undertaken before running models for extended periods. However, the period chosen for this preliminary examination depends on the processes being modelled. For example, a model of tidal currents run over one or two tidal cycles should be sufficient to determine how well the model is performing and which adjustments might be necessary. On the other hand, a model of sediment transport may require considerable time before the effects of net sediment movement are evident through changes in the bathymetry.

4.1. Error, Accuracy, and Uncertainty of Model Calibration Data

As it defines the metric against which model performance will be judged, the assessment of error, accuracy, and uncertainty in the data used for model calibration is an important step in the modelling process. Indeed, the accuracy of a numerical model is governed in part by the degree of error present in the model calibration data. It is essential therefore to quantify error, accuracy, and uncertainty through understanding of the instrumentation, the instrument deployment method, and its location as well as any data postprocessing issue.

It is necessary to distinguish between systematic and random measurement errors. All measurements are prone to systematic errors resulting, for example, from imperfect instrument calibration (zero error) and changes in the environmental conditions. Similarly, random errors are usually present in a measurement or other observations and result from inherently unpredictable fluctuations in the readings of a measurement apparatus or in the experimenter’s interpretation of an instrumental reading or the environment. Different results for ostensibly the same repeated measurement are a clear indication of a random error. The error can be quantified by comparing multiple measurements and reduced by averaging multiple measurements. Systematic errors cannot be detected this way because they always “push” the results in the same direction. However, when identified, they are easier to eliminate from a data set using trend removal techniques (e.g., regression analysis).

Instruments collecting data from different spatial locations may also apply range-dependant spatial averaging to the recorded data, leading to variable spatial resolution. For example, the horizontal averaging across spreading ADCP beams results in a measurement footprint that increases in size with distance from the instrument. Taking as an example the calibration of a 2D depth-averaged hydrodynamic model using ADCP data, it is first necessary to derive the depth-mean current from the ADCP measurement. This requires making assumptions about the vertical structure of the marine boundary layer (often occupying the region from the bed to the air-water interface) before time and spatially averaging the ADCP data to obtain a depth-mean current speed. These data processing steps introduce errors which are difficult to quantify. These problems are further compounded when attempting to extract a meaningful depth-mean representation of the current direction, especially in areas subject to significant current veering (e.g., adjacent to sand banks). Furthermore, if a given measurement footprint is within a highly turbulent flow field, then the accuracy of the mean flow measurement will be governed by the sampling time and can lead to significant errors if the flow is not sampled correctly at that location. With this example in mind when comparing predictions from a grid point in a model with measurements from single or multiple locations, attention must be given to spatial and temporal inconsistencies that might lead to calibration error and/or bias.

4.2. Sensitivity Analyses

Sensitivity analyses are used to study how the uncertainty in the output from a model can be apportioned to different sources of uncertainty in its inputs. Sensitivity analyses are undertaken by varying input parameters (within a range, i.e., physically realistic) and examining the model response. Sensitivity analyses can be useful for a range of purposes including (a) testing the robustness of model resulting in the presence of uncertainty, (b) increasing the understanding of the relationships between input and output variables in a model, (c) identifying errors in the model by encountering unexpected relationships between inputs and outputs, and (d) simplifying models by identifying model inputs that have no effect on the output, or identifying and removing redundant parts of the model structure. Sensitivity analyses can also help to reduce uncertainty by identifying the model inputs that cause the greatest uncertainty in the output, thereby allowing adjustments to increase the robustness of the model. Importantly, by making model results more understandable, compelling, or persuasive, sensitivity analyses can enhance interactions between modellers and the end users of modelling output. Sensitivity analyses are therefore a vital part of evaluating if a model is fit for purpose, and time must be set aside in any modelling study to undertake a credible model sensitivity study.

One area of sensitivity analysis that requires special consideration concerns the sensitivity of a given model to errors in the input data (e.g., bathymetry, water level, and depth-mean current speed). This is especially important when there are errors and/or uncertainty in more than one input data set which can result in compounded errors in the model output. For example, in an estuarine sediment transport model, errors in the water depth and/or current speed at a given location will result in an over- or underestimation of the bed shear stress. Since sediment transport is related to a power of the bed shear stress (typically quadratic for bed load and cubic for suspended load), small errors in predicted bed shear stress can result in large errors in predicted gross and net sediment transport.

4.3. Time Series and Statistical Output

In many cases, the presentation of data in time-series format helps to reveal the goodness of fit between model and observation data, with gaps between observed and predicted data indicating visually discrepancies between the model predictions and the calibration data. Calibration should aim to minimise these discrepancies, and statistical analysis should be used to quantify the goodness of fit. Additionally, it is also informative to compare like-with-like values using a scatter plot showing observed versus modelled values. Some examples are provided below.

To quantify the temporal aspect of the model calibration further, statistical approaches are used to demonstrate that confidence can be placed in the model performance over temporal time scales in a clear and understandable way. The Danish Hydraulics Institute, DHI, Quality Indices Matrix calculating several goodness-of-fit statistics for comparison between observations and simulated results is an appropriate methodology to adopt. When necessary, and when data quality permits, additional types of analysis may be appropriate, such as Brier skill score analysis [38] or indices of agreement (e.g., [39]).

Simple statistics that demonstrate the level of agreement between measured/observed data and model prediction at a chosen location in the model domain include the mean and peak differences (often expressed as a percentage) and the standard deviation. In addition, there are several quality indices that can be used to demonstrate the statistical agreement between model predictions and observations (Table 4). In the table, Oi and Si are the measured and predicted values of a given parameter at time ti, respectively, and Ni is the total number of data points. The statistics are now defined.

Accuracy expresses the difference between the measured and modelled data which is defined as . In all cases, the aim should be to reduce the value of difi to the smallest value practicable. Ideally, a minimum difi should not exceed 10%, although this will be highly variable depending on the parameter being considered and the accuracy of the calibration data used in the model. The accuracy of the modelled data can also be quantified using the root mean square error (RMSE) statistic (Table 4). The RMSE value is often expressed as a percentage, where lower values indicate less residual variance and thus better model performance.

The bias expresses the difference between an estimator’s expectation and the true value of the parameter being estimated and can be defined as being equal to the mean error in the data. Systematic bias reflects external influences that may affect the accuracy of statistical measurements. Detection bias is where a phenomenon is more likely to be observed and/or reported for a set of study subjects. Reporting bias involves a skew in the availability of data, such that observations of a certain kind may be more likely to be reported and consequently used in research.

The agreement or otherwise between measured/observed data and model prediction time series is frequently quantified using the Pearson product-moment coefficient, R (Table 4). It is essential to test the statistical significance of the correlation coefficient. In most cases, the Pearson method (one- or two-tailed) is appropriate. In statistical significance testing, if the null hypothesis is true, the value is the probability of obtaining a test statistic at least as extreme as the one that was observed. One often “rejects the null hypothesis” when the value is less than 0.05 or 0.01, corresponding, respectively, to a 5% or 1% chance of rejecting the null hypothesis when it is true. When the null hypothesis is rejected, the result is said to be statistically significant. In estuarine and shelf sea modelling studies, statistical significance at around the 95% confidence level is judged to be acceptable for most practical applications.

A range of statistical indices of model performance has been developed (e.g., [4043]). The widely used Brier skill score, BSS [38], and Willmott’s dimensionless index of agreement [44] compare the mean square difference between the prediction and observation with the mean square difference between baseline prediction and observation. For example, perfect agreement gives a BSS score of 1, and negative values indicate that predictions are worse than the baseline value. van Rijn et al. [45] provides an interpretation of BSS values where 0 < BSS < 0.3, 0.3 < BSS < 0.6, 0.6 < BSS < 0.8, and BBB > 8 indicated poor, reasonable/fair, good, and excellent, respectively. However, it has been recognised that the larger errors, when squared, overweight the influence of those errors on the sum of squared errors. This issue has recently been addressed by Willmott et al. [46] who present a nontrivial improvement to the earlier index of agreement recommended for a wide range of model performance applications. Examples of model skill assessments for estuarine models are given by Sheng and Kim [47] and Warner et al. [48].

The scatter index, SI, is the RMSE normalised with the mean value. In most cases, the scatter index provides a useful indication of the model performance. However, taking wave model results as an example, the scatter index may appear to understate the skill of the model, as it tends to be large in shelf sea applications. The reason is that the RMSE of the significant wave height is normalised with the average significant wave height, which is usually rather small in shelf sea regions. For example, an RMSE of 0.25 m in the significant wave height in complex field conditions seems reasonable, but if the mean value is only 0.5 m, the scatter index attains the rather high value of 50%. The diagnostic model performance index MPI indicates the degree to which the model reproduces the observed changes of the waves. Like the scatter index, it is defined in terms of RMSE values in the form MPI = 1 − (RMSE/RMSC). Here, the definition of RMSC is identical to that of RMSE, except that all Si values are replaced by the incident Oi values. For a perfect model (RMSE = 0), the value of the MPI would obviously be 1, whereas it would be 0 for a model that (erroneously) predicts no changes (RMSE = RMSC) (cf. [49]).

5. Hydrodynamic Models

5.1. Data Sources

Water-level gauges and pressure sensors typically provide information on the water level relative to a defined datum at a suitable temporal resolution (typically no more than 30-minute intervals). Ideally, water level and current information should be obtained from as many key locations within the model domain as possible, and specifically, in areas of interest and areas of significant variation. Typically, errors associated with these kinds of data include (a) incorrect time references (e.g., GMT/BST), (b) errors in datum corrections (see below), (c) errors in correctly defining the measured data locations in the model domain, and (d) instrument calibration error. Problems with the measuring device often appear as offsets and/or spikes in the measured data. Spikes should be either removed or substituted with artificial data. Interpolation over large gaps in the data should not be attempted, and alternative data sources with better temporal cover should be sought.

To define the confidence limits of the measured data, a quality review is required. This may result in the rejection of some data, or the adoption of other data with stated caveats. The more data that are available (depending on quality, format, and spatial and temporal resolution), the more reliable the model calibration is likely to be. To minimise potential uncertainties in model performance and to optimise model calibration, common misunderstandings and typical errors and uncertainties in hydrodynamic model input data are described below along with some suggested approaches which can aid model setup and calibration.

5.2. Performance Guidelines
5.2.1. Water Level

A model calibration for water level should include examination of amplitude, phase, and asymmetry. Specifically, the test should look at (a) differences in maximum and minimum surface elevations; differences in tidal phase, at high and low water; and RMSE (noting that this is not corrected for bias, and unless the bias is insignificant, this parameter can be difficult to interpret), (b) bias, and (c) scatter index (SI). It is recommended that the minimum-level model performance required for shelf sea areas is (a) water levels to within ±0.10 m (or to within 10% and 15% of spring and neap tidal ranges, resp.) and (b) timing of high water to within ±15 minutes. For estuaries, it is recommended that the minimum-level model performance required is (a) water levels to within ±0.10 m at the mouth, ±0.30 m at the head (or to within 10% and 15% of spring and neap tidal ranges, resp.) and (b) timing of high water at the mouth to within ±15 min, ±25 min at the head.

5.2.2. Current Speed

In 2D depth-average hydrodynamic models, current speed predictions should be examined with respect to amplitude, phase, direction, and asymmetry. Specifically, the test should look at (a) differences in peak flow speeds (ebb and flood tides), (b) mean flow direction, (c) RMSE, (d) bias, and (e) SI. However, appropriate depth-average current speed values must normally be derived from either point measurements at some reference height in the water column or measured vertical current profile data (e.g., ADCP data). In both cases, depth-average current speed can be calculated using the 1/7 power law (e.g., [50], p. 49) or similar. Normally for 3D hydrodynamic models, ADCP data can be used directly for calibration at one or more levels in the model. However, if the model layers are large in vertical extent, they may span one of more ADCP measurement bins, and the 1/7 power law or similar must be applied to interpolate an appropriate current speed value for the model layer.

It is recommended here that predicted current speeds from 2D and 3D hydrodynamic models in shelf sea areas and estuaries be less than ±0.20 m/s (or ±10% to 20%) of the measured speed. To express the accuracy of tidal current speed predictions by models, Cefas (www.cefas.defra.gov.uk/media/.../report-on-first-asmo-workshop.pdf, accessed March 2014) expresses performance in terms of error in the maximum predicted velocity so that errors of < 0.05 m/s, < 0.1 m/s, < 0.2 m/s, and > 0.2 m/s express very good, good, moderate, and poor performance, respectively.

Results from statistical analyses of model performance need to be interpreted with care. The RMSE value provides a quantitative measure of how good the model fits the data based on the mean of the data. However, if there is significant bias in the data, then the goodness of this fit is not an appropriate statistic to use. It is recommended here that bias < 0.2, SI < 0.5, and RMSE < 0.2 demonstrate a statistically significant fit.

5.2.3. Current Direction

Since current direction is derived from vector quantities, it cannot be treated in the same way as other parameters (e.g., speed). However, the accuracy of predicted current direction can be examined using time-series plots and quantified, for example, using bias and SI statistics. To remove ambiguity from current direction data, the following steps are recommended: (a) detect whether the absolute difference between the directions is greater than 180°; (b) if it is greater than 180°, then add 360° to the lesser direction before subtracting the greater direction; or (c) if it is less than 180°, then calculate the absolute difference between the directions. This method returns an absolute (positive) value describing the difference in directions which will be always less than 180°. For practical applications, it is suggested that preserving the sign (negative or positive) of the direction difference is not necessary, and it prevents a meaningful mean bias to be calculated from those differences. Once the absolute difference between the directions has been calculated, it is possible to calculate the bias. For shelf sea areas and estuaries, the minimum-level model performance is recommended here to be ±10° and ±15°, respectively.

5.2.4. Bed Shear Stress

Except for some specialist research instruments, for example, a field-deployable shear plate prototype reported by Oebius [51] and laboratory-based shear plates reported by Grass et al. [52] and Rankin and Hires [53], no reliable direct way of measuring the bed shear stress is yet available. For most practical applications, the use of measurements to calibrate/validate bed shear stress values predicted by a model is therefore not possible.

When considering bed shear stress in the context of hydrodynamic and/or sediment transport, it is critically important to distinguish between the skin friction component of total bed shear stress responsible for sediment mobilisation and transport and the form drag imposed on the flow by pressure losses in the wake of bed obstacles such as bedforms. Most models predict the total bed shear stress using the quadratic stress law. This relates a depth-average flow speed to stress via a drag coefficient that characterises the hydrodynamic “roughness” of the bed. For skin friction bed shear stress, the roughness parameter expresses the drag attributable only to sediment grains. Form drag (in part responsible for maintaining suspended sediment status) is then obtained through a partitioning approach. It is very important to understand how a model deals with drag partitioning and to use any resulting estimate of bed shear stress correctly. Soulsby [50] provides a clear account of bed shear stress components and their calculation and application.

When dealing with subgrid-scale bedforms, it is normal to parameterise bed roughness using a friction coefficient or an equivalent grain roughness. This can vary spatially (and in some cases temporally) and provides a means of moderating or enhancing the local bed shear stress and thereby “tune” the model against observational flow data. This needs to be undertaken with care to avoid implementation of unrealistic friction coefficient values, and guidance on appropriate friction coefficients should be sought (e.g., software Guides for the model being used; Soulsby [50]). To avoid significant over- or underestimation of sediment transport, it is recommended that bed shear stress requires estimation to within ±0.05 N/m2 for shelf sea and estuarine models. However, small errors in bed shear stress can be compounded over time, especially in morphological models. It is also noted that bed shear stress data can be usefully postprocessed to obtain estimates of bedforms and bed load and suspended transport using a range of empirical formulae [50]. However, these estimates are constrained by the data used to generate them and the accuracy of the algorithms used to estimate hydrodynamic stresses.

A simple illustration of a depth-average hydrodynamic model calibration using bed roughness is shown in Figure 4. Figure 4(a) shows time series of measured and predicted current speed at locations P1 and P2 in the mouth of a small tidal inlet. In this initial model run, a drag coefficient, Cd, value of 0.035 is assumed, leading to an underestimation of the current speed by the model. Reference to available bed sediment data suggests that a Cd value of 0.02 is more appropriate resulting in much better agreement between the measured and predicted current speed values (Figure 4(b)). However, it is also noted recently that erosion had occurred in the inlet since the last bathymetric survey. Iterative adjustments to the water depth in a subsequent series of model runs finally resulted in very good agreement between the measured and predicted current speed (Figure 4(c)). The lowering of the bed of the inlet channel by 0.45 m was subsequently confirmed by a repeat bathymetric survey undertaken after the modelling was completed.

As a further example of calibration targets to achieve, required hydrodynamic model performance statistics for estuarine flooding models from Defra/EA [3] are shown in Table 5. The statistics include RMSE for storm surge elevation (hsurge), RMSE for high water levels (hmax), the tolerances for predicted peak water levels, RMSE for flow velocity (U), the tolerances for predicted fluvial inputs (Q), the flood area required to be predicted correctly for two or more historical floods (A), and the predicted flood depth error (derr). While these statistical tests are specific to the flooding application and are exacting since flood predictions must be accurate, they are typical of the model performance criteria that should be used for all shelf sea and estuarine models.

6. Wave Models

6.1. Data Sources

Typically, there are four sources of data available for use in wave model calibration: visual observations; buoys and platforms; satellites; and numerical models driven by surface wind fields. Each has a different level of confidence and uncertainty, and good accounts of wave measurement and data are given by Steele et al. [54], Komen et al. [55], Krogstad et al. [56, 57], and Lindroth and Leijon [58].

6.1.1. Visual Observations

Visual observations of wave parameters (height, period, and direction) taken from ships of opportunity are sometimes available for long periods (decades). This data source has clear limitations and many potential errors, particularly in stormy weather (cf. [59]). Other significant limitations include the number of observations and extent of data coverage which is typically limited to shipping lanes. However, in the past, there are many areas of the world where wave measurements by other means are absent and visual observations may be the only source of information.

6.1.2. Buoy and Platforms

Surface-following buoys are the most common instrument used to measure waves, with deployment depths between 10 m and a few hundred metres (cf. [56, 57, 60]). There is a large variation in the quality of the data available from these devices depending on their age and type. Typically, the latest devices can capture an estimate of the main wave parameters (significant wave height, Hs, mean and peak period, Tm and Tp, and the related directional information such as mean direction, mean directional spread, kurtosis, and skewness for the full 2D spectrum). Typically, the measurements are taken at 1-hour intervals. Wave-measuring buoys are accurate instruments, and the related error for Hs is usually only a few percent. Uncertainty occurs due to sampling variability and resolution of the frequency distribution (peak periods). In the high Hs range, the buoys tend to “slip” around the highest crests. In doing so, this introduces a negative bias in estimation of the higher wave height values.

6.1.3. Remote Sensing

Satellite altimeter and scatterometer data are now available from a range of platforms (e.g., ERS2, ASCAT METOP-A, Krogstad and Barstow [61]). The altimeter provides information on wind speed and wave height, and the scatterometer provides a wider band of information on wave and wind parameters. However, in areas of complex geometry (typically shelf sea and estuarine areas), satellite data usually provide a poor estimate of sea state due to the strong spatial gradients which cannot be well resolved by the satellite sensors. Other limitations include poor temporal coverage due to satellite overpass frequency which can prevent acquisition of high-frequency time series for a chosen location. However, developments reported by Young et al. [62] demonstrate clearly that useful global wave data sets can now be assembled using data from a range of remote sensing platforms and that these data have high utility in regions of the world where wave data are scarce. Wave data can also be obtained using HF (e.g., [63]) and X-band (e.g., [64]) radar systems and through the use of video (cf. Argus Video [65]). Although these approaches require some calibration, each has a capability of measuring nearshore waves and can help calibrate and validate wave models in complex regions where reflection, refraction, and diffusion processes may be present.

6.2. Performance Guidelines

Typically, for waves, the required model performance at the calibration and validation stage is judged to be acceptable if the wave model outputs are biased to within (a) ±10% of the mean observed height, (b) ±20% of the mean observed period, and (c) ±15° of the mean observed direction. Considering design, appraisal, and strategy applications, Table 6 provides practical wave model performance guidelines concerned with model resolution, minimum record (or hindcast) lengths required to define extreme wave statistics, and RMSE values for Hm0 and average peak Hm0 [3].

These wave model performance statistics are intended only a guide, and often more stringent agreement between observed and modelled data may be required. Equally, these criteria might be too exacting for all regions of the modelled area. Meeting these criteria for at least 90% of positions/time combinations is likely to be a less stringent and acceptable criterion in most circumstances. In cases where waves from more than one direction are present simultaneously (e.g., swell and wind sea), mean wave direction is meaningless and reference must be made to the directional wave spectra to characterise the observed and modelled wave field. Scatter plots and correlation statistics are also useful to demonstrate agreement between measured and modelled wave direction for multidirectional sea states. It is also helpful in some circumstances to examine directional wave spreading since many third-generation spectral wave models tend to underestimate this parameter (cf. [66, 67]).

Examples of useful plots that help assess wave model performance are shown in Figure 5 which shows measured Hs, Tp, and direction data from the SWAN model [68] and measured values from a Directional Waverider buoy. Good agreement between the model and the observations is demonstrated. Figure 6(a) shows a scatter diagram of measured and modelled Hs. Here, the line of unity indicates that the model slightly overestimates Hs. In Figure 6(b), a peak over threshold (POT) analysis for modelled and measured Hs data is shown. These diagrams both help identify agreement or otherwise between measured and modelled wave data, and it is recommended that these visual checks are used when evaluating wave predictions.

In the case of spectral wave models, it is also helpful to examine the frequency domain differences in the energy distribution between model and measured spectra. The same approach applies to directional wave spectra, noting however, that it is unusual to have measured wave spectra from more than one location in the model domain.

7. Sediment Models

Accurately simulating the behaviour of sediments in numerical models presents one of the greatest challenges. The principal aim of sediment models is to reproduce the observed spatial and temporal variations in observed erosion and accretion. Here, guidance is provided for the calibration of numerical models for sand (sediment coarser than sand (e.g., shingle) cannot be represented reliably through 2-dimensional modelling) (median grain diameter, D50 > 63 µm) and silt/mud (D50 < 63 µm). Attention is first given to the essential data required to successfully calibrate sediment models of estuaries and open shelf sea environments. The methods used to measure bed load, suspended load, and net sediment transport are reviewed briefly, drawing attention to potential errors and uncertainties that must be considered in the model calibration process [69]. The issues associated with the calibration of cohesive, noncohesive, and mixed grain-size sediment models are then discussed.

It is emphasised from the outset that a primary requirement of all sediment modelling is accurate information about the physical properties of the sediment (grain-size distribution, bulk density, porosity, etc.), bedforms (active and moribund), and the spatial distribution and thickness of the sediments. Also, sight should not be lost of the fact that although the physical characteristics of sediments may be well expressed, biological mediation and the behaviour of some cohesive sediment remain difficult to parameterise in models [70], and although some of these problems can be overcome using in situ measurements (e.g., field flumes [71]), these are costly to deploy and are not normally undertaken in practice.

7.1. Data Sources

Obtaining suitable data of sufficient quality for the calibration of a sediment model is a widely recognised challenge [72]. In addition to sediment data obtained directly from in situ water and bed samples (grabs), typically, there are two primary types of data which are required for the calibration of sediment models: (a) measurements of the sediment transport flux (bed load and suspended load) over time scales of a few tidal cycles and (b) measurements of bed-level changes attributable to local erosion and accretion to provide information on net sediment transport over a period of weeks and months. A comprehensive review of instrumentation used to measure sediment transport is given by Williams [73].

7.1.1. Measuring Bed Load and Bedforms

In estuaries and shelf sea environments, bed load is the dominant mode of sediment transport for sand. Sediment traps, frequently used to measure bed load in rivers, have been deployed in estuarine and shelf sea environments with mixed success (cf. [74]). The Arnhem, Helley–Smith, and Delft Nile samplers are the most commonly used devices owing to their robustness and ease of handling in the field. However, their accuracy depends on the number of samples collected which may be restricted by high analysis costs.

As an alternative to traps, bed load is sometimes estimated using fluorescent tracers (e.g., [32]) and less frequently using magnetic tracers (e.g., [75]). The method involves deploying a quantity of a material at a known location and subsequent sampling campaigns on a grid of sample positions to determine the dispersion of the material. Both approaches use materials with the same dynamic behaviour as the natural sediments and with sufficiently distinct characteristics to make it easily detectable in very low concentrations (http://www.partrac.com/ (accessed on 1 August 2014)). Sediment tracing techniques have values in studies examining the sediment flux and have been used effectively to study dredging impacts and disruptions to sediment supply attributable to structures. A comprehensive review of tracing techniques and options is given by Black et al. [76].

Passive acoustic techniques using hydrophones to record the sediment-generated noise, SGN, arising during bed load transport of coarse sediments have been used (e.g., [77]) but to be effective, they require objective calibration which can be very costly. However, improvements to processing software and computing power now allow automated analysis of video images to detect particle displacements at subsecond temporal resolution for the entire field of view and the visual analysis of bed load images in providing useful data. Attempts to quantify bed load have also exploited the bottom tracking feature of ADCPs in combination with conventional pressure difference samplers. Together, these instruments can be used to determine the bed load transport velocity and bed load transport rate, respectively (e.g., [78, 79]). Rates of bed load transport have also been inferred from rates of bed form migration measured using rotary sonar devices (e.g., [80, 81]). At a much larger scale, remote sensing techniques have been applied to link large bedform migrations with bed load sediment transport rates (e.g., [82]).

7.1.2. Measuring Suspended Load

Only very fine sand, silt, and mud are transported in suspension in estuarine and shelf sea environments. In the simplest approach used to quantify the suspended load, water samples are collected in situ to determine the concentration of suspended particulate matter (SPM) and the grain-size distribution either at the surface or at a specified depth in the water column using, for example, triggered water bottles or pump sampling (cf. [83, 84]). Samples can be collected either at discrete times or at set times throughout a tidal cycle. There is a low-to-moderate level of certainty in the resultant SPM data owing to potential errors in the way the water samples are collected, the short temporal sampling period, and the presence of varying quantities of organic particles. The deployment of colocated current meters enables the sediment flux to be determined, which in turn can provide useful information on sediment resuspension and settling velocity.

Turbidity meters can provide continuous or discrete measure of SPM concentrations by detecting the attenuation of light passing through the instrument’s sampling volume. They are best suited to suspensions of silt and clay-size particles. Self-logging turbidity meters are capable of recording accurately turbidity at a single depth within the water column for long periods [85]. Turbidity data are also obtained using a CTD probe (conductivity, temperature, and depth) equipped with a turbidity sensor. In estuaries and shelf sea locations, CTDs are normally lowered and raised through the water column for a period of 12.5 hours to provide information on the temporal changes in the SSC profile during a tidal cycle. However, there is the potential for the sensors to become fouled over time giving erroneous data, especially if the sensor becomes exposed during low water and the optical systems are compromised by sediment and/or biological films, leading to unrealistically high turbidity values. In addition, the gain setting (sensitivity of the instrument) must be correctly adjusted to accommodate the range of SPM concentrations in the area. For example, turbidity measurements could reach the upper limit of the instrument if the gain setting is too low or barely register turbidity if the gain setting is too high. This problem can be overcome using an instrument with a logarithmic response which allows measurements of SPM spanning several orders of magnitude. Although overcoming the problems associated with saturation and aliasing, the overall instrument precision is reduced as a result.

Optical turbidity instruments are calibrated using primary solutions such as formazin, and turbidity is expressed in formazin turbidity units (FTU) or nephelometric turbidity units (NTU). The main problem with this approach arises when conversions are made between FTU (or NTU) and in situ water samples where differences can be as large as ±200% [86]. Furthermore, the material in suspension may be a complex mixture of organic and inorganic particles which adds further complexity to the conversion between FTU and SSC. Calibration is therefore required to get the concentration of SPM into meaningful units for sediment modelling purposes (e.g., mg/l). This can be achieved either by collecting water samples at specific times to calibrate the measurements or by calibrating the instrument in laboratory conditions for a range of concentrations before and after deployment. It is important that calibration is performed over an applicable range of SPM concentration values. However, there are additional problems attributable to flocculation when measuring muddy sediments. In these cases, water sampling can destroy the delicate structures, and changes in temperature/salinity can enhance or reduce flocculation potential. Both factors can lead to errors, and thus, these kinds of data must be treated with caution in the context of model calibration [87].

Optical backscatter sensors (OBSs) measure turbidity and suspended solids concentrations by detecting infrared light scattered from SPM (cf. [88, 89]). OBS instruments are best suited to suspensions of silt and clay-size particles. The response of the OBS sensors strongly depends on the size, composition, and shape of the suspended particles, and calibration like that used for turbidity sensors is required to obtain SPM concentration data. OBS instruments are subject to the same problems with biofouling and other optical contamination as turbidity sensors (e.g., [90]). SSC profiles can be obtained using vertical arrays of OBS. Further information is given by, for example, Kineke and Sternberg [91], Hoitink and Hoekstra [92], and Boss et al. [93].

Aerial and satellite remote sensing imagery can be used in some circumstances to indicate the advection rate and direction of suspended sediment plumes in the surface and near-surface layers of the water column. Remote sensing algorithms have been widely used to extract information on suspended sediment concentrations from multispectral sensor data (e.g., [94, 95]).

For sand-size particles, the use of multiple-frequency acoustic backscatter (ABS) to measure the concentration of suspended sediment is becoming more widespread. Inversion techniques can be applied to obtain suspended sediment concentration (SSC) profiles directly (e.g., [96]), and information about the grain size in suspension can also be extracted (e.g., [97]). Typically, these instruments measure SSC at intervals of 1 cm up to a few metres above the bed where the bulk of suspended sediment is present. SSC profiles can also be derived from ADCP data, albeit with less spatial resolution, using a similar acoustic inversion technique (e.g., [98]). While one or more samples are required for calibration and measurements are spatially averaged, the instrument can provide useful SSC profile information over extended periods.

7.1.3. Estimations of Net Sediment Transport

In many modelling studies, the required outcome concerns the prediction of net sediment transport over periods of days, weeks, or months. There are several useful data that can assist the model calibration process for this aim. In areas where frequent (e.g., annual) maintenance dredging is undertaken, information is likely to be available to describe the frequency and volumes of sediment removed. These data can be used to define changes in bed levels (e.g., accretion amounts and rates over known periods), and through comparisons between predicted accretion rates and rates derived from dredging data, it may be possible to calibrate a sediment model, albeit with limited accuracy (e.g., [99]). In addition, dredging volume data can also be used to provide an indication of the interannual variability of accretion and guide the modelling process. However, owing to sediment loss during the dredging activities and to uncertainty about the bulk density of the material removed, these measurements may not be as accurate as might be desired and should only be used to provide an indication of the volumes of sediment accreting in the area. It is not possible to attribute accretion to a mode of transport, and thus, sediment formulae that predict total sediment transport must be used.

Several acoustic systems have been developed to image the bed at a large scale including echo sounding devices and side-scan and multibeam sonar (e.g., [100102]). These data can be used to determine net sediment budgets and transport pathways and assist model calibration (e.g., [103, 104]). Repeat subtidal bathymetric surveys can provide valuable information on bedform mobility from which net sediment responses can be determined (e.g., [105]). At the scale of estuaries, Mason et al. [77] illustrate how areas and volumes of sediment accretion and erosion can be estimated using the waterline method employing remote sensing and hydrodynamic modelling. Recent advances in LiDAR now make it possible to penetrate water to depths exceeding 10 m provided water clarity is good enough and thus allow subtidal survey opportunities (e.g., [106]). Monitoring of large-scale changes in morphology and/or bathymetry in coastal and estuarine environments brought about by sediment mobilisation, transport, and accretion can also now be measured routinely with systems such as ARGUS (http://www.planetargus.com/ (accessed on 1 August 2014)) and X-band radar (http://www.oceanwaves.de/ (accessed on 1 August 2014)) (e.g., [107]). Although remote sensing would never be selected to generate a primary bathymetric data set, it has been used in situations where monitoring of rapid bathymetric changes may be required (e.g., following beach nourishments, breaching, etc). Bathymetric and topographic survey data are obtained at irregularly spaced locations. To make the data usable in a numerical model, it is necessary to use an interpolation routine to transform the data onto a grid of regularly spaced data points. Care must be taken to select the most appropriate interpolation method as any error will impact on calculations of change in bathymetry and topography.

LiDAR data are especially useful for intertidal areas. Although spatial positioning is accurate (typically ±5 cm), the vertical accuracy is at best ±20 cm. Furthermore, standing water on the beach can result in spurious data, and significant postprocessing may be required. Although the use of LiDAR to determine accurate accretion and erosion rates is not recommended, it does provide extensive (and rapid) spatial cover which may prove to be useful in several applications. In some instances, an assessment of changes in beach topography might be enhanced through reference to fixed, identifiable structures (e.g., quay walls and engineering structures) which can be used to calibrate repeat surveys.

7.2. Sediment Transport Models

Shelf sea and estuarine models normally provide output defining the predicted cumulative erosion/sedimentation for a stated bulk density giving the cumulative change in bed level over the model period. Total sand transport is usually expressed as a net value over a specified period, allowing transport vectors to be plotted which may be comparable with information directly available from the literature.

A wide range of sediment transport formulae are available to predict bed load transport, suspended load transport, and total load transport of noncohesive and cohesive sediments (e.g., [108]). All are derived to represent the best fit to empirical data sets derived in the laboratory or in some cases from the field. The sediment calibration data available will determine the accuracy of the model and limit how much validation is possible. Typically, a sediment model will be calibrated using SSC data, with validation utilising measured sedimentation rates (e.g., [109]). It is important to keep in mind that the sediment transport model is driven by modelled hydrodynamics and that highly nonlinear relationships exist between bed shear stress, flow turbulence, and sediment mobilisation, transport, and accretion. Thus, any limitation with the initial hydrodynamic calibration could impact significantly on the sediment model. It is therefore critically important to obtain the best hydrodynamic calibration possible.

When modelling sediment transport, it is important to recognise the heterogeneity of the seabed and the homogeneity of most sediment transport models. It is therefore essential that roughness maps previously described are used to characterise as accurately as possible the physical properties of the sediments (grain roughness) and the morphology of the bed (form drag). It should be remembered when interpreting model outputs that, since sediment transport formulae are empirical and are based on a limited amount of calibration data from laboratory and/or field studies, the prediction of sediment responses to hydrodynamic forcing is at best limited to accuracy of no more than a factor of two ([110112]).

7.2.1. Sediment Properties

As the physical and dynamic properties of noncohesive sediments are less complex, the amount of information needed to setup and calibrate sand transport models is less than that required for mud. In the absence of measurements, the specific density, porosity, and bulk density for quartz sand are assumed to be 2650 kg/m3, 0.45 kg/m3, and 1460 kg/m3, respectively. The median grain diameter (D50) is normally measured using grain-size analysis of samples or published data (e.g., BGS (http://www.bgs.ac.uk/discoverymetadata/13605549.html)). While the spatial distribution of sand-sized sediment and information on the depth of any deposit is helpful, it is rarely available, leading to ambiguity about sediment source limitations. Information about bedforms is available either from observations described above or generated through theoretical equations linking bedform dimensions with the sediment grain size and the hydrodynamic regime [50]. The threshold bed shear stress, a critical parameter in sediment models defining the bed shear stress required to mobilise the sediments, is normally calculated using a selected empirical formula (e.g., [50]) and expressed as a Shields parameter.

For cohesive sediment transport models, the following sediment data are normally required: sediment density; grain size; settling velocity; and the threshold bed shear stress for erosion and deposition. If this information cannot be obtained from in situ measurements and/or analysis of samples, Whitehouse et al. [113] provide a good account of formulae for deriving some useful properties of cohesive sediments. It is common practice to measure “wet” sediment density in situ using a density probe which can then be converted to “dry” density (e.g., [113]). To obtain the correct dry density required by some models, it is recommended that the porosity factor is changed until the wet density is the same as the measured density. Typically, porosity values between 0.75 and 0.98 should be applied for sediments consolidated for less than 1 year and then 0.25 to 0.75 for longer periods of consolidation.

Typical settling velocities for mud range from 0.003 m/s to 0.0001 m/s and can be calculated based on the grain size (if known) using empirical formulae (cf. [113]). However, caution must be exercised when using this approach due to flocculation, which can increase significantly the size (and hence settling velocity) of particles in suspension. These may also incorporate organic matter in their matrix, thereby affecting the density (and possibly reducing the settling velocity). Furthermore, sampling of suspended sediments in situ frequently destroys the flocs or alters significantly their physical properties. There is no simple solution to this problem, and modelling assumptions and limitations must be stated clearly. Although knowledge of mineralogy, salinity, turbulent kinetic energy, and water temperature makes it possible to calculate the potential floc size, this is further complicated by temporal and spatial variations in these parameters. It is also noted that many mud models use the SSC as a parameter for defining the settling velocity, not grain size.

The critical bed shear stress for erosion, τcrit_E, can be estimated using the Mitchener et al. [114] formula which accounts for sediment density. Several methods to measure τcrit_E exist and comprise laboratory devices to analyse samples from the field and carousel flumes for field deployment [71]. These can be very effective and allow investigation of how τcrit_E changes as erosion of a given sample proceeds (normally increasing). As with most sediment dynamics, extreme care should be exercised when attempting to parameterise physical properties and processes using empirical approaches. The critical bed shear stress for deposition, τcrit_D, is frequently used as a calibration parameter. However, it is highly dependent on the local conditions. Generally, values between 0.1 N/m2 and 0.3 N/m2 provide effective calibration settings for mud models. It is important to note that the default value in some models may not be appropriate for a case (e.g., in Delft3D τcrit_D = 1000 N/m2 and must be changed prior to any model runs). Examples to guide the use and calibration of cohesive sediment models are provided by van Kessel et al. [115] and Carniello et al. [116].

7.3. Performance Guidelines
7.3.1. Noncohesive and Cohesive Sediments: Suspended Sediments

Sediment model calibration success can be assessed visually by comparing measured and predicted average concentrations over a set period, typically a spring-neap cycle. Example time series of measured water level and measured and predicted SSC over a period of 8 days are shown in Figure 7. In this case, SSC is measured continuously using a turbidity instrument. A second example of model output and SSC calibration data is shown in Figure 8. In this case, SSC data were obtained from water samples. Both Figures 7 and 8 demonstrate that the general pattern of SSC is similar for the modelled and measured data. For most applications, the aim should be to achieve a model calibration of ±20% of the measured average concentrations. In areas where time series of SSC measurements are available from multiple sites, a calibration level of ±30% for average SSC at most of the sites would be deemed as a good level of calibration. If there are only discrete values of SSC from water samples (or a handheld turbidity probe), experience shows that calibration of only ±40% is achievable since the discrete measurements are subject to higher levels of uncertainty.

7.3.2. Noncohesive and Cohesive Sediments: Sedimentation

Provided sufficient good quality data are available, sedimentation rates provide one of the best means of validating the longer-term performance of sediment models and provide an integrated view of the net result of modelled suspended sediments and bed load. Given the complexity of sediment transport and the errors associated with measurements and empirical sediment transport formulae alluded to above, it is normal practice to apply a scaling factor to the modelled sediment transport rates. In effect, this is a global correction factor to the sediment transport rates predicted by the model that provides an effective means of matching the model predictions of sedimentation with the observations. The scaling factor has no physical meaning and simply represents many complex physical processes not present within the model including, for example, biological and sediment consolidation factors which can significantly alter the physical properties of the sediment with respect to mobilisation and transport.

As a general rule, the scaling factor should be less than 5. Higher values indicate a more significant issue with the accuracy of the modelling, or with site-specific complexity, that may require a nonstandard approach. In such cases, the model approximations cannot be relied upon to describe sedimentation/accretion, and field monitoring is recommended to supplement the model deficiencies.

Dredging data are frequently used as a measure of long-term sedimentation and are normally expressed as the volume of sediment removed from an area per year. Although such data are useful, they are frequently complicated by a poorly defined relationship between the dredged volume and the rarely provided bulk density value which can give rise to significant errors. Recourse must be made to estimated bulk density values, and sensitivity analyses should be used to quantify sedimentation for a plausible range of values. When validating a model using dredging data, the volume of sediment accumulation predicted by the model (normally over a 15-day spring-neap cycle) would normally be scaled to match as closely as possible the measurements. In recognition of the many sources of errors and uncertainty, a model predicting the dredged volumes to within 50% of the measured rates is normally deemed to be satisfactory for most practical applications. For example, in study of the Humber, the modelled sedimentation volume was 2,180,000 m3/yr, while the average volume of sediment dredged was 1,830,000 m3/yr, with the values ranging from 790,000 m3/yr to 3,915,000 m3/yr over a 5-year period (Mott MacDonald, per. com.).

8. Morphological Models

Morphological modelling in estuarine and coastal environments is challenging, and useful description of the range of approaches employed is provided by Roelvink [117]. The primary limitation to the accuracy of morphological models concerns the length of time over which the model is run, with results from long runs (e.g., monthly–decadal) likely to deviate significantly from reality [118, 119].

From the outset, it is very important to establish a conceptual understanding of sediment transport and historical morphological changes in each study area before attempting a morphological model. This must draw together existing evidence and provide a qualitative description of the process controls and how the morphology of the system responds to these drivers. For long-term assessments of morphology, this also requires consideration of climate change factors. A conceptual understanding can provide the hypothesis (e.g., sources and sinks) with which to test the performance of the model and to provide some guidance on expected magnitudes and directions of sediment transport and the associated morphological changes.

The largest constraint to calibrating morphological models is the availability of high-quality data sets that adequately describe the model parameters over a sufficient length of time. In an assessment of data requirements by Splinter et al. [120], it was concluded that (a) calibration of a seasonally dominated site required longer data sets but was less sensitive to sampling interval and (b) calibration of a storm-dominated site required shorter and more frequently sampled data sets. Most studies show that morphological calibrations that are based on short observational records (i.e., < one year) are not robust. To determine initial estimates of calibration coefficients and to hindcast short-term (1–5 years) shoreline variability, Splinter et al. [120] recommend monthly monitoring programs for at least two years. For longer-term predictions of morphology, longer data sets are required to improve the performance of the models.

Morphodynamic models of shelf sea and estuarine environments usually comprise a controlling programme that invokes sequentially subroutines predicting hydrodynamics (e.g., [68, 121, 122]), sediment transport (e.g., [123]), and bed level (via the sediment continuity equation). These are all then linked via the well-established morphodynamic feedback loop (e.g., [124, 125]). Examples of morphological models include (a) the deterministic process-based model Delft3d-MOR (e.g., [126]), suitable for short-term morphology predictions, (b) ASMITA (e.g., [127]), a semiempirical model using large process-based units which can iterate towards an equilibrium condition and predict large-scale changes in sediment balances (sediment budget) over medium- to long-term periods, and (c) XBeach, a deterministic process-based model suitable for predicting morphological changes resulting from storm impacts (e.g., [128, 129]). An example of outputs from an XBeach model setup to predict the impact of a shore-normal groyne is shown in Figure 9.

However, bed-level changes occur over long time scales compared with the hydrodynamic forcing, and thus until recently, owing to computational limitations, shelf sea morphodynamic models have been unable to predict very far into the future using traditional morphodynamic upscaling techniques such as the “continuity correction” method. To overcome this limitation, Lesser et al. [122] and Roelvink [130] have developed the morphological acceleration factor (MORFAC) concept to enable morphological predictions to extend over decadal (e.g., [131]) and centennial [132] time scales.

Whilst the certainty in predicting morphology cannot be proven, it may be possible to bound the uncertainty using sensitivity analysis for key process drivers and to determine a range of possible outcomes. Where possible, a range of different morphological modelling approaches should be applied, and where there is general agreement between approaches, then it may be possible to draw additional confidence from the results using an ensemble of model outputs.

Defining what is and what is not a good morphological model performance depends on the spatial and temporal scales considered. At a minimum: (a) the observed/measured sedimentation-erosion patterns must be broadly in agreement with the model outputs; (b) contour plots of measured and computed sedimentation and erosion need to agree as closely as practicable; (c) predicted volume changes over control areas must agree as closely as practicable with soundings or dredging figures; and (d) the shape, migration, and area change of measured and computed cross sections are required to agree.

The incomplete description of the physics underpinning morphological processes and an imperfect knowledge of the initial conditions and parameters will always lead to increasing errors in the model predictions and limit the ability of shelf sea and estuarine morphodynamic models to accurately predict the future true state of the environment.

9. Improving Predictions and Reducing Uncertainty

An alternative emerging approach to address the problem of model prediction uncertainties involves the application of data assimilation techniques. These techniques keep model parameters fixed and produce an updated model state that matches as closely as possible the true state by combining observational data with model predictions. This updated model state is then used to initiate the next model forecast. However, even if the initial system state can be described flawlessly, model parameters simplify the physical processes and, by doing so, will result in the growth of prediction errors. At present, assimilation methods being developed to improve morphological forecast reliability are producing encouraging results (e.g., [133137]). For example, ad hoc data assimilation schemes and techniques using more refined heuristic tuning of model state variables are being used to improve the performance of suspended sediment transport models (e.g., [138, 139]).

Two main types of uncertainty pervade morphological models: (a) scenario uncertainty stems from uncertainty about the nature of the future weather and weather events (magnitude and frequency) responsible for driving morphological change and (b) response uncertainty relates to the uncertainty in predicting how the morphological system will respond to given forcing conditions. To reduce uncertainty in morphological model predictions, the ensemble approach widely adopted by a climate change scientist (e.g., [140]) may prove to be helpful. The ensemble modelling approach aims to address uncertainty arising from two main sources: (a) incomplete description of the physical processes bringing about morphological changes and (b) limited computing power that constrains how accurately processes can be parameterised in models. For example, in models of estuaries or shelf sea systems, subgrid scale processes such as turbulence can only be represented in a simple way. There are two possible routes to take in ensemble modelling: (a) perturbed-physics studies investigate how model predictions are affected by the choice of input parameters through running systematically a single model with different parameter values and (b) multimodel studies investigate how predictions differ between different models. The effect of the initial conditions of the model can also be tested using both approaches.

10. Summary and Conclusions

The modelling guidance presented in the paper has drawn on published guidelines and on the extensive practical experience of the authors and their colleagues using a range of model types in modelling projects concerned with, for example, managed realignment, environmental impact assessments for offshore wind farms, tidal energy, coastal defences, dredging/disposal sites, beach and estuarine morphodynamics, barrages, and cooling water discharges. Statistical guidelines to establish calibration standards for a minimum level of performance for coastal and estuarine hydrodynamic and sediment models are summarised in Table 7 and are based in part on the recommendations from Evans [1] and Bartlett [2].

While naturally these guidelines remain open to challenges from modellers requiring more exacting model performance, they have been found to deliver models with a good prognostic performance across a broad range of metrics and recognise the practical limitations imposed on model calibration processes by the accuracy and the temporal and spatial resolutions of the available calibration data. Their use in coastal and estuarine modelling studies is therefore recommended.

Nomenclature

ADCP:Acoustic Doppler current profiler
ADV:Acoustic Doppler velocimeter
AWAC:Acoustic waves and currents
BST:British Summer Time
CD:Chart datum
CTD:Conductivity, temperature, and depth
DHI:Danish hydraulics institute
DGPS:Differential global positioning system
DEM:Digital elevation model
D50:Median grain diameter
D90:90% finer than
FTU:Formazin turbidity units
GMT:Greenwich Mean Time
HAT:Highest astronomical tide
HF:High frequency
hmax:Maximum wave height
Hm0:Hs computed using spectral analysis
Hs:Significant wave height
JONSWAP:Joint North Sea Wave Project
LAR:Largest astronomical range
LAT:Lowest astronomical tide
LiDAR:Light detection and ranging
MHWN:Mean high water of neap tides
MHWS:Mean high water of spring tides
MLWS:Mean low water of spring tides
MLWN:Mean low water of neap tides
MSL:Mean sea level
N:Newton
Ni:Total number of data
NTU:Nephelometric turbidity units
OBS:Optical backscatter sensors
ODN:Ordnance Datum Newlyn
OSGB:Ordnance Survey Great Britain
Oi:Observed wave data
POT:Peak over threshold
PT:Pressure transducer
Q:Sediment transport
QA:Quality assurance
QMS:Quality management system
R:Pearson product-moment correlation coefficient
RMS:Root mean square
SET:Sedimentation erosion table
SI:Scatter index
Si:Simulated or model data
SSC:Suspended sediment concentration
Tm:Mean wave period
Tp:Spectral peak wave period
Tz:Mean wave period
WGS84:The new world geodetic system
WW3:WaveWatch 3
G:Gram
kg:Kilogram
ks:Equivalent sand roughness
l:Litre
mg:Milligram
s:Second
ε:Rate of turbulent energy dissipation
κ:Rate of turbulent kinetic energy production
μ:Micron (10−6 m)
θ:Wave direction (degrees)
θm:Mean wave direction (degrees)
τ:Bed shear stress (N/m2).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.