Data Assimilation

CGCAM can have the large-scale wind and thermodynamic fields prescribed as a function of space and time from an external source such as a mesoscale model or a measurement database. The large-scale fields or "background" quantities are supplied on coarse space and time sampling intervals and then interpolated onto the fine CGCAM mesh as required. Special procedure are implemented so that the CGCAM solution will exactly track the interpolated background fields when no other forcing mechanisms are present. For more interesting cases with forcing mechanisms such as orography or convection, the background tendencies will still be imposed but the solution will deviate from the background in response to the forcing. The assimilation procedure is described in some detail below.

We start with compressible Navier-Stokes equations

\[\begin{split}\begin{gather*} \frac{\partial \rho}{\partial t} = -\frac{\partial \rho u_j}{\partial x_j}\\[0.1in] \frac{\partial \rho u_i}{\partial t} = -\frac{\partial}{\partial x_j}\Big( u_i \; \rho u_j \Big) - \rho g\delta_{i3} + \frac{\partial \Sigma_{ij}}{\partial x_j}\\[0.1in] \frac{\partial \rho E}{\partial t} = -\frac{\partial}{\partial x_j}\Big( E \; \rho u_j\Big) - \rho g w + \frac{\partial u_i\Sigma_{ij}}{\partial x_j} - \frac{\partial q_j}{\partial x_j} \end{gather*}\end{split}\]

where \(q_j\) is the thermal conduction and \(\Sigma_{ij}\) is net surface force, which is the sum of the viscous and pressure forces:

\[\Sigma_{ij} = \sigma_{ij} - p\delta_{ij}\]

For reference, the quantities \(q_j\) and \(\sigma_{ij}\) are defined in section Governing Equations.

Let the imposed background quantities be denoted by an overbar. If these quantities are interpolated onto the CGCAM computational mesh and the CGCAM operators applied, we will have relations such as

\[ \left(\frac{\partial \bar{\rho} \bar{u}_i}{\partial t}\right)_C = -\frac{\partial}{\partial x_j}\Big( \bar{u}_i \; \bar{\rho}\bar{u}_j\Big) - \bar{\rho}g\delta_{i3} + \frac{\partial \bar{\Sigma}_{ij}}{\partial x_j}\]

Where the subscript 'C' on the time derivative denotes a quantity computed with the CGCAM operators.

The background field database also provides time rates of change and we shall denote these with a subscript 'D'. Due to a variety of sources, the CGCAM and data time rates of change will generally not agree. Sources for disagreement are numerical errors, differences in forcing terms such as tidal or Coriolis forces, and even direct data assimilation in the large-scale model used to produce the background database. The disagreement can be rectified through the addition of a source term \(S_i\), defined as

\[S_i = \left(\frac{\partial \bar{\rho} \bar{u}_i}{\partial t}\right)_D - \left(\frac{\partial \bar{\rho} \bar{u}_i}{\partial t}\right)_C\]

giving

\[ \left(\frac{\partial \bar{\rho} \bar{u}_i}{\partial t}\right)_D = \left(\frac{\partial \bar{\rho} \bar{u}_i}{\partial t}\right)_{C_S} = -\frac{\partial}{\partial x_j}\Big( \bar{u}_i \; \bar{\rho}\bar{u}_j \Big) - \bar{\rho}g\delta_{i3} + \frac{\partial \bar{\Sigma}_{ij}}{\partial x_j} + S_i\]

where \(C_S\) indicates a CGCAM evaluation including the source term \(S_i\). This form makes it evident that the source term \(S_i\) brings the CGCAM-computed time rate of change in line with the data rate of change. Thus if the CGCAM field is initialized with the background quantities and there are no other forcing mechanisms (such as terrain or convection) present, the CGCAM-computed solution including the source term will exactly track the background solution in time. While this is a necessary condition for success it is not terribly useful, as almost all CGCAM runs will contain some sort of forcing mechanism. The basic idea in these cases is to retain the source term, exactly as it is defined above in order to provide the correct tendencies to encourage the background fields to track the input data. The full solution will of course not track, but at least the large-scale component should. Mathematically we simply delete the overbars from the solution variables while retaining the source term to get

\[ \frac{\partial \rho u_i}{\partial t} = -\frac{\partial}{\partial x_j}\Big( u_i \rho u_j \Big) - \rho g\delta_{i3} + \frac{\partial \Sigma_{ij}}{\partial x_j} + S_i\]

A more useful form can be derived by combining the above relations to give

\[ \frac{\partial \rho u_i}{\partial t} = -\frac{\partial}{\partial x_j}\Big( u_i \; \rho u_j - \bar{u}_i \; \bar{\rho}\bar{u}_j \Big) - (\rho - \bar{\rho})g\delta_{i3} + \frac{\partial}{\partial x_j}\left( \Sigma_{ij}- \bar{\Sigma}_{ij}\right) + \left(\frac{\partial \bar{\rho} \bar{u}_i}{\partial t}\right)_D\]

This form shows that we simply exclude the background components of the momentum flux, pressure force, buoyancy force, and viscous stress from the momentum balance and replace these with the given time rate of change provided by the external data source. Thus we directly control the background time rate of change instead of letting it be determined from a momentum balance.

Exactly same idea can be applied to the energy conservation law to give

\[\begin{split}\frac{\partial \rho E}{\partial t} = -\frac{\partial}{\partial x_j}\Big( E \; \rho u_j - \bar{E} \; \bar{\rho}\bar{u}_j \Big) - (\rho w -\bar{\rho}\bar{w})g + \frac{\partial}{\partial x_j}\left( u_i\Sigma_{ij} - \bar{u}_i\bar{\Sigma}_{ij}\right) - \\[0.1in] \frac{\partial}{\partial x_j}( q_j - \bar{q}_j) + \left(\frac{\partial \bar{\rho}\bar{E}}{\partial t}\right)_D\end{split}\]

While the source term concept is plausible for the momentum and energy equations, it is questionable for mass conservation as there should really be no unknown sources for mass. Numerical errors may exist, but it probably better to correct for these directly instead of working with background fields that do not conserve mass. Thus an alternative procedure is developed below for imposing a mass balance on the background field.

When applied to the CGCAM operators, the mass conservation statement for the background is

\[\left(\frac{\partial \bar{\rho}}{\partial t}\right)_C = -\left( \frac{\partial \bar{\rho} \bar{u}}{\partial x} + \frac{\partial \bar{\rho} \bar{v}}{\partial y} + \frac{\partial \bar{\rho} \bar{w}}{\partial z} \right)\]

Here the time derivative term \((\partial\bar{\rho}/\partial t)_C\) will in general differ from the rate of change implied by the background data. One way to fix this problem is to make small adjustments to the background velocities in order for force agreement in the rates of change. While there are many ways to make such adjustments, probably the easiest is to change just one of the velocity components. If we decide to change \(\bar{v}\), we can integrate the above relation while replacing \((\partial\bar{\rho}/\partial t)_C\) with \((\partial\bar{\rho}/\partial t)_D\) to get

\[\bar{v} = \frac{1}{\bar{\rho}}\left\{ \bar{\rho}_0\bar{v}_0 - \int_{y_0}^y \left[ \left(\frac{\partial \bar{\rho}}{\partial t}\right)_D + \frac{\partial \bar{\rho} \bar{u}}{\partial x} + \frac{\partial \bar{\rho} \bar{w}}{\partial z} \right]dy \right\}\]

In many cases \(\bar{w}\) will be zero, which simplifies the above relation somewhat.

If the adjusted \(\bar{v}\) is used in the y-momentum source term, then the background velocity field will conserve mass while automatically tracking the background density, all while using the unperturbed mass conservation statement

\[\frac{\partial \rho}{\partial t} = -\frac{\partial \rho u_j}{\partial x_j}\]

Round-off Error Reduction

The flux terms in the momentum and energy equations involve quantities \(u_i\) and \(E\) and these must be determined from the solution variables (\(\rho u_i\) and \(\rho E\)) by dividing by the density. While this would appear to be a trivial operation, the divide is subject to round-off error which can render \((\rho u)/\rho\) slightly different from \(u\). While it appears difficult to reduce this error in the general case, we can at least use a procedure that exactly recovers \(\bar{u}\) when \((\rho u)=\bar{\rho}\bar{u}\). An approach satisfying this condition is useful for verifying the correctness of the flux terms when the solution is initialized with the background quantities. The following formula

\[u = \bar{u} + \left[ \frac{(\rho u) - \rho\bar{u} }{\rho} \right]\]

does the trick since numerator of the bracketed term will evaluate to zero exactly when \((\rho u)=\bar{\rho}\bar{u}\) and \(\rho=\bar{\rho}\).

While one may think that the above formula would also reduce the round-off error over \(u=(\rho u)/\rho\) when the bracketed term is small, careful tests showed that this is not the case. The reason for this is that, under the floating point representation, the size of the numerator is handled by the exponent, while the finite precision is handled by the mantissa. The exponent simply shifts the solution whereas round-off appears when the mantissa of the divide is truncated. This means that the division is subject to the machine precision (at the level of the least significant bit) independent of the numerator size.

Boundary Conditions

When evaluating the background terms in CGCAM it is important to apply boundary conditions that are consistent with the model or data acquisition system used to generate the data. For example, if the data come from a coarse mesoscale model run and only provide horizontal wind components on a coarsely-sampled vertical mesh, then there is not enough information in the data to account for either terrain or the surface boundary layer. Thus inviscid boundary conditions on a flat lower boundary should be used in this case.