Improving data assimilation algorithms for faster execution, higher resolution and application coupling for U.S. West and East coast ocean forecast systems
Project Lead: John Wilkin, Rutgers, The State University of New Jersey
CO-PIs: Hernan Arango (Rutgers), Andrew Moore (UC Santa Cruz), Christopher Edwards (UCSC)
Federal partners: NOAA Coastal Survey Development Laboratory (CSDL)
IOOS Regional Association partners: MARACOOS, CeNCOOS, SCCOOS, and NANOOS
Project Overview and Results
The modeling efforts here utilize the Regional Ocean Modeling System (ROMS; www.myroms.org) and its 4D-Var assimilation system and coupled NEMURO biogeochemical and ecosystem model.
4D-Var is an iterative data assimilation algorithm for identifying the most-likely ocean state given all available prior information and observations (Moore et al. 2011). The components of ROMS 4D-Var are depicted schematically in the figure below.
The most-likely state estimate is identified by minimizing a nonlinear cost function, which is challenging. An approach is to linearize the problem utilizing the tangent linear (TL) and adjoint (AD) versions of ROMS and minimize a sequence of linearized cost functions during so-called inner loops of the algorithm. The ocean state estimate about which each inner loop is linearized is periodically updated in an outer loop, which is a run of the full nonlinear model with all its physics and forcing. At the conclusion, the final outer-loop integration yields the most-likely ocean state.
A 4D-Var inner-loop comprises one integration of TL and AD ROMS that each require ~50% more CPU time than a nonlinear (NL) ROMS calculation spanning the same time interval. Therefore, a single inner-loop is equivalent to ~3 NL model runs. The current CeNCOOS and MARACOOS forecast systems require ~50 times the computational cost of running the NL model to compute a single 4D-Var analysis, which places significant constraints on feasible model resolutions. This project will develop multi-resolution and mixed-precision computation capabilities that will improve the efficiency of the ROMS 4D-Var system.
Biogeochemical (BGC) ocean state-estimates have been a component of the CeNCOOS near-real-time ROMS system since 2015. Based on the NEMURO model (Kishi et al. 2007), the model estimates 11 ecosystem components (inorganic nitrogen and silicon nutrients, 2 phytoplankton classes, and 3 zooplankton). The BGC model is incorporated into UCSC ROMS 4D-Var, assimilating satellite surface chlorophyll data. Though not in the NRT product, recent model enhancements add oxygen and carbonate chemistry for the prediction of pH and aragonite saturation – key indicators of ocean acidification.
Multi-resolution ROMS 4D-Var
We propose a multi-resolution approach for ROMS 4D-Var facilitated by a new feature in ROMS that allows different grid resolution NL, TL and AD components of 4D-Var run as separate executables. This “split” capability was introduced as part of the integration of ROMS into the Joint Effort for Data Assimilation Integration (JEDI).
Model-data misfits, or the so-called “innovations” in DA parlance, are computed based on a high-resolution NL model forecast with the most accurate possible physics, but the AD and TL iterations that collectively comprise the bulk of the compute effort run at a lower resolution. The low-resolution increments are interpolated back to the fine resolution model to execute the forecast.
Mixed-precision 4D-Var in the multi-resolution system
Most earth system models are written in double-precision yet research in Numerical Weather Prediction (NWP) suggests that many floating-point operations can be performed at lower accuracy without an appreciable degradation in the final product. In applications where performance is limited by memory and or I/O, such as 4D-Var, the savings for large ocean grids may be substantial.
In mixed-precision ROMS, the outer loops are computed at double-precision for accuracy in the ocean dynamics, but the inner loops that inform the iterative search directions of the cost function minimization run at single precision. This will be tested in the MARACOOS and CeNCOOS systems and WCOFS re-run at UC Santa Cruz. In combination with the multi-resolution 4D-Var approach described above, the use of mixed-precision could potentially yield an order of magnitude reduction in the computational effort required to perform a 4D-Var analysis.
Time- and space-averaged observations
Altimetry, gilders, and coastal HF radars provide high temporal resolution observations to IOOS forecast systems. Direct assimilation of these raw data streams, however, can degrade the performance of 4D-Var if the cost function is dominated by high-frequency signals that are not well resolved by the model. An alternative approach is to assimilate the time-average of the observations, or in case of altimetry, observations that are pre-processed to best represent slowly evolving geophysical signals. The existing ROMS observation operators are not configured to accept time-averaged data, so we will develop a more general operator in the TL model forcing to allow the assimilation of such data.
Ecosystem/BGC modeling with WCOFS
One impediment to the development of ecosystem forecast models is the time and expertise required to configure a skillful underlying physical model, especially with DA. The ROMS 4D-Var systems in the RAs and at CSDL each took years to develop. We will create an infrastructure to quickly and easily re-run WCOFS analyses and forecasts locally, and with this explore how to leverage efforts in DA model development in support of ecosystem modeling.
ROMS 4D-Var sandbox
Creating a cloud-based sandbox for experimentation with ROMS 4D-Var DA is central to our project transition activities. The IOOS Cloud Sandbox (ICS) (github.com/ioos/Cloud-Sandbox) presently enables re-running selected instances of NOAA operational forecast systems and RA systems.
In collaboration with RPS, we will add to the ICS archival instances (not NRT sustained data) of MARACOOS and WCOFS 4D-Var encompassing all necessary code and supporting input files. This will enable researchers to explore the sensitivity of WCOFS (and MARACOOS) 4D-Var to changes in the 4D-Var configuration, but more importantly to changes in the assimilated data. This might include the impact of new data streams that are not yet operational or, via Observing System Simulation Experiments (OSSE), the potential value of novel observing platforms or redesign of deployment strategies.
Kishi, M.J., Kashiwai, M., Ware, D.M., Megrey, B.A., Eslinger, D.L., Werner, F.E., Noguchi-Aita, M., Azumaya, T., Fujii, M., Hashimoto, S. and Huang, D., 2007. NEMURO—a lower trophic level model for the North Pacific marine ecosystem. Ecological Modelling, 202(1-2), pp.12-25.
Moore, A.M., Arango, H.G., Broquet, G., Edwards, C.A., Veneziani, M., Foley, B.P.D., Doyle, J., Costa, D., and Robinson, P., 2011a. The Regional Ocean Modeling System (ROMS) 4-dimensional variational data assimilation systems. Part I: System overview and formulation. Prog. Oceanogr. 91, 34–49.