Understanding the ICON dynamical core

# Understanding the ICON dynamical core Objective: Better understand the science behind the ICON dynamical core, in particular the solution to the non-hydrostatic fully compressible Euler equations. To this end we first walk through the current `solve_nh` code, and build a list of questions and observations which can then be shared with the authors at DWD. ## Notes Printout for some vertical levels for the current MCH experiment: ``` nflatlev(jg) 7 nflat_gradp(jg) 36 kstart_dd3d(jg) 5 nrdmax(jg) 13 kstart_moist(jg) 1 nlev 80 nlevp1 81 ``` ## TODOs: - [x] @Magdalena Documentation: open [EXCLAIM confluence](https://unlimited.ethz.ch/pages/viewpage.action?pageId=166668220) page with links to Hackmd files. - [x] Build up a [glossary](https://hackmd.io/PALNc5ecTv6Gz-md5tipNw) with shortnames/abbreviations - [ ] Identify kernels/stencils that are part of the main Euler Equations and which ones are workarounds needed for stability. ## Observations * `nstep` counts the substepping. in `mo_nh_stepping.f90`: `SUBSTEPS: nstep = 1, ndyn_substeps_var(jg)` * for first dynamic substep some fields are set to zero (`lclean_mflx` does the same) * `jstep`: actual timestep * `vn` and `vt` are normal (prognostic) and tangential (diagnostic) velocities, defined on the edge center. `w` is the vertical velocity, defined at the cell center, but on the layer edge in the vertical. * there is a CFL threshold value 0.85 (`mo_nh_stepping.f89:set_ndyn_substeps`), if it is exceeded all kind of extra things are done: extra diffusion is run in `velocity_advection`, number of dyn substeps are changed,... * `ddt_` variables, there are around 30 to 40 of them in the diagnostic state, some are optional but not all of them. * `btraj` means back trajectory, needed for tracer advection, for example for advecting `theta` and `rho` * `lvn_only`: if .FALSE. it calculates vertical velocity (`w`) and tangential velocity (`vt`) in `velocity_advection` * output of *velocity advection* * `p_diag%vn_ie` (vn interpolated to half levels ("interface levels"), (defined in `atm_phy_les/mo_sgs_turbulence.f90`: `REAL(wp) :: vn_ie(nproma,p_patch%nlevp1,p_patch%nblks_e)`, ` * `p_diag%vt`:, * `p_diag%w_concorr_c`: contravariant vertical correction [m/s] * `p_diag%ddt_vn_apc_pc`: * `p_diag%ddt_w_adv_pc`: * `maxcvfl` : maximum clf over all cells. * `solve_nonhydro`: * corrector step: `lvn_only` is always false, so everything is calculated for corrector steps. * output solve_nonhydro: * first block * corrector step:`rho_ic` (`rho` in interface center, is half levels), `theta_v_ic` * predictor step: `exner_pr`, `rho_ic`, `theta_v_ìc` * second block * only for nesting * third block (predictor only): * does something for the lower more distorted levels * what is nflat_gradp * 4th block (predictor only) * updates 2nd cell halo line. Apparently this field can be calculated locally from other already present halo fields. * 5th block (predictor only): horizontal advection, they call horizontal advection schemes from tracer advection. THe advect virtual temperature `theta_v` and density (`rho`). * 6th block: corrector: * 7th horizontal pressure of exner: different stencil for different k levels, depending on how distorted they are, depends on the `igradp_method: `z_gradh_exner`, `z_hydro_corr` is updated * 8th block: predictor, igradp_method = 3 or 5: updates `z_gradh_exner` with the just computed `z_hydro_corr`, could be integrated into the section before * 9th block: update vn * for predictor: update vn * corrector: update vn, takes average over tendencies of predictor and corrector * corrections to vn then follow: * rayleigh damping: we are using default = 2 and not RAYLEIGH_CLASSIC. Entire block not ported. * HALO exchange of vn * reconstruction over diamond edges of normal winds * mass flux on edges: consumes the z_theta_v flux and mass flux. * `concorr`: contravariant correction * calculation vn on vertical interface levels and horizontal kinetic energy (same thing is calculated in velocity advection, see below duplicated stencils in velocity advection and solve non hydro) * done with edge computation (marked by timer) * start of vertically implicit sovler part * z_alpha, z_beta, z_gamma: form 3 diagonal matrix of the 3-diag solver. * update density, exner pressure, theta first inside domain, * then on lateral boundary. * `lprep_adv`: computations for in preparation for tracer advection. * moist calculation only for first substep * exclaim uses `divdamp_order` = 24, `igradp_method` = 3 * If you support`divdamp_order = 24`, you also support all stencils you need for `divdamp_order = 4` and `divdamp_order = 2` * `ddt_vn_apc_pc(:,:,:,:)` normal wind tendency from advection plus coriolis, for predictor (p) and corrector (c) steps ! (nproma,nlev,nblks_e,1:3) * `exner`: exner ~~temperature~~ pressure. Alternative prognostic variable, non dimensionalized pressure. * `theta_v`: potential tempeature [K], * `gradp_method`: method to compute horizontal pressure gradient. * `z_rth_pr`: contains the perturbation values of rho and theta_v * dycore and tracer advection work on cells, edges, vertices. The physics modules only work on cells. `nproma`the defined by the number of cells, but used for all dimensions. * The predictor-corrector scheme used takes a number of substeps per timestep (example: dt=10s, 5 substeps: substeps will be 2 seconds). In `perform_dyn_substepping` this substep duration is computed: ``` dt_dyn = dt_phy/ndyn_substeps_var(jg) ``` * There are some booleans set in `perform_dyn_substepping` which become relevant in `nh_solve`: ``` ! nullify prep_adv fields at first substep lclean_mflx = (nstep==1) l_recompute = lclean_mflx ! logical checking for the last substep llast = (nstep==ndyn_substeps_var(jg)) ! save massflux at first substep lsave_mflx = (p_patch%n_childdom > 0 .AND. nstep == 1 ) ``` * The predictor-corrector is programmed as a loop `DO istep=1,2`. This is almost certainly done in order to minimize the shared code. But essentially the first half of this DO loop consists of case distinctions between `istep=1` and `istep=2`. (It was also brutally difficult to successfully port this DO loop to OpenACC since intermediate values in the predictor were then influencing the corrector step.) It would be easier, clearer, and probably not much longer for the ICON4Py dycore to simple do one step after the other. * It would be extremely useful to keep a glossary of the meaning of the configuration variables which are used in the case distinctions: * `istep` : the predictor step (1) or the corrector step (2) * `itime_scheme`: time scheme, defined in non-hydro configuration (for us: 4) * `iadv_rhotheta` : advection scheme for Rho (density) and Theta (virtual temperature) * `igradp_method` : Method for computing the horizontal pressure gradient (for us: 3) * `divdamp_order` : divergence damping order (for us: 24) * `l_limited_area`: limited area mode (i.e., not global) * `l_vert_nested` : vertical nesting (for us: .FALSE.) * `lhdiff_rcf` : Compute horizontal diffusion also at the large time step * `idiv_method` : divergence method (for us: 1) * `l_open_ubc` : open boundary condition at top (for us: .FALSE.) * `rayleigh_damp` : type of Rayleigh damping (for us: 2, `RAYLEIGH_KLEMP`) * `divdamp_type` : divergence damping type (for us: 3) * `kstart_moist` : Where moisture starts in vertical domain (for us: 1) * `lprep_adv` : Prepare for the advection (for us: .TRUE.). Always needed when tracers are advected? * solve_nonhydro_stencil_{36,37,38} are the same as velocity_advection_stencil_{02,05,06} ## Questions * "Time increment for backward-shifting of lateral boundary mass flux" => What is backward shifting? * Related to `dt_shift` defined at the start of the dycore? * Recalculates the mass fluxes for each of the dynamics substeps * ensures mass conservation * relevant for LAM, but mass conservation is here not ensured * used for prep_adv from the dycore then used for the advection * Can `div_avg` be deleted completely? * DWD will consider this * This would allow to remove `prepare_tracer` method from the `perform_dyn_substepping` subroutine * But would still be required for standalone advection runs (without dycore)... * The `nnow` and `nnew` concept implies there could be multiple time stepping schemes, but isn't the reality that there is only one: `nnow=1, nnew=2` which are then swapped repeatedly? * No: `nnow` and `nnew` are toggled to avoid copying * Should be solved by a pointer swap * `nnow` always stands for current timestep, and `nnew` is used either for predicted or corrected values of new timestep * Is the CFL x 0.85 threshold value commonly surpassed in production? Under what conditions does it happen? * Enhanced monitoring starts at 0.85 (could be made an implementation 'constant') * Breaking gravity waves over steep orography * this really happens in production: generally over South America * number of substeps is increased (control time step remains fixed) * Occurs between 5-10km resolution, never happens at 1km * `lextra_diffu` is activated at CFL 0.85, if it continues to increase, time step reduction is applied; * 1.05 is the critical CFL so that the number of substeps is increased (this should be a namelist variable, since it might decrease for higher resolution) * Timestep returns to original value if CFL is low enough (something like ~0.85) * How confident are you in this as a solution to high velocities? * The important thing is the stability, which is why artificial diffusion is added for CFL > 0.85. Of course this makes the results less accurate. * The advection of the fluid itself, more precisely `z_theta_v_e` and `z_rho_e` is done with the same algorithms as are used in the tracer advection (flavors of Miura). For the 3rd order the `upwind_hflux_miura3` is called directly (including flux limiting). However, for the 2nd order advection (which is used for the majority of model runs) the functions called in the body of `upwind_hflux_miura` are called directly within `nh_solve` (code duplication!), and there is no flux limiting. * DWD will delete the call to 'upwind_hflux_miura3' in the dycore -- never used * Is this done for performance reasons? ==> Yes! * Is flux limiting unneccessary for timesteps as small as 2s? * Not needed for the dynamical core, because the gradients are smaller * Positive definite flux limiting does not make sense anyway, because temp and density are positive anyway * Why are there two versions of the `grad_green_gauss` method, one for the advection and one for the dycore? * '_dycore' version can use 'vp', computes two outputs rather than one (for optimization) * How should we compartmentalize the dynamical core? As it is done in the time output? `veltend`, `cell_comp`, `edge_comp`, `vnupd`, `vimpl`, `exchg` * These are not methods, but rather a grouping of stencil patterns. Current structure is dictated by the optimization for cache and the minimization of halo exchanges. * Cell computations should not be mixed with edge computations, again for performance. * How much work would it be to have 3 nproma for cells, verts and edges? * DWD: good question! * Most of the work would be in the GridManager in the set-up phase. * Also nproma per domain * `enh_` always means "enhanced"? For example, `enh_smag_fac` * Yes: the diffusion coefficient is enhanced (increased below the model top) * 'enh_diffu_3d' is a cold pool thing. These are crucial particular in Switzerland * `i_gradp_method`? Method of dealing with steep topography? * Method of calculating the horz. pressure gradient, crucial for steep orography. * Also used by MCH. * `vn_ie`: what is the meaning of `_ie`? Defined on interface edge? * `mass_flx_me` what does the `me` stand for? *middle(vertical)_edge(horizontal*? * m stands for mass point * `mass_flx_ic` what does the `ic` stand for? *interface(vertical)_cell(horizontal)*? * i stands for interface * `exner_pr` is Exner Pressure; what is prognostic variable `exner`, and what are its units? * _pr : stands for Perturbation from reference * Should it be `w_concorr_ic` instead of `w_concorr_c? * Yes, in principle, but it is obvious that it is on interfaces * contravariant correction on `w`, what is the meaning of it? * horizontal coord system always parallel to the topography, so vertical wind speed zero in that system * `w_concorr_ic` is difference between `w` and the contravariant windspeed, which is zero at the lower boundary * most important when calculating fluxes near the surface * `nflat_gradp`: is this where the layers really become "bumpy"? * Less than level 7 is simply flat * 'nflat_gradp' (calculated in setup phases) is the where it is so bumpy that you need more than 1 level for the gradient calculation, and then a multi-level calculation with interpolation is performed. * kstart_dd3d: for divergence damping method 24, use 2D below and 3D above this threshold * Could we now replace all lists in ICON with masks? (Would that limit efficiency on any platform?) * One example of useful lists: 'igradp_method' calculation of 'z_grad_exner' (but this is a static list) * For vector machines it can be very slow without masks, therefore lists are sometimes necessary * Apparently the vector machine executes both code paths and decides which result to take using the mask at the very end. This is very very expensive compared to lists if the mask is only true for very fvery fvery f * Divergence damping: what is this condition `divdamp_fac_o2 <= 4 * divdamp_fac`? * to more effectively manage the increments from data assimilation * use more harsh divergence damping for the first 30 minutes (2nd order) and then from 30 to 120 minutes use 4th order divergence damping * If condition true, addition computation done in transition phase between 2nd order and 4th order divergence damping (during transition phase both are computed, and then weighted) * Why the `iau` update just before the vertical tridiagonal solver? * Can `iau` be fed it at a higher level? * Possibly, but it has to be performed before the 'vn' halo exchange, would be difficult to put elsewhere * Numerically it is the correct place to put it, so inherently the dycore requires analysis increments and aiu cannot be separated into a different interface atm_phy_nwp/mo_nh_interface_nwp.f90: ``` pt_diag%ddt_vn_phy(jce,jk,jb) = pt_int_state%c_lin_e(jce,1,jb) & & * ( z_ddt_u_tot(iidx(jce,jb,1),jk,iblk(jce,jb,1)) & & * pt_patch%edges%primal_normal_cell(jce,jb,1)%v1 & & + z_ddt_v_tot(iidx(jce,jb,1),jk,iblk(jce,jb,1)) & & * pt_patch%edges%primal_normal_cell(jce,jb,1)%v2 )& & + pt_int_state%c_lin_e(jce,2,jb) & & * ( z_ddt_u_tot(iidx(jce,jb,2),jk,iblk(jce,jb,2)) & & * pt_patch%edges%primal_normal_cell(jce,jb,2)%v1 & & + z_ddt_v_tot(iidx(jce,jb,2),jk,iblk(jce,jb,2)) & & * pt_patch%edges%primal_normal_cell(jce,jb,2)%v2 ) pt_prog%vn(jce,jk,jb) = pt_prog%vn(jce,jk,jb) + dt_loc * ( & pt_int_state%c_lin_e(jce,1,jb) & & * ( prm_nwp_tend%ddt_u_turb(iidx(jce,jb,1),jk,iblk(jce,jb,1)) & & * pt_patch%edges%primal_normal_cell(jce,jb,1)%v1 & & + prm_nwp_tend%ddt_v_turb(iidx(jce,jb,1),jk,iblk(jce,jb,1)) & & * pt_patch%edges%primal_normal_cell(jce,jb,1)%v2 )& & + pt_int_state%c_lin_e(jce,2,jb) & & * ( prm_nwp_tend%ddt_u_turb(iidx(jce,jb,2),jk,iblk(jce,jb,2)) & & * pt_patch%edges%primal_normal_cell(jce,jb,2)%v1 & & + prm_nwp_tend%ddt_v_turb(iidx(jce,jb,2),jk,iblk(jce,jb,2)) & & * pt_patch%edges%primal_normal_cell(jce,jb,2)%v2 ) ) ``` * `ddt_vn_phy` is physics tendencies only added to dynamics only in one place? * Only for 'exner' and 'vn', similar to 'iau' * Must be before the explicit solver due to the mathematical formulation * Exner is a convolution of density and virtual temperature, allows to have vertical implicit solver for 2 variables instead of 3 * What are the mathematical meanings of `z_w_expl` and `z_contr_w_fl_l`? * `expl` is the explicit part of the vertical velocity inside of the vertical solver * Implicit part is needed in order to circumvent the CFL restriction for vertical sound waves * `z_contr_w_fl_l` contravariant vertical density flux of the vertical velocity * Is there a sponge layer at model top? * `RAYLEIGH_KLEMP` is a linear interpolation with respect to `w(:,1,:)`? * 'w' is not zero with vertical nesting * The magic is in 'z_raylfac' -- hyperpolic tangent values, starts at a factor of 1/1000 away from the boundary and grows very quickly towards the boundray. * vertical wind damping * Why is approx. of w=0 at model top good enough for nwp? * Difficult to do something better... * No way to guarantee mass conservation at the model top * When is `lprep_adv` true? Always? ``` IF ( idiv_method == 1 .AND. (ltransport .OR. p_patch%n_childdom > 0 .AND. grf_intmethod_e >= 5)) THEN lprep_adv = .TRUE. ! do computations for preparing tracer advection in solve_nh ELSE lprep_adv = .FALSE. ENDIF ``` * Is not true for idealised dynamical core test cases * Otherwise true, also always true for real world cases * In nesting also used in dycore test cases due to the lateral boundary condition for mass flux. * `lhdiff_rcf` : compute horiz diffusion also at large time step (???) Divergent definitions * rcf stands for reduced calling frequency, if the flag is true, the diffusion is only called at the fast physics timestep * Reminder: fix comment in `configure_model/mo_nonhydrostatic_config.f90` * alway .TRUE. in real simulations, but we should leave the code ``` configure_model/mo_nonhydrostatic_config.f90: LOGICAL :: lhdiff_rcf ! if true: compute horizontal diffusion also at the large time step namelists/mo_nonhydrostatic_nml.f90: LOGICAL :: lhdiff_rcf ! if true: compute horizontal diffusion only at the large time step ``` * Why is there a k_start_moist loop in the solve_nonhydro and what are these pressure values used for? * Used to prepare temperature tendencies inside dynamical core to pass to the convection schemes * Computed only at those levels where they are needed * Significance of `p_nh%metrics%mask_prog_halo_c`? * for indices of halo points in the lateral boundary zone * some of the lateral boundary points are not sorted into the row, and therefore have to be treated individual * lateral boundary cell lines 1 and 2 tend to be part of the lateral boundary memory region, lateral boundary cell lines 3 and 4 tend to be part of the halo memory region ('refin_ctrl' defines these distances from the lateral boundary) * For a related discussion: https://gitlab.dkrz.de/fprill/code-snippets/-/wikis/ICON-start_block-and-start_blk * in 'mo_vertical_grid' loop 750-765, there is a list in the IF .TRUE. and a mask if .FALSE. suggests that they are inverses. * "halo points lying in the boundary interpolation zone" How is this list of points calculated? Why is `theta_v` misused? ``` DO jk = 1, nlev p_nh%prog(nnew)%theta_v(jc,jk,jb) = p_nh%prog(nnew)%exner(jc,jk,jb) ! Diagnose exner from rho*theta p_nh%prog(nnew)%exner(jc,jk,jb) = EXP(rd_o_cvd*LOG(rd_o_p0ref* & p_nh%prog(nnew)%rho(jc,jk,jb)*p_nh%prog(nnew)%theta_v(jc,jk,jb))) ENDDO ``` * Lateral boundary calculation is done for different prognostic variables, must be done with one intensive and one extensive variable. * This avoids one halo exchange (this is thus a performance optimization) * Why is `exner` then recalculated in next stencil on the lateral boundary, we did that before the halo update? * Line 3862 actually 'exner' is *misused* to carry 'theta_v', in line 2924 "DO NOT TOUCH THIS!" -- again: this is done to avoid a second halo exchange * Are the two masks/lists of the third to last and last stencil partitions of the halo set (i.e., have no overlap)? * We would like to believe this is true * Is the halo size (number of lines) in ICON variable? * For all practical purposes it fixed * In early development bigger halos were tested, with adapting the computations on the halos in order to have the same level of optimization, but the code was slower * For this reason the halo size is fixed ## Code refactoring suggested by DWD * Remove '__SWAPDIM' * Remove 'div_avg' ('idiv_method==2') * Remove the call to 'upwind_hflux_miura3' in dycore * Remove the call to 'btraj_compute_o1' in dycore * Remove OpenACC COPYIN (`bdry_idx/blk`) in line 767 of `mo_vertical_grid`, since list is now in 'p_metrics' * Remove "Compute contribution of thermal expansion to vertical wind at model top" * Remove all OpenACC-related comments (WS)