# [Blueline] Continue tasks in model verification
- Shaped by: Abishek
- Appetite: 1.5 - 2 weeks
- Developers:
## Problem
- This is an overflow task from cycle 16, with new additions
- With the exception of submitting abstract to NCAR workshop, other tasks still remain.
- **New:** Need to re-run Zeman test on latest `icon-dsl`, especially after Christoph's bug fixes for cfl_clipping.
- **New:** C2SM (Annika Lauber) and MCH (Daniel Hupp) are starting to work on improving probtest. Some of these tasks align with the questions asked here, and makes sense to work together.
(From last cycle)
We have performed probtest and Zeman verification tests for the aquaplanet case giving us reasonable confidence that the `icon-dsl` code is "correct" when compared to the cpu and gpu (openacc) versions. Out of these two tests, the Zeman test appears the most robust and also involves a longer simiulation time (10-12 days). This also involves a non-trivial computational cost that may not be so easy to automate in the future.
The probtest, on the other hand, is very cheap as it's only run for 10-15 timesteps. It relies on the predictability of roundoff level error growth in the initial timesteps. This however presents a problem for the aquaplanet case.

In the figure above, we see that the perturbed CPU runs (black curves) jump to really large differences (1e-14 -> 1e-7), relative to the unperturbed CPU run, within a few timesteps. The red/blue curves are from unperturbed gt4py/openacc runs and represent a more "reasonable" error growth. In the case with fairly simple dycore stencils, the error growth is usually more gradual. However, the addition of physics parametrizations, among other things, likely contributes to the jumps we're seeing.
The main issue is that the gap between the black and red curve represents an area where errors will not be picke
## Tasks
- [ ] **Re-apply the Zeman test to the latest, icon-dsl** configuration and compare with icon-dsl v0.1.1 (the last "verified" version)
- Learn to apply Christian's scripts to verification, and think about how easy it is to automate.
- If Praveen's schedule permits, work with him to do even longer runs (4-6 months) and validate results with icon and icon-dsl.
- [ ] **Research question 1**: Which components of the current aquaplanet setup might be causing a bulk of the sensitivity to perturbations O(-14). Is it some particular physics parametrization, or interpolation coefficients?
- [ ] **Establish base aquaplanet configuration, with minimum/no physics such that roundoff level perturbations do not grow beyond few orders of magnitude in a few time steps**
- Annika and others will also look at this issue - coordinate with them.
- [ ] **Research question 2**: For namelist errors intentionally introduced, of O(-8), does the Zeman test pick it up?
- [ ] Coordination for a possible paper + NCAR Correctness and Reproducibility [workshop](https://ncar.github.io/correctness-workshop/) talk (Nov 9-10)
- [ ] **Finish documenting** (hackmd/confluence) all of the verification procedures in detail (till whatever we have this cycle)
- [ ] **Meet with Anurag, Praveen, Christian**, and others to discuss scientifically novel/interesting parts of our verification workflows.