# [Greenline] Cleanup / closing loose ends
<!-- Add the tag for the current cycle number in the top bar -->
- Shaped by: Magdalena,
- Appetite (FTEs, weeks):
- Developers: <!-- Filled in at the betting table unless someone is specifically required here -->
## Problem
<!-- The raw idea, a use case, or something we’ve seen that motivates us to work on this -->
### backend settings - Nikki - done
Currently, icon4py uses a global setting for the selected backend, which makes it hard to change and does not allow for simple backend switching for CI runs which is important for performance measurements.
### initialization of python granules - Nikki - done
The `SolveNonHydro` and `Diffusion` class separate the construction from the initialization. This was originally done because the prototype of CFFI Fortran wrapper needed to have a granule instance around before calling init function, but is not needed any longer and it has the drawback that the user might create an instance of a granule that is not ready to run because it is not initialized.
### fix various issues and quick fixes from GPU runs
GPU runs of the JW test case and the granule integration lead to various type casts (int to int32) or device / host copies cluttered through out the code. This should be consolidated.
### ~~KHalfDim in diffusion~~
postponed: (observation) the group of people who want to push this increases
This cleanup is a step towards a better user interface in icon4py. Currently, due to gt4py restriction icon4py uses the same dimension `K` for fields on full and half levels. There exist a half level dimention `KHalfDim` which is in someplaces used to allocate fields with the correct vertical size but after allocation the dimension is dropped. Some exploratory work was done to fully use the half level dimension in `Diffusion`.
The work could not be finished due to restrictions in gt4py. The idea is to allow a simple implementation from the gt4py side that does not have performance penalties but also does not need a full "staggered" dimension implementation and still allowing consistent use of vertical dimension in icon4py.
TODO (magdalena): check with Hannes/Enrique, to decide whether we should work on this.
## Appetite
<!-- Explain how much time we want to spend and how that constrains the solution -->
half a cycle, dependening on what we do.
## Solution
<!-- The core elements we came up with, presented in a form that’s easy for people to immediately understand -->
### backend settings
These steps should be done for all model components: (SolveNonHydro, Diffusion, Saturation Adjustment)
- Pass the backend as a parameter to the constructor of the model components (diffusion, dycore).
- cache the stencils that are run in the component with the `.with_backend(backend)`
- use the backend for allocation of local fields.
- use the "cached" `.with_backend` stencils everywhere at `run` time
- tests: use the `backend` fixture in the datatests tests
Additional steps: `xp` import from `settings.py`: replace by a simple conditional import for cupy if it is installed. If cuda is available cuda will be used and when transforming to gt4py fields the `gtx.as_field` will take care of possible device/host copying.
IconGrid: with the conditional import the `on_gpu` flag for the grid can be removed and the conditional`xp` import used. As an exception the `start_index` and `end_index` arrays need to be on CPU.
- move the `settings.py` to `tools/py2fgen` and remove the `backend` argument from the stencils (needs to be discussed with Sam!)
### fix various issues and quick fixes from GPU runs
see PRs [555](https://github.com/C2SM/icon4py/pull/555), [553](https://github.com/C2SM/icon4py/pull/553)
#### strategy
- Agree on a common patterns:
- device/host copies for Fields: through gt4py allocation or conversion.
- for arrays: xp.asarray only where we really need arrays and they should on device if running on GPU.
There are some special cases
- `start_indices`, `end_indices` *must* be on host, explicitly use numpy.
Keep in mind the 0d array/scalar issue in numpy vs cupy: numpy treats 0d fields as scalars, whereas copy treats them as 0d array as it does not know scalars.
Cupy
```
x = cp.asarray(2) # 0d - cp.ndarray
val = x.item() # scalar but this creates a standard python int = int64!
```
Numpy
```
ar = xp.arange(10, dtype=xp.int32)
v = ar[2].item() # this has type int
v = ar[2] # is a 0d array (which in numpy behaves as scalar)
v = ar[2][()] # np.int32 in numpy
#cupy has not scalars only arrays
v32 = xp.int32(ar[2].item()) -> np.int32 # copies the value to python.int and casts to int32
v = ar[2].asnumpy() -> np.ndarray # device to host copy
v2 = ar.asnumpy()[2][()] -> np.int32 # copies to numpy array
```
### initialization of python granules
Initialize the component/granule in the constructor and remove the `init` function.
## Rabbit holes
<!-- Details about the solution worth calling out to avoid problems -->
## No-gos
<!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable -->
## Progress
<!-- Don't fill during shaping. This area is for collecting TODOs during building. As first task during building add a preliminary list of coarse-grained tasks for the project and refine them with finer-grained items when it makes sense as you work on them. -->
- [x] Task 1 ([PR#xxxx](https://github.com/GridTools/gt4py/pulls))
- [x] Subtask A
- [x] Subtask X
- [ ] Task 2
- [x] Subtask H
- [ ] Subtask J
- [ ] Discovered Task 3
- [ ] Subtask L
- [ ] Subtask S
- [ ] Task 4