# Test best practices
###### tags: `discussion`
[TOC]
## Goals
- Test folder structure [✔]
- Find common patterns and decide best practices
- Data / fixtures / parametrization issues
- Removing manual boilerplate
- Declarative / readable style ?
- Reduce testing time
- Remove tests ?
- Make tests faster ?
## Tasks
- Change directory layout [✔]
- Split and rename test files (TODO)
- Implement single-feature integration tests [✔]
- Lowering test utilities (string interpolation and parsing) (WIP @havogt)
## Issues to discuss
### ffront
- ~~Folder structure for unit tests should replicate the same structure as the original package~~
- ~~Redefinition / global definition of `Dimension`s, `Offset`s, ...~~ -> Define globally for all tests
- ~~End-to-end/integration tests structure?~~
~~- Should we split the huge `test_execution` into smaller tests?`test_math_builtin_execution`, `test_math_unary_builtins`. Duplications?~~
~~- Remove `test_math_unary_builtins` and integrate into `test_math_builtin_execution`.~~
- ~~Maybe:~~
- ~~integration_tests~~
- ~~single_feature_tests~~
- ~~multi_feature_tests~~
- ~~Reducing boilerplate in tests (mostly `test_execution.py`, ...)~~
- ~~Concise utilities to allocate fields with sample data (do not use random data, but a e.g. range) -> use `test_copy` as a vehicle to find a good structure (continue with `test_offset_field`)~~
- ~~Same for tests which require a mesh with connectivities~~
- Naming scheme for fields / data used in tests (mostly `test_execution.py`, ...)
- ~~Input test data:~~
- ~~Meshes & connectivities~~
- ~~GT4Py sources: expressions, operators, programs (depend on meshes)~~
- ~~Fields (might depend on meshes)~~
- Should we add a regression test category where we are forced to add a (minimal) test for each bug fix?
- ~~Lowering tests (e.g. `foast_lowering.py`):~~
+ ~~Long term solution: execute and compare output results against embedded field view~~ -> Add TODO to lowering tests file
+ ~~Short term solution: maybe substitute current ITIR maker utilities with string templates (which can be easily inspected) and then parse them to generate the expected ITIR~~
+ Do we want autogenerated human-readable expected ITIR code? (It potentially could be done with `cog`) -> Nice to have but not urgent. Maybe in the future
- ~~`past_lowering`/`past_parsing`: parsing is implicitly tested in lowering.~~
- ~~failures should be addressed in parsing first, but it could not easy to spot if lowring tests are executed first~~
- ~~`test_program` tests execution of programs. Should it be in the same category as other end-to-end test, but in a subcategory that emphasizes aspects of the program part (e.g. handling of domain, etc).~~
- ~~Move some tests from `test_execution` into this category: `domain`+`tuple` out argument focused.~~
- ~~`test_type_deduction`: currently we build first a huge list of parameters, then tests are small. Readability is not ideal.~~ -> No action for now. In the future we might look into writing the list of cases with a more readable pattern.
### Iterator
- ~~big problem: no folder structure~~
- ~~integration test setup should be possible to share with ffront (e.g. `hdiff_reference`, `fvm_nabla_setup`)~~ -> Both should be moved to `multi_feature_tests`
~~- transformation tests: we need (to continue and conclude) the discussion on how to work with iterator IR~~
### Transformation tests
- We should define best practices to write Iterator IR transformations and tests.
+ Do we want to use globally defined `itir.Deref` SymRefs in tests or in tranformations? Since they should not be used for building nodes (to avoid clashes ) How to differentiate between
- Look here: https://hackmd.io/sqPtFEroSNyKBWFPv04hOg
## Concrete patterns
### single-op: Testing a single and simple field operator
- Parameterized by backend
- Needs input data, expected data
- Needs input and output storage allocation
- Assert that output data is equal to expected data
### multiple-op: Similar to `single-op` but with multiple field operators
...
### field-to-itir-lowering: Test lowering code from FOAST to ITIR
- Compare expected and achieved ITIR somehow
## Discussion & solutions
### Naming and structuring conventions [✔]
#### Should we use Python packages or bare folders for tests?
+ **✅** Using Python package seems useful to share resources and we are now using it anyway in multiple places
+ **❌** Is it still needed to share data/resources if we have a clear test folder structure?
+ **Decision**: we use Python packages for now.
+ **Open issues**: _relative_ vs _absolute_ imports (needs a bit of exploration to decide)
`cd tests/; python -m storage_tests.unit_tests.shared_resources`
`cd tests/storage_tests; python this_file.py`
#### Folder structure
- **Decision**:
tests/
cartesian_tests/
eve_tests/
next_tests/
integration_tests/
single_feature_tests/
ffront_tests/ (tests features starting from this module)
test_feature_file.py
iterator_tests/
test_feature_file.py
multi_feature_tests/
test_some_implementation.py
my_complex_example/
reference_file.py
test_ffront_implementation_file.py
test_iterator_implementation_file.py
unit_tests/ (fully reflect the subpackage structure)
ffront_tests/
test_module.py
iterator_tests/
test_module_name.py
test_module_name_feature.py (Allowed temporary. We should review later)
regression_tests/
storage_tests/
- **Alternatives**:
next_tests/
integration_tests/
single_feature_tests/
multi_feature_tests/
ffront_tests/
iterator_tests/
integration_tests/
single_feature_tests/
test_feature_file.py (test_ffront, test_iterator) ()?
multi_feature_tests/
transformation_tests/
#### pytest markers **(TODO)**
- We want to use this feature. Add list of markers to a file and start discussions on how to organize them with multiple categories and axes.
#### Implementation strategy
- ~~First PR: just move and rename files and folders, without splitting files or changing content.~~
- Second PR: split files and (maybe) rename tests:
- `test_execution` should be splitted and most of it goes to some files in `feature_tests`
- `test_math_builtin_execution` should be integrated with `test_math_unary_builtins`
- `test_program` should be most likely splitted as different files in `feature_tests` (one of them could be `test_program`)
### Pattern for writing _single-feature integration_ tests **(TODO)**
- Examples: `ffront` and `iterator`
- Problems:
- How to get the data buffers
- How to select backend
- How to parametrize tests
- How much boilerplate is needed for each test
- ...
- Proposal:
+ Test configuration always set `default_backend = NotExecutable` to forbid execution of operators or programs without a explicit backend
+ Definitions: convenient definitions of Dimensions, Offsets and Fields
```python
I = Dim(...)
J = Dim(...)
IJKField = Field[[I, J, K], float]
```
+ Fixtures: have a small number of test case fixtures that are parametrized on backend and other useful things (e.g. meshes for unstructured)
```python
cartesian_case = ...
unstructured_case ...
```
+ Functions: small set of utility functions to automatize the most common tasks
```python
# Field allocation reading the type annotations from the definition
inp = allocate(copy, "inp").zeros(extend={I:(-1,1), J:(-1,1)})
# Test case validation. It should be feasible to pass either
# Field operators and Programs
def verify(prog, *inps, out, ref):
prog.with_backend(self.backend)(*inps, out=out, offset_provider=self.offset_provider)
assert...
```
+ Example:
```python=
def test_copy(cartesian_case):
@field_operator
def copy(inp: IJKField) -> IJKField:
return inp
inp = allocate(copy, "inp").zeros(extend={I:(-1,1), J:(-1,1)})
# Verifying the field operator directly (by using the trivial
# program generation from field operators)
verify(cartesian_case, copy, inp, out=out, ref=inp)
# Or using a explicit program defined in the test case
@program
def prog(inp, out):
copy(inp, out=out)
cartesian_case.verify(prog, inp, out=out, ref=inp)
```
- Playground:
```python=
def test_copy(backend):
inp = field.default(backend) # default = range
out = field.zeros(backend)
@field_operator(backend=backend)
def copy(inp):
return inp
copy(inp, out)
assert np.allclose(inp, out)
def test_copy(backend):
inp = field.default(backend) # default = range
out = field.zeros(backend)
@execute_field_operator(inp, out=out)
def copy(inp):
return inp
assert np.allclose(inp, out)
def test_copy(backend):
inp = field.default() # default = range
out = field.zeros()
@field_operator
def copy(inp):
return inp
copy(inp, out=out)
assert np.allclose(inp, out)
def test_copy(backend, mesh):
context = make_context(backend, mesh)
inp = field.default() # default = range
out = field.zeros()
def copy(inp: backend.Default):
return inp
context:
backend_fixture= "gt.."
mesh_fixture = "mesh1"
dims:
offsets:
backend: ----
offset_providers:
check_field_operator(
copy,
inp,
out=out,
ref=inp
)
execute_field_operator(copy, inp, out=out)
assert np.allclose(inp, out)
assert gt4py.allclose(inp, out)
def test_mesh(backend, mesh):
inp = mesh.edges.default() # default = range
out = mesh.vertices.zeros()
def copy(inp):
return inp
execute_field_operator(copy, inp, out=out)
check_field_operator(
copy,
inp,
out=out,
ref=inp
)
assert np.allclose(inp, out)
assert gt4py.allclose(inp, out)
@cartesian_test
def test_copy():
inp = field.default() # default = range
out = field.zeros()
@field_operator
def copy(inp):
return inp
copy(inp, out=out)
assert np.allclose(inp, out)
def test_copy(cartesian):
inp = cartesian.default() # default = range
out = cartesian.zeros()
@cartesian.field_operator
def copy(inp):
return inp
copy(inp, out=out)
assert np.allclose(inp, out)
def test_mesh(mesh):
inp = mesh.edges() # default = range
out = mesh.edges.zeros()
@mesh.field_operator
def copy(inp):
return inp
copy(inp, out=out)
assert np.allclose(inp, out)
```
### Lowering tests utilities **(WIP)**
- Short term solution: substitute current ITIR maker utilities with string templates (which can be easily inspected) and then parse them to generate the expected ITIR.
### Parsing and lowering field-view tests:
- `past_lowering`, `past_parsing`: defining test cases with all expected outputs (expected exception errors, expected outputs, ...) and then add automated unit tests for parsing and lowering
### Tests based on long lists of cases
- For the future we might try to look for the best pattern or plugin to write lists in a readable way (e.g. pytest-cases?)
- `test_type_deduction`?