Test best practices

# Test best practices ###### tags: `discussion` [TOC] ## Goals - Test folder structure [✔] - Find common patterns and decide best practices - Data / fixtures / parametrization issues - Removing manual boilerplate - Declarative / readable style ? - Reduce testing time - Remove tests ? - Make tests faster ? ## Tasks - Change directory layout [✔] - Split and rename test files (TODO) - Implement single-feature integration tests [✔] - Lowering test utilities (string interpolation and parsing) (WIP @havogt) ## Issues to discuss ### ffront - ~~Folder structure for unit tests should replicate the same structure as the original package~~ - ~~Redefinition / global definition of `Dimension`s, `Offset`s, ...~~ -> Define globally for all tests - ~~End-to-end/integration tests structure?~~ ~~- Should we split the huge `test_execution` into smaller tests?`test_math_builtin_execution`, `test_math_unary_builtins`. Duplications?~~ ~~- Remove `test_math_unary_builtins` and integrate into `test_math_builtin_execution`.~~ - ~~Maybe:~~ - ~~integration_tests~~ - ~~single_feature_tests~~ - ~~multi_feature_tests~~ - ~~Reducing boilerplate in tests (mostly `test_execution.py`, ...)~~ - ~~Concise utilities to allocate fields with sample data (do not use random data, but a e.g. range) -> use `test_copy` as a vehicle to find a good structure (continue with `test_offset_field`)~~ - ~~Same for tests which require a mesh with connectivities~~ - Naming scheme for fields / data used in tests (mostly `test_execution.py`, ...) - ~~Input test data:~~ - ~~Meshes & connectivities~~ - ~~GT4Py sources: expressions, operators, programs (depend on meshes)~~ - ~~Fields (might depend on meshes)~~ - Should we add a regression test category where we are forced to add a (minimal) test for each bug fix? - ~~Lowering tests (e.g. `foast_lowering.py`):~~ + ~~Long term solution: execute and compare output results against embedded field view~~ -> Add TODO to lowering tests file + ~~Short term solution: maybe substitute current ITIR maker utilities with string templates (which can be easily inspected) and then parse them to generate the expected ITIR~~ + Do we want autogenerated human-readable expected ITIR code? (It potentially could be done with `cog`) -> Nice to have but not urgent. Maybe in the future - ~~`past_lowering`/`past_parsing`: parsing is implicitly tested in lowering.~~ - ~~failures should be addressed in parsing first, but it could not easy to spot if lowring tests are executed first~~ - ~~`test_program` tests execution of programs. Should it be in the same category as other end-to-end test, but in a subcategory that emphasizes aspects of the program part (e.g. handling of domain, etc).~~ - ~~Move some tests from `test_execution` into this category: `domain`+`tuple` out argument focused.~~ - ~~`test_type_deduction`: currently we build first a huge list of parameters, then tests are small. Readability is not ideal.~~ -> No action for now. In the future we might look into writing the list of cases with a more readable pattern. ### Iterator - ~~big problem: no folder structure~~ - ~~integration test setup should be possible to share with ffront (e.g. `hdiff_reference`, `fvm_nabla_setup`)~~ -> Both should be moved to `multi_feature_tests` ~~- transformation tests: we need (to continue and conclude) the discussion on how to work with iterator IR~~ ### Transformation tests - We should define best practices to write Iterator IR transformations and tests. + Do we want to use globally defined `itir.Deref` SymRefs in tests or in tranformations? Since they should not be used for building nodes (to avoid clashes ) How to differentiate between - Look here: https://hackmd.io/sqPtFEroSNyKBWFPv04hOg ## Concrete patterns ### single-op: Testing a single and simple field operator - Parameterized by backend - Needs input data, expected data - Needs input and output storage allocation - Assert that output data is equal to expected data ### multiple-op: Similar to `single-op` but with multiple field operators ... ### field-to-itir-lowering: Test lowering code from FOAST to ITIR - Compare expected and achieved ITIR somehow ## Discussion & solutions ### Naming and structuring conventions [✔] #### Should we use Python packages or bare folders for tests? + **✅** Using Python package seems useful to share resources and we are now using it anyway in multiple places + **❌** Is it still needed to share data/resources if we have a clear test folder structure? + **Decision**: we use Python packages for now. + **Open issues**: _relative_ vs _absolute_ imports (needs a bit of exploration to decide) `cd tests/; python -m storage_tests.unit_tests.shared_resources` `cd tests/storage_tests; python this_file.py` #### Folder structure - **Decision**: tests/ cartesian_tests/ eve_tests/ next_tests/ integration_tests/ single_feature_tests/ ffront_tests/ (tests features starting from this module) test_feature_file.py iterator_tests/ test_feature_file.py multi_feature_tests/ test_some_implementation.py my_complex_example/ reference_file.py test_ffront_implementation_file.py test_iterator_implementation_file.py unit_tests/ (fully reflect the subpackage structure) ffront_tests/ test_module.py iterator_tests/ test_module_name.py test_module_name_feature.py (Allowed temporary. We should review later) regression_tests/ storage_tests/ - **Alternatives**: next_tests/ integration_tests/ single_feature_tests/ multi_feature_tests/ ffront_tests/ iterator_tests/ integration_tests/ single_feature_tests/ test_feature_file.py (test_ffront, test_iterator) ()? multi_feature_tests/ transformation_tests/ #### pytest markers **(TODO)** - We want to use this feature. Add list of markers to a file and start discussions on how to organize them with multiple categories and axes. #### Implementation strategy - ~~First PR: just move and rename files and folders, without splitting files or changing content.~~ - Second PR: split files and (maybe) rename tests: - `test_execution` should be splitted and most of it goes to some files in `feature_tests` - `test_math_builtin_execution` should be integrated with `test_math_unary_builtins` - `test_program` should be most likely splitted as different files in `feature_tests` (one of them could be `test_program`) ### Pattern for writing _single-feature integration_ tests **(TODO)** - Examples: `ffront` and `iterator` - Problems: - How to get the data buffers - How to select backend - How to parametrize tests - How much boilerplate is needed for each test - ... - Proposal: + Test configuration always set `default_backend = NotExecutable` to forbid execution of operators or programs without a explicit backend + Definitions: convenient definitions of Dimensions, Offsets and Fields ```python I = Dim(...) J = Dim(...) IJKField = Field[[I, J, K], float] ``` + Fixtures: have a small number of test case fixtures that are parametrized on backend and other useful things (e.g. meshes for unstructured) ```python cartesian_case = ... unstructured_case ... ``` + Functions: small set of utility functions to automatize the most common tasks ```python # Field allocation reading the type annotations from the definition inp = allocate(copy, "inp").zeros(extend={I:(-1,1), J:(-1,1)}) # Test case validation. It should be feasible to pass either # Field operators and Programs def verify(prog, *inps, out, ref): prog.with_backend(self.backend)(*inps, out=out, offset_provider=self.offset_provider) assert... ``` + Example: ```python= def test_copy(cartesian_case): @field_operator def copy(inp: IJKField) -> IJKField: return inp inp = allocate(copy, "inp").zeros(extend={I:(-1,1), J:(-1,1)}) # Verifying the field operator directly (by using the trivial # program generation from field operators) verify(cartesian_case, copy, inp, out=out, ref=inp) # Or using a explicit program defined in the test case @program def prog(inp, out): copy(inp, out=out) cartesian_case.verify(prog, inp, out=out, ref=inp) ``` - Playground: ```python= def test_copy(backend): inp = field.default(backend) # default = range out = field.zeros(backend) @field_operator(backend=backend) def copy(inp): return inp copy(inp, out) assert np.allclose(inp, out) def test_copy(backend): inp = field.default(backend) # default = range out = field.zeros(backend) @execute_field_operator(inp, out=out) def copy(inp): return inp assert np.allclose(inp, out) def test_copy(backend): inp = field.default() # default = range out = field.zeros() @field_operator def copy(inp): return inp copy(inp, out=out) assert np.allclose(inp, out) def test_copy(backend, mesh): context = make_context(backend, mesh) inp = field.default() # default = range out = field.zeros() def copy(inp: backend.Default): return inp context: backend_fixture= "gt.." mesh_fixture = "mesh1" dims: offsets: backend: ---- offset_providers: check_field_operator( copy, inp, out=out, ref=inp ) execute_field_operator(copy, inp, out=out) assert np.allclose(inp, out) assert gt4py.allclose(inp, out) def test_mesh(backend, mesh): inp = mesh.edges.default() # default = range out = mesh.vertices.zeros() def copy(inp): return inp execute_field_operator(copy, inp, out=out) check_field_operator( copy, inp, out=out, ref=inp ) assert np.allclose(inp, out) assert gt4py.allclose(inp, out) @cartesian_test def test_copy(): inp = field.default() # default = range out = field.zeros() @field_operator def copy(inp): return inp copy(inp, out=out) assert np.allclose(inp, out) def test_copy(cartesian): inp = cartesian.default() # default = range out = cartesian.zeros() @cartesian.field_operator def copy(inp): return inp copy(inp, out=out) assert np.allclose(inp, out) def test_mesh(mesh): inp = mesh.edges() # default = range out = mesh.edges.zeros() @mesh.field_operator def copy(inp): return inp copy(inp, out=out) assert np.allclose(inp, out) ``` ### Lowering tests utilities **(WIP)** - Short term solution: substitute current ITIR maker utilities with string templates (which can be easily inspected) and then parse them to generate the expected ITIR. ### Parsing and lowering field-view tests: - `past_lowering`, `past_parsing`: defining test cases with all expected outputs (expected exception errors, expected outputs, ...) and then add automated unit tests for parsing and lowering ### Tests based on long lists of cases - For the future we might try to look for the best pattern or plugin to write lists in a readable way (e.g. pytest-cases?) - `test_type_deduction`?

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.