# [DaCe] Minimalistic DaCe-Folder
<!-- Add the tag for the current cycle number on top -->
- Shaped by: Philip
- Appetite (FTEs, weeks): ~1 week
- Developers: <!-- Filled in at the betting table unless someone is specifically required here -->
## Problem
For each SDFG DaCe generates a folder containing the generate source code, a dumped version of the SDFG, the build script and other things, which are not needed to run the compiled code and hence occupy space.
In this project we want to reduce the amount of space that has to be stored.
### Current State
GT4Py configures DaCe to create its SDFG-Folders inside GT4Py's cache folder.
In case the program is not yet in GT4Py's cache, it is lowered and optimized, then the SDFG is compiled by calling `sdfg.compile()`, which will create the folder.
In case the program is inside the cash the SDFG is loaded and then `sdfg.compiled()` is called.
However, now DaCe will find the folder and load the already compiled code[^PureDaCeBehaviour].
### The Minimal Folder
The question is constitutes the minimal folder is defined by what is needed to construct a `CompiledSDFG` object, if the `SDFG` object (which is supplied by GT4Py) already given?
Technically only the compiled `.so` file and the associated "stub"-library[^StubLibrary].
#### Removing the "Stub"-Library
The stub-library is the same for all SDFGs so it could also be removed, provided that the library is available from somewhere else.
However, this involves some non trivial changes that should be considered as a stretch goal.
## Appetite
It should be doable in a week.
Depending on how things go, it should be even possible to complete the stretch goal within this time frame.
## Solution
> Before this project is started the proposed solution should be discussed with the DaCe developers.
There are two aspects here:
1) The generation of the SDFG folder
2) Loading of the SDFG from the folder
It is important that the selected design must allows for:
- To add other versions, i.e. folder formats, as a later stage.
- To preserve the old behaviour.
Another question that must be answered is how to "tag the folder", i.e. how does DaCe know which version the folder has.
An idea would be to inference it from the content of the folder.
While simple it is also error prone and thus it is proposed to add a new file, for example `VERSION`, that contains the information which version the folder has and if not present the original version is assumed.
### Generation of the SDFG Folder
To select which version of the folder should be generated the best would be to add a new DaCe configuration variable, a bad name proposal would be `folder_version`.
In its default value the full folder will be generated.
Another point of consideration is if the versions should use numbers, i.e. default `0` and reduced version `1` or if one should use names.
A question that is not yet answered is how should it be implemented?
Should there a dedicated function/build infrastructure for every version or should we always start with the full version and then transform the folder accordingly?
For ease of implementation a middle ground is proposed here.
Which essentially means that the program folder is generated as usual, but everything that is not sued, such as `perf/` or `smaple/` is not generated.
Then after the library has been built the unused parts, such as the source code and build artifacts are removed.
Note this document also proposes to move the compiled library outside of the `bui/` subfolder.
This ensures that we get a flat hierarchy, where everything (2 to 3 files) is on the same level.
### Generation of the `CompiledSDFG`
This is the second step, creating a `CompiledSDFG` object from an SDFG and the folder.
As (partially) outlined above this boils down to:
- Determining in which format/version the folder is.
- Locate the compiled SDFG and the stub file.
- Construct the `CompiledSDFG`.
If we use us a `VERSION` file then the first step is simple.
In case we want to even remove that file (something this proposal is recommending) then one could also rely on a configuration option.
One could either introduce a new option or assume that if the `VERSION` file is not present the folder has the format indicated by `folder_version`.
Once the format is determined the remaining steps are kind of simple.
It is very likely that the functions `get_program_handle()` & `load_from_file()`, which are currently used to turn a folder into a `CompiledSDFG` are no longer adequate, since they essentially expect a path to the `build/` subfolder.
For compatibility they should be deprecated and be replaced by a function that takes the SDFG as argument.
### Important Pieces to Look at
- [`SDFG::compile()`](https://github.com/spcl/dace/blob/0ec62e20d44b6cb5f83f89b2fbc9b61f218e48e5/dace/sdfg/sdfg.py#L2461): Entrypoint to the compilation.
- [`CompiledSDFG`](https://github.com/spcl/dace/blob/0ec62e20d44b6cb5f83f89b2fbc9b61f218e48e5/dace/codegen/compiled_sdfg.py#L178): Representation of a compiled SDFG in Python.
- [`ReloadableDLL`](https://github.com/spcl/dace/blob/0ec62e20d44b6cb5f83f89b2fbc9b61f218e48e5/dace/codegen/compiled_sdfg.py#L24): Class that controls how an `so`-file is loaded.
- [`dacestub.cpp`](https://github.com/spcl/dace/blob/main/dace/codegen/tools/dacestub.cpp): Source of the "stub"-library, which is managed by `ReloadableDLL`.
- [`get_program_handle()`](https://github.com/spcl/dace/blob/0ec62e20d44b6cb5f83f89b2fbc9b61f218e48e5/dace/codegen/compiler.py#L394) & [`load_from_file()`](https://github.com/spcl/dace/blob/0ec62e20d44b6cb5f83f89b2fbc9b61f218e48e5/dace/codegen/compiler.py#L400): Loading compiled SDFGs from disc (It seems that the only difference between these functions is the order of arguments and that the second one checks if the file exists).
- [`load_precompiled_sdfg()`](https://github.com/spcl/dace/blob/0ec62e20d44b6cb5f83f89b2fbc9b61f218e48e5/dace/sdfg/utils.py#L1665): This function allows to load an SDFG from a folder, i.e. it loads the `program.sdfgz` file.
## Rabbit holes
Since the "stub"-library is redundant it is tempting to also remove it.
However, this should only be done as a second step!
## No-gos
<!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable -->
## Progress
<!-- Don't fill during shaping. This area is for collecting TODOs during building. As first task during building add a preliminary list of coarse-grained tasks for the project and refine them with finer-grained items when it makes sense as you work on them. -->
- [x] Task 1 ([PR#xxxx](https://github.com/GridTools/gt4py/pulls))
- [x] Subtask A
- [x] Subtask X
- [ ] Task 2
- [x] Subtask H
- [ ] Subtask J
- [ ] Discovered Task 3
- [ ] Subtask L
- [ ] Subtask S
- [ ] Task 4
<!--------------------------------------------------------->
[^PureDaCeBehaviour]: There is a special behaviour in DaCe that for GT4Py is not important but for the sake of completeness is mentioned here.
If a compiled SDFG, i.e. its `.so` file, is already loaded and should be loaded again, then the `.so` file is copied with a new name and then the copy is loaded.
[^StubLibrary]: See [`dace/dace/codegen/tools/dacestubs.cpp`](https://github.com/spcl/dace/blob/main/dace/codegen/tools/dacestub.cpp).
This library is used to detect if the actual library is already loaded and then to actually load and unload the library into the address space of the interpreter.