# Writing easyconfigs for Python packages
## TODO (subsections to write)
- https://github.com/easybuilders/easybuild-docs/pull/132 (merged PR - module cheatsheet)
- general info
- Python + incl. extension
- SciPy-bundle
- PythonBundle vs PythonPackage easyblock
- when to use which
- recommended easyconfig parameters
- download_dep_fail
- use_pip
- sanity_pip_check
- (will be default in EasyBuild v5.0)
- commonly used easyconfig parameters
- modulename (to tweak import check)
- sanity check paths (vs default in PythonBundle)
- sanity_check_commands
- template `%(pyshortver)s`
- modextrapaths (for updating $PYTHONPATH when installing Python bindings on top of an installation performed with another easyblock)
-------------------------------------------------------
# Writing easyconfigs for Python-based packages
This page explains the basics of creating an easyconfig file for any package that gets installed through `pip` or by running the `setup.py` script aswell as some common errors and their troubleshooting.
Most commonly used easyblocks to install these packages are [`PythonPackage`](http://docs.easybuild.io/version-specific/generic-easyblocks/?h=pythonpackage#pythonpackage) and [`PythonBundle`](http://docs.easybuild.io/version-specific/generic-easyblocks/?h=pythonbundle#pythonbundle).
## Difference between `PythonPackage` and `PythonBundle` [[Filip]]
There are a few differences between these easyblocks.
`PythonPackage` easyblock expects you to provide the means of getting the package inside the [`sources`](http://docs.easybuild.io/writing-easyconfig-files/?h=#common_easyconfig_param_sources) (and `source_urls`,...) parameter. This easyblock isn't viable if you need to install extensions (for instance from PyPi).
`PythonBundle` easyblock doesn't use `sources` or `source_urls`. Instead it uses the array of tuples [`exts_list`](http://docs.easybuild.io/writing-easyconfig-files/?h=exts_list#module_extensions) to acquire all the extensions needed. You need to specify the package that you are trying to install as the last element of this array.
Usually for both of these easyblocks, you want to set `use_pip` and `sanity_pip_check` to `True` in your easyconfig file. \<fill in info about `download_dep_fail`\>
FIXME
## Finding out dependencies for packages [[ Denis ]]
For a software to work correctly, we need to make sure we have all the dependencies that it needs installed. Here are some methods to determine the dependencies of a Python package:
1. `Pipfile`: Pipenv users can view a package's dependencies by examining its `Pipfile`, which specifies the packages and versions installed in the virtual environment. Example: https://github.com/Pocket/recommendation-api/blob/main/Pipfile.lock
2. `requirements.txt`: Another often used file that lists dependencies. Example: https://github.com/pymc-devs/pymc/blob/main/requirements.txt
3. `pyproject.toml`: This file is becoming increasingly popular in the Python ecosystem as a way to specify dependencies. It contains metadata about the package, including its dependencies. Example: https://github.com/pandas-dev/pandas/blob/master/pyproject.toml
4. `setup.py`: The `setup.py` file contains metadata about the package, including its dependencies. It can be found in the package's root directory. https://github.com/Hoohm/CITE-seq-Count/blob/master/setup.py
5. `setup.cfg`: The `install_requires` option in the setup.cfg file can be used to specify the package's dependencies. See https://github.com/theislab/scib/blob/main/setup.cfg
6. Conda environment definition: You can view a package's dependencies by examining its environment definition. It is a YAML file and it specifies the packages and versions installed in the Conda environment. Example: https://github.com/theislab/scib-pipeline/blob/main/envs/scib-pipeline-R4.0.yml
7. `README` file: Some Python packages have a README file containing information about the package, including its dependencies. It can be found in the package's root directory.
### Finding dependencies using `Pipdeptree`:
Pipdeptree is a Python command-line tool that helps you visualize the dependency tree of a project managed by pip, the package installer for Python. It shows you a tree-like representation of the installed Python packages and their dependencies, including both direct and indirect dependencies.
## EasyConfig parameter `options['modulename']`
By default EasyBuild performs an import check when install Python packages. The modulename parameter is used to specify the name of the Python module that should be imported during the sanity check after the software installation. This parameter is particularly useful when the default module name inferred by EasyBuild does not match the actual module name or when the user wants to check the availability of a specific submodule.
`options = {'modulename': 'example'}`
(you will need to change 'example' here, of course)
TODO: partially the same thing as Ex.3 at http://tutorial.easybuild.io/2021-lust/creating_easyconfig_files/#exercises
## Recommended Easyconfig parameters
These are the parameters whose use is highly recommended.
- `download_dep_fail = True`
- The download_dep_fail parameter in EasyBuild is used to control the build process behavior when downloaded dependencies (other than those directly specified in Easyconfig) are detected.
- `use_pip = True`
- When use_pip=True, EasyBuild uses pip to install Python packages and dependencies, providing a convenient way to manage them. However, with use_pip=False, EasyBuild employs its default method, typically using setup.py scripts.
- `sanity_pip_check = True`
- When sanity_pip_check is set to True, EasyBuild will run pip check as part of the sanity check phase. This command inspects the installed Python packages and their dependencies, ensuring there are no broken requirements or conflicts. If issues are detected, the build process will be halted, and an error will be raised.
## Making dependencies less strict [[ Denis ]]
todo: `sed`-ing a setup.py file etc.
A common problem is when a software package requires a very specific version of a dependency, which can make it difficult to install. This often comes from a fact that less experienced or careful developer might just dump everything they have installed when developing the software into the `requirements.txt` file or similar (look above to see what other places we can found dependencies listed at) without realising that they locked the software in a way that it can only very specific versions.
Since EasyBuild is using `toolchain` generations (which usually refresh software versions only twice a year), this is bad news for us, as we consecutively cannot meet these strict dependencies.
In this section, we will explore how to use `patches` and `sed` to ease off dependency requirements and make software installation and creating `easyconfigs` easier (or possible at all).
### Using Patch Files
Patch files are text files that contain the differences between two files. They are commonly used to apply updates or modifications to software packages. In the context of dependencies, patch files can be used to modify the requirements of a package. For example, if a package requires version 2.0 of a dependency, but version 2.1 is available and compatible, a patch file can be created to modify the requirement to version 2.1.
To create a patch file, you will need to have both the original file and the modified file. The diff command can be used to generate a patch file based on the differences between the two files. For example, if we have a file called "requirements.txt" that specifies a dependency on version 2.0 of a package called "foo", we can create a patch file to modify this requirement as follows:
1. Download the package source code.
2. Create a copy of the source code, appending the new folder name with `_orig`:
```bash=
cp -r foo foo_orig
```
3. Edit the `requirements.txt` file in folder `foo` replacing software versions (e.g. rewrite `dependency==2.0` to `dependency==2.*`). Needless to say, you need to make sure the latter dependency version is actually compatible with your software. Sometimes, version changes too drastic may lead to compatibility issues.
4. Create the patch file:
```bash=
diff -ruN foo_orig foo > foo_patch.patch
```
5. Place the patch file to the correct folder (where EasyBuild can find it). Finally, tell Easybuild to use the patch - it will apply the patch during installation, so you don't have to worry about applying the patch, just specify the patch name. More often than not, you'll be using the `PythonBundle`, so the syntax is as follows:
```python=
# make sure to use the patch for correct software
# it is the one depending on it, not the one that is a dependency
(name, version, {
'patches': ['foo_patch.patch'],
}),
```
### Using `sed`
The `sed` command is a powerful tool for manipulating files. It can be used to modify specific lines or patterns within a file. In the context of dependencies, sed can be used to modify the version requirements specified in a configuration file (e.g. `requirements.txt`).
For example, if a package requires version 2.0 of a dependency in a file called "requirements.txt", we can use sed to modify this requirement to allow version 2.1 (this time we can try more general approach to futureproof the easyconfig and it's later revisions) as follows:
```bash=
sed -i 's/dependency==2.0/dependency>=2.0/g' requirements.txt
```
Now, of course, we have to put this in a context of `easyconfig`, so for `PythonBundle`, it will look like this:
```python=
(name, version, {
# replace fixed (==) versions in requirements.txt with minimal versions (>=)
'preinstallopts': "sed -i 's/dependency==2.0/dependency>=2.0/g' requirements.txt && "
}),
```