owned this note
owned this note
Published
Linked with GitHub
# Building a Singularity container builder service for IRIDA/Workbench use
Galaxy tools contain a `<requirements>` section that expresses the software (command line programs or software dependencies) the tool requires for operation. These requirements are either software package names or (much less commonly) names of software containers.
The requirements for a Galaxy tool are combined together to define its dependencies. The dependency specification is passed to "dependency resolvers" to ensure that the dependencies are satisified at tool runtime. Since installing software is a time consuming process, dependencies are translated into a string name. This is straightforward if there is a single dependency: the name is the package name of the dependency, with, or without, a version number requirement. For example, a requirement for `samtools` might imply a dependency `__samtools@1.13` or (if a version is not specified) `_samtools@_uv_`.
If more than one package is required, the requirements are combined using a process of mulling. This means that a hash string is created of the combined requirements, e.g. `mulled-v1-7f197292788ac0c39321dcb559c0e8de191431d8b9522034fbbe44259e040dee`.
Historically dependency resolvers involved using various packaging systems, most commonly conda, to install packages and make them available at runtime. You can read more about dependency resolver configuration in the [Galaxy admin docs](https://docs.galaxyproject.org/en/master/admin/dependency_resolvers.html). In recent Galaxy versions, dependency resolvers are supplement by container resolvers that provide links between Galaxy tool execution and software containers (either Docker or Singularity) used to provide required packages.
Container resolvers are configured using the [container_resolvers_conf.xml](https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/config/sample/container_resolvers_conf.xml.sample). This provides configuration for:
1. Explicitly named containers (i.e. when a tool states that it requires a particular container by name)
2. Cached mulled containers, i.e. when a container can be looked up in a locally stored cache based string name generated from the requirements (as described above).
3. Mulled containers, where the requirements "mulled" string name is used to look up a containe. The [auto-mulled](https://github.com/BioContainers/auto-mulled) project builds containers for many Galaxy tools, e.g. those in the tools-iuc collection.
4. Building containers: when none of the previously mentioned resolvers find a container, a new container can be built. This uses [involucro](https://github.com/involucro/involucro) to build the container based on the tool's requirements specification.
5. A fallback container that can be configured by the Galaxy admin.
Containers either fetched using the mulled container resolver or built using `involucro` are stored in a cache for future use by the cached mulled container resolver.
Note that there are Docker and Singularity versions of each of these resolvers. Singularity images are stored on disk, by default in the `database/container_cache` folder.
Given the above explanation, there are several steps that are involved in going from Galaxy tool specification to running Galaxy tools with installed container resolvers, with associated questions:
0. Using the `workflow-to-tools` command from [Ephemeris](https://github.com/galaxyproject/ephemeris), we can go from a Galaxy workflow JSON specification (".ga file") to a list of tools to install (in a YAML file). This is used in the [irida-plugin-builder](https://github.com/COMBAT-TB/irida-plugin-builder).
1. A Galaxy toolshed name, tool name, author and toolshed commit ID specifies how to find a Galaxy tool. Ephemeris can install tools from a tool specification (in the YAML file mentioned above) into a Galaxy server.
2. Using the bioblend interface to the toolshed API, we can get info about a tool from its tool specification, e.g.
```python
# this is how to get tool info from the toolshed
toolshed_name = 'toolshed.g2.bx.psu.edu'
tool_name = 'read_it_and_keep'
tool_author = 'iuc'
tool_revision = '1563b58905f4'
toolshed_url = f'https://{toolshed_name}'
ts = toolshed.ToolShedInstance(url=toolshed_url)
result = ts.repositories.get_repository_revision_install_info(
tool_name, tool_author, tool_revision)
for dictionary in result:
if 'valid_tools' in dictionary:
spec_strs = []
# this dict contains a list of installable tools
for tool in dictionary['valid_tools']:
print(tool['id'])
for requirement in tool['requirements']:
if 'version' in requirement:
spec_str = f'{requirement["name"]}=={requirement["version"]}'
spec_strs.append(spec_str)
print(spec_str)
else:
print(f'unversioned {requirement["name"]}', file=sys.stderr)
print(','.join(spec_strs))
```
3. The `mulled-build` command from the [galaxy-tool-util](https://pypi.org/project/galaxy-tool-util) package can be used to build images using the spec from a tools requirements. Using the above example of Read It And Keep, the spec string is `read-it-and-keep==0.2.2,python==3.10` and the command to use is `mulled-build build-and-test --test echo --singularity 'read-it-and-keep==0.2.2,python==3.10'`.
This builds a Docker container image, using conda to install the required tools (so tools need to be in conda-forge or bioconda), runs a test (using the command specfied with `--test`, thus `echo` is a fake test) and then builds a singularity container in the `./singularity_import` directory (or another directory specified with `--singularity-image-dir`).
As an aside, this command uses [involucro](https://github.com/involucro/involucro) which is a container building system that relies on Lua files for its list of tasks. In the case of Galaxy [this file](https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/tool_util/deps/mulled/invfile.lua) is used. Look in this file for the `inv.task` sections to understand the available tasks (e.g. `build`, `build-and-test`, `singularity`, etc). A further detail: the current release of `mulled-build` is using Singularity version 2 to build these images.
4. Finally the image from the `./singularity_import` directory can be copied to the cache directory (by default `database/container_cache/singularity/mulled`) of the Galaxy server.
The workflow described thus far assumes that containers are going to be built on the Galaxy server where they are used and assumes the presence of all infrastructure (Ephemeris, galaxy-tool-util, Docker) on that server. Alternatively, a list of tools (in the tool specification YAML) can be used to build Singularity container images ahead of time that can then be copied to the Galaxy server as needed.
### From Tool specification to Mulled ID
This script - which uses the `bioblend` and `galaxy_util` libraries, prints a mulled ID when given a tool specification (tool, author, revision).
```python
#!/usr/bin/env python3
import argparse
from typing import Union
from bioblend import toolshed
from galaxy.tool_util.deps.mulled.mulled_build import target_str_to_targets
from galaxy.tool_util.deps.mulled.util import v1_image_name, v2_image_name
def get_tool_requirements(tool_name: str, tool_author: str, tool_revision: str,
toolshed_name: str = 'toolshed.g2.bx.psu.edu') -> Union[str, None]:
"Given a tool description, get the list of requirements"
# this is how to get tool info from the toolshed
toolshed_url = f'https://{toolshed_name}'
ts = toolshed.ToolShedInstance(url=toolshed_url)
result = ts.repositories.get_repository_revision_install_info(
tool_name, tool_author, tool_revision)
for dictionary in result:
if 'valid_tools' in dictionary:
spec_strs = []
# this dict contains a list of installable tools
for tool in dictionary['valid_tools']:
for requirement in tool['requirements']:
if 'version' in requirement:
spec_str = f'{requirement["name"]}=={requirement["version"]}'
spec_strs.append(spec_str)
else:
print(f'unversioned {requirement["name"]}', file=sys.stderr)
return ','.join(spec_strs)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--mulled_version', default='v2')
parser.add_argument('tool_name')
parser.add_argument('tool_author')
parser.add_argument('tool_revision')
args = parser.parse_args()
targets = target_str_to_targets(get_tool_requirements(args.tool_name, args.tool_author, args.tool_revision))
if args.mulled_version == 'v2':
image_name = v2_image_name
else:
image_name = v1_image_name
print(image_name(targets))
```