owned this note
owned this note
Published
Linked with GitHub
# Conda store thoughts
https://conda-store.readthedocs.io/en/latest/contributing.html#architecture is
a very useful link, describing the architecture.
Ultimately, this is *deeply tied* to conda - the db schema makes very specific
references to conda.
I'm working on a TLJH plugin (https://github.com/yuvipanda/tljh-conda-store)
to test this out.
## Components
- API server (conda-store-server)
- Database (SQLAlchemy compatible, postgresql preferred, but this can just be)
- What is actually in this?
- https://conda-store.readthedocs.io/en/latest/contributing.html#database-model
- Contains all the actual user provided environment files and content
- Contains references to all the built artifacts
- Does *not* seem to contain user info? I think that's all external
- Object Storage (s3 compatible, like minio - or filesystem is enough)
- Is this transient and used as a cache? Or?
- Not just used as a cache, as there are direct references to it from the db schema
- Celery (runs conda-store-worker)
- Message broker for Celery (could be same postgresql used by SQLAlchemy)
So I think the minimum set of processes really is JupyterHub + api server (sqlite + filesystem) + celery (with sqlite)
## Authentication
It seems to have its own authentication implementation, complete with
RBAC for permissions on a user / group basis.
https://github.com/Quansight/conda-store/blob/main/conda-store-server/conda_store_server/server/auth.py
It does support just piggy-backing on JupyterHub auth though,
https://github.com/Quansight/conda-store/blob/0a578730c73579467a84336bd635e9bd81fa19a2/conda-store-server/conda_store_server/server/auth.py#L567
so that is nice
## Comparisons to BinderHub
(Need to validate this comparison)
BinderHub is a service that takes a collection of file (environment.yml, apt.txt, etc)
stored *externally* (from Github, etc), uses Kubernetes (or now just Docker) to run
repo2docker to produce an environment in the form of a docker image.
conda-store is a service that takes *one* file (environment.yml), stored internally
(in its db), uses celery to run conda to produce an environment in the form of a tarball
(or optionally) a docker image.
## Tech choices
- The API Server is flask, not an async setup like tornado (like all of Jupyter) or fastapi.
Reponse from Chris O.
Totally agree with it not being async. I want it to be and there is no reason it has to be flask. I started with flask for prototyping and it has persisted since only becuase I haven't justified the development work. I'm actually using pydantic already so I would be eager to adopt fastapi though.
## Questions?
- What is in the database?
Response from Chirs O.
The docs should give a good description of what is in the database. Overall it keeps track of namespaces, environments, and builds. Associated with each build it keeps track of the conda packages that exist within the environment along with all build artifacts filesystem env, conda pack, docker image, etc.
I have put some though on if this should be a more general environment managed but for not I think this would restrict the number of build artifacts that can be built for each. For example if we were to support apt packages that could not be available in the conda-lock or conda-pack format. But I do agree this would be a useful service just would increase the scope a lot.
- [docs database diagram](https://conda-store.readthedocs.io/en/latest/contributing.html#database-model)
- How does it handle auth?
Response from Chris O.
Conda-Store tries to not be responsible for auth and always defer to other services (for example jupyterhub, keycloak, auth0, etc.). To conda store a "user" or "service" has a "primary_namespace" where the user's default environments are created. Also a set of permissions e.g. `{"quansight/*": ["admin"], "*/*": ["viewer"]}`. All of this is stored in a jwt token making conda-store stateless with authentication/authorization. I would like to add an api token as well though which looks like it will need to be stored in a database similar to jupyterhub. But overall I don't want to grow Conda-Store's auth capabilities and it should rely on other services.
- [Authentication docs](https://conda-store.readthedocs.io/en/latest/contributing.html#authentication-model)
- [Authorization docs](https://conda-store.readthedocs.io/en/latest/contributing.html#authorization-model)