Erik Sundell
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # repo2docker-service - A JupyterHub service to use repo2docker BinderHub and tljh-repo2docker are two pieces of software that both build Docker images using repo2docker. Users who directly deploy a [JupyterHub Helm chart](https://github.com/jupyterhub/zero-to-jupyterhub-k8s) don't have a way to provide their users with a way to build images using repo2docker though. ## What this text includes and not This text includes two main sections. First a technical background of tljh-repo2docker, and then a proposal to develop a new jupyterhub service to use repo2docker that should both function as a building block via its REST API's but also be useful on its own. This text does not include a section about how the proposed jupyterhub repo2docker service could be used beyond its quite narrow scope, to for example provide an experience involving for example a "click link -> build & push image -> launch user server" workflow. A workflow like that is meant to be facilitated by a service like this though. ## Technical background of [tljh-repo2docker](https://github.com/plasmabio/tljh-repo2docker) This text aims to summarize the code in tljh-repo2docker in order to evaluate what code could be adjusted and shared for a JupyterHub service (accessed under `/services/repo2docker`, and scoped to build images with repo2docker). ### [`__init__.py`](https://github.com/plasmabio/tljh-repo2docker/blob/master/tljh_repo2docker/__init__.py) `__init__.py` includes `tljh_custom_jupyterhub_config` and `tljh_extra_hub_pip_packages` that are detected as [TLJH plugin hooks](https://tljh.jupyter.org/en/latest/contributing/plugins.html), and this is how installing tljh-repo2docker can influence a [TLJH](https://github.com/jupyterhub/the-littlest-jupyterhub) distribution of JupyterHub. The `tljh_custom_jupyterhub_config` registers additional _tornado web request handlers_ like below. ```python c.JupyterHub.extra_handlers.extend( [ (r"environments", ImagesHandler), (r"api/environments", BuildHandler), (r"api/environments/([^/]+)/logs", LogsHandler), (r"environments-static/(.*)", CacheControlStaticFilesHandler, ...), ] ) ``` ### Extra JupyterHub web request handlers JupyterHub as a [tornado](https://www.tornadoweb.org/) web based application will do things and respond with HTML/JSON when HTTP web requests arrive. Web requests to different paths (`/hub/home` vs `/hub/admin` etc) are handled by different handlers that can also behave differently based on the web request's HTTP method (`GET`, `POST`, `DELETE`, ...). A [tornado web request handler](https://www.tornadoweb.org/en/stable/web.html) is a class with functions (`get`, `post`, ...) that reacts to an incoming web request and provides a HTTP response, often containing HTML or JSON. tljh-repo2docker registers a few additional tornado web request handlers with JupyterHub. These handlers also rely on Python decorators [`@web.authenticated`](https://www.tornadoweb.org/en/stable/web.html#tornado.web.authenticated) and [`@admin_only`](https://github.com/jupyterhub/jupyterhub/blob/75e03ef1d977dfee680708289c98432e2893ed5a/jupyterhub/utils.py#L342-L354) provided by tornado and jupyterhub for use by the tornado applicatin itself (that can't be used in a standalone web application). - **[images.py](https://github.com/plasmabio/tljh-repo2docker/blob/master/tljh_repo2docker/images.py)** The `ImagesHandler` is registered to handle requests arriving to `/environments` and renders the [jinja2](https://jinja.palletsprojects.com) HTML template [images.html](https://github.com/plasmabio/tljh-repo2docker/blob/master/tljh_repo2docker/templates/images.html). This location is available only for authenticated JupyterHub admin users. ```python class ImagesHandler(BaseHandler): @web.authenticated @admin_only async def get(self): images = await list_images() containers = await list_containers() result = self.render_template( "images.html", images=images + containers, default_mem_limit=self.settings.get("default_mem_limit"), default_cpu_limit=self.settings.get("default_cpu_limit"), ) if isawaitable(result): self.write(await result) else: self.write(result) ``` - **[builder.py](https://github.com/plasmabio/tljh-repo2docker/blob/master/tljh_repo2docker/builder.py)** The `BuildHandler` is registered to handle HTTP `POST` and HTTP `DELETE` requests arriving to `api/environments`. When handling a POST request, it will build an image, and when handling a DELETE request it will delete an already built image. As an API, the responses from the handlers are basic JSON messages like `{"status": "ok"}` after finishing the task they were meant to accomplish. - **[logs.py](https://github.com/plasmabio/tljh-repo2docker/blob/master/tljh_repo2docker/logs.py)** The `LogsHandler` is registered to handle HTTP `GET` requests arriving to `api/environments/.../logs` where `...` is a name associated with a build container to get repo2docker build logs from. ### Jinja HTML templates When a JupyterHub responds with HTML to a user, it is HTML rendered from a jinja template given data such as the name of the user and more. - **[page.html](https://github.com/plasmabio/tljh-repo2docker/blob/master/tljh_repo2docker/templates/page.html) - A JupyterHub template** By providing this template, it overrides the template provided by JupyterHub. By doing so, `tljh-repo2docker` is able to add a `Environments` like to arrive at `/environments`. Adding that link is this templates sole purpose. This strategy is [in conflict with for example jupyterhub-nativeauthenticator](https://github.com/plasmabio/tljh-repo2docker/issues/58) that also overrides `page.html` in order to provide a link to its custom pages and handlers. - **[admin.html](https://github.com/plasmabio/tljh-repo2docker/blob/master/tljh_repo2docker/templates/admin.html) - A JupyterHub template** The `/hub/admin` view relies on the `admin.html` template, which is expanded to also list an image used in a spawned server. - **[images.html](https://github.com/plasmabio/tljh-repo2docker/blob/master/tljh_repo2docker/templates/images.html) - A dedicated `tljh-repo2docker` template.** This is the only template that is directly rendered by `tljh-repo2docker` itself. It provides a user interface to interact with `api/environments` paths via the bundled [images.js](https://github.com/plasmabio/tljh-repo2docker/blob/master/tljh_repo2docker/static/js/images.js) javascript. ### Logic to execute repo2docker tljh-repo2docker doesn't execute repo2docker directly, but instead starts a docker container using the image `quay.io/jupyterhub/repo2docker:main`, and builds a new image from within that container using the host machines docker runtime that is mounted to the container. `docker.py` is code that starts that build container. - **[docker.py](https://github.com/plasmabio/tljh-repo2docker/blob/master/tljh_repo2docker/docker.py)** This file has three functions. - `list_images` lists built images. - `list_containers` lists build containers running the repo2docker image to build images. - `build_image` runs the repo2docker image to build a given repository. The file is small with very little source code, centered around calling `docker` to in turn run a container with repo2docker installed, which in turn builds a new image. The `build_image` function has a signature like below. ```python async def build_image( repo, ref, name="", memory=None, cpu=None, username=None, password=None, extra_buildargs=None ): ``` Note how `memory` is passed. It is used to label the image being built (and the container building it) with `tljh_repo2docker.mem_limit`. The `list_...` functions are also coupled like this to `mem_limit` and `cpu_limit`. ### Summary As we consider creating a general purpose JupyterHub service to build images using repo2docker, I'd like to highlight how tljh-repo2docker parts that we could hope to share in tljh-repo2docker is coupled in ways we can't. 1. Extra web request handlers, but we need a dedicated web application As we plan to build a dedicated JupyterHub service exposed via `/services/repo2docker`, we must also start a dedicated web server separate from the JupyterHub tornado server. We can't easily take these web request handlers and re-use them due to that. 2. HTML templates are coupled to JupyterHub's web application The HTML templates are tightly coupled with JupyterHub's other provided template. For example, images.html leads with the line `{% extends "page.html" %}`, and the provided page.html leads with the line `{% extends "templates/page.html" %}` that refers to a JupyterHub bundled template I think. 3. UI, API, and built images are coupled with spawning options cpu_limit and mem_limit The HTML/JS based user interface presented under `/environments` the API it interacts with, and the logic to build an image, all couple to specifically `cpu_limit` and `mem_limit`. ## Proposal of a jupyterhub repo2docker service This is a proposal to develop a [JupyterHub service](https://jupyterhub.readthedocs.io/en/stable/reference/services.html) tightly scoped to do few things to build images using repo2docker. The idea is that this can be a building block usable on its own, but also as a building block for more advanced composed functionality. ### What the service should and shouldn't do #### Should - Be possible to use with a z2jh based JupyterHub where it should also be able to facilitate pushing of images to a container registry. - Be its own web server. - Work specifically against JupyterHub as an identity provider, and if needed its RBAC system with custom scopes to determine what users are allowed to use what actions. Users must be logged in to view anything, and that users with further permissions, for example to build images are identified as having a JupyterHub RBAC scope. - Provide a REST API to: - build (and optionally push) images - provide relevant information - list built images - list images currently building - logs of recently built or building image - Support being run with or without a pre-configured container registry. - Provide a HTML/JS based user interface to interact with the REST API. #### Shouldn't - Assume it is running on the same machine as JupyterHub. - Persist state to the local file system or JupyterHub's database, but instead rely on its in-memory state, the local docker runtime, and the optional remove container registry. - Couple directly with things related to how the built images are used. As an example, consider a JupyterHub Spawner that want's to use information from this service about built images as the Spawner thinks of them as images that users can start containers from. Then it is the Spawner that is responsible for asking for that information from this service. - Build images in a distributed manner like BinderHub can do by creating dedicated build Pods. This service should allow itself to run on a single machine. ### Choice of web server (tornado, fastapi, flask, etc.) We need a web server, but what software should we build from and why? Our needs are probably not very advanced, so we can focus on more basic aspects. - Smooth handling of authentication and authorization with JupyterHub - Something the JupyterHub team is overall is already used to JupyterHub use Tornado, [jupyter_server](https://github.com/jupyter-server/jupyter_server) use Tornado, [jupyverse](https://github.com/jupyter-server/jupyverse) use FastAPI. For the time being, let's assume we use Tornado and compare against it if we consider something else. ### Managing authentication (who?) and authorization (allowed?) This service should be developed to rely on JupyterHub to validate a visitor's identity, and if needed to decide if the user is allowed to do something or not should be done using the new JupyterHub RBAC system. Like this, we can define new kinds of RBAC scopes that we tie to various permissions in the service. These scopes could then be granted to various JupyterHub groups of users as well. A great thing is that from the perspective of this repo2docker service, it can just ask JupyterHub what user is interacting with it and what permission that user has. I think that a user could be assigned the relevant permissions indirectly by belonging to a JupyterHub group as well, and then management of permissions could be done by managing the users in a group. ### Q/A: Could code be shared between a new repo2docker service and tljh-repo2docker? Yes, hopefully. The vision is that this service is developed tightly scoped to be functioning as a general purpose building block for use by more feature rich tooling like tljh-repo2docker. ### Open questions Note that there are more open questions beyond these I have not managed to formulate, such as details on the UI to be provided etc. - **Avoiding image name conflicts** With users building and pushing images via this service, it can be security critical that they can't replace images built by other users. How do we ensure that this doesn't happen? I think the resolution is to ensure that the image name has a section for the username. A configuration like `image_name_template` could be relevant. - **Container registry API assumptions** What assumptions about the container registry do we make? I think there are standards for container registries with what REST API they provide etc, and the answer to this question should be at least a clarification on the kind of standard. We need this to keep track of what is available in the container registry. # Followup, 2i2c slack's #binderhub-jupyterhub channel on Friday September 23rd ([link](https://2i2c.slack.com/archives/C03RLNFM43F/p1663928687936299)) I just had a meeting with Min RK who kindly helped out. Here are some notes from that and some next steps planned just in my head. ## My on-the-spot planned next steps - practically start building something functioning as a hello world of the relevant techniques - detail a REST API to cover all relevant interactions we want to support with this service to function as a reliable building block - detail a primitive UI for /services/repo2docker for direct use of the service - validate that a jupyterhub admin can be made to manage if a user is part of a group or not, which by the initial setup can imply permissions to do certain things in the service. ## My refined notes from the meeting with Min - **Ideas ideas about how the JupyterHub RBAC can be used by the service was verified.** I wanted to make sure that its planned so that jupyterlab extensions could be developed to interact against the service as well for example and wasn't sure this would be viable - but it sure is practically viable! - **Understanding that a dedicated webserver is needed was verified.** This is no matter what when we work to provide this service as a jupyterhub service (either managed or external). What tljh-repo2docker does would not require this though, it just registers additional web request handlers for the main jupyterhub application to also handle. - **Input about the use of HubOAuth class (not tornado specific) and the HubOAuthenticated class (tornado specific).** I understood what Python code was around that could be reused related to oauth, but got confident that it would be fine to implement this anew if needed as its mainly standard OAuth2 procedure involved. JupyterHub related documentation: https://jupyterhub.readthedocs.io/en/stable/api/services.auth.html. - **Tornado and FastAPI was discussed**, and both are serious options as the webserver for the service with some known pro/con and some unknown pro/con making it not obvious on what to go for. There are various examples of services developed with various webservers found here that can be used as input: https://github.com/jupyterhub/jupyterhub/tree/main/examples. On a very technical level, making a service that relies on JupyterHub RBAC to manage a custom set of permissions would involve something like... 1. Declare custom RBAC scopes (`JupyterHub.custom_scopes`). See https://jupyterhub.readthedocs.io/en/stable/rbac/scopes.html#custom-scopes. 2. Declare that the service exists (`JupyterHub.services`). See https://jupyterhub.readthedocs.io/en/stable/api/service.html. 3. Declare a role with scopes, and a group of users to have that role, and add users to that group. See also https://jupyterhub.readthedocs.io/en/stable/rbac/scopes.html#custom-scopes. 4. Declare that the service allowed scopes involve the custom scopes (?) 5. Optionally declare that the Spawned servers should request the custom scopes as well to help a JupyterLab extension make requests to the service. This would be `spawner.oauth_client_allowed_scopes`. If the user has such scopes, a JupyterLab extension could access it via a "pageconfig token" or similar.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully