# Jupyter security map Questions we want to be able to answer: - What does component X need to talk to (directly), e.g. Hub->proxy, proxy->singleuser/hub, etc. - If credentials for component X are compromised, what actions could be taken, and how can I detect/mitigate this? - Where are credentials for component X typically stored? - If the _process_ of component X is compromised, what actions could be taken? - What configuration options mitigate vulnerabilities (e.g. disable_user_config, token expiration, user scopes, session expiration, etc.) Outline looks something like: **Component**: - capabilities: (e.g. access to other components) - access: (e.g. network) - authentication/authorization: (e.g. token in header) **Kernel**: - accessed via zeromq - sockets are bound (listen) on localhost (tcp) or BSD sockets - authenticated via HMAC signatures - not encrypted - capabilities: arbitrary code execution **Jupyter Server**: - http (usually not https) (JupyterHub has internal_ssl) - start/stop kernels - authenticated: token, etc. **JupyterHub**: **Configurable HTTP Proxy**: Notes: - https://github.com/jupyterhub/jupyterhub-the-hard-way - https://github.com/jupyterhub/jupyterhub-tutorial - http://the-littlest-jupyterhub.readthedocs.io ![](https://i.imgur.com/FoXnAcr.png) https://httpstatuses.com # COMPONENTS **kernels** - Accessed via zmq - Sockets are bound (listen) on localhost (tcp) or BSD sockets - Authenticated via HMAC signatures - control over the virtualenv only does not affect the environment(s) in which the user’s kernel(s) may run - Users cannot submit arbitrary commands to another user’s kernel - Encryption does not yet cover the zmq tcp sockets between the Notebook client and kernel - Capable of arbitrary code execution **Spawner**: # Capabilities: - Starts each single-user notebook server. - Spawner should be able to take three actions: - Start the process - Poll whether the process is still running - Stop the process - The spawner’s underlying system or cluster is responsible for enforcing CPU limits and providing guarantee FOR CPU usage. -Runs in the Hub process, so has access to everything the Hub can do -Spawning lots of servers at the same time can cause performance problems forthe Hub or the underlying spawning system. **Jupyter Server**: - Runs on http (usually not https) (JupyterHub has internal_ssl) - Capable of Starting/stoping **kernels** - Authenticated via: OAuth token issued by Hub **Database (DEFAULT SQlite)**: - Store information about users, services, and other data needed for operating the Hub. - Requires no administration. - Can employ sqlite3 command-line shell for data analysis. - SQLite uses reader/writer locks to control access to the database - Contains: - usernames - activity events - server URLs - _hashed_ tokens - (optionally) encrypted 'auth state' - Authentication: varies by implementation - Only the Hub talks directly to the database -Comprimise to the database can lead to the addition, modification and deletion of data in the database. -Runs in the Hub process. **JupyterHub**: # Capabilities: - JupyterHub provides a configuration option which can be set to prevent the user-owned configuration files from being loaded - JupyterHub provides the ability to run single-user servers on their own subdomains. - The pages of the JupyterHub application are generated from Jinja templates - The Hub manages by default the proxy as a subprocess (it can be run externally, as well, and typically is in production deployments). - JupyterHub reports about the user servers via its REST API (/users) - JupyterHub to provide access to other interfaces, such as RStudio, that provide their own access to a language kernel. - Prometheus can be configured to repeatedly poll JupyterHub’s /metrics endpoint to parse and save its current state. - JupyterHub utilises SQLite database by default, but others (PostgreSQL, MySQL) can also be utilized. - JupyterHub can be configured to record structured events from a running server using Jupyter’s Telemetry System - Utilizes fitering procedures to refine the API response to access only resource attributes corresponding to the passed scopes - The payload of an API call can be filtered both horizontally and vertically -Jupyterhub is capable of sharing links to a particular notebook from one users to others through a special url /hub/user-redirect -jupyterhub can control computing infrastruture but do not provide them -Jupyterhub provides remote access to jupyter notebooks and jupyterlab for many users -JupyterHub controls computational resources via a spawner -For more sophisticated computational resources (like distributed computing), JupyterHub can connect with other infrastructure tools (like Dask or Spark) - Authenticating services with JupyterHub requires two levels of authentication (HubOAuth and HubAuth). - By default, the hub listens on localhost only. This address must be accessible from the proxy and user servers. It could be set to a public ip or '' for all interfaces if the proxy or user servers are in containers or on a different host. -JupyterHub’s OAuthenticator currently supports the following popular services: Auth0, Azure AD, Bitbucket,CILogon, GitHub, GitLab, Globus, Google,MediaWiki, Okpy, OpenShif -For more sophisticated computational resources (like distributed computing), JupyterHub can connect with other infrastructure tools (like Dask or Spark) -The JupyterHub Helm Chart manages resources in the cloud using Kubernetes. # Access: -jupyterhub is supported on linux/unix based systems -Jupyterhub configures the proxy to forward URL prefixes to single user notebook services (jupyter server) -login data is handed over to the authenticator for valdation -Compromise of the hub will be a compromise to the Database, spawner and authenticator - JupyterHub’s authentication setup prevents a user writing arbitrary HTML and serving it to another user - JupyterHub ships with the default PAM-based Authenticator, for logging in with local user accounts via a username and password. - JupyterHub is also compatible with other authentication schemes (token) and custom authenticators. - JupyterHub has a REST API that can be used by external services -To run external service, an API token must be created and provided to the service. - Jupyterhub provides roles by default (User, Admin,Token and Server roles) default roles cannot be deleted - The RBAC framework allows for requesting tokens with specific existing roles -unauthorized access to tokens could spawn priviledge escalation. - In a role definition, the name field is required, while all other fields are optional. - Role names must be 3 - 255 characters, use ascii lowercase, numbers, ‘unreserved’ URL punctuation -_.~, start with a letter, end with letter or number - It is not possible to implicitly add a new user to the database by defining a new role. - Jupyterhub Scopes reveals which resources it provides access to. - Access to the user’s own resources and subresources is covered by metascope self. This metascope includes the user’s model, activity, servers and tokens. - The JupyterHub API requires authorization to access its APIs -Users can be added to and removed from the Hub via either the admin panel or the REST API - Each API endpoint has a list of scopes which can be used to access the API. If no scopes are listed, the API is not authenticated and can be accessed without any permissions -Enabling SSL for all internal communication enables end-to-end encryption between all JupyterHub components **Proxy**: # Capabilitities: - Hub and single-user servers are placed in a single domain, behind a proxy - Configurable-http-proxy avails last activity data for routes - Does not persist the routing table by default - REST API for managing routing table -capable of running as a service seperate from the hub # Access: - REST API token for routing table -If Proxy authentication token is not set, the Hub will generate a random key itself, which means that any time the Hub is restarted the Proxy will also restart ((Default config) -The Hub authenticates its requests to the Proxy using a secret token that the Hub and Proxy agree upon. -Not all proxy implementations use an auth token. ### Credentials **Database password or file permissions** - Stored in jupyterhub_config.py or environment **OAuth tokens** - issued by hub - scopes govern permissions - (hashed) and stored in database - stored (encrypted, not hashed) in browser cookies - stored (not encrypted, not hashed) in server environment variables - Tokens cannot be assigned roles through role definition but may be assigned specific roles when requested via API - By default expires when the cookie reaches its expiry time of 2 weeks (or after 1 hour in JupyterHub versions < 1.3.0). - OAuth tokens can generally only be used to identify a user, not take actions on the users behalf **Cookie secret** - Stored in a jupyterhub_cookie_secret file - Encrypts browser cookies used for authentication - Recommended permissions for the cookie secret file are 600 (owner-only rw) - jupyterhub_cookie_secret file stores cookie secrets ### ACCESS CONTROL