# Jupyter security map
Questions we want to be able to answer:
- What does component X need to talk to (directly), e.g. Hub->proxy, proxy->singleuser/hub, etc.
- If credentials for component X are compromised, what actions could be taken, and how can I detect/mitigate this?
- Where are credentials for component X typically stored?
- If the _process_ of component X is compromised, what actions could be taken?
- What configuration options mitigate vulnerabilities (e.g. disable_user_config, token expiration, user scopes, session expiration, etc.)
Outline looks something like:
**Component**:
- capabilities: (e.g. access to other components)
- access: (e.g. network)
- authentication/authorization: (e.g. token in header)
**Kernel**:
- accessed via zeromq
- sockets are bound (listen) on localhost (tcp) or BSD sockets
- authenticated via HMAC signatures
- not encrypted
- capabilities: arbitrary code execution
**Jupyter Server**:
- http (usually not https) (JupyterHub has internal_ssl)
- start/stop kernels
- authenticated: token, etc.
**JupyterHub**:
**Configurable HTTP Proxy**:
Notes:
- https://github.com/jupyterhub/jupyterhub-the-hard-way
- https://github.com/jupyterhub/jupyterhub-tutorial
- http://the-littlest-jupyterhub.readthedocs.io
![](https://i.imgur.com/FoXnAcr.png)
https://httpstatuses.com
# COMPONENTS
**kernels**
- Accessed via zmq
- Sockets are bound (listen) on localhost (tcp) or BSD sockets
- Authenticated via HMAC signatures
- control over the virtualenv only does not affect the environment(s) in which the user’s kernel(s) may run
- Users cannot submit arbitrary commands to another user’s kernel
- Encryption does not yet cover the zmq tcp sockets between the Notebook client and kernel
- Capable of arbitrary code execution
**Spawner**:
# Capabilities:
- Starts each single-user notebook server.
- Spawner should be able to take three actions:
- Start the process
- Poll whether the process is still running
- Stop the process
- The spawner’s underlying system or cluster is responsible for enforcing CPU limits and providing guarantee FOR CPU usage.
-Runs in the Hub process, so has access to everything the Hub can do
-Spawning lots of servers at the same time can cause performance problems forthe Hub or the underlying spawning system.
**Jupyter Server**:
- Runs on http (usually not https) (JupyterHub has internal_ssl)
- Capable of Starting/stoping **kernels**
- Authenticated via: OAuth token issued by Hub
**Database (DEFAULT SQlite)**:
- Store information about users, services, and other data needed for operating the Hub.
- Requires no administration.
- Can employ sqlite3 command-line shell for data analysis.
- SQLite uses reader/writer locks to control access to the database
- Contains:
- usernames
- activity events
- server URLs
- _hashed_ tokens
- (optionally) encrypted 'auth state'
- Authentication: varies by implementation
- Only the Hub talks directly to the database
-Comprimise to the database can lead to the addition, modification and deletion of data in the database.
-Runs in the Hub process.
**JupyterHub**:
# Capabilities:
- JupyterHub provides a configuration option which can be set to prevent the user-owned configuration files from being loaded
- JupyterHub provides the ability to run single-user servers on their own subdomains.
- The pages of the JupyterHub application are generated from Jinja templates
- The Hub manages by default the proxy as a subprocess (it can be run externally, as well, and typically is in production deployments).
- JupyterHub reports about the user servers via its REST API (/users)
- JupyterHub to provide access to other interfaces, such as RStudio, that provide their own access to a language kernel.
- Prometheus can be configured to repeatedly poll JupyterHub’s /metrics endpoint to parse and save its current state.
- JupyterHub utilises SQLite database by default, but others (PostgreSQL, MySQL) can also be utilized.
- JupyterHub can be configured to record structured events from a running server using Jupyter’s Telemetry System
- Utilizes fitering procedures to refine the API response to access only resource attributes corresponding to the passed scopes
- The payload of an API call can be filtered both horizontally and vertically
-Jupyterhub is capable of sharing links to a particular notebook from one users to others through a special url /hub/user-redirect
-jupyterhub can control computing infrastruture but do not provide them
-Jupyterhub provides remote access to jupyter notebooks and jupyterlab for many users
-JupyterHub controls computational resources via a spawner
-For more sophisticated computational resources (like distributed computing), JupyterHub can connect with other infrastructure tools (like Dask or Spark)
- Authenticating services with JupyterHub requires two levels of authentication (HubOAuth and HubAuth).
- By default, the hub listens on localhost only. This address must be accessible from the proxy and user servers. It could be set to a public ip or '' for all interfaces if the proxy or user servers are in containers or on a different host.
-JupyterHub’s OAuthenticator currently supports the following popular services:
Auth0, Azure AD, Bitbucket,CILogon, GitHub, GitLab, Globus, Google,MediaWiki, Okpy, OpenShif
-For more sophisticated computational resources (like distributed computing), JupyterHub can connect with other infrastructure tools (like Dask or Spark)
-The JupyterHub Helm Chart manages resources in the cloud using Kubernetes.
# Access:
-jupyterhub is supported on linux/unix based systems
-Jupyterhub configures the proxy to forward URL prefixes to single user notebook services (jupyter server)
-login data is handed over to the authenticator for valdation
-Compromise of the hub will be a compromise to the Database, spawner and authenticator
- JupyterHub’s authentication setup prevents a user writing arbitrary HTML and serving it to another user
- JupyterHub ships with the default PAM-based Authenticator, for logging in with local user accounts via a username and password.
- JupyterHub is also compatible with other authentication schemes (token) and custom authenticators.
- JupyterHub has a REST API that can be used by external services
-To run external service, an API token must be created and provided to the service.
- Jupyterhub provides roles by default (User, Admin,Token and Server roles) default roles cannot be deleted
- The RBAC framework allows for requesting tokens with specific existing roles
-unauthorized access to tokens could spawn priviledge escalation.
- In a role definition, the name field is required, while all other fields are optional.
- Role names must be 3 - 255 characters, use ascii lowercase, numbers, ‘unreserved’ URL punctuation -_.~, start with a letter, end with letter or number
- It is not possible to implicitly add a new user to the database by defining a new role.
- Jupyterhub Scopes reveals which resources it provides access to.
- Access to the user’s own resources and subresources is covered by metascope self. This metascope includes the user’s model, activity, servers and tokens.
- The JupyterHub API requires authorization to access its APIs
-Users can be added to and removed from the Hub via either the admin panel or the REST API
- Each API endpoint has a list of scopes which can be used to access the API. If no scopes are listed, the API is not authenticated and can be accessed without any permissions
-Enabling SSL for all internal communication enables end-to-end encryption between all JupyterHub components
**Proxy**:
# Capabilitities:
- Hub and single-user servers are placed in a single domain, behind a proxy
- Configurable-http-proxy avails last activity data for routes
- Does not persist the routing table by default
- REST API for managing routing table
-capable of running as a service seperate from the hub
# Access:
- REST API token for routing table
-If Proxy authentication token is not set, the Hub will generate a random key itself, which means that any time the Hub is restarted the Proxy will also restart ((Default config)
-The Hub authenticates its requests to the Proxy using a secret token that the Hub and Proxy agree upon.
-Not all proxy implementations use an auth token.
### Credentials
**Database password or file permissions**
- Stored in jupyterhub_config.py or environment
**OAuth tokens**
- issued by hub
- scopes govern permissions
- (hashed) and stored in database
- stored (encrypted, not hashed) in browser cookies
- stored (not encrypted, not hashed) in server environment variables
- Tokens cannot be assigned roles through role definition but may be assigned specific roles when requested via API
- By default expires when the cookie reaches its expiry time of 2 weeks (or after 1 hour in JupyterHub versions < 1.3.0).
- OAuth tokens can generally only be used to identify a user, not take actions on the users behalf
**Cookie secret**
- Stored in a jupyterhub_cookie_secret file
- Encrypts browser cookies used for authentication
- Recommended permissions for the cookie secret file are 600 (owner-only rw)
- jupyterhub_cookie_secret file stores cookie secrets
### ACCESS CONTROL