Parameterized Kernelspecs

# Parameterized Kernelspecs ## Overview This design document outlines discussion topics around parameterizing Jupyter kernelspecs. Kernelspecs are the means by which applications that use Jupyter kernels can instantiate those kernels. Any system that expects to launch Jupyter kernels will likely need to have some way to read, process and execute kernelspecs. The most common clients to launch kernels are the Jupyter notebook server (backing nteract web and jupyterlab UIs too), jupyter console (which runs as a CLI), and nbconvert/papermill (which directly use the jupyter_client APIs), and spawnteract (a node.js library used by CoCalc, Hydrogen, and nteract desktop). There will be a few goals for this session: * impart common ground on the problem around parameterized kernelspecs * Facilitate an open discussion around the challenges, pitfalls and opportunities with parameterized kernelspecs * Enable collaborative prototyping and exploration during the afternoon Open Studio session * Develop one or more potential plans for how to implement this functionality * If we can consolidate on a single plan by the end of the week, that would be wonderful, but given the complexity of the topic, this seems like it may be difficult to achieve. ## User Desire / UI simplification ![The big notebook drop down list](https://i.imgur.com/GrvGDMX.png) ![Grid of notebooks to select from](https://i.imgur.com/gcXgu2L.png) ## Technical background Kernelspecs can refer to objects at a couple of different levels of detail. Kernelspec as Directory: A kernelspec would be a directory found inside the conventional Jupyter kernelspec file system locations. These are also known as kernel “resource directories”. | | Unix | Windows |--------|----------------------------------|--------- | System | `/usr/share/jupyter/kernels` | | |`/usr/local/share/jupyter/kernels` | `%PROGRAMDATA%\jupyter\kernels` |Env| `{sys.prefix}/share/jupyter/kernels` | |User | `~/.local/share/jupyter/kernels (Linux)`| | | `~/Library/Jupyter/kernels (Mac)` | `%APPDATA%\jupyter\kernels` This directory would be expected to at least contain a file named kernel.json (NB: from this we get the second meaning of “kernelspec”). It is also advised that the kernelspec contain an image that can be displayed in graphical UI contexts (like the in-browser Jupyter notebook, nteract and jupyterlab UIs) to act as a signifier for underlying kernelspecs. Additional files are conventionally kept there as well if they are to be used when launching the resulting kernels (one example of this is the launcher.jar that almond places in the kernelspec directories that it creates). Kernelspec as File: A kernelspec file is named kernel.json and is located within a kernelspec directory. Overall the file needs to have a "display_name", "language", and "argv" field. Most important from the perspective is the argv field as that is what determines what command will be run, and will likely be the field that will need to be parameterized. For example, you might have a custom kernel like the following that would allow you to define whatever kernel command you wanted by setting the COMMAND environment variable: kernel.json ``` { "display_name": "conda kernel", "language": "python", "argv": [ "/usr/local/bin/conda-kernel", "-c", "{connection_file}", “-p”, “{parameters_file}” ] } ``` Parameters file will be a json file with the set parameters ``` { “sparkVersion”: “2.1.1”, “anotherSetting”: true } ``` Hard part then is “discovery”. How do we fill in parameters that say they are strings but ideally would get filled in as a specific set of strings dynamically (on server query). JSON schema in an arguments.json file ``` { "$schema": "http://json-schema.org/draft-04/schema#" "description": "Schema for kernel with parameters" "type" : "object", "properties": { "sparkVersion": { "type": "string", # somehow we want this to be a dropdown that gets provided # default behavior is to provide a text box, but we’d like to be able to # query the available options for spark "default": "2.1.1", "enum" : [ ## Can we get version numbers like this? "$ref": "https://location/of/version/numbers" ] }, "anotherSetting": { "type": "boolean", # checkbox "default": true } } "required": [ "sparkVersion", "anotherSetting" ] } ``` To view other fields that may be passed in I recommend you look at the kernelspec docs. Of special note: `"argv": […"{connection_file}"…]` The kernelspec defined by a file will be formatted to replace the connection_file variable with a path to an appropriate connection file. The connection file is used by the Jupyter client to coordinate the ports being used by the resulting Jupyter kernel’s zmq channels. Usually, this path is determined on the fly since a connection_file does not already exist. Business Context Because kernelspecs define the commands that are being run, any context where a business would want to define that command at runtime would benefit from the introduction of parameterized kernelspecs. This can apply to: * UIs that allow users to define their own kernel * A reduction in the number of static kernelspecs that need to exist to provide access to multiple versions of jupyter kernels. * Runtime classpath definition (e.g., for providing additional jars at instantiation time to jvm-based kernels such as almond). ### Motivational areas ## Conda environments Instead of installing N kernels for N conda environments, instead make a single kernel type that is parametrized by the name of the conda environment (and language type?). Let's say that we don't take this parametrized approach. We then have to create a kernel(spec) for every conda environment. We would have N kernel specs and N conda environments. If the user deletes the conda environment, then the kernelspec sticks around both on disk and in every UI they can launch it. This is a problem. Instead, if the kernel itself was able to be launched by specifying the conda environment by name we could have just one kernel and pass the conda environment as a single field. The `parameters.json` file, written alongside the connection file `1234-abcd.json` would look like: ``` { "conda_environment": "magic_panda" } ``` While a jupyter client would execute `/usr/local/bin/conda-kernel -c "1234-abcd.json -p params.json"` to start the kernel. ## Opportunities/Areas to explore: * Currently, there is no static kernelspec JSON Schema, it is worth investigating whether that would be useful to establish before broadening kernelspec capabilities * Explore custom KernelManagers as a way to prototype kernelspec parameterization by changing the format_kernel_cmd command. * Defining how to provide and validate schema for allowed parameter keys and values in a safe (well-typed & validated) manner * Provide a JSON Schema file to validate the parameters before populating the command This will likely make it easier to make the jump from jupyter_console prototyping to surfacing parameterized kernelspecs on the notebook server. * How will users’ previously parameterized kernels be stored? * How will we express default values for parameters? * How are we to identify kernelspecs when neither a static display name nor a static image suffice? * Prototype web-based UI designs for how to specify parameters (and to access them from previously filled out kernelspecs) ## Use Cases * Providing nicer UI for selecting kernels (currently, it can be difficult if you have more than one kernel with the same icon or name) * Providing greater type safety for kernels as they exist today * Reducing the number of kernels that need to be deployed to provide access to many * Providing jars to JVM based kernels at runtime by modifying the classpath ## Technical Considerations ### Solutions for jupyter console vs. the notebook server It will be easiest to prototype much of this functionality via jupyter console rather than via the notebook server. This is because we have access to the command line when using jupyter console (which can set arbitrary traitlet values), while we don’t have access to that when we run kernels through the notebook server. Thus any solution for the server will need to be more general and encompass a way to process some kind of REST call to parameterize the kernel. Dropdown menus & selecting kernels Currently kernelspecs are surfaced to web-based front-ends using a dropdown menu set of selectors. This will not work for parameterized kernelspecs as much more information will be needed to fill out the parameters. ## Appendix: Extended implementation notes ### Python details #### Kernelspec as File: A kernelspec file is named kernel.json and is located within a kernelspec directory. There is no static JSONschema file that a kernel must meet, however runtime-level validation is enabled through traitlets defined on the KernelSpec class. Overall the file needs to have a "display_name", "language", and "argv" field. Most important from the perspective is the argv field as that is what determines what command will be run, and will likely be the field that will be need to be parameterized. For example you might have a custom kernel like the following that would allow you to define whatever kernel command you wanted by setting the COMMAND environment variable: ``` { "display_name": "my custom kernel", "language": "bash", "argv": [ "/usr/local/bin/bash", "-c", "$COMMAND" ] } ``` To view other fields that may be passed in I recommend you look at the kernelspec docs. Of special note: `"argv": […"{connection_file}"…]` The kernelspec defined by a file will be formatted by a KernelManager (via a call to format_kernel_cmd) to replace the connection_file variable with a path to an appropriate connection file. The connection file is used by the Jupyter client to coordinate the ports being used by the resulting Jupyter kernel’s zmq channels. Usually this path is determined on the fly since a connection_file does not already exist. This is the only example of a runtime parameter being set in the current KernelManager. Kernelspec as Runtime Python Object: In the jupyter_client library, there is a KernelSpec class created by reading the kernel.json file into memory based on the resource_dir name and populating it’s traitlet values. This handles the validation currently. Typescript/Javascript fs-kernels/kernelspecs.ts ### Notes from discussion One possible proposal * Kernel should start regardless of validity of the parameters, send back a sensible error message over zeromq when parameters were invalid * * Return code is not enough * Since kernel can start with no parameters, any channel can be used to send back error messages Questions: * What's fillable? * What can they take? (Validity, Type) * Who's responsibility is it to handle validity? Each stage of validation of inputs for a parametrized kernel start **UI**: Initial Validity (was it a string, was it one of the enums) **Server**: Will perform validity no matter what (it's an API, so we have to make sure we got the right stuff) **Kernel**: Final Say in how it went. This raises questions of how the kernel should indicate failure. Example `POST /api/kernels` ```json { "name": "condaKernel", "params": { "env_name": "fluffy_snail" } } ``` #### kernel logs /api/kernels/{uuid}/logs * general logs dumped (stdout, stderr) - read by server - KernelManager writes is for remote kernels * kernel writes its own structured error log: `kernel_crash.json` - con: no good for remote (but KM could be responsible for copying it to server) #### kernel lifecycle messages Kernel should connect iopub regardless of whether the interpreter is ready. Then it can send context about: * acquiring resources * unable to get resources * going to die * death Should we have a way to re-send these on request? Do you always Is it the case that you want the (JSON) API to have / hold the state / transations that occur over ZeroMQ. - we are bleeding into the server-side model - these lifecycle messages seem like ones we should store server side Alternative to having a `parameters-UUID.json` file would be to put the parameters in the connection file itself.