Models Spec - Peer Review

# Models Spec :::warning Draft Specification: functionality has not been implemented yet. Feedback: [HackMD: Models Spec](https://hackmd.io/ulO3uB1AQCqLa5SAAMFOQw) ::: - Design Principle - Don't go for simplicity yet - Underlying abstractions are changing very frequently (e.g. ggufv3) - Provide a minimalist framework over the abstractions that takes care of coordination between tools - Show direct system state for now ## Stuff that we will KIV to Model Spec V2 - "OpenAI" and Azure - Importing via URL - Multiple Partitions ## Overview Jan's Model API aims to be as similar as possible to [OpenAI's Models API](https://platform.openai.com/docs/api-reference/models), with additional methods for managing and running models locally. ### Objectives - Users can download, import and delete models - Users can use remote models (e.g. OpenAI, OpenRouter) - Users can start/stop models and use them in a thread (or via Chat Completions API) - User can configure default model parameters at the model level (to be overridden later at `chat/completions` or `assistant`/`thread` level) ## Models Folder Models in Jan are stored in the `/models` folder. Models are stored and organized by folders, which are atomic representations of a model for easy packaging and version control. A model's folder name is its `model.id` and contains: - `<model-id>.json`, i.e. the [Model Object](#model-object) - Binaries (may be downloaded later) - Decision: Binaries are dumb (ask Alan) - Why multiple folders? - Model Partitions (e.g. Llava in the future) - Why a folder and config file for each quantization? - Differently quantied model are completely different - Should we call them model.jsons instead? - Decision: `model.json` - 1st December: - Catalogue of recommended models, anything else = mutate the filesystem - Should we have an API to help quantize models? - from Linh's question - Could be a really cool feature to have (i.e. import from HF, quantize model, run on CPU) - We should have a helper function to handle hardware compatibility - model/{model-id}/compatibility - Louis: we are combining states & manifest - Need to think through #### Folder Structure - /jan # Jan root folder - /models - `GGUF model` - /llama2-70b-q4_k_m - model-binary-1.gguf - model-binary-2.gguf - model.json - `Recommended Model (yet to be downloaded)` - /mistral-7b-gguf-q3_k_l - model.json # Contains download instructions - mistral-7b-q3-K-L.gguf - /mistral-7b-gguf-q8_k_m - model.json # Contains download instructions - mistral-7b-q8_k_m.gguf - `Remote model` - /azure-openai-gpt3-5 - azure-openai-gpt3-5.json - Note: No binaries - `Multiple Binaries` - COMING SOON - `Multiple Quantizations` - COMING SOON - `Imported model (autogenerated .json)` - random-model-q4_k_m.bin - Note: will be moved into a autogenerated folder - /random-model-q4_k_m - random-model-q4_k_m.bin - random-model-q4_k_m.json (autogenerated) ### Importing Models :::warning - This has not been confirmed - Dan's view: Jan should auto-detect and create folders automatically - Jan's UI will allow users to rename folders and add metadata ::: You can import a model by just dragging it into the `/models` folder, similar to Oobabooga. - Jan will detect and generate a corresponding `model-filename.json` file based on filename - Jan will move it into its own `/model-id` folder once you define a `model-id` via the UI - Jan will populate the model's `model-id.json` as you add metadata through the UI ## Model Object :::warning - This is currently not finalized - Dan's view: I think the current JSON is extremely clunky - We should move `init` to top-level (e.g. "settings"?) - We should move `runtime` to top-level (e.g. "parameters"?) - `metadata` is extremely overloaded and should be refactored - Dan's view: we should make a model object very extensible - A `GGUF` model would "extend" a common model object with extra fields (at top level) - Dan's view: State is extremely badly named - Recommended: `downloaded`, `started`, `stopped`, null (for yet-to-download) - We should also note that this is only for local models (not remote) ::: Jan represents models as `json`-based Model Object files, known colloquially as `model.jsons`. Jan aims for rough equivalence with [OpenAI's Model Object](https://platform.openai.com/docs/api-reference/models/object) with additional properties to support local models. Jan's models follow a `model_id.json` naming convention, and are built to be extremely lightweight, with the only mandatory field being a `source_url` to download the model binaries. <ApiSchema example pointer="#/components/schemas/Model" /> ### Types of Models :::warning - This is currently not in the Model Object, and requires further discussion. - Dan's view: we should have a field to differentiate between `local` and `remote` models ::: There are 3 types of models. - Local model - Local model, yet-to-be downloaded (we have the URL) - Remote model (i.e. OpenAI API) #### Local Models :::warning - This is currently not finalized - Dan's view: we should have `download_url` and `local_url` for local models (and possibly more) ::: A `model.json` for a local model should always reference the following fields: - `download_url`: the original download source of the model - `local_url`: the current location of the model binaries (may be array of multiple binaries) ```json // ./models/llama2/llama2-7bn-gguf.json "local_url": "~/Downloads/llama-2-7bn-q5-k-l.gguf", ``` #### Remote Models :::warning - This is currently not finalized - Dan's view: each cloud model should be provided via a syste module, or define its own params field on the `model` or `model.init` object ::: A `model.json` for a remote model should always reference the following fields: - `api_url`: the API endpoint of the model - Any authentication parameters // Dan's view: This needs to be refactored pretty significantly - "source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo", - "parameters": { - "init" { - "API-KEY": "", - "DEPLOYMENT-NAME": "", - "api-version": "2023-05-15" - }, - "runtime": { - "temperature": "0.7", - "max_tokens": "2048", - "presence_penalty": "0", - "top_p": "1", - "stream": "true" - } - } - "metadata": { - "engine": "api", // Dan's view: this should be a `type` field - } ### Importers :::caution - This is only an idea, has not been confirmed as part of spec ::: Jan builds "importers" for users to seamlessly import models from a single URL. We currently only provide this for [TheBloke models on Huggingface](https://huggingface.co/TheBloke) (i.e. one of the patron saints of llama.cpp), but we plan to add more in the future. Currently, pasting a TheBloke Huggingface link in the Explore Models page will fire an importer, resulting in an: - Nicely-formatted model card - Fully-annotated `model.json` file ### Multiple Binaries :::warning - This is currently not finalized - Dan's view: having these fields under `model.metadata` is not maintainable - We should explore some sort of `local_url` structure ::: - Model has multiple binaries `model-llava-1.5-ggml.json` - See [source](https://huggingface.co/mys/ggml_llava-v1.5-13b) ```json "source_url": "https://huggingface.co/mys/ggml_llava-v1.5-13b", "parameters": {"init": {}, "runtime": {}} "metadata": { "mmproj_binary": "https://huggingface.co/mys/ggml_llava-v1.5-13b/blob/main/mmproj-model-f16.gguf", "ggml_binary": "https://huggingface.co/mys/ggml_llava-v1.5-13b/blob/main/ggml-model-q5_k.gguf", "engine": "llamacpp", "quantization": "Q5_K" } ``` ## Models API :::warning - We should use the OpenAPI spec to discuss APIs - Dan's view: This needs @louis and App Pod to review as they are more familiar with this - Dan's view: Start/Stop model should have some UI indicator (show state, block input) ::: See http://localhost:3001/api-reference#tag/Models. | Method | API Call | OpenAI-equivalent | | -------------- | ------------------------------- | ----------------- | | List Models | GET /v1/models | true | | Get Model | GET /v1/models/{model_id} | true | | Delete Model | DELETE /v1/models/{model_id} | true | | Start Model | PUT /v1/models/{model_id}/start | | | Stop Model | PUT /v1/models/{model_id}/start | | | Download Model | POST /v1/models/ | | ## Examples ### Local Model - Model has 1 binary `model-zephyr-7B.json` - See [source](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/) #### Model.json - type: "model", - version: "1", - id: "zephyr-7b" // used in chat-completions, matches folder name - name: "Zephyr 7B" - owned_by: "" // OpenAI compatibility - created: 1231231 // unix timestamp - description: "..." - state: enum(null, downloading, available) @James - remote: bool // Default: local (i.e. remote: false). If remote, will have `remote`key in JSON - // KIV: remote: // Subsequent - // KIV: type: "llm" // For future where there are different types - format: "ggufv3" // State format, rather than engine - "source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf", - settings { - "ctx_len": "2048", - "ngl": "100", - "embedding": "true", - "n_parallel": "4", - "pre_prompt": "A chat between a curious user and an artificial intelligence", - "user_prompt": "USER: ", - "ai_prompt": "ASSISTANT: " } - "parameters": { - "temperature": "0.7", - "token_limit": "2048", - "top_k": "0", - "top_p": "1", - "stream": "true" - // (whatever the "system prompt" is called) - }, - // "metadata": { - // DELETE - "engine": "llamacpp", - // DELETE - "quantization": "Q3_K_L", - // DELETE - "size": "7B", - } - `// Need generic metadata for inference engine` - assets: [ // Formerly binaries - "file://.../zephyr-7b-q4_k_m.bin", - `// Allow you to add metadata to files in folder` - { url: "zephyr-7b-q4_k_m.bin", field: "..." } - ] - Missing Fields? - Question: how does `config.json` fit into this? ### Remote Model - Using a remote API to access model `model-azure-openai-gpt4-turbo.json` - See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api) ```json "source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo", "parameters": { "init" { "API-KEY": "", "DEPLOYMENT-NAME": "", "api-version": "2023-05-15" }, "runtime": { "temperature": "0.7", "max_tokens": "2048", "presence_penalty": "0", "top_p": "1", "stream": "true" } } "metadata": { "engine": "api", } ``` ### Deferred Download - Jan ships with a default model folders containing recommended models - Only the Model Object `json` files are included - Users must later explicitly download the model binaries - ```sh models/ mistral-7b/ mistral-7b.json hermes-7b/ hermes-7b.json ``` ### Multiple quantizations - Each quantization has its own `Jan Model Object` file - TODO: `model.json`? ```sh llama2-7b-gguf/ llama2-7b-gguf-Q2.json llama2-7b-gguf-Q3_K_L.json .bin ``` ### Multiple model partitions - A Model that is partitioned into several binaries use just 1 file ```sh llava-ggml/ llava-ggml-Q5.json .proj ggml ``` ### Locally fine-tuned model ```sh llama-70b-finetune/ llama-70b-finetune-q5.json .bin ``` ## Other Fields? - [x] LMStudio - [ ] Oobabooga - [ ] Ollama - LMStudio: Inference Parameters ![image](https://hackmd.io/_uploads/Skf5Xuu4a.png) - LMStudio: Prompt Format ![image](https://hackmd.io/_uploads/SkjqXd_N6.png) - LMStudio: Pre-prompt/System Prompt ![image](https://hackmd.io/_uploads/rybi7OdEa.png) - LMStudio: Model Initialization ![image](https://hackmd.io/_uploads/rJujQuuNp.png) - Oobabooga: - ![image](https://hackmd.io/_uploads/r1w7Qt_VT.png) - Hardware Parameters? - ![image](https://hackmd.io/_uploads/HyENmtuET.png)