Models (deprecated)

### Filesystem - name.json = "Model Name" ```shell= /janroot /models llama2-70b.json llama2-7b-gguf.json huggingface.co/ # Model registries (de-factor open source) meta-llama/ llama2-70b-chat-hf/ # 1. Model registries download: Downloaded binaries file here (if downloaded) and update to model.json # If user drag and drop, update `model_bin` field inside model.json instead (we will handle this later) llama2-7b-chat/ thebloke/ llama2-70b-chat-hf-gguf/ llama2-7b-chat/ llama7b_q2_K_L.gguf llama7b_q3_K_L.gguf model.louis.ai/ # Private model registries meta-llama/ llama2-70b-chat-hf-tensorrt-llm/ llama2-70b-chat-hf-awq/ model.json thebloke/ llava-1-5-gguf/ # Use case with multiple model mmproj.bin model-q5.ggml llama-70b-finetune.bin llama-70b-finetune.json ``` - Problems - Hard for ### Modeljson format llama-70b.json ```sh= # Required "url": huggingface_link for binary api.openai.com/ azure openai/ claude (can infer) # Optional import_format: default # downloads the whole thing thebloke # custom importer (detects from URL) janhq # Custom importers "default_download": llama-2-13b-chat.ggmlv3.q2_K.bin # optional # Optional: OpenAI format "id": "/huggingface.co/the-bloke/llama-70b-gguf", "object": "model", "created": 1686935002, "owned_by": "the-bloke" # Optional: params # Question: How does config.json fit into this? "parameters": { # Now specific to Nitro with llama.cpp for LLM, but keeping it as flat Map is good to go "temperature": "..", "token_limit": "..", "top_k": "..", "top_p": "..", "pre_prompt": "A chat between a curious user and an artificial intelligence", "user_prompt": "USER: ", "ai_prompt": "ASSISTANT: " }, // Jan specific configs "metadata": { // @Q: should we put all under "jan" "engine": "", // Defaults to: llamacpp. It is hard to decide now so let's put it here. } ``` ### Jan's Threads -> Models ```sh= # thread.json { model: { uses: "llama2" parameters: { } } } ``` 1. File structure ```sh /janroot /models huggingface.co/ # Model registries (de-factor open source) meta-llama/ llama2-70b-chat-hf/ model.json # Single source of truth # 1. Model registries download: Downloaded binaries file here (if downloaded) and update to model.json # If user drag and drop, update `model_bin` field inside model.json instead (we will handle this later) llama2-7b-chat/ model.json thebloke/ llama2-70b-chat-hf-gguf/ model.json llama2-7b-chat/ model.json llama7b_q2_K_L.gguf llama7b_q3_K_L.gguf jan.ai/ # Private model registries meta-llama/ llama2-70b-chat-hf-tensorrt-llm/ model.json llama2-70b-chat-hf-awq/ model.json thebloke/ llava-1-5-gguf/ # Use case with multiple model model.json mmproj.bin model-q5.ggml mine/ ${whoami}/ fine_tuned_model.bin model.json ``` 3. APIs - Equivalent to: https://platform.openai.com/docs/api-reference/models ```sh # List models GET https://localhost:1337/v1/models?filter=[enum](running, downloaded, downloading) List[model_object] # Get model object GET https://localhost:1337/v1/models/{model} # json file name as {model} model_object <Map> # Delete model DELETE https://localhost:1337/v1/models/{model} # Stop model PUT https://localhost:1337/v1/models/{model_id}/stop # Start model PUT https://localhost:1337/v1/models/{model_id}/start { "id": [string] "model_parameters": [jsonPayload], # Inference engine will dictate what does it want to get "engine": [string] # The inference engine that will be used ex: llamacpp will expect 1-3 f yeahile paths, tensorrt-llm will expect different etc } # Unload model will re-use the same structure as delete model because nitro wont deal with FS so a delete request now will just unload the model out of nitro index # Download model from HF li yeahnk () POST https://localhost:1337/v1/models/remote/add { "model_path": (required) [string](huggingface_handles) "engine": (required) [enum](llamacpp-ver, torchscript, etc) } # Creating a model from local model upload (user manually adds a .bin) POST https://localhost:1337/v1/models/local/add { "model_path": (required) [string] "engine": (required) [enum](llamacpp-ver, torchscript, etc) } # maybe just /models/import ``` --- title: "Models" --- # Models - [ ] HackMD should read from Github (commit to GH) - [ ] Start with "UX" (helps people understand) ## Changelog | Date | Changes | | ----------- | ------------------------------------------------------ | | 13 Nov 2023 | Initial discussion by @vuonghoainam, @0xsage, @dan-jan. Hiro previous docs: https://hackmd.io/Nu463_xqSaq9wXMNq0rglA | Models are AI models like Llama and Mistral ## Overview Jan's Models are equivalent to OpenAI's models. > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models ## User Experience - [ ] Put in Wireframes ## Model Object Note: - `*` indicates OpenAI compatibility - All fields are **optional** and has a `default` fallback. - Rationale: don't expect users to write a `model.json` everytime. Enable a simple drag-and-drop experience of model binaries. | Property | Type | Description | Optional | | ------------ | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------ | | id\* | string | A `uuid` of the model. Same as `model_path`. | See [id](#id). | | object\* | enum: `model` | The object type, which is always `model`. | Defaults to `model` | | created\* | string | The Unix timestamp (in seconds) when the model was created. | Defaults to `time.now` upon creation | | owned_by\* | string | The source of the model | See [id](#id). | | model_name | string | A display name for the model. | See [id](#id). | | model_path | string | Path to a folder containing model binary(s). | Defaults to current folder. See [id](#id). | | parameters | map | Set of `n` key-value pairs. This is useful for storing model `run parameters` in a structured format. Keys are max 64 chars and values are max 512 chars. e.g. `temperature`, `seed`, `top_p` | See [parameters](#parameters). | | instructions | map | Set of `n` key-value pairs. Contains model run instructions, e.g. prompt templates, system prompts, user prompts and more | See [instructions](#instructions). | | engine | enum: `llamacpp`, `tensorrt` | Model backend. Used by Jan to determine how to run model. | Defaults to: `llamacpp` | | metadata | map | Set of `n` key-value pairs. This is useful for storing misc information about models in a structured format. Keys are max 64 chars and values are max 512 chars. e.g. `author`, `labels`, `license` | See [metadata](#metadata). | ### `id` We need to think through if `id` === `model.json` === `model_path` === folder name containing binaries? Approach A (current): - `id` == `PLATFORM/[OWNED_BY]/MODEL_NAME` - `id` == `model_path` - Quantizations share the same `id` - But then how to have quant-level metadata:`source`, `size`? - What if users want to delete a specific quantization? Approach B: - `id` == `PLATFORM/[OWNED_BY]/MODEL_NAME/BINARY_FILE_NAME` - `id` == `model_path` + file name - Quantizations have unique ids - `model.json` !== model object - `model.json` file DOESN'T contain `id`, only `model_path`. - We'll have to define a separate template for `model.json` that drops some quantization-specific properties in the `model` object => [Hiro] We can discuss on Approach B (learning from e commerce/ booking) - model variant as products, model variant/ quantization as SKU User refer to a product and once they are more interested, they have control over SKUs granularity Ref at https://laracasts.com/discuss/channels/laravel/database-design-for-e-commerce-product-variants-with-laravel?page=1&replyId=851110 ```sh # Hugginface: /huggingface/the-bloke/llama-7bn-gguf # More remote source examples: /civit/meow/1-cat /my-own-cnd-mirror/the-bloke/llama-7bn-gguf # User uploaded it locally: /$(whoami)/my-custom-model ``` ### `parameters` ```json # GGUF sample { "llama_model_path": "/path/to/your_model.gguf", "ctx_len": 2048, "ngl": 100, "embedding": true, "n_parallel": 4, "pre_prompt": "A chat between a curious user and an artificial intelligence", "user_prompt": "USER: ", "ai_prompt": "ASSISTANT: " } ``` ### `instructions` ```json # GGUF sample { "messages": [ {"content": "Hello there 👋", "role": "assistant"}, {"content": "Can you write a long story", "role": "user"} ], "stream": true, "model": "<model_name>", "max_tokens": 2000 } ``` ### `metadata` ```json # GGUF sample @hiro TODO # Nothing on my mind atm ``` ## `model.json` Template - What fields are omitted from Model Object? - i.e. I dont think `object`, `created`, etc. need to be explicitly included in `model.json` files ## Model API See [/model](/api/model) - Equivalent to: https://platform.openai.com/docs/api-reference/models *Manage Models* ```sh # List models GET https://localhost:1337/v1/models { "object": "list", "data": [ { "id": [string](model_path), "object": "model", "created": [string](unix_timestamp), "owned_by": [enum] [<hf_handles>, "anonymous"] # Anonymous in case user use local/ fine tuned model }, ], "object": "list" } # Get model object GET https://localhost:1337/v1/models/{model} { "id": [string](model_path), "object": "model", "created": [string](unix_timestamp), "owned_by": [enum] [<hf_handles>, "anonymous"] # Anonymous in case user use local/ fine tuned model } # Delete model DELETE https://localhost:1337/v1/models/{model} { "id": [string](model_path), "object": "model", "deleted": true } # TODO @alan # Start model PUT https://localhost:1337/v1/models/{model_id}/start { "model_parameters": [jsonPayload] , # this should be generic due to the multi inference engine nature } # Stop model PUT https://localhost:1337/v1/models/{model_id}/stop # Load model (Nitro only) POST https://localhost:1337/v1/models { "id": [string] "model_config": [jsonPayload], # Inference engine will dictate what does it want to get "engine": [string] # The inference engine that will be used ex: llamacpp will expect 1-3 file paths, tensorrt-llm will expect different etc } # Unload model (Nitro only) Unload model will re-use the same structure as delete model because nitro won't deal with FS so a delete request now will just unload the model out of nitro index ``` *Download/import Models* ```sh # Download model from HF link POST https://localhost:1337/v1/models/remote/add { "model_path": (required) [string](huggingface_handles) "engine": (required) [enum](llamacpp-ver, torchscript, etc) } # Creating a model from local model upload (user manually adds a .bin) POST https://localhost:1337/v1/models/local/add { "model_path": (required) [string] "engine": (required) [enum](llamacpp-ver, torchscript, etc) } ``` ## Model Filesystem @Hiro How `models` map onto your local filesystem ### Default `/models` - Jan ships with a list of `recommended` models' `model.json` files, packaged under `/models`. - The model binaries are `wget` only after users explicitly choose to download them. > Changelog: this means we don't maintain a remote Github Model-Catalog anymore! ```sh # File structure /janroot /models /huggingface # PLATFORM meta-llama/ # OWNED_BY chat-70b-chat-hf/ # MODEL_NAME model.json # see below # Empty until user downloads binaries ``` ```json // Template model.json // OpenAI compatible metadata "id": "/huggingface/the-bloke/llama-7bn-gguf", "object": "model", "created": 1686935002, "owned_by": "the-bloke" // Run configs "name": "" // Defaults to: llama-7bn-gguf "path": "", // Defaults to: `id` "variants": Object { "default": Int(index), "quantizations": [ { "metadata": { "source": "", "remote_model":"", "quantization": "", "variant": "", "isLora": "" etc }, "name": "", "model_file": [Object] { # Llava } } ] } "parameters": { // @hiro: K-V similar to #parameters "temperature": "..", "token_limit": "..", "top_k": "..", "top_p": ".." }, "instructions": {}, // K-V similar to #instructions // Jan specific configs "metadata": { // @Q: should we put all under "jan" "engine": "", // Defaults to: llamacpp. It's hard to decide now so let's put it here. } ``` ### Case: Jan "Global" Assistant - Jan ships with a default assistant called "Jan" - Assistant Jan lets you chat with all downloaded models in `janroot/models` - See [Jan Assistant.json](#Jan-“Global”-Assistant) ```sh /janroot /assistants /jan assistant.json # See below # No models.json or /models subfolder ``` ### Case: Model has many quantizations > e.g. https://huggingface.co/TheBloke/Llama-2-7B-GGUF/tree/main Think through the following: - Quantizations share the same `id` - Q: should `id` include the binary name as well - Q: should `model_name` be - Quantizations share the same folder - Quantizations share the same `model.json` - What happens to `model_name`, `source` -> `model_name`: Extract file name (name.split('/')[-1]) -> `source`: - If huggingface: Use hugging_face handler - If local: Use `anonymous`, allow user to edit Ref: https://huggingface.co/TheBloke/Llama-2-7B-GGUF/tree/main ```sh /janroot /models huggingface/ thebloke/ # Pls use lowercase llama7b-gguf/ model.json llama7b_q2_K_L.gguf llama7b_q3_K_L.gguf ``` -> See `model.json` at #Default-models ### Case: Model has many sizes, e.g. 7Bn vs 13Bn - Models with different size params have different model.jsons because the run parameters might be entirely different. ```sh /janroot /models huggingface.co/ meta-llama/ llama2-70b-chat-hf/ model.json llama2-7b-chat/ model.json thebloke/ llama2-70b-chat-hf-gguf/ model.json llama2-7b-chat/ model.json ``` ### Case: Model is composed of multiple required binaries > e.g. https://huggingface.co/mys/ggml_llava-v1.5-13b/tree/main ```sh TODO @Hiro ``` ### Case: Model contains other models under the hood > e.g. https://github.com/ggerganov/whisper.cpp/tree/master/examples/talk-llama => This case is very minimal, 1 assistant can use multiple models at once, do not need to deal with this! ```sh TODO @Hiro ``` ### Case: Threads can override default `model.json` - Within the scope of each conversation thread, users can actually override default model settings in the right sidebar - @Hiro: Ashley/we need to know what can be configured at the thread level, so we can incoporate it into the right sidebar designs -> @Nicole: We have to decide high level design: - Do we need to track the change for assistant/ model? If we consider assistant and model with different life cycle, then my proposal is: - User cannot chat directly with models on Jan, only to Assistant. If they choose model, they assistant have `model.json` that by default similar to model's `model.json`, then record the change within `assistant` level. We can keep track by everytime user change, it records and save to `model-<unix>.json` and by default use the latest updated json, user can also choose older model.json but then that file will be changed in terms of name??? ```sh TODO @Hiro ``` ## Model UI/UX Lo-fi wireframes from Dan/Nicole - https://link.excalidraw.com/l/kFY0dI05mm/6taqaC1SNDM - https://app.excalidraw.com/s/kFY0dI05mm/5t0B03L5zpV ### Explore Models Pages (TODO: @Ashley) - User can `Explore Models` and download models *Download Recommended Model* - User should be able to see model cards of recommended models - Model card should only include useful information (not duplicated, useless ones) - User should be able to be recommended most compatible - User should be able to click “download” to start downloading *Download Model by HF URL* - User should be able to past in a HuggingFace link into the search bar - User should be able to see the model card, including autogenerated compatibility (e.g. Not enough RAM) *Import new Model locally* - User should be able to manually upload a bin/gguf custom model - Users should be able to configure properties in `model.json` via the UI, for the new model *Import existing `model.json` locally* - User should be able to manually upload an existing /models/modelfolder with a preconfigured `model.json` & `model.bin` - Is this automatic, or do users do it through a UI? *Find Model* - User should be able to filter recommended models by size, params, RAM required (See Traveloka flight search for an example) ### Manage Models Page (TODO: @Ashley) @Ashley TODO See: https://link.excalidraw.com/l/kFY0dI05mm/6taqaC1SNDM _See Downloaded Models_ - User should be able to see all downloaded models - User should be able to see downloaded model (i.e. downloaded thru Jan into a named folder) - User should be able to see "imported models" (i.e. user just drops the `.bin` file in `/models`) _Start, Stop, Delete Models_ - User should be able to see whether model is currently running - User should be able to stop a running model - User should be able to delete a model _See System Status_ (might be system monitor) - User can see total filesize of downloaded models - User can see which model is currently active/running - User can see amount of RAM or VRAM utilized --- ## MVP `assistant.json` We sort of have to define parts of `assistants.json` if we want to implement Jan Global Assistant Note: - `*` indicates OpenAI compatibility - All fields are **optional** and has a `default` fallback. | Property | Type | Description | Optional | | -------------- | --------------- | ----------------------------------------------------------------------- | -------- | | id\* | string | The identifier, which can be referenced in API endpoints. | | | object\* | string | The object type, which is always assistant. | | | created_at\* | string | The Unix timestamp (in seconds) for when the assistant was created. | | | name\* | string | The name of the assistant. The maximum length is 256 characters. | | | description\* | string | The description of the assistant. The maximum length is 512 characters. | | | model\* | string OR array | ID of the model to use. OR, Jan also supports declaring multiple model ids | | | instructions\* | string | Inherits from the model.json corresponding to the `model` ids | | | tools\* | array | Coming soon | | | file_ids\* | array | Coming soon | | | metadata\* | array | Misc metadata/labels used by Jan UI, etc | | ### Jan "Global" Assistant - `model` field actually refers to model `id(s)`, not the underlying model binaries ```json // assistant.json "model": ["janroot/models/**"] ``` ### Future Other Assistants - Don't worry about this yet ```json // assistant.json // Case: assistant just supports 1 model "model": "./model.json" // Case: assistant just supports multiple models "model": [ "./model_0.json", "./model_1.json", "janroot/models/**/model.json", ] ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.