### Filesystem
- name.json = "Model Name"
```shell=
/janroot
/models
llama2-70b.json
llama2-7b-gguf.json
huggingface.co/ # Model registries (de-factor open source)
meta-llama/
llama2-70b-chat-hf/
# 1. Model registries download: Downloaded binaries file here (if downloaded) and update to model.json
# If user drag and drop, update `model_bin` field inside model.json instead (we will handle this later)
llama2-7b-chat/
thebloke/
llama2-70b-chat-hf-gguf/
llama2-7b-chat/
llama7b_q2_K_L.gguf
llama7b_q3_K_L.gguf
model.louis.ai/ # Private model registries
meta-llama/
llama2-70b-chat-hf-tensorrt-llm/
llama2-70b-chat-hf-awq/
model.json
thebloke/
llava-1-5-gguf/ # Use case with multiple model
mmproj.bin
model-q5.ggml
llama-70b-finetune.bin
llama-70b-finetune.json
```
- Problems
- Hard for
### Modeljson format
llama-70b.json
```sh=
# Required
"url":
huggingface_link for binary
api.openai.com/ azure openai/ claude (can infer)
# Optional
import_format:
default # downloads the whole thing
thebloke # custom importer (detects from URL)
janhq # Custom importers
"default_download": llama-2-13b-chat.ggmlv3.q2_K.bin # optional
# Optional: OpenAI format
"id": "/huggingface.co/the-bloke/llama-70b-gguf",
"object": "model",
"created": 1686935002,
"owned_by": "the-bloke"
# Optional: params
# Question: How does config.json fit into this?
"parameters": { # Now specific to Nitro with llama.cpp for LLM, but keeping it as flat Map is good to go
"temperature": "..",
"token_limit": "..",
"top_k": "..",
"top_p": "..",
"pre_prompt": "A chat between a curious user and an artificial intelligence",
"user_prompt": "USER: ",
"ai_prompt": "ASSISTANT: "
},
// Jan specific configs
"metadata": { // @Q: should we put all under "jan"
"engine": "", // Defaults to: llamacpp. It is hard to decide now so let's put it here.
}
```
### Jan's Threads -> Models
```sh=
# thread.json
{
model: {
uses: "llama2"
parameters: {
}
}
}
```
1. File structure
```sh
/janroot
/models
huggingface.co/ # Model registries (de-factor open source)
meta-llama/
llama2-70b-chat-hf/
model.json # Single source of truth
# 1. Model registries download: Downloaded binaries file here (if downloaded) and update to model.json
# If user drag and drop, update `model_bin` field inside model.json instead (we will handle this later)
llama2-7b-chat/
model.json
thebloke/
llama2-70b-chat-hf-gguf/
model.json
llama2-7b-chat/
model.json
llama7b_q2_K_L.gguf
llama7b_q3_K_L.gguf
jan.ai/ # Private model registries
meta-llama/
llama2-70b-chat-hf-tensorrt-llm/
model.json
llama2-70b-chat-hf-awq/
model.json
thebloke/
llava-1-5-gguf/ # Use case with multiple model
model.json
mmproj.bin
model-q5.ggml
mine/
${whoami}/
fine_tuned_model.bin
model.json
```
3. APIs
- Equivalent to: https://platform.openai.com/docs/api-reference/models
```sh
# List models
GET https://localhost:1337/v1/models?filter=[enum](running, downloaded, downloading)
List[model_object]
# Get model object
GET https://localhost:1337/v1/models/{model} # json file name as {model}
model_object <Map>
# Delete model
DELETE https://localhost:1337/v1/models/{model}
# Stop model
PUT https://localhost:1337/v1/models/{model_id}/stop
# Start model
PUT https://localhost:1337/v1/models/{model_id}/start
{
"id": [string]
"model_parameters": [jsonPayload], # Inference engine will dictate what does it want to get
"engine": [string] # The inference engine that will be used ex: llamacpp will expect 1-3 f yeahile paths, tensorrt-llm will expect different etc
}
# Unload model will re-use the same structure as delete model because nitro wont deal with FS so a delete request now will just unload the model out of nitro index
# Download model from HF li yeahnk ()
POST https://localhost:1337/v1/models/remote/add
{
"model_path": (required) [string](huggingface_handles)
"engine": (required) [enum](llamacpp-ver, torchscript, etc)
}
# Creating a model from local model upload (user manually adds a .bin)
POST https://localhost:1337/v1/models/local/add
{
"model_path": (required) [string]
"engine": (required) [enum](llamacpp-ver, torchscript, etc)
}
# maybe just /models/import
```
---
title: "Models"
---
# Models
- [ ] HackMD should read from Github (commit to GH)
- [ ] Start with "UX" (helps people understand)
## Changelog
| Date | Changes |
| ----------- | ------------------------------------------------------ |
| 13 Nov 2023 | Initial discussion by @vuonghoainam, @0xsage, @dan-jan. Hiro previous docs: https://hackmd.io/Nu463_xqSaq9wXMNq0rglA |
Models are AI models like Llama and Mistral
## Overview
Jan's Models are equivalent to OpenAI's models.
> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models
## User Experience
- [ ] Put in Wireframes
## Model Object
Note:
- `*` indicates OpenAI compatibility
- All fields are **optional** and has a `default` fallback.
- Rationale: don't expect users to write a `model.json` everytime. Enable a simple drag-and-drop experience of model binaries.
| Property | Type | Description | Optional |
| ------------ | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------ |
| id\* | string | A `uuid` of the model. Same as `model_path`. | See [id](#id). |
| object\* | enum: `model` | The object type, which is always `model`. | Defaults to `model` |
| created\* | string | The Unix timestamp (in seconds) when the model was created. | Defaults to `time.now` upon creation |
| owned_by\* | string | The source of the model | See [id](#id). |
| model_name | string | A display name for the model. | See [id](#id). |
| model_path | string | Path to a folder containing model binary(s). | Defaults to current folder. See [id](#id). |
| parameters | map | Set of `n` key-value pairs. This is useful for storing model `run parameters` in a structured format. Keys are max 64 chars and values are max 512 chars. e.g. `temperature`, `seed`, `top_p` | See [parameters](#parameters). |
| instructions | map | Set of `n` key-value pairs. Contains model run instructions, e.g. prompt templates, system prompts, user prompts and more | See [instructions](#instructions). |
| engine | enum: `llamacpp`, `tensorrt` | Model backend. Used by Jan to determine how to run model. | Defaults to: `llamacpp` |
| metadata | map | Set of `n` key-value pairs. This is useful for storing misc information about models in a structured format. Keys are max 64 chars and values are max 512 chars. e.g. `author`, `labels`, `license` | See [metadata](#metadata). |
### `id`
We need to think through if `id` === `model.json` === `model_path` === folder name containing binaries?
Approach A (current):
- `id` == `PLATFORM/[OWNED_BY]/MODEL_NAME`
- `id` == `model_path`
- Quantizations share the same `id`
- But then how to have quant-level metadata:`source`, `size`?
- What if users want to delete a specific quantization?
Approach B:
- `id` == `PLATFORM/[OWNED_BY]/MODEL_NAME/BINARY_FILE_NAME`
- `id` == `model_path` + file name
- Quantizations have unique ids
- `model.json` !== model object
- `model.json` file DOESN'T contain `id`, only `model_path`.
- We'll have to define a separate template for `model.json` that drops some quantization-specific properties in the `model` object
=> [Hiro] We can discuss on Approach B (learning from e commerce/ booking) - model variant as products, model variant/ quantization as SKU
User refer to a product and once they are more interested, they have control over SKUs granularity
Ref at https://laracasts.com/discuss/channels/laravel/database-design-for-e-commerce-product-variants-with-laravel?page=1&replyId=851110
```sh
# Hugginface:
/huggingface/the-bloke/llama-7bn-gguf
# More remote source examples:
/civit/meow/1-cat
/my-own-cnd-mirror/the-bloke/llama-7bn-gguf
# User uploaded it locally:
/$(whoami)/my-custom-model
```
### `parameters`
```json
# GGUF sample
{
"llama_model_path": "/path/to/your_model.gguf",
"ctx_len": 2048,
"ngl": 100,
"embedding": true,
"n_parallel": 4,
"pre_prompt": "A chat between a curious user and an artificial intelligence",
"user_prompt": "USER: ",
"ai_prompt": "ASSISTANT: "
}
```
### `instructions`
```json
# GGUF sample
{
"messages": [
{"content": "Hello there 👋", "role": "assistant"},
{"content": "Can you write a long story", "role": "user"}
],
"stream": true,
"model": "<model_name>",
"max_tokens": 2000
}
```
### `metadata`
```json
# GGUF sample
@hiro TODO
# Nothing on my mind atm
```
## `model.json` Template
- What fields are omitted from Model Object?
- i.e. I dont think `object`, `created`, etc. need to be explicitly included in `model.json` files
## Model API
See [/model](/api/model)
- Equivalent to: https://platform.openai.com/docs/api-reference/models
*Manage Models*
```sh
# List models
GET https://localhost:1337/v1/models
{
"object": "list",
"data": [
{
"id": [string](model_path),
"object": "model",
"created": [string](unix_timestamp),
"owned_by": [enum] [<hf_handles>, "anonymous"] # Anonymous in case user use local/ fine tuned model
},
],
"object": "list"
}
# Get model object
GET https://localhost:1337/v1/models/{model}
{
"id": [string](model_path),
"object": "model",
"created": [string](unix_timestamp),
"owned_by": [enum] [<hf_handles>, "anonymous"] # Anonymous in case user use local/ fine tuned model
}
# Delete model
DELETE https://localhost:1337/v1/models/{model}
{
"id": [string](model_path),
"object": "model",
"deleted": true
}
# TODO @alan
# Start model
PUT https://localhost:1337/v1/models/{model_id}/start
{
"model_parameters": [jsonPayload] , # this should be generic due to the multi inference engine nature
}
# Stop model
PUT https://localhost:1337/v1/models/{model_id}/stop
# Load model (Nitro only)
POST https://localhost:1337/v1/models
{
"id": [string]
"model_config": [jsonPayload], # Inference engine will dictate what does it want to get
"engine": [string] # The inference engine that will be used ex: llamacpp will expect 1-3 file paths, tensorrt-llm will expect different etc
}
# Unload model (Nitro only)
Unload model will re-use the same structure as delete model because nitro won't deal with FS so a delete request now will just unload the model out of nitro index
```
*Download/import Models*
```sh
# Download model from HF link
POST https://localhost:1337/v1/models/remote/add
{
"model_path": (required) [string](huggingface_handles)
"engine": (required) [enum](llamacpp-ver, torchscript, etc)
}
# Creating a model from local model upload (user manually adds a .bin)
POST https://localhost:1337/v1/models/local/add
{
"model_path": (required) [string]
"engine": (required) [enum](llamacpp-ver, torchscript, etc)
}
```
## Model Filesystem @Hiro
How `models` map onto your local filesystem
### Default `/models`
- Jan ships with a list of `recommended` models' `model.json` files, packaged under `/models`.
- The model binaries are `wget` only after users explicitly choose to download them.
> Changelog: this means we don't maintain a remote Github Model-Catalog anymore!
```sh
# File structure
/janroot
/models
/huggingface # PLATFORM
meta-llama/ # OWNED_BY
chat-70b-chat-hf/ # MODEL_NAME
model.json # see below
# Empty until user downloads binaries
```
```json
// Template model.json
// OpenAI compatible metadata
"id": "/huggingface/the-bloke/llama-7bn-gguf",
"object": "model",
"created": 1686935002,
"owned_by": "the-bloke"
// Run configs
"name": "" // Defaults to: llama-7bn-gguf
"path": "", // Defaults to: `id`
"variants": Object {
"default": Int(index),
"quantizations": [
{
"metadata": {
"source": "",
"remote_model":"",
"quantization": "",
"variant": "",
"isLora": ""
etc
},
"name": "",
"model_file": [Object] {
# Llava
}
}
]
}
"parameters": { // @hiro: K-V similar to #parameters
"temperature": "..",
"token_limit": "..",
"top_k": "..",
"top_p": ".."
},
"instructions": {}, // K-V similar to #instructions
// Jan specific configs
"metadata": { // @Q: should we put all under "jan"
"engine": "", // Defaults to: llamacpp. It's hard to decide now so let's put it here.
}
```
### Case: Jan "Global" Assistant
- Jan ships with a default assistant called "Jan"
- Assistant Jan lets you chat with all downloaded models in `janroot/models`
- See [Jan Assistant.json](#Jan-“Global”-Assistant)
```sh
/janroot
/assistants
/jan
assistant.json # See below
# No models.json or /models subfolder
```
### Case: Model has many quantizations
> e.g. https://huggingface.co/TheBloke/Llama-2-7B-GGUF/tree/main
Think through the following:
- Quantizations share the same `id`
- Q: should `id` include the binary name as well
- Q: should `model_name` be
- Quantizations share the same folder
- Quantizations share the same `model.json`
- What happens to `model_name`, `source`
-> `model_name`: Extract file name (name.split('/')[-1])
-> `source`:
- If huggingface: Use hugging_face handler
- If local: Use `anonymous`, allow user to edit
Ref: https://huggingface.co/TheBloke/Llama-2-7B-GGUF/tree/main
```sh
/janroot
/models
huggingface/
thebloke/ # Pls use lowercase
llama7b-gguf/
model.json
llama7b_q2_K_L.gguf
llama7b_q3_K_L.gguf
```
-> See `model.json` at #Default-models
### Case: Model has many sizes, e.g. 7Bn vs 13Bn
- Models with different size params have different model.jsons because the run parameters might be entirely different.
```sh
/janroot
/models
huggingface.co/
meta-llama/
llama2-70b-chat-hf/
model.json
llama2-7b-chat/
model.json
thebloke/
llama2-70b-chat-hf-gguf/
model.json
llama2-7b-chat/
model.json
```
### Case: Model is composed of multiple required binaries
> e.g. https://huggingface.co/mys/ggml_llava-v1.5-13b/tree/main
```sh
TODO @Hiro
```
### Case: Model contains other models under the hood
> e.g. https://github.com/ggerganov/whisper.cpp/tree/master/examples/talk-llama
=> This case is very minimal, 1 assistant can use multiple models at once, do not need to deal with this!
```sh
TODO @Hiro
```
### Case: Threads can override default `model.json`
- Within the scope of each conversation thread, users can actually override default model settings in the right sidebar
- @Hiro: Ashley/we need to know what can be configured at the thread level, so we can incoporate it into the right sidebar designs
-> @Nicole: We have to decide high level design:
- Do we need to track the change for assistant/ model? If we consider assistant and model with different life cycle, then my proposal is:
- User cannot chat directly with models on Jan, only to Assistant. If they choose model, they assistant have `model.json` that by default similar to model's `model.json`, then record the change within `assistant` level. We can keep track by everytime user change, it records and save to `model-<unix>.json` and by default use the latest updated json, user can also choose older model.json but then that file will be changed in terms of name???
```sh
TODO @Hiro
```
## Model UI/UX
Lo-fi wireframes from Dan/Nicole
- https://link.excalidraw.com/l/kFY0dI05mm/6taqaC1SNDM
- https://app.excalidraw.com/s/kFY0dI05mm/5t0B03L5zpV
### Explore Models Pages (TODO: @Ashley)
- User can `Explore Models` and download models
*Download Recommended Model*
- User should be able to see model cards of recommended models
- Model card should only include useful information (not duplicated, useless ones)
- User should be able to be recommended most compatible
- User should be able to click “download” to start downloading
*Download Model by HF URL*
- User should be able to past in a HuggingFace link into the search bar
- User should be able to see the model card, including autogenerated compatibility (e.g. Not enough RAM)
*Import new Model locally*
- User should be able to manually upload a bin/gguf custom model
- Users should be able to configure properties in `model.json` via the UI, for the new model
*Import existing `model.json` locally*
- User should be able to manually upload an existing /models/modelfolder with a preconfigured `model.json` & `model.bin`
- Is this automatic, or do users do it through a UI?
*Find Model*
- User should be able to filter recommended models by size, params, RAM required (See Traveloka flight search for an example)
### Manage Models Page (TODO: @Ashley)
@Ashley TODO See: https://link.excalidraw.com/l/kFY0dI05mm/6taqaC1SNDM
_See Downloaded Models_
- User should be able to see all downloaded models
- User should be able to see downloaded model (i.e. downloaded thru Jan into a named folder)
- User should be able to see "imported models" (i.e. user just drops the `.bin` file in `/models`)
_Start, Stop, Delete Models_
- User should be able to see whether model is currently running
- User should be able to stop a running model
- User should be able to delete a model
_See System Status_ (might be system monitor)
- User can see total filesize of downloaded models
- User can see which model is currently active/running
- User can see amount of RAM or VRAM utilized
---
## MVP `assistant.json`
We sort of have to define parts of `assistants.json` if we want to implement Jan Global Assistant
Note:
- `*` indicates OpenAI compatibility
- All fields are **optional** and has a `default` fallback.
| Property | Type | Description | Optional |
| -------------- | --------------- | ----------------------------------------------------------------------- | -------- |
| id\* | string | The identifier, which can be referenced in API endpoints. | |
| object\* | string | The object type, which is always assistant. | |
| created_at\* | string | The Unix timestamp (in seconds) for when the assistant was created. | |
| name\* | string | The name of the assistant. The maximum length is 256 characters. | |
| description\* | string | The description of the assistant. The maximum length is 512 characters. | |
| model\* | string OR array | ID of the model to use. OR, Jan also supports declaring multiple model ids | |
| instructions\* | string | Inherits from the model.json corresponding to the `model` ids | |
| tools\* | array | Coming soon | |
| file_ids\* | array | Coming soon | |
| metadata\* | array | Misc metadata/labels used by Jan UI, etc | |
### Jan "Global" Assistant
- `model` field actually refers to model `id(s)`, not the underlying model binaries
```json
// assistant.json
"model": ["janroot/models/**"]
```
### Future Other Assistants
- Don't worry about this yet
```json
// assistant.json
// Case: assistant just supports 1 model
"model": "./model.json"
// Case: assistant just supports multiple models
"model": [
"./model_0.json",
"./model_1.json",
"janroot/models/**/model.json",
]
```