# Import Models: A Research Scratchpad
---
## Appendix
This document is a more granular spec on how users can import **uncatalogged** models from various sources.
## Questions
- How do we catalog the models from HuggingFace? See Catalog Options
### Catalog Option 1
**Dynamically scrape HF at runtime**, i.e. User pastes in url path`https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF`, then we scrape the contents of the site to generate model card.
- Pro: Not as tedious as (2); users see newest models
- Con: Slower to load results, user needs to provide complete & valid URL, we can't suggest all models under TheBloke
### Catalog Option 2
**Mirror HuggingFace and pre-index all models** (e.g. LM-Studio)
- Pro: Fast to load results. Users can search for partial terms, not full URL, e.g. typing `TheBloke` will suggest all TheBloke models. Builds moat.
- Con: Have to build daily/hourly HF scraper, overcome HF IP throttling
- See further thoughts: https://hackmd.io/4R2aN42GR-GlT516_uixOQ
### Catalog Option 3
**Curate a Library of recommended, popular models** (e.g. Ollama)
- Pro: Curated, good tags/descriptions. Builds moat.
- Con: Users might not see newest models, tedious to maintain
## UX Principles
- Users should be able to import and use a model in less than 3 clicks, in under a minute.
- Don't gatekeep new models. Users shouldn't depend on us to "support" new models from TheBloke, for example.
- Maintain a handful of `Recommended Models`, which are popular open source models, with optimal parameters already configured.
## UX: Model import flows (assuming option 1)
### 1. User downloads a model via GUI (using URL)
1. User makes a post request: `POST /models` with parameter `$USER_SOURCE_URL` (see [valid url paths](#SOURCE_URL))
2. Model Importer infers the Model Format based on `SOURCE_URL`. If it is a custom format, then handle it accordingly.
3. User sees a `model card` based on Model Format (designs pending), i.e. if GGUF, then render the various quantizations
3. User chooses a model/variation to download
**Initial model file**
- It doesn't exist
**Final model file**
```json
"source_url": $USER_SOURCE_URL,
"state": "ready",
// Q: How to express the place where model binary was actually saved?
// "downloaded_path": "/models/$FOLDERNAME" // Alt names: binary, model/binary/file_location
// Alternate
// "entry_point": "./models/$FOLDERNAME/$BINARYFILE", // In case of multiple model binaries
"metadata": {
"format": "gguf" // For custom supported formats
"custom_format_tags_here": "tba"
}
```
### 2. User downloads a Recommended Model via GUI (via model.json)
1. Jan ships with a few Recommended GGUF models with deferred download
2. They render as Recommended Model Cards in the UI, and users have the choice of actually downloading the models
Initial model file
```json
"state": null,
"parameters": {...fully_defined}
```
Final model file
```json
"state": "ready",
...
```
### 3. (KIV) User imports a model from local filesystem (using model.json)
- User drags and drops a complete model package (with model.json and binaries into /models)
- Handle this later.
### SOURCE_URL
Valid URL paths we handle:
1. `Huggingface/$ORG_NAME/$MODEL_NAME*`
- https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF
- https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main
- https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q3_K_M.gguf
## Supported Model Formats
The following model formats have custom import logic.
- [GGUF](#GGUF)
- (KIV) AWQ
- (KIV) GPTQ
- (KIV) Pytorch
- (KIV) Safetensors
- (KIV) TensorRT
## Benchmarking
### Ollama
- Import UX: Ollama maintains a Library of supported models.
- Users just do `ollama run mistral:variant` to use it out-of-the-box
- Supported formats: `GGUF`, `PyTorch` or `Safetensors`
- Shaping up to be a HF competitor, letting users upload custom models: https://github.com/jmorganca/ollama/blob/main/docs/import.md#publishing-your-model-optional--early-alpha
### LMStudio
- Most nontechnical-user friendly UX
- They scrape HuggingFace daily and index all models and variants
- Users can search for any terms and get partial matches, e.g. `TheBloke/llama` returns a list of many `TheBloke/*llama*` models
### Faraday
- Similar to Ollama, maintains a curated Library of models.
- Thus not all/latest models are shown
### Ooba
### SillyTavern
- Depends on you running a separate inference server
- Main, recommended way to use it is via remote API
- Q: What is @dan-jan referring to when he says ST has a good models experience
### KoboldCPP
-