RFC-2: HF Inference Provider

## Goal Implement HF Inference Provider as first-class provider (un-deletable) on Jan Model Provider list. ## Background Information ### Regarding Inference Provider HF Inference Provider allow you to choose which specific provider you want through their client library. However, if we choose to their OpenAI-compatible version (which require much less effort on Jan's side), the provider is automatically selected on the server-side. [resource](https://huggingface.co/docs/inference-providers/en/index) This means a few things: #### If we choose OpenAI API (which Victor has already implemented, the work is more or less done) 1. We cannot implement what Julien suggested regarding hi-lighting the provider list. Since we can't interact with the provider, it is a non-functional feature that will only confuse the users. https://files.slack.com/files-pri/T1RCG4490-F0942TB5Q3U/image.png 2. It limits the scope of our potential future collaboration (e.g Jan-nano endpoint the people can select Menlo as provider) #### If we choose to implement through their client: 1. It requires us to use and build with hugginface inference client => HuggingFace provider will be a separated component in term of technical implementation. 2. More resource need to be commited to build and maintain HF Provider. 3. Do we want to place that much importance on HF Inference Provider? ### Regarding Inference Endpoint > Due to its cost basis, I don't think anyone will ever use Inference Endpoints as backend for any Jan/chat UI app. - Lucain This is probably a hint from HF not to pursue this direction because if a lot of casual user is using it, they would have already pushed it => do not implement this and left it as a Custom Provider option / extra Document on Jan's side ## Success metric The implementation should include the following points: #### 1. If we implement only OpenAI-compatible API: - Can show the list of provided models (e.g DeepSeek-R1-0528) - Discussion point, do we want to list all the models? or do we want to ask user to visit their websites to see what model is available instead? since their list will grow longer in the same manner as OpenRouter. - Can chat with a selected model through an OpenAI-compatible endpoint #### 2. If we implement through their client: - Can show the list of provided models (e.g DeepSeek-R1-0528) - Can chat with a selected model (limited to chat completions only) - Can select a provider for a model through the UI on Jan (=> should focus on the providers not on HF per HF request) ## Reference design and implementation #### Currently, implemented by Victor https://github.com/menloresearch/jan/pull/5808 #### Julien suggestion https://files.slack.com/files-pri/T1RCG4490-F0942TB5Q3U/image.png