GPUtopia/Arguflow bounty integration

# GPUtopia/Arguflow bounty integration ## Requirements - [X] OpenAI compliant API over an LLM thru [LocalAI](https://github.com/arguflow/LocalAI) running on gputopia network - [X] Text-embedding model running in [Arguflow embedding server](https://github.com/arguflow/arguflow/tree/main/embedding-server) on gputopia network - [ ] ~10-minute deployment of Arguflow on top of GPUTopia compute ## Bounty amount ## Proposed path forward for gputopia - advanced routing and model preloading - always picking handler that have the model loaded - sending out preloads in anticipation of future requests - check fastembed/onyx vs torch for exe size and performance esp parallelism, one of these should work - bge-large-en-v1.5 - ember-v1 - gte-large - stella-base-en-v2 ## For Arguflow - please test our api endpoint (sign up get an api token, and try it out) - send feeback and requests until it works nicely