Currently llama cpp is using
# Limitation of llama cpp server (29-Nov)
### Drogon feature
- BlockingIO- https://github.com/yhirose/cpp-httplib

- Static theading models giving total control over OS threads: https://github.com/drogonframework/drogon/wiki/ENG-FAQ-1-Understanding-drogon-threading-model
- Possibility to extend usecase based on built-in DB egnine

### Model feature
- llama.cpp server does not have in-flight load and unload model
# What upstream merge from nitro can add
### What we can add in a "llama drogon" PR
- non-blocking IO
- static threads control models (fixed threads will improve performance of webserver significantly)
- bigger framework and built-in toolset (drogon) to extend the native c++ server capabilities
### What we might consider
- Model management:
- Model warmup
- Model loading
- Model unloading
- Model list (not yet)
- Model folder (not yet)
# How it works on an overall landscape
The model of contributing will be like the below chart, except for functionalities will be something like the above exhaustive list.
