Wa1#shington01
# Cortex Roadmap π§
**Mission**
> To add intelligence to all hardware, in particular Robots
**Vision**
> To become the go-to tool to put AI on any device
This document highlights where Cortex is currently at in terms of development, what needs to be addressed ordered by priority levels, and what lies ahead in 2025.
## Table of Content
1. State of Cortex
2. Path to Success
- Configuration
- Portability
- Python Engine
- Data Management
- Performance
- Benchmarks & Metrics
- Model Management
3. Content
- Documentation
- Guides
- Videos
- Conferences
4. Next Stage
- Cortex Platform
- Functional Requirements
- Non-Functional Requirements
- Design
- Monetization
- Model Hub
- Hardware Integration
- Monetization
5. QA
- Tests
- CI/CD
- Hardware Assurance
6. Roadmap & Action Items
- Milestones
- TODOs
- v2
## 1. State of Cortex
The current version of Cortex, `v1.0.10`, allows developers to run LLMs across different platforms after installation but it is falling short from being able to call itself v1-ready.
What's working well
- Cortex has a clean CLI with straightforward commands, taking inspiration from the way docker manages images and initializes containers.
- The OpenAI compatible API provides a familiar way to communicate with models, making it easy for even non-developers to start using cortex with a few commands. :raised_hands:
- The API docs generated upon starting the server are very good. :ok_hand:
What needs improvement
- Cortex doesn't provide a complete way to customize the `.cortexrc` file it creates in the home directory of its host. This leads to having to manually tweak settings like `apiServerHost:` to `0.0.0.0` in order to self-host cortex in a remote machine.
- Not enough educational materials like videos, tutorials and guides. At the moment, users need to know, or have an idea of, what they want to do when they come to Cortex.
- The distribution of Cortex is too manual. Users need to go to the documentation and copy a command in order to use it.
- The multi-branch approach doesn't seem to offer a lot of benefits on top of the one branch to rule them all approach. It would be benficial to AB test this hypothesis inside the `cortexso` hub to make sure we want to continue investing time in this approach.
- `llama.cpp`, our main engine, comes from a repo called `cortex.llamacpp` adding complexity into the codebase.
- Interactions with a model are stateless.
- It is not possibe to use different modalities at the moment.
Competitors
- Ollama - The have good momentum to the point that users do the marketing for them. This is a great place to be in.
- [ZML](https://github.com/zml/zml) - uses a granular inference pipeline where the modelβs forward computation is compiled into an accelerator-specific executable, taking advantange of type-safe tensor constructs and explicit buffer management to minimize overhead and giving fine-grained control over memory and compute operations. The video below by one of the founders is quite good.
{%youtube hLHITkWb77s %}
## 2. Path to Success
The following path is meant to serve as a tentative blueprint to make Cortex's internals achieve a high degree of reliability, flexibility, and usability in 2025 alongside good distribution.
### Configuration
At the moment, Cortex has minimal support for editing its own configuration via the CLI or HTTP and this makes it challenging for developers wanting to deploy it on a VM in the Cloud or even an environment where the ultimate goal is to let Cortex talk to other tools via the server.
The first stage in making Cortex more configurable would involve adding a CLI flag for most of the options in the `.cortexrc` file. At the moment, the generated `.cortexrc` file contains the following parameters.
```
logFolderPath: /home/user/cortexcpp
logLlamaCppPath: ./logs/cortex.log
logTensorrtLLMPath: ./logs/cortex.log
logOnnxPath: ./logs/cortex.log
dataFolderPath: /home/user/cortexcpp
maxLogLines: 100000
apiServerHost: 127.0.0.1
apiServerPort: 39281
checkedForUpdateAt: 1740630061
checkedForLlamacppUpdateAt: 1740628158149
latestRelease: v1.0.10
latestLlamacppRelease: v0.1.49
huggingFaceToken: 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
gitHubUserAgent: ""
gitHubToken: ""
llamacppVariant: linux-amd64-avx2-cuda-12-0
llamacppVersion: v0.1.49
enableCors: true
allowedOrigins:
- http://localhost:39281
- http://127.0.0.1:39281
- http://0.0.0.0:39281
proxyUrl: ""
verifyProxySsl: true
verifyProxyHostSsl: true
proxyUsername: ""
proxyPassword: ""
noProxy: example.com,::1,localhost,127.0.0.1
verifyPeerSsl: true
verifyHostSsl: true
sslCertPath: ""
sslKeyPath: ""
supportedEngines:
- llama-cpp
- onnxruntime
- tensorrt-llm
- python-engine
- python
checkedForSyncHubAt: 0
```
To start a server, we currently only offer three options:
```shell
cortex start --port 7777 --loglevel DBUG --help
```
The most crucial option needed in early 2025 is undoubtedly the `apiServerHost` to be able to deploy cortex in a remote VM. Ideally, we would provide users with the full menu to start the cortex server with different configurations. For example:
First, a little abstraction for better DX
- `logFolderPath` --> `--logspath </path/to/nirvana>`
- `logLlamaCppPath` --> `--logsllama </path/to/llamaland>`
- `logTensorrtLLMPath` --> Needs to be removed πͺ
- `logOnnxPath` --> `--logsonnx </path/to/devsdevsdevs>`
- `dataFolderPath` --> `--datapath </path/to/dataland`
- `maxLogLines` --> `--loglines <100000>`
- `apiServerHost` --> `--host <0.0.0.0>`
- `apiServerPort` --> `--host 7777` β
- `checkedForUpdateAt` --> ... Not Needed to start the server β
- `checkedForLlamacppUpdateAt` --> ... Not Needed to start the server β
- `latestRelease` --> ... Not Needed to start the server β
- `latestLlamacppRelease` --> ... Not Needed to start the server β
- `huggingFaceToken` --> `--hf-token <token>`
- `gitHubUserAgent` --> `--gh-agent <that-thing>`
- `gitHubToken` `--gh-token <that-token>`
- `llamacppVariant` --> ... Not Needed to start the server β
- `llamacppVersion` --> ... Not Needed to start the server β
- `enableCors` --> `--cors 1` (1 = true & 0 = false)
- `allowedOrigins` --> `--origins <list of origins>`
- `proxyUrl` --> `--proxu-url "https://hey.you"`
- `verifyProxySsl` --> `--verify-proxy`
- `verifyProxyHostSsl` --> `--verify-proxy-host`
- `proxyUsername` --> `--proxy-username`
- `proxyPassword` --> `--proxy-password`
- `noProxy`: example.com,::1,localhost,127.0.0.1
- `verifyPeerSsl` --> `--verify-ssl-peer`
- `verifyHostSsl` --> `--verify-ssl-host`
- `sslCertPath` --> `--ssl-cert-path`
- `sslKeyPath` --> `--ssl-key-path`
- `supportedEngines` --> ... Not Needed to start the server β
- `checkedForSyncHubAt` --> ... Not Needed to start the server β
```sh
cortex start --host "0.0.0.0" \
--port 7777 \
--hf-token "<some-token>" \
--cors 1 \
--logspath "/some/interesting/path" \
...
```
The second stage would involve allowing cortex to live as a long-standing process within the system in which it is installed by using `systemd` or whatever might be available on the user's device. We could implement this in different ways. Here is one example:
```sh
sudo touch /etc/systemd/system/cortex.service
sudo chmod 664 /etc/systemd/system/cortex.service
```
In the `cortex.service` file we would include:
```txt
[Unit]
Description=Cortex
[Service]
ExecStart=/usr/path/to/cortex/binary start
[Install]
WantedBy=multi-user.target
```
Then we ca start reload `systemctl` with:
```sh
sudo systemctl daemon-reload
```
And operate on the service as a long standing process:
```shell
sudo systemctl start cortex.service
sudo systemctl stop cortex.service
sudo systemctl restart cortex.service
sudo systemctl enable cortex.service
systemctl status cortex.service
```
### Portability
Portability here means making cortex more accessible via package managers or other distribution channels that might be more appropriate to different hardware. For example, users going for tiny devices or micro-controllers might opt for Alpine Linux as it weights on average 30-50 MB. One pathway to have cortex installed would be via the respective package manager of each platform, in the case of Alpine, that would be the through `apk` manager.
Ideally, we would add the required workflow in our CI to distribute cortex via:
Mac
- homebrew - `brew`
- Nix - `nix`
- Mac Ports - `port`
Windows
- chocolatey - `choco`
- Scoop
- Winget
Linux
- Alpine - `apk`
- Arch - `pacman` or `yay` / `paru`
- Fedora - `dnf`
- Debian - `apt` or `apt-get`
- NixOS - `nix`
In addition, we would provide docker images with cortex installed in different OS environments, for example:
- `menloltd/cortex-ubuntu:latest`
- `menloltd/cortex-ubuntu-nogpu:latest`
- `menloltd/cortex-arch:latest`
- `menloltd/cortex-fedora:latest`
- `menloltd/cortex-alpine:latest`
### Flexibility via the Python Engine
The Python engine will provide flexibility in different ways. The main two being development velocity and library ecosystem. In addition to this, the Python engine would provide **the ability to serve models of different modalities** like image, audio, video, robotics actions, and so on, the **ability to serve unquantized models** if the user desires it, and the **ability to offer additional services** on top of it. For example,
**Custom Tools**: Users might want to create tools that interact with their deployed models. These might include bespoke benchmarking tools, metrics, and so on.
**Fine-tuning**:
- On-device fine-tuning could happen as follows:
- The user send a copy of the training file and runs the fine-tuning step inside the device.
- If a Cortex server is inside a central Menlo or Raspberry Pi providing and providing intelligence to other devices where the data is being collected at, copies of the model could be sent to such devices for fine-tuning and the weights would be sent back to the main device for integration.
- Via their own cloud
- Menlo Cloud(?)
### Data Management
Single node applications might not have access to the internet for a while or work under networks with limited bandwidth. This means that providing a way to save interactions with the model, or to save logs and other kind of metadata would enable different use cases that users would appreciate and that we can add functionality on top of, for example, the fine-tuning option previously mentioned.
**Data Saved to the Database**
The `cortexcpp` directory we create when Cortex is installed in a system contain a `cortex.db` sqlite database. This database could be leverage to store interactions with different models and provide capabilities such as:
- **long-term memory** This would be similar to how systems like mem0 and memGTP
**Logs**
We do collect a limited amount of logs from both the server and the CLI but have a lot of room for improvement. We could provide different views into the logs via the CLI, for example, simple TUI that pops up with `cortex view logs` allowing you to scan different information available in them. Something similar to the [tui-logger crate](https://github.com/gin66/tui-logger) below:

**Metadata**
Information coming from the model providers, highly customized models, metrics, benchmarks, and more, can all be considered metadata. Organizing these in an easily accessible tables inside the `cortex.db` would add a delightful touch many tools lack.
### Performance
Ability to run models on NVIDIA and non-nvidia based GPUs.
Ability to run efficiently on CPU-only architecture.
### Benchmarks & Metrics
Being able to provide benchmark data on a per model and per hardware basis would serve the following purposes.
- It will provide developers with useful information regarding their model and hardware of interest.
- It will help populate the Model Hub's **BenchCards**, which are sister of the Model Cards provided on the HuggingFace Hub.
At the moment, `robobench` provides a suite of benchmarks for Cortex across seven areas:
1. Model Initialization Tracks the model's startup performance:
- Disk to RAM loading time :heavy_check_mark:
- Cold vs warm start times :heavy_check_mark:
- Model switching overhead :heavy_check_mark:
- Memory spike during initialization :heavy_check_mark:
- Multi-GPU loading efficiency (when available) :x: (Not toroughtly tested)
- Initial memory footprint :heavy_check_mark:
2. Runtime Performance Measures inference capabilities:
- Time to first token (latency) :heavy_check_mark:
- Tokens per second (throughput) :heavy_check_mark:
- Token generation consistency :x: (Not toroughtly tested)
- Streaming performance :x: (Not toroughtly tested)
- Response quality vs speed tradeoffs :x: (Not toroughtly tested)
- Context window utilization :heavy_check_mark:
- KV cache efficiency (Somewhat useful)
- Memory usage per token :heavy_check_mark:
- Batch processing efficiency :x: (Not toroughtly tested)
3. Resource Utilization Monitors system resource usage:
- Memory management patterns
- Peak usage
- Growth patterns
- Cache efficiency
- Fragmentation
- Hardware utilization
- CPU core scaling
- GPU memory bandwidth
- PCIe bandwidth
- Temperature impacts
- Power consumption
4. Advanced Processing Evaluates complex scenarios:
- Multi-model GPU sharing :pray: ideal
- Layer allocation efficiency :pray: ideal
- Inter-model interference :pray: ideal (pipeline setting with more than one model loaded)
- Memory sharing effectiveness
- Multi-user performance
- Request queuing behavior
- Resource contention handling
5. Workload Performance tests different scenarios:
- Short vs long prompt handling
- Code generation performance
- Mathematical computation speed
- Multi-language capabilities
- System prompt impact
- Mixed workload handling :pray: ideal (very cool with multiple models of different modalities loaded)
- Session management :pray: ideal
- Error recovery :pray: ideal (similar to recovery behavior section 7 but with different level of details)
6. System Integration Measures API and system performance:
- API latency :heavy_check_mark:
- Bandwidth utilization :pray: ideal
- Connection management :pray: ideal
- WebSocket performance :pray: ideal
- Request queue behavior :pray: ideal
- Inter-process communication :pray: ideal
- Monitoring overhead :thinking_face: maybe
7. Reliability and Stability Tracks long-term performance:
- Performance degradation patterns :pray: ideal
- Memory leak detection :pray: ideal
- Error rates and types :pray: ideal
- Recovery behavior
- Thermal throttling impact :pray: ideal
- System stability under load :pray: ideal
**Usage and Output**
Robobench provides these metrics via a simple CLI:
```bash
# Basic benchmark
robobench "model-name:quantization"
# Specific benchmark type
robobench "model-name:quantization" --type runtime
# Extended stability test
robobench "model-name:quantization" --type stability --duration 24
```
Results are displayed in clear, formatted tables and can be exported to JSON as well.
The ideal pipeline would be that when these benchmarks are run, either through CI or a separate workflow, the result would be fed directly into the new Cortex's Model Hub, providing.
What we are currently not including is information regarding the open benchmarks like MMLU, SWE, MATH, and so on.
### Model Management
At the moment, the model management
Hardware-based suggestion of models upon installa
Model Merging capabilities
## 3. Content
### Documentation
Cortex will continue to grow and include features that will not be available in previous versions or that might be removed at a later time, because of this, we want to improve our documentation strategy and include
**Different versions:** This means keeping up to 3 to 5 versions back to allow developers.
If cortex will be used on edge devices, or in used in situations where the internet is flaky or not available, we should assume developers and companies using Cortex won't be able to live on the bleeding edge of our software, therefore, it is important that we give them the appropriate documentation for their version as things change.
2
Some examples:
The Zig Programming Language

Pydantic

**Feedback Widgets**
As our software matures, it would be useful to passively get feedback on how things are going for our users. A nice way of doing this is via widgets at the bottom of the pages in our docs. OrbitCSS does this quite nicely.

**Chatbot or Search & Ask functionality similar to Claire**
The chatbot piece might be an overkill but a nice search and chat bar could be quite useful. I find that the way drizzle did this by separating both is quite nice.

Their search bar is powered by Algolia but the chatbot is a separate widget powered by [inkeep](https://inkeep.com).

### Guides
For developers and companies to adopt Cortex we need to show them what they can do with it. That means creating examples running models via Cortex on Raspberry Pis, Orange Pis, Arduinos, and others with practical and/or cool use cases.
**Examples**

- Smart Camera --> adjust lenses, increase accuracy, adjust for movement, detect depth, etc. all via a model
- Mobile phones --> fine-tune model on-device to better math the user's behavior when using their phone.
- Personal Laptop --> Offline use cases.
- Groceries cart --> small device to detect items on the cart.
- Support animal --> device that detects environment and helps dog take better care of the person.
### Videos
Video tutorials are key for showing developers how to use our software, troubleshoot different situations that might arise, and increase engagements.
The lineup for videos this year includes at least one a month in out YouTube channel covering the following topics.
- Introduction to Cortex
- Use Cases on Top of Cortex
- Structured Outputs
- Guardrails
- Tool-calling
- MCP
- ...
- How-to
- Deploy on a Raspberry Pi
- Deploy on an Orange Pi
- Deploy on an Arduino
- ...
- Practical & Fun Examples
- Smart Home use cases
- Airplane coding
- ...
### Conferences
Conferences represent the perfect environment to connect with developers and potential users of our software. In addition, it let's you see in real-time what is working well and what isn't and take note to iterate faster.
Some conferences we might want to attend to:
- OSS
- Open Source Summit Japan
- Python
- PyCon
- PyData
- SciPy
- [EuroSciPy](https://euroscipy.org/2025/)
- EuroPython
- C++
- C++ Now
- CppCon
- Cpp North
- AI
- AI_dev
- Science
- ODSC
- Sci
Meetup events can often be a mini-conference in and of themselves, and while these would be more location-dependent, it is a good opportunity to get the wider Menlo team involved in community-related activities.
## 4. Next Stage
The next stage of Cortex involves thinking about the future and **making it a sustainable product**. This means monetizing the value it adds to teams at companies and corporations while keeping the core of it available in OSS form to users. We want to provide a batteries-included tool with everything a team would want to build around Cortex if **the NeoCortex Platform** didn't exist. Here are some ideas.
### Cortex Platform
> **NeoCortex**
or simply
> **Cortex Platform**

The Cortex Platform would be the management hub for the deployment of Cortex into multiple devices. It would provide users with a way to oversee, test, and manage their deployed instances of Cortex in devices such as Menlo Pi, Raspberry Pi, Orange Pi, and more, using an intuitive and extensible user interface.
The following two sections describe the dish we offer our patrons, what we do in the kitchen to make the dish, and how it should look when it reaches the our patrons' plates.
#### Functional Requirements
Control instances of Cortex in different platforms.

Connection could be managed via SSH and secret access keys. Menlo Pi's, for example, would provide a quick and straightforward experience for connecting to the Cortex Platform. Other platforms with an OS in them could use hardware-specific docker images, the package manager of the OS or another method to provide access to Cortex in their bespoke hardware.
The platform would allow to load and unload, test, quantize, and benchmark models. In addition, it would provide data management and sync capabilities
- Benchmark hardware
- Track metrics
- Troubleshoot via ssh
- Fine-tuning
- on-device
- on-prem
- cloud
, and further customized ones using the Cortex Platform
Sync Layer
Hardware-model visualization
Model Merging
#### Non-Functional Requirements
- Notes-taking capabilities
- Collaborate with team members(?)
- Comment on a deployment
- Start a thread
- Notifications via Slack or Discord regarding a deployment
#### Design





#### Monetization :moneybag:
Revenue will flow in through different tiers of NeoCortex but there are unexplored avenues in this document that could prove quite lucrative, for example, bespoke contacts to set up Menlo Pis, Cortex, and NeoCortex, or engagements with institutions.
**Free Tier**
NeoCortex will be free to download but with limited features from the get go. As developers or teams go into different tiers, they would be able to access more and more functionality individually or for their team.
**Indie Developer Tier**
The indie developer tier will be a step up from the free tier and include model-hardware visualisation, nicer logs view, and potentially something else.
This could cost (in USD)
- $20/month
- $200/year (save $40)
**Teams**
This would include everything in the developer Tier plus the ability to fine-tune on-device, collaboration features like comments, reports, sharing dashboards, invite viewers, and sync layer between deployed instance of Cortex and their desired DB.
This could cost (in USD)
- Flat monthly fee of $100
- $20/user
**Enterprise**
Everything in Teams plus Service Level Agreements, early access to new features, and more.
This could cost (in USD)
- Flat monthly fee of $1000
- $20/user
**Bespoke Engagement**
These could include.
- Hardware-software setup
- Model merging
- Model fine-tuning
- Consulting on how to extract the most out of Cortex
### Model Hub
We want to provide users with a good overview on how models work on different hardware. In order to do this, we will revamp the Model Hub and add our own flavor of Model Cards called, BenchCards.
For starters, the hub will provide high-level details on each model via a quick drop-down as follows:


The model card would look somewhat like this.

### Hardware Integration
## 5. QA
### Tests
The current test suite does not l
### CI/CD
### Hardware Assurance
## 6. Roadmap & Action Items
Individual-level focus
Ramon
- Create content
- `robobench` - which will feed information to the new
- Model Hub
- Polish Model-Hardware Visualization
- NeoCortex
- Design
- Initial prototype in Tauri
Thien
- Python Engine
Harry
- Improve the testing suite and CI/CD pipelines of Cortex (if you don't have a lot of experience with C++, you can create tests in your favorite language and convert it to C++ [with this tool](https://codingfleet.com/code-converter/python/))
Sang
- Improve the configuration capabilities of Cortex
Akarsham
- Enable Cortex to run models on different GPUs
- Make Cortex go Brrr on CPU
Minh
- Improve Cortex distribution mechanism i.e. make it installable via the package manager mentioned in the sections above.
### Milestones
```mermaid
gantt
title Milestones
dateFormat YYYY-MM-DD
axisFormat %d-%m
excludes weekends
section M1-Flexibility
Python Engine :a1, 2025-03-04, 30d
Intel GPU :a2, after a1, 20d
AMD GPU :a3, after a2, 20d
Other (Web?)GPU :after a3, 20d
section M2-Distribution
Arch - AUR :2025-03-10, 2d
Debian - APT :2025-03-12, 2d
Fedora - dnf :2025-03-17, 2d
Alpine - apk :2025-03-19, 2d
NixOS/MacOS - Nix :2025-03-24, 2d
Win - chocolatey :2025-03-26, 2d
Win - Scoop :2025-03-31, 2d
Mac - Homebrew :2025-04-02, 2d
Docker Img Variants :2025-04-04, 21d
Tutorials :2025-03-05, 14d
Guides :2025-03-14, 21d
New Docs Site :2025-03-26, 21d
New Model Hub :after b1, 21d
section M3-Performance
Benchmarks :b1, 2025-03-12, 10d
Metrics :b2, 2025-03-12, 24d
Robust Testing :after b2, 24d
section M4-Monetization
Cortex Platform Prototype :2025-04-15, 60d
```
#### Flexibility
- Run models in
- different formats
- quntized and non-quantized
- different quantization methods
- Run in different platforms
- Run in different GPUs
- Provide a straightforward memory layer (to start) via SQLite or a similar DB
- Provide SDKs that go beyond talking to a model via the OpenAI SDK
#### Distribution
- Make Cortex accessible via package manages
- Make distinct docker images
- Create better documentation with
- guides on different hardware
- tutorials on how to do X with Cortex
#### Performance
- Make it fast
- Make it small
- Enable metrics
- Give users piece of mind with increased testing
#### Monetization
- Lunch alpha version of Cortex Platform
### v2
Nvidia alternative
wrap all ideas into cortex enterprise?
+
what are the unix like libraries that can come togehter to the main solution
Cleaning up the GitHub project
How do we organize the team to tackle the whole of cortex
Cortex positioning for now: Firmware on top of Hardware
