Documentation - API Reference - Changelog - Bug reports - Discord
β οΈ Cortex is currently in Development: Expect breaking changes and bugs!
Cortex is a C++ AI engine that comes with a Docker-like command-line interface and client libraries. It supports running AI models using ONNX
, TensorRT-LLM
, and llama.cpp
engines. Cortex can function as a standalone server or be integrated as a library.
Cortex supports the following engines:
cortex.llamacpp
:cortex.llamacpp
library is a C++ inference tool that can be dynamically loaded by any server at runtime. We use this engine to support GGUF inference with GGUF models. Thellama.cpp
is optimized for performance on both CPU and GPU.cortex.onnx
Repository:cortex.onnx
is a C++ inference library for Windows that leveragesonnxruntime-genai
and uses DirectML to provide GPU acceleration across a wide range of hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.cortex.tensorrt-llm
:cortex.tensorrt-llm
is a C++ inference library designed for NVIDIA GPUs. It incorporates NVIDIAβs TensorRT-LLM for GPU-accelerated inference.
brew install cortex-engine
winget install cortex-engine
sudo apt install cortex-engine
Coming Soon!
To install Cortex from the source, follow the steps below:
- Clone the Cortex repository here.
- Navigate to the
cortex-js
folder. - Open the terminal and run the following command to build the Cortex project:
npx nest build
- Make the
command.js
executable:
chmod +x '[path-to]/cortex/cortex-js/dist/src/command.js'
- Link the package globally:
npm link
To run and chat with a model in Cortex:
# Start the Cortex server
cortex
# Start a model
cortex run [model_id]
# Chat with a model
cortex chat [model_id]
Cortex supports a list of models available on Cortex Hub.
Here are example of models that you can use based on each supported engine:
Model ID | Variant (Branch) | Model size | CLI command |
---|---|---|---|
codestral | 22b-gguf | 22B | cortex run codestral:22b-gguf |
command-r | 35b-gguf | 35B | cortex run command-r:35b-gguf |
gemma | 7b-gguf | 7B | cortex run gemma:7b-gguf |
llama3 | gguf | 8B | cortex run llama3:gguf |
llama3.1 | gguf | 8B | cortex run llama3.1:gguf |
mistral | 7b-gguf | 7B | cortex run mistral:7b-gguf |
mixtral | 7x8b-gguf | 46.7B | cortex run mixtral:7x8b-gguf |
openhermes-2.5 | 7b-gguf | 7B | cortex run openhermes-2.5:7b-gguf |
phi3 | medium-gguf | 14B - 4k ctx len | cortex run phi3:medium-gguf |
phi3 | mini-gguf | 3.82B - 4k ctx len | cortex run phi3:mini-gguf |
qwen2 | 7b-gguf | 7B | cortex run qwen2:7b-gguf |
tinyllama | 1b-gguf | 1.1B | cortex run tinyllama:1b-gguf |
Model ID | Variant (Branch) | Model size | CLI command |
---|---|---|---|
gemma | 7b-onnx | 7B | cortex run gemma:7b-onnx |
llama3 | onnx | 8B | cortex run llama3:onnx |
mistral | 7b-onnx | 7B | cortex run mistral:7b-onnx |
openhermes-2.5 | 7b-onnx | 7B | cortex run openhermes-2.5:7b-onnx |
phi3 | mini-onnx | 3.82B - 4k ctx len | cortex run phi3:mini-onnx |
phi3 | medium-onnx | 14B - 4k ctx len | cortex run phi3:medium-onnx |
Model ID | Variant (Branch) | Model size | CLI command |
---|---|---|---|
llama3 | 8b-tensorrt-llm-windows-ampere | 8B | cortex run llama3:8b-tensorrt-llm-windows-ampere |
llama3 | 8b-tensorrt-llm-linux-ampere | 8B | cortex run llama3:8b-tensorrt-llm-linux-ampere |
llama3 | 8b-tensorrt-llm-linux-ada | 8B | cortex run llama3:8b-tensorrt-llm-linux-ada |
llama3 | 8b-tensorrt-llm-windows-ada | 8B | cortex run llama3:8b-tensorrt-llm-windows-ada |
mistral | 7b-tensorrt-llm-linux-ampere | 7B | cortex run mistral:7b-tensorrt-llm-linux-ampere |
mistral | 7b-tensorrt-llm-windows-ampere | 7B | cortex run mistral:7b-tensorrt-llm-windows-ampere |
mistral | 7b-tensorrt-llm-linux-ada | 7B | cortex run mistral:7b-tensorrt-llm-linux-ada |
mistral | 7b-tensorrt-llm-windows-ada | 7B | cortex run mistral:7b-tensorrt-llm-windows-ada |
openhermes-2.5 | 7b-tensorrt-llm-windows-ampere | 7B | cortex run openhermes-2.5:7b-tensorrt-llm-windows-ampere |
openhermes-2.5 | 7b-tensorrt-llm-windows-ada | 7B | cortex run openhermes-2.5:7b-tensorrt-llm-windows-ada |
openhermes-2.5 | 7b-tensorrt-llm-linux-ada | 7B | cortex run openhermes-2.5:7b-tensorrt-llm-linux-ada |
Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.
Note: For a more detailed CLI Reference documentation, please see here.
cortex
cortex chat [options] [model_id] [message]
cortex embeddings [options] [model_id] [message]
cortex pull <model_id>
This command can also pulls Hugging Face's models.
cortex run [options] [model_id]:[engine]
cortex models get <model_id>
cortex models list [options]
cortex models remove <model_id>
cortex models start [model_id]
cortex models stop <model_id>
cortex models update [options] <model_id>
cortex engines get <engine_name>
cortex engines install <engine_name> [options]
cortex engines list [options]
cortex engines set <engine_name> <config> <value>
cortex ps
Cortex has a REST API that runs at localhost:1337
.
curl --request POST \
--url http://localhost:1337/v1/models/{model_id}/pull
curl --request POST \
--url http://localhost:1337/v1/models/{model_id}/start \
--header 'Content-Type: application/json' \
--data '{
"prompt_template": "system\n{system_message}\nuser\n{prompt}\nassistant",
"stop": [],
"ngl": 4096,
"ctx_len": 4096,
"cpu_threads": 10,
"n_batch": 2048,
"caching_enabled": true,
"grp_attn_n": 1,
"grp_attn_w": 512,
"mlock": false,
"flash_attn": true,
"cache_type": "f16",
"use_mmap": true,
"engine": "cortex.llamacpp"
}'
curl http://localhost:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "",
"messages": [
{
"role": "user",
"content": "Hello"
},
],
"model": "mistral",
"stream": true,
"max_tokens": 1,
"stop": [
null
],
"frequency_penalty": 1,
"presence_penalty": 1,
"temperature": 1,
"top_p": 1
}'
curl --request POST \
--url http://localhost:1337/v1/models/mistral/stop
Note: Check our API documentation for a full list of available endpoints.
- For support, please file a GitHub ticket.
- For questions, join our Discord here.
- For long-form inquiries, please email hello@jan.ai.