Cortex

Documentation - API Reference - Changelog - Bug reports - Discord

⚠️ Cortex is currently in Development: Expect breaking changes and bugs!

About

Cortex is a C++ AI engine that comes with a Docker-like command-line interface and client libraries. It supports running AI models using ONNX, TensorRT-LLM, and llama.cpp engines. Cortex can function as a standalone server or be integrated as a library.

Cortex Engines

Cortex supports the following engines:

cortex.llamacpp: cortex.llamacpp library is a C++ inference tool that can be dynamically loaded by any server at runtime. We use this engine to support GGUF inference with GGUF models. The llama.cpp is optimized for performance on both CPU and GPU.
cortex.onnx Repository: cortex.onnx is a C++ inference library for Windows that leverages onnxruntime-genai and uses DirectML to provide GPU acceleration across a wide range of hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
cortex.tensorrt-llm: cortex.tensorrt-llm is a C++ inference library designed for NVIDIA GPUs. It incorporates NVIDIA’s TensorRT-LLM for GPU-accelerated inference.

Installation

MacOs

brew install cortex-engine

Windows

winget install cortex-engine

Linux

sudo apt install cortex-engine

Docker

Coming Soon!

Libraries

Build from Source

To install Cortex from the source, follow the steps below:

Clone the Cortex repository here.
Navigate to the cortex-js folder.
Open the terminal and run the following command to build the Cortex project:

npx nest build

Make the command.js executable:

chmod +x '[path-to]/cortex/cortex-js/dist/src/command.js'

Link the package globally:

npm link

Quickstart

To run and chat with a model in Cortex:

# Start the Cortex server
cortex

# Start a model
cortex run [model_id]

# Chat with a model
cortex chat [model_id]

Model Library

Cortex supports a list of models available on Cortex Hub.

Here are example of models that you can use based on each supported engine:

`llama.cpp`

Model ID	Variant (Branch)	Model size	CLI command
codestral	22b-gguf	22B	`cortex run codestral:22b-gguf`
command-r	35b-gguf	35B	`cortex run command-r:35b-gguf`
gemma	7b-gguf	7B	`cortex run gemma:7b-gguf`
llama3	gguf	8B	`cortex run llama3:gguf`
llama3.1	gguf	8B	`cortex run llama3.1:gguf`
mistral	7b-gguf	7B	`cortex run mistral:7b-gguf`
mixtral	7x8b-gguf	46.7B	`cortex run mixtral:7x8b-gguf`
openhermes-2.5	7b-gguf	7B	`cortex run openhermes-2.5:7b-gguf`
phi3	medium-gguf	14B - 4k ctx len	`cortex run phi3:medium-gguf`
phi3	mini-gguf	3.82B - 4k ctx len	`cortex run phi3:mini-gguf`
qwen2	7b-gguf	7B	`cortex run qwen2:7b-gguf`
tinyllama	1b-gguf	1.1B	`cortex run tinyllama:1b-gguf`

`ONNX`

Model ID	Variant (Branch)	Model size	CLI command
gemma	7b-onnx	7B	`cortex run gemma:7b-onnx`
llama3	onnx	8B	`cortex run llama3:onnx`
mistral	7b-onnx	7B	`cortex run mistral:7b-onnx`
openhermes-2.5	7b-onnx	7B	`cortex run openhermes-2.5:7b-onnx`
phi3	mini-onnx	3.82B - 4k ctx len	`cortex run phi3:mini-onnx`
phi3	medium-onnx	14B - 4k ctx len	`cortex run phi3:medium-onnx`

`TensorRT-LLM`

Model ID	Variant (Branch)	Model size	CLI command
llama3	8b-tensorrt-llm-windows-ampere	8B	`cortex run llama3:8b-tensorrt-llm-windows-ampere`
llama3	8b-tensorrt-llm-linux-ampere	8B	`cortex run llama3:8b-tensorrt-llm-linux-ampere`
llama3	8b-tensorrt-llm-linux-ada	8B	`cortex run llama3:8b-tensorrt-llm-linux-ada`
llama3	8b-tensorrt-llm-windows-ada	8B	`cortex run llama3:8b-tensorrt-llm-windows-ada`
mistral	7b-tensorrt-llm-linux-ampere	7B	`cortex run mistral:7b-tensorrt-llm-linux-ampere`
mistral	7b-tensorrt-llm-windows-ampere	7B	`cortex run mistral:7b-tensorrt-llm-windows-ampere`
mistral	7b-tensorrt-llm-linux-ada	7B	`cortex run mistral:7b-tensorrt-llm-linux-ada`
mistral	7b-tensorrt-llm-windows-ada	7B	`cortex run mistral:7b-tensorrt-llm-windows-ada`
openhermes-2.5	7b-tensorrt-llm-windows-ampere	7B	`cortex run openhermes-2.5:7b-tensorrt-llm-windows-ampere`
openhermes-2.5	7b-tensorrt-llm-windows-ada	7B	`cortex run openhermes-2.5:7b-tensorrt-llm-windows-ada`
openhermes-2.5	7b-tensorrt-llm-linux-ada	7B	`cortex run openhermes-2.5:7b-tensorrt-llm-linux-ada`

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.

Cortex CLI Commands

Note: For a more detailed CLI Reference documentation, please see here.

Start Cortex Server

cortex

Chat with a Model

cortex chat [options] [model_id] [message]

Embeddings

cortex embeddings [options] [model_id] [message]

Pull a Model

cortex pull <model_id>

This command can also pulls Hugging Face's models.

Download and Start a Model

cortex run [options] [model_id]:[engine]

Get a Model Details

cortex models get <model_id>

List Models

cortex models list [options]

Remove a Model

cortex models remove <model_id>

Start a Model

cortex models start [model_id]

Stop a Model

cortex models stop <model_id>

Update a Model Config

cortex models update [options] <model_id>

Get an Engine Details

cortex engines get <engine_name>

Install an Engine

cortex engines install <engine_name> [options]

List Engines

cortex engines list [options]

Set an Engine Config

cortex engines set <engine_name> <config> <value>

Show Model Information

cortex ps

REST API

Cortex has a REST API that runs at localhost:1337.

Pull a Model

curl --request POST \
  --url http://localhost:1337/v1/models/{model_id}/pull

Start a Model

curl --request POST \
  --url http://localhost:1337/v1/models/{model_id}/start \
  --header 'Content-Type: application/json' \
  --data '{
  "prompt_template": "system\n{system_message}\nuser\n{prompt}\nassistant",
  "stop": [],
  "ngl": 4096,
  "ctx_len": 4096,
  "cpu_threads": 10,
  "n_batch": 2048,
  "caching_enabled": true,
  "grp_attn_n": 1,
  "grp_attn_w": 512,
  "mlock": false,
  "flash_attn": true,
  "cache_type": "f16",
  "use_mmap": true,
  "engine": "cortex.llamacpp"
}'

Chat with a Model

curl http://localhost:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "",
  "messages": [
    {
      "role": "user",
      "content": "Hello"
    },
  ],
  "model": "mistral",
  "stream": true,
  "max_tokens": 1,
  "stop": [
      null
  ],
  "frequency_penalty": 1,
  "presence_penalty": 1,
  "temperature": 1,
  "top_p": 1
}'

Stop a Model

curl --request POST \
  --url http://localhost:1337/v1/models/mistral/stop

Note: Check our API documentation for a full list of available endpoints.

Contact Support

For support, please file a GitHub ticket.
For questions, join our Discord here.
For long-form inquiries, please email hello@jan.ai.

Name		Name	Last commit message	Last commit date
Latest commit History 1,322 Commits
.github		.github
assets		assets
cortex-js		cortex-js
coverage		coverage
docker		docker
engine		engine
package-managers-template/launchpad/cortexso/debian		package-managers-template/launchpad/cortexso/debian
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

gugas1nwork/cortex

Folders and files

Latest commit

History

Repository files navigation

Cortex

About

Cortex Engines

Installation

MacOs

Windows

Linux

Docker

Libraries

Build from Source

Quickstart

Model Library

llama.cpp

ONNX

TensorRT-LLM

Cortex CLI Commands

Start Cortex Server

Chat with a Model

Embeddings

Pull a Model

Download and Start a Model

Get a Model Details

List Models

Remove a Model

Start a Model

Stop a Model

Update a Model Config

Get an Engine Details

Install an Engine

List Engines

Set an Engine Config

Show Model Information

REST API

Pull a Model

Start a Model

Chat with a Model

Stop a Model

Contact Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`llama.cpp`

`ONNX`

`TensorRT-LLM`

Packages