LLM API Server with Authentication and Usage Tracking

This project provides a user-friendly interface for running LLM models using llama-cpp-python, with API key authentication and usage tracking.

Features

Easy Model Setup: Run LLM models with different backends (CPU, CUDA, Metal, OpenBLAS)
API Key Authentication: Secure your API with user-specific API keys
Usage Tracking: Track API usage statistics per user
System Monitoring: Monitor CPU, memory, and GPU usage
User Management: Add and delete API keys through the UI

Architecture

The system consists of the following components:

Electron App: The main user interface for managing models, users, and viewing system information
Setup Script: Installs llama-cpp-python with the appropriate backend and downloads models
LLama-cpp-python Server: Runs the LLM model and provides the OpenAI-compatible API
FastAPI Middleware: Handles API key authentication and usage tracking

How It Works

When you click "Run Model" in the UI, the system:
- Runs the setup.sh script to install llama-cpp-python with the selected backend
- Downloads the model if it's a Hugging Face model ID
- Starts the llama-cpp-python server on port 8000
- Starts the FastAPI middleware on port 8080
The FastAPI middleware:
- Authenticates API requests using API keys
- Tracks usage statistics per user
- Forwards requests to the llama-cpp-python server
- Returns responses to the client
To use the API:
- Send requests to http://localhost:8080/v1/... (same endpoints as OpenAI API)
- Include your API key in the "api-key" header

API Endpoints

The middleware provides the following endpoints:

All OpenAI-compatible endpoints from llama-cpp-python (forwarded)
/admin/usage: Get usage statistics for all users

Usage Statistics

The system tracks the following usage statistics per user:

Total number of requests
Total number of tokens used
Last request timestamp
Endpoint usage counts

Requirements

Python 3.8+
Node.js 14+
Electron
For CUDA backend: NVIDIA GPU with CUDA toolkit
For Metal backend: Apple Silicon or AMD GPU on macOS

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
deploy		deploy
electron		electron
shared		shared
test		test
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM API Server with Authentication and Usage Tracking

Features

Architecture

How It Works

API Endpoints

Usage Statistics

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

AndrewMead10/compute-community-deploy

Folders and files

Latest commit

History

Repository files navigation

LLM API Server with Authentication and Usage Tracking

Features

Architecture

How It Works

API Endpoints

Usage Statistics

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages