8000 Add support for GGUF models by gaby · Pull Request #866 · serge-chat/serge · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

Add support for GGUF models #866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/model-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ jobs:
- name: Run model health check
working-directory: ./api
run: |
poetry run python -m pytest test/healthcheck_models.py
poetry run python -m pytest -v --color=yes test/healthcheck_models.py
59 changes: 6 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
![License](https://img.shields.io/github/license/serge-chat/serge)
[![Discord](https://img.shields.io/discord/1088427963801948201?label=Discord)](https://discord.gg/62Hc6FEYQH)

Serge is a chat interface crafted with [llama.cpp](https://github.com/ggerganov/llama.cpp) for running Alpaca models. No API keys, entirely self-hosted!
Serge is a chat interface crafted with [llama.cpp](https://github.com/ggerganov/llama.cpp) for running GGUF models. No API keys, entirely self-hosted!

- 🌐 **SvelteKit** frontend
- 💾 **[Redis](https://github.com/redis/redis)** for storing chat history & parameters
Expand Down Expand Up @@ -57,18 +57,10 @@ Instructions for setting up Serge on Kubernetes can be found in the [wiki](https

| Category | Models |
|:-------------:|:-------|
| **Alpaca 🦙** | Alpaca-LoRA-65B, GPT4-Alpaca-LoRA-30B |
| **Chronos 🌑**| Chronos-13B, Chronos-33B, Chronos-Hermes-13B |
| **GPT4All 🌍**| GPT4All-13B |
| **Koala 🐨** | Koala-7B, Koala-13B |
| **LLaMA 🦙** | FinLLaMA-33B, LLaMA-Supercot-30B, LLaMA2 7B, LLaMA2 13B, LLaMA2 70B |
| **Lazarus 💀**| Lazarus-30B |
| **Nous 🧠** | Nous-Hermes-13B |
| **OpenAssistant 🎙️** | OpenAssistant-30B |
| **Orca 🐬** | Orca-Mini-v2-7B, Orca-Mini-v2-13B, OpenOrca-Preview1-13B |
| **Samantha 👩**| Samantha-7B, Samantha-13B, Samantha-33B |
| **Vicuna 🦙** | Stable-Vicuna-13B, Vicuna-CoT-7B, Vicuna-CoT-13B, Vicuna-v1.1-7B, Vicuna-v1.1-13B, VicUnlocked-30B, VicUnlocked-65B |
| **Wizard 🧙** | Wizard-Mega-13B, WizardLM-Uncensored-7B, WizardLM-Uncensored-13B, WizardLM-Uncensored-30B, WizardCoder-Python-13B-V1.0 |
| **CodeLLaMA** | 7B, 13B |
| **LLaMA** | 7B, 13B, 70B |
| **Mistral** | 7B-Instruct, 7B-OpenOrca |
| **Zephyr** | 7B-Alpha, 7B-Beta |

Additional weights can be added to the `serge_weights` volume using `docker cp`:

Expand All @@ -80,45 +72,6 @@ docker cp ./my_weight.bin serge:/usr/src/app/weights/

LLaMA will crash if you don't have enough available memory for the model:

| Model | Max RAM Required |
|-------------|------------------|
| 7B | 4.5GB |
| 7B-q2_K | 5.37GB |
| 7B-q3_K_L | 6.10GB |
| 7B-q4_1 | 6.71GB |
| 7B-q4_K_M | 6.58GB |
| 7B-q5_1 | 7.56GB |
| 7B-q5_K_M | 7.28GB |
| 7B-q6_K | 8.03GB |
| 7B-q8_0 | 9.66GB |
| 13B | 12GB |
| 13B-q2_K | 8.01GB |
| 13B-q3_K_L | 9.43GB |
| 13B-q4_1 | 10.64GB |
| 13B-q4_K_M | 10.37GB |
| 13B-q5_1 | 12.26GB |
| 13B-q5_K_M | 11.73GB |
| 13B-q6_K | 13.18GB |
| 13B-q8_0 | 16.33GB |
| 33B | 20GB |
| 33B-q2_K | 16.21GB |
| 33B-q3_K_L | 19.78GB |
| 33B-q4_1 | 22.83GB |
| 33B-q4_K_M | 22.12GB |
| 33B-q5_1 | 26.90GB |
| 33B-q5_K_M | 25.55GB |
| 33B-q6_K | 29.19GB |
| 33B-q8_0 | 37.06GB |
| 65B | 50GB |
| 65B-q2_K | 29.95GB |
| 65B-q3_K_L | 37.15GB |
| 65B-q4_1 | 43.31GB |
| 65B-q4_K_M | 41.85GB |
| 65B-q5_1 | 51.47GB |
| 65B-q5_K_M | 48.74GB |
| 65B-q6_K | 56.06GB |
| 65B-q8_0 | 71.87GB |

## 💬 Support

Need help? Join our [Discord](https://discord.gg/62Hc6FEYQH)
Expand All @@ -139,4 +92,4 @@ To run Serge in development mode:
```bash
git clone https://github.com/serge-chat/serge.git
docker compose -f docker-compose.dev.yml up -d --build
```
```
3 changes: 2 additions & 1 deletion api/.dockerignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
./weights/*.bin**
./weights/*.bin**
./weights/*.gguf**
Loading
0