serge-chat · gaby · Nov 17, 2023 · Nov 13, 2023 · Nov 13, 2023 · Nov 13, 2023
diff --git a/.github/workflows/model-check.yml b/.github/workflows/model-check.yml
@@ -45,4 +45,4 @@ jobs:
       - name: Run model health check
         working-directory: ./api
         run: |
-          poetry run python -m pytest test/healthcheck_models.py
+          poetry run python -m pytest -v --color=yes test/healthcheck_models.py
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 ![License](https://img.shields.io/github/license/serge-chat/serge)
 [![Discord](https://img.shields.io/discord/1088427963801948201?label=Discord)](https://discord.gg/62Hc6FEYQH)
 
-Serge is a chat interface crafted with [llama.cpp](https://github.com/ggerganov/llama.cpp) for running Alpaca models. No API keys, entirely self-hosted!
+Serge is a chat interface crafted with [llama.cpp](https://github.com/ggerganov/llama.cpp) for running GGUF models. No API keys, entirely self-hosted!
 
 - 🌐 **SvelteKit** frontend
 - 💾 **[Redis](https://github.com/redis/redis)** for storing chat history & parameters
@@ -57,18 +57,10 @@ Instructions for setting up Serge on Kubernetes can be found in the [wiki](https
 
 | Category      | Models |
 |:-------------:|:-------|
-| **Alpaca 🦙** | Alpaca-LoRA-65B, GPT4-Alpaca-LoRA-30B |
-| **Chronos 🌑**| Chronos-13B, Chronos-33B, Chronos-Hermes-13B |
-| **GPT4All 🌍**| GPT4All-13B |
-| **Koala 🐨**  | Koala-7B, Koala-13B |
-| **LLaMA 🦙**  | FinLLaMA-33B, LLaMA-Supercot-30B, LLaMA2 7B, LLaMA2 13B, LLaMA2 70B |
-| **Lazarus 💀**| Lazarus-30B |
-| **Nous 🧠**   | Nous-Hermes-13B |
-| **OpenAssistant 🎙️** | OpenAssistant-30B |
-| **Orca 🐬**   | Orca-Mini-v2-7B, Orca-Mini-v2-13B, OpenOrca-Preview1-13B |
-| **Samantha 👩**| Samantha-7B, Samantha-13B, Samantha-33B |
-| **Vicuna 🦙** | Stable-Vicuna-13B, Vicuna-CoT-7B, Vicuna-CoT-13B, Vicuna-v1.1-7B, Vicuna-v1.1-13B, VicUnlocked-30B, VicUnlocked-65B |
-| **Wizard 🧙** | Wizard-Mega-13B, WizardLM-Uncensored-7B, WizardLM-Uncensored-13B, WizardLM-Uncensored-30B, WizardCoder-Python-13B-V1.0 |
+| **CodeLLaMA** | 7B, 13B |
+| **LLaMA**  | 7B, 13B, 70B |
+| **Mistral** | 7B-Instruct, 7B-OpenOrca |
+| **Zephyr** | 7B-Alpha, 7B-Beta |
 
 Additional weights can be added to the `serge_weights` volume using `docker cp`:
 
@@ -80,45 +72,6 @@ docker cp ./my_weight.bin serge:/usr/src/app/weights/
 
 LLaMA will crash if you don't have enough available memory for the model:
 
-| Model       | Max RAM Required |
-|-------------|------------------|
-| 7B          | 4.5GB            |
-| 7B-q2_K     | 5.37GB           |
-| 7B-q3_K_L   | 6.10GB           |
-| 7B-q4_1     | 6.71GB           |
-| 7B-q4_K_M   | 6.58GB           |
-| 7B-q5_1     | 7.56GB           |
-| 7B-q5_K_M   | 7.28GB           |
-| 7B-q6_K     | 8.03GB           |
-| 7B-q8_0     | 9.66GB           |
-| 13B         | 12GB             |
-| 13B-q2_K    | 8.01GB           |
-| 13B-q3_K_L  | 9.43GB           |
-| 13B-q4_1    | 10.64GB          |
-| 13B-q4_K_M  | 10.37GB          |
-| 13B-q5_1    | 12.26GB          |
-| 13B-q5_K_M  | 11.73GB          |
-| 13B-q6_K    | 13.18GB          |
-| 13B-q8_0    | 16.33GB          |
-| 33B         | 20GB             |
-| 33B-q2_K    | 16.21GB          |
-| 33B-q3_K_L  | 19.78GB          |
-| 33B-q4_1    | 22.83GB          |
-| 33B-q4_K_M  | 22.12GB          |
-| 33B-q5_1    | 26.90GB          |
-| 33B-q5_K_M  | 25.55GB          |
-| 33B-q6_K    | 29.19GB          |
-| 33B-q8_0    | 37.06GB          |
-| 65B         | 50GB             |
-| 65B-q2_K    | 29.95GB          |
-| 65B-q3_K_L  | 37.15GB          |
-| 65B-q4_1    | 43.31GB          |
-| 65B-q4_K_M  | 41.85GB          |
-| 65B-q5_1    | 51.47GB          |
-| 65B-q5_K_M  | 48.74GB          |
-| 65B-q6_K    | 56.06GB          |
-| 65B-q8_0    | 71.87GB          |
-
 ## 💬 Support
 
 Need help? Join our [Discord](https://discord.gg/62Hc6FEYQH)
@@ -139,4 +92,4 @@ To run Serge in development mode:
 ```bash
 git clone https://github.com/serge-chat/serge.git
 docker compose -f docker-compose.dev.yml up -d --build
-```
+```
diff --git a/api/.dockerignore b/api/.dockerignore
@@ -1 +1,2 @@
-./weights/*.bin**
+./weights/*.bin**
+./weights/*.gguf**