LLaMA

Run sick LLM apps hyper fast on your local machine for funzies.

See it live and in action 📺

Single Model Chat
python -m --model models/mistral-7b-instruct-v0.1.Q4_0.gguf
Single Model Chat with GPU Offload
python -m --model models/mistral-7b-instruct-v0.1.Q4_0.gguf --n_gpu -1
Single Model Function Calling with GPU Offload
python -m --model models/mistral-7b-instruct-v0.1.- Q4_0.gguf --n_gpu -1 --chat functionary
Multiple Model 53D2 Load with Config
python -m --config_file config.json
Multi Modal Models
python -m llama_cpp.server --model models/llava-v1.5-7b-Q4_K.gguf --clip_model_path models/llava-v1.5-7b-mmproj-Q4_0.gguf --n_gpu -1 --chat llava-1-5

👨🏾‍💻 Author: Nick Renotte
📅 Version: 1.x
📜 License: This project is licensed under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
1. openaitest.py		1. openaitest.py
2. main.py		2. main.py
3. instructor.py		3. instructor.py
4. multimodal.py		4. multimodal.py
README.md		README.md
app.py		app.py
config.json		config.json
stock_data.py		stock_data.py