8000 GitHub - nicknochnack/FakeServer: An end to end walkthrough of LLaMA CPP's server.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

nicknochnack/FakeServer

Repository files navigation

LLaMA

Run sick LLM apps hyper fast on your local machine for funzies.

See it live and in action 📺

Startup 🚀

  1. Git clone https://github.com/ggerganov/llama.cpp
  2. Run the make commands:
  • Mac: cd llama.cpp && make
  • Windows (from here ):
    1. Download the latest fortran version of w64devkit.
    2. Extract w64devkit on your pc.
    3. Run w64devkit.exe.
    4. Use the cd command to reach the llama.cpp folder.
    5. From here you can run:
      make
  1. pip install openai 'llama-cpp-python[server]' pydantic instructor streamlit
  2. Start the server:
  • Single Model Chat
    python -m --model models/mistral-7b-instruct-v0.1.Q4_0.gguf
  • Single Model Chat with GPU Offload
    python -m --model models/mistral-7b-instruct-v0.1.Q4_0.gguf --n_gpu -1
  • Single Model Function Calling with GPU Offload
    python -m --model models/mistral-7b-instruct-v0.1.- Q4_0.gguf --n_gpu -1 --chat functionary
  • Multiple Model 53D2 Load with Config
    python -m --config_file config.json
  • Multi Modal Models
    python -m llama_cpp.server --model models/llava-v1.5-7b-Q4_K.gguf --clip_model_path models/llava-v1.5-7b-mmproj-Q4_0.gguf --n_gpu -1 --chat llava-1-5

Models Used 🤖

Who, When, Why?

👨🏾‍💻 Author: Nick Renotte
📅 Version: 1.x
📜 License: This project is licensed under the MIT License

About

An end to end walkthrough of LLaMA CPP's server.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0